Eliminate hyphen from AppleScript's text item delimiters - applescript

I am having problems separating words from each other when it comes to equations, because I can't separate the equation into two parts if there is a negative variable involved.
set function to "-3x"
return word 1 of function
that would return "3x", because a hyphen is a text item delimiter, but I want it to return "-3x". Is there any way to remove the hyphen from the text item delimiters or any other way to include the hyphen into the string?

To give an idea, here's a simple tokenizer for a very simple Lisp-like language:
-- token types
property StartList : "START"
property EndList : "END"
property ANumber : "NUMBER"
property AWord : "WORD"
-- recognized token chars
property _startlist : "("
property _endlist : ")"
property _number : "+-.1234567890"
property _word : "abcdefghijklmnopqrstuvwxyz"
property _whitespace : space & tab & linefeed & return
to tokenizeCode(theCode)
considering diacriticals, hyphens, punctuation and white space but ignoring case and numeric strings
set i to 1
set l to theCode's length
set tokensList to {}
repeat while i ≤ l
set c to character i of theCode
if c is _startlist then
set end of tokensList to {tokenType:StartList, tokenText:c}
set i to i + 1
else if c is _endlist then
set end of tokensList to {tokenType:EndList, tokenText:c}
set i to i + 1
else if c is in _number then
set tokenText to ""
repeat while character i of theCode is in _number and i ≤ l
set tokenText to tokenText & character i of theCode
set i to i + 1
end repeat
set end of tokensList to {tokenType:ANumber, tokenText:tokenText}
else if c is in _word then
set tokenText to ""
repeat while character i of theCode is in _word and i ≤ l
set tokenText to tokenText & character i of theCode
set i to i + 1
end repeat
set end of tokensList to {tokenType:AWord, tokenText:tokenText}
else if c is in _whitespace then -- skip over white space
repeat while character i of theCode is in _whitespace and i ≤ l
set i to i + 1
end repeat
else
error "Unknown character: '" & c & "'"
end if
end repeat
return tokensList
end considering
end tokenizeCode
The syntax rules for this language are as follows:
A number expression contains one or more digits, "+" or "-" signs, and/or decimal point. (The above code currently doesn't check that the token is a valid number, e.g. it'll happily accept nonsensical input like "0.1.2-3+", but that's easy enough to add.)
A word expression contains one or more characters (a-z).
A list expression begins with a "(" and ends with a ")". The first token in a list expression must be the name of the operator to apply; this may be followed by zero or more additional expressions representing its operands.
Any unrecognized characters are treated as an error.
For example, let's use it to tokenize the mathematical expression "3 + (2.5 * -2)", which in prefix notation is written like this:
set programText to "(add 3 (multiply 2.5 -2))"
set programTokens to tokenizeCode(programText)
--> {{tokenType:"START", tokenText:"("},
{tokenType:"WORD", tokenText:"add"},
{tokenType:"NUMBER", tokenText:"3"},
{tokenType:"START", tokenText:"("},
{tokenType:"WORD", tokenText:"multiply"},
{tokenType:"NUMBER", tokenText:"2.5"},
{tokenType:"NUMBER", tokenText:"-2"},
{tokenType:"END", tokenText:")"},
{tokenType:"END", tokenText:")"}}
Once the text is split up into a list of tokens, the next step is to feed that list into a parser which assembles it into an abstract syntax tree which fully describes the structure of the program.
Like I say, there's a bit of a learning curve to this stuff, but you can write it in your sleep once you've grasped the basic principles. Ask and I'll add an example of how to parse these tokens into usable form later.

Following on from before, here's a parser that turns the tokenizer's output into a tree-based data structure that describes the program's logic.
-- token types
property StartList : "START"
property EndList : "END"
property ANumber : "NUMBER"
property AWord : "WORD"
-------
-- handlers called by Parser to construct Abstract Syntax Tree nodes,
-- simplified here for demonstration purposes
to makeOperation(operatorName, operandsList)
return {operatorName:operatorName, operandsList:operandsList}
end makeOperation
to makeWord(wordText)
return wordText
end makeWord
to makeNumber(numberText)
return numberText as number
end makeNumber
-------
-- Parser
to makeParser(programTokens)
script ProgramParser
property currentToken : missing value
to advanceToNextToken()
if programTokens is {} then error "Found unexpected end of program after '" & currentToken & "'."
set currentToken to first item of programTokens
set programTokens to rest of programTokens
return
end advanceToNextToken
--
to parseOperation() -- parses an '(OPERATOR [OPERANDS ...])' list expression
advanceToNextToken()
if currentToken's tokenType is AWord then -- parse 'OPERATOR'
set operatorName to currentToken's tokenText
set operandsList to {}
advanceToNextToken()
repeat while currentToken's tokenType is not EndList -- parse 'OPERAND(S)'
if currentToken's tokenType is StartList then
set end of operandsList to parseOperation()
else if currentToken's tokenType is AWord then
set end of operandsList to makeWord(currentToken's tokenText)
else if currentToken's tokenType is ANumber then
set end of operandsList to makeNumber(currentToken's tokenText)
else
error "Expected word, number, or list expression but found '" & currentToken's tokenText & "' instead."
end if
advanceToNextToken()
end repeat
return makeOperation(operatorName, operandsList)
else
error "Expected operator name but found '" & currentToken's tokenText & "' instead."
end if
end parseOperation
to parseProgram() -- parses the entire program
advanceToNextToken()
if currentToken's tokenType is StartList then
return parseOperation()
else
error "Found unexpected '" & currentToken's tokenText & "' at start of program."
end if
end parseProgram
end script
end makeParser
-------
-- parse the tokens list produced by the tokenizer into an Abstract Syntax Tree
set programTokens to {{tokenType:"START", tokenText:"("}, ¬
{tokenType:"WORD", tokenText:"add"}, ¬
{tokenType:"NUMBER", tokenText:"3"}, ¬
{tokenType:"START", tokenText:"("}, ¬
{tokenType:"WORD", tokenText:"multiply"}, ¬
{tokenType:"NUMBER", tokenText:"2.5"}, ¬
{tokenType:"NUMBER", tokenText:"-2"}, ¬
{tokenType:"END", tokenText:")"}, ¬
{tokenType:"END", tokenText:")"}}
set parserObject to makeParser(programTokens)
set abstractSyntaxTree to parserObject's parseProgram()
--> {operatorName:"add", operandsList:{3, {operatorName:"multiply", operandsList:{2.5, -2}}}}
The ProgramParser object is a very, very simple recursive descent parser, a collection of handlers, each of which knows how to turn a sequence of tokens into a specific data structure. In fact, the Lisp-y syntax used here is so simple it really only requires two handlers: parseProgram, which gets everything underway, and parseOperation, which knows how to read the tokens that make up a (OPERATOR_NAME [OPERAND1 OPERAND2 ...]) list and turn it into a record that describes a single operation (add, multiply, etc) to be performed.
The nice thing about an AST, especially a very simple regular one like this, is you can manipulate it as data in its own right. For instance, given the program (multiply x y) and a definition of y = (add x 1), you could walk the AST and replace any mention of y with its definition, in this case giving (multiply x (add x 1)). i.e. You can not only do arithmetic calculations (algorithmic programming), but algebraic manipulations (symbolic programming) too. That's a bit heady for here, but I'll see about knocking together a simple arithmetical evaluator for later.

To finish off, here's a simple evaluator for the parser's output:
to makeOperation(operatorName, operandsList)
if operatorName is "add" then
script AddOperationNode
to eval(env)
if operandsList's length ≠ 2 then error "Wrong number of operands."
return ((operandsList's item 1)'s eval(env)) + ((operandsList's item 2)'s eval(env))
end eval
end script
else if operatorName is "multiply" then
script MultiplyOperationNode
to eval(env)
if operandsList's length ≠ 2 then error "Wrong number of operands."
return ((operandsList's item 1)'s eval(env)) * ((operandsList's item 2)'s eval(env))
end eval
end script
-- define more operations here as needed...
else
error "Unknown operator: '" & operatorName & "'"
end if
end makeOperation
to makeWord(wordText)
script WordNode
to eval(env)
return env's getValue(wordText)'s eval(env)
end eval
end script
end makeWord
to makeNumber(numberText)
script NumberNode
to eval(env)
return numberText as number
end eval
end script
end makeNumber
to makeEnvironment()
script EnvironmentObject
property _storedValues : {}
--
to setValue(theKey, theValue)
-- theKey : text
-- theValue : script
repeat with aRef in _storedValues
if aRef's k is theKey then
set aRef's v to theValue
return
end if
end repeat
set end of _storedValues to {k:theKey, v:theValue}
return
end setValue
--
to getValue(theKey)
repeat with aRef in _storedValues
if aRef's k is theKey then return aRef's v
end repeat
error "'" & theKey & "' is undefined." number -1728
end getValue
--
end script
end makeEnvironment
to runProgram(programText, theEnvironment)
set programTokens to tokenizeCode(programText)
set abstractSyntaxTree to makeParser(programTokens)'s parseProgram()
return abstractSyntaxTree's eval(theEnvironment)
end runProgram
This replaces the make... handlers used to test the parser with new handlers that construct full-blown objects representing each type of structure that can make up an Abstract Syntax Tree: numbers, words, and operations. Each object defines an eval handler that knows how to evaluate that particular structure: in a NumberNode it simply returns the number, in a WordNode it retrieves and evaluates the structure stored under that name, in an AddOperationNode it evaluates each operand then sums them, and so on.
For example, to evaluate our original 3 + 2.5 * -2 program:
set theEnvironment to makeEnvironment()
runProgram("(add 3 (multiply 2.5 -2))", theEnvironment)
--> -2.0
In addition, an EnvironmentObject is used to store named values. For example, to store a value named "x" for use by a program:
set theEnvironment to makeEnvironment()
theEnvironment's setValue("x", makeNumber(5))
runProgram("(add 3 x)", theEnvironment)
--> 8
Obviously this will need a bit more work to make it into a proper calculator: a full set of operator definitions, better error reporting, and so on. Plus you'll probably want to replace the parenthesized prefix syntax with a more familiar infix syntax, for which you'll need something like a Pratt parser that can handle precedence, association, etc. But once you've got the basics working it's just a matter of reading up on the various techniques and making changes and improvements one by one until you arrive at the desired solution. HTH.

You can write a calculator in AppleScript if you wish to, but you need to do it as you would in any other language: 1. using a tokenizer to split the input text into a list of tokens, 2. feeding those tokens to a parser which assembles them into an abstract syntax tree, and 3. evaluating that tree to produce a result.
For what you're doing, you could probably write your tokenizer as a regular expression (assuming you don't mind dipping down to NSRegularExpression via the AppleScript-ObjC bridge). For parsing, I recommend reading up on Pratt parsers, which are easy to implement yet powerful enough to support prefix, infix, and posfix operators and operator precedence. For evaluation, a simple recursive AST walking algorithm may well be sufficient, but one step at a time.
These are all well-solved problems, so you won't have any trouble finding tutorials and other online information on how to do it. (Lots of crap, of course, so be prepared to spend some time figuring out how tell the good from the bad.)
Your one problem is that you none will be written specifically for AppleScript, so be prepared to spelunk material written around other languages (Python, Java, etc, etc) and translate from that to AS yourself. That'll require some effort and patience wading through all the programmer-speak, but is eminently doable (I originally cut my teeth on AppleScript and now write my own automation scripting languages) and a great learning exercise for developing your skills.

Related

Calculator with function Add concatenates result instead of addition [duplicate]

On every site that talks about VBScript, the '&' operator is listed as the string concatenation operator. However, in some code that I have recently inherited, I see the '+' operator being used and I am not seeing any errors as a result of this. Is this an accepted alternative?
The & operator does string concatenation, that is, forces operands to be converted to strings (like calling CStr on them first). +, in its turn, forces addition if one of the expressions is numeric. For example:
1 & 2
gives you 12, whereas
1 + 2
"1" + 2
1 + "2"
give you 3.
So, it is recommended to use & for string concatenation since it eliminates ambiguity.
The + operator is overloaded, whereas the & operator is not. The & operator only does string concatenation. In some circles the & operator is used as a best practice because it is unambiguous, and therefore cannot have any unintended effects as a result of the overloading.
+ operator might backfire when strings can be interpreted as numbers. If you don't want nasty surprises use & to concatenate strings.
In some cases the + will throw an exception; for example the following:
Sub SimpleObject_FloatPropertyChanging(fvalue, cancel)
'fvalue is a floating point number
MsgBox "Received Event: " + fvalue
End Sub
You will get an exception when the COM object source fires the event - you must do either of the following:
MsgBox "Received Event: " & fvalue
or
MsgBox "Received Event: " + CStr(fvalue)
It may be best in either case to use CStr(value); but using & per above comments for string concatenation is almost always best practice.

VB Control Naming by Variable

Dim LineNo as Integer
LineNo = CStr(channel) 'This can have a value of 1 to 100
If LineNo = 1 then
Text1.Text = "Line one selected"
Elseif LineNo = 2 then
Text2.Text = "Line one selected"
'Etc etc
End if
I need to replace the number "1" in Text1.Text and every other TextBox with the value of LineNo? For example:
Text{LineNo}.Text
So I would not have to do a repeated "If" and have a smaller one line code like this:
Text{LineNo}.Text = "Line " & LineNo & " selected"
How would I do this?
Look into a Control array of text boxes. You could have txtLine(), for example, indexed by the channel number.
LineNo = CStr(channel)
txtLine(channel).Text = "Line " & LineNo & " selected"
To create the array, set the Index property of each of the text boxes to an increasing integer, starting at 0.
If you have a finite and relatively small number, you can use a property. I've used this approach with up to 30+ elements in a pinch. Super simple, easy pattern to recognize and replicate in other places. A bit if a pain if the number of elements changes in the future, but extensible nevertheless.
It uses the Choose statement, which takes an index N and returns the Nth element (1-based), hence, the check makes sure that N is > 0 and <= MAX (which you would configure).
Public Property Get TextBox txt(ByVal N As Long)
Const MAX As Long = 10
If N <= 0 || N > MAX Then Exit Property ' Will return a "Nothing". You could return the bound element if you prefer
set txt = Choose(Text1, Text2, Text3, Text4, Text5, Text6, Text7, Text8, Text9, Text10)
End Property
Then, you can simply reference them with the Property, much like an alias:
txt(1).Text = "Line 1 text"
txt(2).Text = "Line 2 text"
If you have an arbitrary number, then you are likely using a control array already, which is simpler because it can be referenced by Index already, so you can directly reference it.
If neither of these work for you and you have a very large number of controls, you can scan the Controls collection in a similar Property, attempting to match ctrl.Name with the pattern of your choice (e.g., matching the first 4 characters to the string "Text", thus matching Text1, Text2, etc, for an unlimited number). Theoretically, this should be future-proofed, but that's just theoretical, because anything can happen. What it does do for you is to encapsulate the lookup in a way that "pretends" to be a control array. Same syntax, just you control the value.
Public Property Get TextBox txt(ByVal N As Long)
Dim I As Long
For I = 0 To Controls.Count - 1 ' Controls is zero-based
' Perform whatever check you need to. Obviously, if you have a "Label" named
' "Text38", the assignment will throw an error (`TextBox = Label` doesn't work).
If Left(Controls(I).Name, 4) = "Text" Then
set txt = Controls(I)
End If
Next
' If you want the Property to never return null, you could uncomment the following line (preventing dereference errors if you insist on the `.Text` for setting/getting the value):
' If txt Is Nothing Then Set txt = Text1
End Property
Use the same way as above: txt(n).Text = "..."

Usage of + operator in differents situations in vbscript

What is the Value of the below in vbscript
1)x=1+"1"
2)x="1"+"1"
3)x=1+"mulla"
Note:In the all above three cases I am using first variable as either string or integer and second on as always as string.
Case 1:Acting as a numeric and auto conversion to numeric during operation
enter code here
y=inputbox("Enter a numeric value","") Rem I am using 1 as input
x=1
msgbox x+y Rem value is 2
msgbox x*y Rem value is 1
Case 2:Acting as a String and no conversion to numeric during operation it fails
enter code here
y=inputbox("Enter a numeric value","") Rem I am using 1 as input
x=1
if y= x then
msgbox "pass"
else
msgbox "fail"
end if
Case 3:Acting as a String and explicit conversion to numeric during operation it passes
enter code here
y=inputbox("Enter a numeric value","") Rem I am using 1 as input
x=1
if Cint(y) = x then
msgbox "pass"
else
msgbox "fail"
end if
I need a logic reason for the different behaviors. but in other language it is straight forward and will work as expected
Reference: Addition Operator (+) (VBScript)
Although you can also use the + operator to concatenate two character strings, you should use the & operator for concatenation to eliminate ambiguity. When you use the + operator, you may not be able to determine whether addition or string concatenation will occur.
The type of the expressions determines the behavior of the + operator in the following way:
If Both expressions are numeric then the result is the addition of both numbers.
If Both expressions are strings then the result is the concatenation of both strings.
If One expression is numeric and the other is a string then an Error: type mismatch will be thrown.
When working with mixed data types it is best to cast your variables into a common data type using a Type Conversion Function.
I agree with most of what #thomas-inzina has said but the OP has asked for a more detailed explanation so here goes.
As #thomas-inzina point's out using + is dangerous when working with strings and can lead to ambiguity depending on how you combine different values.
VBScript is a scripting language and unlike it's big brothers (VB, VBA and VB.Net) it's typeless only (some debate about VB and VBA also being able to be typeless but that's another topic entirely) which means it uses one data type known as Variant. Variant can infer other data types such as Integer, String, DateTime etc which is where the ambiguity can arise.
This means that you can get some unexpected behaviour when using + instead of & as + is not only a concatenation operator when being used with strings but also a addition operator when working with numeric data types.
Dim x: x = 1
Dim y: y = "1"
WScript.Echo x + y
Output:
2
Dim x: x = "1"
Dim y: y = "1"
WScript.Echo x + y
Output:
11
Dim x: x = 1
Dim y: y = 1
WScript.Echo x + y
Output:
2
Dim x: x = 1
Dim y: y = "a"
WScript.Echo x + y
Output:
Microsoft VBScript runtime error (4, 5) : Type mismatch: '[string: "a"]'

Classic ASP InStr() Evaluates True on Empty Comparison String

I ran into an issue with the Classic ASP VbScript InStr() function. As shown below, the second call to InStr() returns 1 when searching for an empty string in a non empty string. I'm curious why this is happening.
' InStr Test
Dim someText : someText = "So say we all"
Dim emptyString : emptyString = ""
'' I expect this to be true
If inStr(1,someText,"so",1) > 0 Then
Response.write ( "I found ""so""<br />" )
End If
'' I expect this to be false
If inStr(1, someText, emptyString, 1) > 0 Then
Response.Write( "I found an empty string<br />" )
End If
EDIT:
Some additional clarification: The reason for the question came up when debugging legacy code and running into a situation like this:
Function Go(value)
If InStr(1, "Option1|Option2|Option3", value, 1) > 0 Then
' Do some stuff
End If
End Function
In some cases function Go() can get called with an empty string. The original developer's intent was not to check whether value was empty, but rather, whether or not value was equal to one of the piped delimited values (Option1,Option2, etc.).
Thinking about this further it makes sense that every string is created from an empty string, and I can understand why a programming language would assume a string with all characters removed still contains the empty string.
What doesn't make sense to me is why programming languages are implementing this. Consider these 2 statements:
InStr("so say we all", "s") '' evaluates to 1
InStr("so say we all", "") '' evaluates to 1
The InStr() function will return the position of the first occurrence of one string within another. In both of the above cases, the result is 1. However, position 1 always contains the character "s", not an empty string. Furthermore, using another string function like Len() or LenB() on an empty string alone will result in 0, indicating a character length of 0.
It seems that there is some inconsistency here. The empty string contained in all strings is not actually a character, but the InStr() function is treating it as one when other string functions are not. I find this to be un-intuitive and un-necessary.
The Empty String is the Identity Element for Strings:
The identity element I (also denoted E, e, or 1) of a group or related
mathematical structure S is the unique element such that Ia=aI=a for
every element a in S. The symbol "E" derives from the German word for
unity, "Einheit." An identity element is also called a unit element.
If you add 0 to a number n the result is n; if you add/concatenate "" to a string s the result is s:
>> WScript.Echo CStr(1 = 1 + 0)
>> WScript.Echo CStr("a" = "a" & "")
>>
True
True
So every String and SubString contains at least one "":
>> s = "abc"
>> For p = 1 To Len(s)
>> WScript.Echo InStr(p, s, "")
>> Next
>>
1
2
3
and Instr() reports that faithfully. The docs even state:
InStr([start, ]string1, string2[, compare])
...
The InStr function returns the following values:
...
string2 is zero-length start
WRT your
However, position 1 always contains the character "s", not an empty
string.
==>
Position 1 always contains the character "s", and therefore an empty
string too.
I'm puzzled why you think this behavior is incorrect. To the extent that asking Does 'abc' contain ''? even makes sense, the answer has to be "yes": All strings contain the empty string as a trivial case. So the answer to your "why is this happening" question is because it's the only sane thing to do.
It is s correct imho. At least it is what I expect that empty string is part of any other string. But maybe this is a philosophical question. ASP does it so, so live with it. Practically speaking, if you need a different behavior write your own Method, InStrNotEmpty or something, which returns false on empty search string.

Can a whose clause be used to filter text element lists such as words, characters and paragraphs

I have the following working example AppleScript snippet:
set str to "This is a string"
set outlist to {}
repeat with wrd in words of str
if wrd contains "is" then set end of outlist to wrd
end repeat
I know the whose clause in AppleScript can often be used to replace repeat loops such as this to significant performance gain. However in the case of text element lists such as words, characters and paragraphs I haven't been able to figure out a way to make this work.
I have tried:
set outlist to words of str whose text contains "is"
This fails with:
error "Can’t get {\"This\", \"is\", \"a\", \"string\"} whose text contains \"is\"." number -1728
, presumably because "text" is not a property of the text class. Looking at the AppleScript Reference for the text class, I see that "quoted form" is a property of the text class, so I half expected this to work:
set outlist to words of str whose quoted form contains "is"
But this also fails, with:
error "Can’t get {\"This\", \"is\", \"a\", \"string\"} whose quoted form contains \"is\"." number -1728
Is there any way to replace such a repeat loop with a whose clause in AppleScript?
From page 534 (working with text) of AppleScript 1-2-3
AppleScript does not consider paragraphs, words, and characters to be
scriptable objects that can be located by using the values of their
properties or elements in searches using a filter reference, or whose
clause.
Here is another approach:
set str to "This is a string"
set outlist to paragraphs of (do shell script "grep -o '\\w*is\\w*' <<< " & quoted form of str)
As #adayzdone has shown. It looks like you are out of luck with that.
But you could try using the offset command like this.
set wrd to "I am here"
set outlist to {}
set str to " This is a word"
if ((offset of space & "is" & space in str) as integer) is greater than 0 then set end of outlist to wrd
Note the spaces around "is" . This makes sure Offset is finding a whole word. Offset will find the first matching "is" in "This" otherwise.
UPDATE.
To use it as the OP wants
set wrd to "I am here"
set outlist to {}
set str to " This is a word"
repeat with wrd in words of str
if ((offset of "is" in wrd) as integer) is greater than 0 then set end of outlist to (wrd as string)
end repeat
-->{"This", "is"}

Resources