If string does NOT contain and REGEX - applescript

I'm looking for a way to write the following javascript code in applescript: If the condition is false then I want to do something.
var regEx = /\d{5}/g;
var str = 'This string contains 12345';
if (!regEx.test(str)){
do something
}
Below is the applescript I started but it doesn't work.
set str to 'This string contains 12345'
set regEx to <NOT SURE HOW APPLESCRIPT HANDLES THIS>
if string does contains "12345" then
do something
end if
In Javascript ! = does not. What is the equivalent in applescript? and how do I handle RegEx?
My overall goal is to find out if the finder window selected DOES NOT contain any 5 digit number combination in the folder name.

tl;dr For any version of macOS that is >= OSX 10.8 you'll need to replace grep's -P option (as indicated in the "Solution" section below) with the -E option - as mentioned in the "Different grep utilities" section at the bottom of this post.
As correctly noted in the comments...
Vanilla AppleScript can't handle regex. - vadian
so you'll need to
shell out to something that does know regex - red_menace
Solution:
To meet your requirement with vanilla AppleScript in a way which is analogous to JavaScript's test() method, consider utilizing a custom AppleScript subroutine as follows:
Subroutine:
on regExpTest(str, re)
set statusCode to do shell script "grep -q -P " & quoted form of re & ¬
" <<<" & quoted form of str & " 2>/dev/null; echo $?"
if statusCode is equal to "0" then
return true
else
return false
end if
end regExpTest
Usage:
set regExp to "\\d{5}"
set str to "This string contains 12345"
if regExpTest(str, regExp) then
display dialog "It DOES match so let's do something"
end if
Running the above script will display a dialog with the given message because there is a match between the regular expression and the specified string.
Note: AppleScript strings use the backslash as an escape character, so you'll notice that the \d metacharacter has been further escaped with an additional backslash, i.e. \\d
Inequality operators:
In Javascript != does not. What is the equivalent in applescript? and how do I handle RegEx?
AppleScript's inequality operators that are analogous to JavaScripts inequality operator (!=) are:
≠
is not
isn't
isn't equal [to]
is not equal [to]
doesn't equal
does not equal
So given your JavaScript if statement:
if (!regEx.test(str)){
// do something
}
We can achieve the same logic, (again using the aforementioned custom regExpTest subroutine), with the following code:
set regExp to "\\d{5}"
set str to "This string contains 1234"
if regExpTest(str, regExp) ≠ true then
display dialog "It DOES NOT match so let's do something"
end if
Note The str value only includes four consecutive digits, i.e. 1234.
This time running the above script will display a dialog with the given message because there is NOT a match between the regular expression and the specified string.
There are many variations that can be made to the aforementioned AppleScript if statement to acheieve the same desired logic. For example;
if regExpTest(str, regExp) is not equal to true then
...
end if
if regExpTest(str, regExp) = false then
...
end if
etc...
regExpTest subroutine explanation:
The aforementioned regExpTest AppleScript subroutine is essentially utilizing the do shell script command to run the following code that you would run directly via your macOS Terminal application. For instance in your Terminal application run the following two commands:
grep -q -P "\d{5}" <<<"This string contains 12345" 2>/dev/null; echo $?
Prints:
0
grep -q -P "\d{5}" <<<"This string contains 1234" 2>/dev/null; echo $?
Prints:
1
EDIT: Different grep utilities:
As noted in the comment by user3439894 it seems that some versions of the grep utility installed on Mac do not support the -P option which ensured the RegExp pattern was interpreted as a Perl regular expression. The reason why I opted to utilize a Perl Regular Expression is because they're more closely aligned to the regexp's used in JavaScript.
However, If you run man grep via your command line and discover that your greputility doesn't provide the -P option then change the following line of code in the regExpTest subroutine:
set statusCode to do shell script "grep -q -P " & quoted form of re & ¬
" <<<" & quoted form of str & " 2>/dev/null; echo $?"
to this instead:
set statusCode to do shell script "grep -q -E " & quoted form of re & ¬
" <<<" & quoted form of str & " 2>/dev/null; echo $?"
Note: The -P option has been changed to -E so the pattern is now interpreted as an extended regular expression (ERE) instead.
The shorthand metacharacter \d
You may also find that you need to change the the assignment of the regexp pattern from:
set regExp to "\\d{5}"
to
set regExp to "[0-9]{5}"
This time the shorthand metacharacter \d, (which is used match a digit), has been replaced with the equivalent character class [0-9].

As others have said, you can use the Foundation framework’s NSRegularExpression via the AppleScript-ObjC bridge.
That said, Objective-C APIs, while powerful, aren’t exactly AppleScripter-friendly, so I knocked together some “standard libraries” a few years back that wrapped a lot of that general functionality as nice native AppleScript commands.
e.g. Here’s the nearest equivalent to your JavaScript using the Text library’s search text command:
use script "Text"
set str to "This string contains 12345"
set foundMatches to search text str for "\\d{5}" using pattern matching
if foundMatches is not {} then
-- do something
end if
Couldn’t drum up much interest so I no longer do development or support. But they’re free and open (public domain as far as I’m concerned) and still work fine in the current version of macOS AFAIK, so help yourself.

Related

Search and delete specific line of text in txt file - applescript

I would like to search the contents of a .txt file for a specific line of text and delete only that line from the .txt file.
I want to specify the line of text to find as a variable. For example:
set lineOfTextToDelete to "The quick brown fox jumps over the lazy dog."
Contents before:
Let's say the contents of my TestDelta.txt file is:
This is a a paragraph of text.
This is another line of text.
The quick brown fox jumps over the lazy dog.
Here is another line
Contents after:
The following shows the contents of the TestDelta.txt that I want after running the script. As you can see the string which has been assigned to the lineOfTextToDelete variable, i.e. "The quick brown fox jumps over the lazy dog." has been deleted from the contents of the file.
This is a a paragraph of text.
This is another line of text.
Here is another line
What I've tried so far:
Below is what I've tried, however I'm unsure what I should do next?
set txtfile to "Macintosh HD - Data:Users:crelle:Desktop:TestDelta.txt" as alias
set thisone to read txtfile
set theTextList to paragraphs of thisone
Can anyone help show me what to do?
Here are, in no particular order, a couple of solutions to consider.
Before usage I recommend creating a backup copy of any .txt file that you're going to try them with. These scripts can potentially cause loss of valuable data if not used carefully.
If you have any concerns regarding assignment of the correct filepath to either;
The txtFilePath variable in Solution A
The txtFilePath property in Solution B
then replace either of those lines with the following. This will enable you to choose the file instead.
set txtFilePath to (choose file)
Solution A: Shell out from AppleScript and utilize SED (Stream EDitor)
on removeMatchingLinesFromFile(findStr, filePath)
set findStr to do shell script "sed 's/[^^]/[&]/g; s/\\^/\\\\^/g' <<<" & quoted form of findStr
do shell script "sed -i '' '/^" & findStr & "$/d' " & quoted form of (POSIX path of filePath)
end removeMatchingLinesFromFile
set txtFilePath to "Macintosh HD - Data:Users:crelle:Desktop:TestDelta.txt"
set lineOfTextToDelete to "The quick brown fox jumps over the lazy dog."
removeMatchingLinesFromFile(lineOfTextToDelete, txtFilePath)
Explanation:
The arbitrarily named removeMatchingLinesFromFile subroutine / function contains the tasks necessary to meet your requirement. It lists two parameters; findStr and filePath. In its body we "shell out" twice to sh by utilizing AppleScript's do shell script command.
Let's understand what's happening here in more detail:
The first line that reads;
set findStr to do shell script "sed 's/[^^]/[&]/g; s/\\^/\\\\^/g' <<<" & quoted form of findStr
executes a sed command. The purpose of this command is to escape any potential Basic Regular Expression (BRE) metacharacters that may exist in the given line of text that we want to delete. Utlimately it ensures each character in the given string is treated as a literal when used in the subsequent sed command - thus negating any "special meaning" the metacharacter has.
Refer to this answer for further explanation. Essentially it does the following:
Every character except ^ is placed in its own character set [...] expression to treat it as a literal.
Note that ^ is the one char. you cannot represent as [^], because it has special meaning in that location (negation).
Then, ^ chars. are escaped as \^.
Note that you cannot just escape every char by putting a \ in front of it because that can turn a literal char into a metachar, e.g. \< and \b are word boundaries in some tools, \n is a newline, \{ is the start of a RE interval like \{1,3\}, etc.
Credit for this SED pattern goes to Ed Morton and mklement0.
So, given that the string assigned to the variable named lineOfTextToDelete is:
The quick brown fox jumps over the lazy dog.
we actually end up assigning the following string to the findStr variable after it has been parsed via the sed command:
[T][h][e][ ][q][u][i][c][k][ ][b][r][o][w][n][ ][f][o][x][ ][j][u][m][p][s][ ][o][v][e][r][ ][t][h][e][ ][l][a][z][y][ ][d][o][g][.]
As you can see each character is wrapped in opening and closing square brackets, i.e. [], to form a series of bracket expressions.
To further demonstrate what's happening; launch your Terminal application and run the following compound command:
sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"The quick brown fox jumps over the lazy dog."
Note When running the aforementioned compound command directly via the Terminal the sed pattern contains less backslashes (\) in comparison to the pattern specified in the AppleScript. This is because AppleScript strings require any backslash to be escaped with an additional backslash.
The second line reading;
do shell script "sed -i '' '/^" & findStr & "$/d' " & quoted form of (POSIX path of filePath)
executes another sed command via the shell. This performs the task of finding all instances of the given line of text in the file and deletes it/them.
The -i option specifies that the file is to be edited in-place, and requires a following empty string argument ('') when using the BSD version of sed that ships with macOS.
The '/^" & findStr & "$/d' part is the pattern that we provide to sed.
The ^ metacharacter matches the null string at beginning of the pattern space - it essentially means start matching the subsequent regexp pattern only if it exists at the beginning of the line.
The Applescript findStr variable is the result we obtained via the previous sed command. It is concatenated with the preceding pattern part using the & operator.
The $ metacharacter refers to the end of pattern space, i.e. the end of the line.
The d is the delete command.
The & quoted form of (POSIX path of filePath) part utilizes AppleScript's POSIX path property to transform your specified HFS path, i.e.
Macintosh HD - Data:Users:crelle:Desktop:TestDelta.txt
to the following POSIX-style path:
/Macintosh HD - Data/Users/crelle/Desktop/TestDelta.txt
The quoted form property ensures correct quoting of the POSIX-style path. For example, it ensures any space character(s) in the given pathname are interpreted correctly by the shell.
Again, to further demonstrate what's happening; launch your Terminal application and run the following compound command:
sed -i '' '/^[T][h][e][ ][q][u][i][c][k][ ][b][r][o][w][n][ ][f][o][x][ ][j][u][m][p][s][ ][o][v][e][r][ ][t][h][e][ ][l][a][z][y][ ][d][o][g][.]$/d' ~/Desktop/TestDelta.txt
Let's understand how to use the aforementioned removeMatchingLinesFromFile function:
Firstly we assign the same HFS path that you specified in your question to the arbitrarily named txtFilePath variable:
set txtFilePath to "Macintosh HD - Data:Users:crelle:Desktop:TestDelta.txt"
Next we assign the line of text that we want to find and delete to the arbitrarily named lineOfTextToDelete variable:
set lineOfTextToDelete to "The quick brown fox jumps over the lazy dog."
Finally we invoke the custom removeMatchingLinesFromFile function, passing in two required arguments namely; lineOfTextToDelete and txtFilePath:
removeMatchingLinesFromFile(lineOfTextToDelete, txtFilePath)
Solution B: Using vanilla AppleScript without SED:
This solution provided below does not utilize the shell, nor SED, and produces the same desired result as per Solution A.
property lineOfTextToDelete : "The quick brown fox jumps over the lazy dog."
property txtFilePath : alias "Macintosh HD - Data:Users:crelle:Desktop:TestDelta.txt"
removeMatchingLinesFromFile(lineOfTextToDelete, txtFilePath)
on removeMatchingLinesFromFile(findStr, filePath)
set paraList to {}
repeat with aLine in getLinesFromFile(filePath)
if contents of aLine is not findStr then set paraList to paraList & aLine
end repeat
set newContent to transformListToText(paraList, "\n")
replaceFileContents(newContent, filePath)
end removeMatchingLinesFromFile
on getLinesFromFile(filePath)
if (get eof of filePath) is 0 then return {}
try
set paraList to paragraphs of (read filePath)
on error errorMssg number errorNumber
error errorMssg & errorNumber & ": " & POSIX path of filePath
end try
return paraList
end getLinesFromFile
on transformListToText(ListOfStrings, delimiter)
set {tids, text item delimiters} to {text item delimiters, delimiter}
set content to ListOfStrings as string
set text item delimiters to tids
return content
end transformListToText
on replaceFileContents(content, filePath)
try
set readableFile to open for access filePath with write permission
set eof of readableFile to 0
write content to readableFile starting at eof
close access readableFile
return true
on error errorMssg number errorNumber
try
close access filePath
end try
error errorMssg & errorNumber & ": " & POSIX path of filePath
end try
end replaceFileContents
Explanation:
I'll keep this explanation brief as the code itself is probably easier to comprehend than Solution A.
The removeMatchingLinesFromFile subroutine essentially performs the following with the aid of additional helper functions:
read's the contents of the given .txt file via the getLinesFromFile function and return's a list. Each item in the returned list holds each line/paragraph of text found in the .txt file content.
We then loop through each item (i.e. each line of text) via a repeat statement. If the contents of each item does not equal the given line of text to find we store it in another list, i.e. the list assigned to the paraList variable.
Next, the list assigned to the paraList variable is passed to the transformListToText function along with a newline (\n) delimiter. The transformListToText function returns a new string.
Finally, via the replaceFileContents function, we open for access the original .txt file and overwrite its contents with the newly constructed content.
Important note applicable to either solution: When specifying the line of text that you want to delete, (i.e. the string that is assigned to the lineOfTextToDelete variable), ensure each and every backslash \ that you may want to search for is escaped with another one. For example; if the line that you want to search for contains a single backslash \ then escape it to become two \\. Similarly if the line that you want to search for contains two consecutive backslashes \\ then escape each one to become four \\\\, and so on.

Sed "invalid command code <", when using sed within Applescript

I am trying to use Sed to do string replacement within my Applescript.
I am using the following command:
set selected_text to do shell script "echo " & "\"" & selected_text & "\"" & " | sed /'s/<\\(.*\\)>/\\1/'"
and I am being met with the following error:
The action “Run AppleScript” encountered an error: “sed: 1: "/s/<\(.*\)>/\1/": invalid command code <”
Expected input:
FirstName LastName <FirstName.LastName#email.com>
Expected output:
FirstName.LastName#email.com
Does anyone have any suggestions?
the better (easier) way is to use quoted form of with using do shell script. AppleScript escapes the text for you.
The slash in front of the regex was wrong, without it the error disappeared, but there was still an error in your regex. With your sed you want to replace things in your input, so you have to use an input regex matching your whole input. The parentheses then defines the part you want to keep. The first part is to find all items until <, the second is the email address you want to keep inside parentheses and finally the closing >.
set selected_text to do shell script "echo " & quoted form of selected_text & " | sed 's/[^<]*<\\(.*\\)>/\\1/'"
Best, Michael / Hamburg

What does this variable assignment do?

I'm having to code a subversion hook script, and I found a few examples online, mostly python and perl. I found one or two shell scripts (bash) as well. I am confused by a line and am sorry this is so basic a question.
FILTER=".(sh|SH|exe|EXE|bat|BAT)$"
The script later uses this to perform a test, such as (assume EXT=ex):
if [[ "$FILTER" == *"$EXT"* ]]; then blah
My problem is the above test is true. However, I'm not asking you to assist in writing the script, just explaining the initial assignment of FILTER. I don't understand that line.
Editing in a closer example FILTER line. Of course the script, as written does not work, because 'ex' returns true, and not just 'exe'. My problem here is only, however, that I don't understant the layout of the variable assignment itself.
Why is there a period at the beginning? ".(sh..."
Why is there a dollar sign at the end? "...BAT)$"
Why are there pipes between each pattern? "sh|SH|exe"
You probably looking for something as next:
FILTER="\.(sh|SH|exe|EXE|bat|BAT)$"
for EXT
do
if [[ "$EXT" =~ $FILTER ]];
then
echo $EXT extension disallowed
else
echo $EXT is allowed
fi
done
save it to myscript.sh and run it as
myscript.sh bash ba.sh
and will get
bash is allowed
ba.sh extension disallowed
If you don't escape the "dot", e.g. with the FILTER=".(sh|SH|exe|EXE|bat|BAT)$" you will get
bash extension disallowed
ba.sh extension disallowed
What is (of course) wrong.
For the questions:
Why is there a period at the beginning? ".(sh..."
Because you want match .sh (as extension) and not for example bash (without the dot). And therefore the . must be escaped, like \. because the . in regex mean "any character.
Why is there a dollar sign at the end? "...BAT)$"
The $ mean = end of string. You want match file.sh and not file.sh.jpg. The .sh should be at the end of string.
Why are there pipes between each pattern? "sh|SH|exe"
In the rexex, the (...|...|...) construction delimites the "alternatives". As you sure quessed.
You really need read some "regex tutorial" - it is more complicated - and can't be explained in one answer.
Ps: NEVER use UPPERCASE variable names, they can collide with environment variables.
This just assigns a string to FILTER; the contents of that string have no special meaning. When you try to match it against the pattern *ex*, the result is true assuming that the value of $FILTER consists the string ex surrounded by anything on either side. This is true; ex is a substring of exe.
FILTER=".(sh|SH|exe|EXE|bat|BAT)$"
^^
|
+---- here is the "ex" from the pattern.
As I can this is similar to regular expression pattern:
In regular expressions the string start with can be show with ^, similarly in this case . represent seems doing that.
In the bracket you have exact string, which represents what the exact file extensions would be matched, they are 'Or' by using the '|'.
And at the end the expression should only pick the string will '$' or end point and not more than.
I would say that way original author might have looked at it and implemented it.

Script to convert lower case characters into upper case is working differently as service action

I am trying a simple script as a service action in automator which performs this function:
Receives selected text in any application and replaces selected text
with the text containing capital letters
So I used this script:
on run {input, parameters}
set upperCaseString to ""
repeat with i in input
if (ASCII number i) > 96 and (ASCII number i) < 123 then
set upperCaseString to upperCaseString & (ASCII character ((ASCII number i) - 32))
else
set upperCaseString to upperCaseString & (ASCII character (ASCII number i))
end if
end repeat
return upperCaseString
end run
But I found this problem:
It was returning first letter of input as an upper case letter, eg.
input - lowercasetext, output - L, whereas the expected output was -
LOWERCASETEXT.
To check the problem I added this line of code in repeat loop:
display dialog i
and found that it is displaying complete text in place of single character at a time ,ie. in place of displaying l.. o.. w.. in lowercasetext it is displaying lowercasetext at once.
Can anyone suggest me why is it bugging me as service action while it is working fine in Apple Script Editor?
This works for a lot of languages:
on toUpper(s)
tell AppleScript to return do shell script "shopt -u xpg_echo; export LANG='" & user locale of (system info) & ".UTF-8'; echo " & quoted form of s & " | tr [:lower:] [:upper:]"
end toUpper
on toLower(s)
tell AppleScript to return do shell script "shopt -u xpg_echo; export LANG='" & user locale of (system info) & ".UTF-8'; echo " & quoted form of s & " | tr [:upper:] [:lower:]"
end toLower
When I run your script, I get the correct result. But one thing you may want to do is to explicitly coerce your result to text. The easiest way to do that would be at the end:
return upperCaseString as text
That may or may not do it for you, but you'll avoid a lot of frustration if you explicitly coerce data when there is a possibility of ambiguity.
Another (faster) way is to leverage the Unix tr (translate) command the via do shell script:
set upperCaseString to ¬
(do shell script ("echo " & input & " | tr a-z A-Z;"))
That's enough for 'English' language, but you can also add diacritical translation, like so
set upperCaseString to ¬
(do shell script ("echo " & input & " | tr a-zäáà A-ZÄÁÀ;"))
tr will translate anything to anything, so you can add any characters you may encounter and what you'd like them to translate to. A 'leet-speak' translator comes to mind.
You will get the same result in the AppleScript Editor if the input variable is set to a list. The input parameter of an Automator action is also a list, so your comparison isn't doing what you think. Note that text id's have obsoleted ASCII character and ASCII number commands - see the 10.5 AppleScript Release notes.
#Matt Strange:
You could also try:
set upperCaseString to ¬
do shell script "echo " & input & " | tr [:lower:] [:upper:]"
If you run 'man tr' on 'OS X 10.10' you may see that the character classes [:lower:] and [:upper:] should be used instead of explicit character ranges like 'a-z' or 'A-Z', since these may not produce correct results as it is explained there, on the manual page.

Ruby string containing ${...}

In the Ruby string :
"${0} ${1} ${2:hello}"
is ${i} the ith argument in the command that called this particular file.
Tried searching the web for "Ruby ${0}" however the search engines don't like non-alphanumeric characters.
Consulted a Ruby book which says #{...} will substitute the results of the code in the braces, however this does not mention ${...}, is this a special syntax to substitute argvalues into a string, thanks very much,
Joel
As mentioned above ${0} will do nothing special, $0 gives the name of the script, $1 gives the first match from a regular expression.
To interpolate a command line argument you'd normally do this:
puts "first argument = #{ARGV[0]}"
However, ARGV is also aliased as $* so you could also write
puts "first argument = #{$*[0]}"
Perhaps that's where the confusion arose?

Resources