How to search from specific starting location in AppleScript - applescript

I'm trying to search for a string in a very long bit of text. Normally I'd do something along these lines:
set testString to "These aren't the droids you're looking for. Now we have a ridiculously large amount of text. These ARE the DROIDS you're looking for."
set searchTerm to "droids"
set searchTermLength to count of characters in searchTerm
# Gets string from first appearance of searchTerm
set testStringSearch to characters 19 thru -1 of testString as text
# Finds location of next appearance of searchTerm
set testLocation to offset of searchTerm in testStringSearch
# Returns next location of searchTerm
set theTest to characters testLocation thru (testLocation + searchTermLength) of testStringSearch as text
return theTest
However, the amount of text is so large (120k+ characters) that when I try to set testStringSearch, it hangs for a while.
Since I'm going to be creating a loop where it returns each location of searchTerm, I would like to avoid that lost time, if possible. Is there something I'm missing?

Your biggest bottleneck is when you strip off the beginning of the string:
set testStringSearch to characters 19 thru -1 of testString as text
Assuming an average word length of 5 characters, this is creating a list of almost 600,000 characters, then turning that list back into text.
Your best bet would be to turn the string into data you can work with upfront and use that data for the rest of the script. As an example, you could split the string on the target search word and use the remaining string lengths to create a list of offsets:
set offsets to allOffsets("A sample string", "sample")
--> {3}
on allOffsets(str, target)
set splitString to my explode(str, target)
set offsets to {}
set compensation to 0
set targetLength to length of target
repeat with i from 1 to ((count splitString) - 1)
set currentStringLength to ((length of item i of splitString))
set end of offsets to currentStringLength + compensation + 1
set compensation to compensation + currentStringLength + targetLength
end repeat
return offsets
end allOffsets
on explode(theText, theDelim)
set AppleScript's text item delimiters to theDelim
set theList to text items of theText
set AppleScript's text item delimiters to ""
return theList
end explode
As you can see, to get the current offset, you're taking the length of the string + 1, then in the compensation variable, you're keeping track of the length of all previous strings you have already processed.
Performance
I did find that performance is directly linked to how many occurrences are found in the string. My test data was made up of 20,000 words from a Lorem Ipsum generator.
Run 1:
Target: "lor"
Found: 141 Occurrences
Time: 0.01 seconds
Run 2:
Target: "e"
Found: 6,271 Occurrences
Time: 1.97 seconds
Run 3:
Target: "xor"
Found: 0 Occurrences
Time: 0.00 seconds

Related

Apply noBreak AppleScript Illustrator

My goal here is to apply the no break parameter of Illustrator with AppleScript to two words in a text frame.
I'm able to detect the non-breaking space in a string. Then I need to apply the no break parameter to the word after and before the character 202 as no break space isn't supported by Illustrator
Open this Scriplet in your Editor:
set ourText to "Hello my friend Jon<non-breaking-space>Doe."
set findThis to (ASCII character 202)
set MW to words of ourText
repeat with aWord in MW
if findThis is in aWord then
set myWord to aWord
exit repeat
end if
end repeat
myWord
--> display: Jon Doe
Then I would like to search in the text frame for "Jon Doe" apply the no break parameter. I tried manually in Illustrator, this would work.
Your script doesn’t work because you are building a list of words. Spaces (including no-break spaces) are word delimiters, so they are not in your word list (MW).
It will work if we use the no-break space as text item delimiter:
use scripting additions
set theResult to {}
set ourText to "Hello my friends Jon Doe, Jane Doe and Joe Doe!" # Each name contains a no-break space
set findThis to character id 160 # Decimal notation of U+00A0 (no-break space)
set saveTID to AppleScript's text item delimiters
set AppleScript's text item delimiters to findThis # The no-break space
set countTextItems to count of text items of ourText
if countTextItems > 1 then
repeat with i from 1 to countTextItems - 1
set end of theResult to word -1 of text item i of ourText & findThis & word 1 of text item (i + 1) of ourText
end repeat
else
set theResult to "[no character id " & id of findThis & " found]"
end if
set AppleScript's text item delimiters to linefeed
display dialog theResult as text
set AppleScript's text item delimiters to saveTID
Input (ourText):
Hello my friends Jon[noBreakSpace]Doe, Jane[noBreakSpace]Doe and Joe[noBreakSpace]Doe!
Output:
Please note that this will fail in cases like
my friends J.[noBreakSpace]Doe
because we are using word inside the repeat loop.
If you often have cases like these then replace word -1 and word 1 with text -1 and text 1. The output then will only contain the two characters around the spaces, but for searching purposes this is still enough.

Filter clipboard and save as data list

I'm trying to get an AppleScript to find somme keyword from the clipboard and list them in a new clipboard
e.g : I copy this line in the clipboard "order KAFGEFEF price 999 date 17 order KADFSDGS price 874 date 18"`
and the result will be
K1AFGE2FEF
K1ADFSD2GS
or even beter
K1AFGE2FEF : 999
K1ADFSD2GS : 17
the data I want to collect always start with "K1...." and have 10 characteres.
I actually had a old Javascript which are kind of doing the trick but I need to use AppleScript instead.
I really not sure where to start here, maybe I shoud start something around pbcopy and egrep ?
hope that's make sense.
Kind regards.
It is not clear from your question exactly how your clipboard data is structured or what your desired output is. For starters, here is an Applescript solution that will extract order, price, and date values from the clipboard. It assumes that order, price, and date are always grouped together in that specific order, and that there can be multiple order-price-date groups in a single line of text on the clipboard. For example:
order K1AFGE2FEF price 999 date 17 order K1ADFSD2GS price 874 date 18
Then the following Applescript will extract each order, price, and date triplet and save it as a three-item sublist in a master list:
set masterList to {}
set tid to AppleScript's text item delimiters
try
set AppleScript's text item delimiters to "order "
repeat with i in (get (the clipboard)'s text items 2 thru -1)
tell i's contents
try
set currOrder to first word
set AppleScript's text item delimiters to "price "
set currPrice to (get its text item 2)'s first word
set AppleScript's text item delimiters to "date "
set currDate to (get its text item 2)'s first word
if (currOrder starts with "K1") and (currOrder's length = 10) then set end of masterList to {currOrder, currPrice, currDate}
end try
end tell
end repeat
end try
set AppleScript's text item delimiters to tid
return masterList -- {{"K1AFGE2FEF", "999", "17"}, {"K1ADFSD2GS", "874", "18"}}
The master list can then be processed further into whatever output you desire.

I want the first three words of a paragraph as a string with spaces between the words

Is there a way, say using a space char as the string delimiter, set a string to the first three words of a paragraph including spaces.
For example
set a to "This is my test string"
set b to words 1 thru to 3 of a
set c to words 1 thru to 3 of a as rich text
return {b,c}
Returns {{"This","is","my"},"Thisismy"}
I want to set a variable so in this case of a, it would be set to "This is my".
First let's explain what happens. words 1 thru 3 of a as rich text is getting a range of words as an list. Then as rich text (which should be as string) coerces the list into an string. When you coerce an list (or record) to an string AppleScript will use an separator called text item delimiter. By default it is set to "". This means there is no separator (delimiter) used and the words are glued together. But let's see what happens when we set temporarily the text item delimiters to space.
set a to "This is my test string"
set b to words 1 thru 3 of a
set {oldTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, space}
set c to words 1 thru 3 of a as string
set AppleScript's text item delimiters to oldTID
return {b, c}
now it returns {{"This", "is", "my"}, "This is my"}

Find text from a file and set it as a variable in applescript?

I am trying to build a script that sends me updates and notifications from cex.io. Please keep on reading below, so I may guide you until the point I have trouble with.
The first simple script in this operation goes to cex.io's trading page for BTC/GHS. It records ands saves the text to a file every 4 seconds. It works great. It doesn't need to have safari refresh because the site pushes info to the browser live.
repeat
set the webpage_content to ""
tell application "Safari" to set the webpage_content to the text of document 1
set theText to webpage_content
set a to "Macintosh HD:Users:PRIVATE:Desktop:CEX:"
set theFile to (open for access file ((a) & "CEXRaw") with write permission)
write theText to theFile
close access theFile
delay 4
end repeat
-
And it returns this in a main file every 4 seconds: (note I cut off a chunk from the bottom and the top of the file, because they are unimportant)
GHS:
0.05233439
BTC:
0.00000223
NMC:
0.00002939
LTC:
0.00000000
GHS/BTC
0.02362958 LTC/BTC
0.02438131 NMC/BTC
0.00597565 GHS/NMC
3.96951800 BF1/BTC
1.67000000 Fund Account
GHS/BTC
Last price:
0.02362958
Daily change:
-0.00018042
Today's open:
0.02381000
24h volume:
73812.35539255
-
I now need an applescript to read that file, and return wanted values. But I'm lost on how to write it.
It needs to find the number under BTC, and set it as a variable.
It needs to find the number under GHS, and set it as a variable.
It needs to find the number under Last Price, and set it as a variable.
If anyone could script that really quick for me, or tell me how to do it, that would be amazing. Thank you so much!
Well, if those values will always be in the same paragraph counts, you could just pull them by line number.
set theCEXRaw to read file "Macintosh HD:Users:PRIVATE:Desktop:CEX:CEXRaw"
set theGHS to paragraph 2 of theCEXRaw
set theBTC to paragraph 4 of theCEXRaw
set thePRICE to paragraph 17 of theCEXRaw
You'd need to adjust the paragraph numbers. But, assuming that paragraph numbers aren't reliably consistent, in pure Applescript, you'd use Applescript's Text Item Delimiters.
set theCEXRaw to read file "Macintosh HD:Users:PRIVATE:Desktop:CEX:CEXRaw"
set AppleScript's text item delimiters to {("GHS:
"), ("BTC:
"), ("Last price:
")}
set theCEXRaw to text items of theCEXRaw
set theGHS to paragraph 1 of item 2 of theCEXRaw
set theBTC to paragraph 1 of item 3 of theCEXRaw
set thePRICE to paragraph 1 of item 4 of theCEXRaw
Note that the three delims include a return character inside the quotes. You will want to capture the old delimiter first, so you can restore it, and hopefully you can do the setting of the delimiter outside your repeat loop to save juice.
You could also use do shell script with sed or grep to strip each value.
You could get those values using the offset which searches a string for a substring and returns it's character position.
eg, set pos to the offset of "world" in "hello world" -- returns 7
Here is a solution that uses this principal to find your values and convert them into the Applescript floating point type Number
property line_delimiter : linefeed -- OR return OR return & linefeed pending your data
set results to "GHS:
0.05233439
BTC:
0.00000223
NMC:
0.00002939
LTC:
0.00000000
Last price:
0.02362958"
processCEX(results)
on processCEX(in_text)
set btc_val to searchNumberValueLine("BTC:" & line_delimiter, in_text)
set ghs_val to searchNumberValueLine("GHS:" & line_delimiter, in_text)
set last_price_val to searchNumberValueLine("Last price:" & line_delimiter, in_text)
display dialog ("btc = " & btc_val & return & "ghs = " & ghs_val & return & " last price = " & last_price_val)
end processCEX
on searchNumberValueLine(key_name, input_str)
set start_index to the offset of key_name in input_str
if (start_index is not 0) then
set input_str to text (start_index + ((length of key_name))) thru -1 of input_str
set end_index to the offset of line_delimiter in input_str
if (end_index is 0) then
return input_str as number
else
return (text 1 thru (end_index - 1) of input_str) as number
end if
else
return -1
end if
end searchNumberValueLine
Also i'd recommend against writing to a text file if you don't need to, to avoid any file io issues when reading the same file from a different script, given you are modifying it every 4 seconds.
You could change your code to something like this:
repeat
set the webpage_content to ""
tell application "Safari" to set the webpage_content to the text of document 1
processCEX(webpage_content)
delay 4
end repeat

How to count the number of space-delimited substrings in a string

Dim str as String
str = "30 40 50 60"
I want to count the number of substrings.
Expected Output: 4
(because there are 4 total values: 30, 40, 50, 60)
How can I accomplish this in VB6?
You could try this:
arrStr = Split(str, " ")
strCnt = UBound(arrStr) + 1
msgBox strCnt
Of course, if you've got Option Explicit set (which you should..) then declare the variables above first..
Your request doesn't make any sense. A string is a sequence of text. The fact that that sequence of text contains numbers separated by spaces is quite irrelevant. Your string looks like this:
30 40 50 60
There are not 4 separate values, there is only one value, shown above—a single string.
You could also view the string as containing 11 individual characters, so it could be argued that the "count" of the string would be 11, but this doesn't get you any further towards your goal.
In order to get the result that you expect, you need to split the string into multiple strings at each space, producing 4 separate strings, each containing a 2-digit numeric value.
Of course, the real question is why you're storing this value in a string in the first place. If they're numeric values, you should store them in an array (for example, an array of Integers). Then you can easily obtain the number of elements in the array using the LBound() and UBound() functions.
I agree with everything Cody stated.
If you really wanted to you could loop through the string character by character and count the number of times you find your delimiter. In your example, it is space delimited, so you would simply count the number of spaces and add 1, but as Cody stated, those are not separate values..
Are you trying to parse text here or what? Regardless, I think what you really need to do is store your data into an array. Make your life easier, not more difficult.

Resources