Applescript: cleaning a string

Applescript: cleaning a string - applescript

I have this string that has illegal chars that I want to remove but I don't know what kind of chars may be present.
I built a list of chars that I want not to be filtered and I built this script (from another one I found on the web).
on clean_string(TheString)
--Store the current TIDs. To be polite to other scripts.
set previousDelimiter to AppleScript's text item delimiters
set potentialName to TheString
set legalName to {}
set legalCharacters to {"a", "b", "c", "d", "e", "f",
"g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r",
"s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E",
"F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R",
"S", "T", "U", "V", "W", "X", "Y", "Z", "1", "2", "3", "4", "5",
"6", "7", "8", "9", "0", "?", "+", "-", "Ç", "ç", "á", "Á", "é",
"É", "í", "Í", "ó", "Ó", "ú", "Ú", "â", "Â", "ã", "Ã", "ñ", "Ñ",
"õ", "Õ", "à", "À", "è", "È", "ü", "Ü", "ö", "Ö", "!", "$", "%",
"/", "(", ")", "&", "€", "#", "#", "=", "*", "+", "-", ",", ".",
"–", "_", " ", ":", ";", ASCII character 10, ASCII character 13}
--Whatever you want to eliminate.
--Now iterate through the characters checking them.
repeat with thisCharacter in the characters of potentialName
set thisCharacter to thisCharacter as text
if thisCharacter is in legalCharacters then
set the end of legalName to thisCharacter
log (legalName as string)
end if
end repeat
--Make sure that you set the TIDs before making the
--list of characters into a string.
set AppleScript's text item delimiters to ""
--Check the name's length.
if length of legalName is greater than 32 then
set legalName to items 1 thru 32 of legalName as text
else
set legalName to legalName as text
end if
--Restore the current TIDs. To be polite to other scripts.
set AppleScript's text item delimiters to previousDelimiter
return legalName
end clean_string
The problem is that this script is slow as hell and gives me timeout.
What I am doing is checking character by character and comparing against the legalCharacters list. If the character is there, it is fine. If not, ignore.
Is there a fast way to do that?
something like
"look at every char of TheString and remove those that are not on legalCharacters"
?
thanks for any help.

What non-ascii characters are you running into? What is your file encoding?
It's much, much more efficient to use a shell script and tr, sed or perl to process text. All languages are installed by default in OS X.
You can use a shell script with tr (as the example below) to strip returns, and you can also use sed to strip spaces (not in the example below):
set clean_text to do shell script "echo " & quoted form of the_string & "| tr -d '\\r\\n' "
Technical Note TN2065: do shell script in AppleScript
Or, with perl, this will strip non-printing characters:
set x to quoted form of "Sample text. smdm#$%%&"
set y to do shell script "echo " & x & " | perl -pe 's/[^[:alnum:]|[:space:]]//g'"
Search around SO for other examples of using tr, sed and perl to process text with Applescript. Or search MacScripter / AppleScript | Forums

Another Shell script method might be:
set clean_text to do shell script "echo " & quoted form of the_string & "|sed \"s/[^[:alnum:][:space:]]//g\""
that uses sed to delete everything that isn't an alphanumeric character, or space. More regex reference here

Iterating in Applescript is always slow, and there really isn't a faster way around these problems. Logging in loops is an absolutely guaranteed way to slow things down. Use the log command judiciously.
In your specific case, however, you have a length limit, and moving the length check into into the repeat loop will potentially cut the processing time down considerably (just under a second to run in Script Debugger regardless of length of text):
on clean_string(TheString)
set potentialName to TheString
set legalName to {}
set legalCharacters to {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "1", "2", "3", "4", "5", "6", "7", "8", "9", "0", "?", "+", "-", "Ç", "ç", "á", "Á", "é", "É", "í", "Í", "ó", "Ó", "ú", "Ú", "â", "Â", "ã", "Ã", "ñ", "Ñ", "õ", "Õ", "à", "À", "è", "È", "ü", "Ü", "ö", "Ö", "!", "$", "%", "/", "(", ")", "&", "€", "#", "#", "=", "*", "+", "-", ",", ".", "–", "_", " ", ":", ";", ASCII character 10, ASCII character 13}
with timeout of 86400 seconds --86400 seconds = 24 hours
repeat with thisCharacter in the characters of potentialName
set thisCharacter to thisCharacter as text
if thisCharacter is in legalCharacters then
set the end of legalName to thisCharacter
if length of legalName is greater than 32 then
return legalName as text
end if
end if
end repeat
end timeout
return legalName as text
end clean_string

BBEdit or TextWrangler will be much, much faster at this. Download TextWrangler (it's free), then open up your file and run Text -> Zap Gremlins... on it. Does that do what you need? If it does, celebrate with a cold beverage. If not, try out BBEdit (it's not free) and create a new Text Factory with as many "Replace All" conditions as you need, then open up your file and run the Text Factory on it.

Related

Finding and Replacing Text for every item in a list Applescript

I've got an Applescript list of shortcuts and I want to replace some special characters with text.
My current list looks like this:
set hotkeyShortcutList to {"$", "U", "J", "G", "R", "⇧+R", "⇧+Y", "⇧+G", "⇧+B", "⇧+P", "⇧+⌫", "⌃+M", "⌃+W", "⌃+S", "⌃+X", "⌃+C", "⌃+V", "⌃+N", "⇧+⌃+N", "⌃+U", "⌃+B", "⇧+⌃+A", "⌃+A", "⌥+I", "⌥+O", "⇧+⌥+I", "⇧+⌥+O", "⌥+B", "⌥+D", "⌥+S", "⌃+⌥+M", "⌃+⌥+B", "⌃+⌥+X", "⇧+⌃+G", "⇧+⌃+⌥+R", "⇧+⌃+⌥+L", "⌃+Å", "⌃+]", "⇧+⌃+Å", "⇧+⌃+}", "⇧+⌃+M", "⇧+⌃+⌥+!", "⇧+⌃+⌥+#", "⇧+⌃+⌥+£", "⇧+⌃+⌥+$", "⇧+⌃+⌥+%", "⇧+⌃+⌥+^", "⌃+1", "⌃+2", "⌃+3", "⌃+4", "⌃+5", "⌃+6", "⇧+⌃+!", "⇧+⌃+\"", "⇧+⌃+#", "⇧+⌃+€", "⇧+⌃+%", "⇧+⌃+&", "K", "⌃+K", "⌃+V", "⇧+⌃+⌥+K", "A", "Y", "Z", "⇧+⌃+*", "⇧+⌃+⌥+*", "X", "⌃+,", "⌃+.", "⇧+⌃+;", "⇧+⌃+:", "⌃+P", "⇧+⌃+)", "⇧+⌃+?", "⌃++", "Space", "[", "]", "V", "L", "P", "S", "N", "Q", "O", "T", "E", "D", "W", "C", "F"}
I would like to search every item of the list for occurrences of certain characters and replace them in each item.
"⇧" replaced with "shift"
"⌃" replaced with "control"
"⌥" replaced with "option"
"⌘" replaced with "command"
"⌫" replaced with "backspace"
"→" replaced with "arrow right"
and so on,
I've tried to modify the following code repeat with each item of the list, but I can only seem to get it to work as long as I'm working with a String.
on findAndReplaceInText(theText, theSearchString, theReplacementString)
set AppleScript's text item delimiters to theSearchString
set theTextItems to every text item of theText
set AppleScript's text item delimiters to theReplacementString
set theText to theTextItems as string
set AppleScript's text item delimiters to ""
return theText
end findAndReplaceInText
How do I find and replace for every item of the list?

Try the following example AppleScript code:
set hotkeyShortcutList to {"$", "U", "J", "G", "R", "⇧+R", "⇧+Y", "⇧+G", "⇧+B", "⇧+P", "⇧+⌫", "⌃+M", "⌃+W", "⌃+S", "⌃+X", "⌃+C", "⌃+V", "⌃+N", "⇧+⌃+N", "⌃+U", "⌃+B", "⇧+⌃+A", "⌃+A", "⌥+I", "⌥+O", "⇧+⌥+I", "⇧+⌥+O", "⌥+B", "⌥+D", "⌥+S", "⌃+⌥+M", "⌃+⌥+B", "⌃+⌥+X", "⇧+⌃+G", "⇧+⌃+⌥+R", "⇧+⌃+⌥+L", "⌃+Å", "⌃+]", "⇧+⌃+Å", "⇧+⌃+}", "⇧+⌃+M", "⇧+⌃+⌥+!", "⇧+⌃+⌥+#", "⇧+⌃+⌥+£", "⇧+⌃+⌥+$", "⇧+⌃+⌥+%", "⇧+⌃+⌥+^", "⌃+1", "⌃+2", "⌃+3", "⌃+4", "⌃+5", "⌃+6", "⇧+⌃+!", "⇧+⌃+\"", "⇧+⌃+#", "⇧+⌃+€", "⇧+⌃+%", "⇧+⌃+&", "K", "⌃+K", "⌃+V", "⇧+⌃+⌥+K", "A", "Y", "Z", "⇧+⌃+*", "⇧+⌃+⌥+*", "X", "⌃+,", "⌃+.", "⇧+⌃+;", "⇧+⌃+:", "⌃+P", "⇧+⌃+)", "⇧+⌃+?", "⌃++", "Space", "[", "]", "V", "L", "P", "S", "N", "Q", "O", "T", "E", "D", "W", "C", "F"}
set processedShortcutList to {}
repeat with thisShortcut in hotkeyShortcutList
set thisShortcut to my findAndReplaceInText(thisShortcut, "⇧", "shift")
set thisShortcut to my findAndReplaceInText(thisShortcut, "⌃", "control")
set thisShortcut to my findAndReplaceInText(thisShortcut, "⌥", "option")
set thisShortcut to my findAndReplaceInText(thisShortcut, "⌘", "command")
set thisShortcut to my findAndReplaceInText(thisShortcut, "⌫", "backspace")
set thisShortcut to my findAndReplaceInText(thisShortcut, "→", "arrow right")
set end of processedShortcutList to thisShortcut
end repeat
return processedShortcutList
on findAndReplaceInText(theText, theSearchString, theReplacementString)
set AppleScript's text item delimiters to theSearchString
set theTextItems to every text item of theText
log theTextItems
set AppleScript's text item delimiters to theReplacementString
set theText to theTextItems as string
set AppleScript's text item delimiters to ""
return theText
end findAndReplaceInText
Result:
{"$", "U", "J", "G", "R", "shift+R", "shift+Y", "shift+G", "shift+B", "shift+P", "shift+backspace", "control+M", "control+W", "control+S", "control+X", "control+C", "control+V", "control+N", "shift+control+N", "control+U", "control+B", "shift+control+A", "control+A", "option+I", "option+O", "shift+option+I", "shift+option+O", "option+B", "option+D", "option+S", "control+option+M", "control+option+B", "control+option+X", "shift+control+G", "shift+control+option+R", "shift+control+option+L", "control+Å", "control+]", "shift+control+Å", "shift+control+}", "shift+control+M", "shift+control+option+!", "shift+control+option+#", "shift+control+option+£", "shift+control+option+$", "shift+control+option+%", "shift+control+option+^", "control+1", "control+2", "control+3", "control+4", "control+5", "control+6", "shift+control+!", "shift+control+\"", "shift+control+#", "shift+control+€", "shift+control+%", "shift+control+&", "K", "control+K", "control+V", "shift+control+option+K", "A", "Y", "Z", "shift+control+*", "shift+control+option+*", "X", "control+,", "control+.", "shift+control+;", "shift+control+:", "control+P", "shift+control+)", "shift+control+?", "control++", "Space", "[", "]", "V", "L", "P", "S", "N", "Q", "O", "T", "E", "D", "W", "C", "F"}

Regexp ignores some letters

I'm trying to solve Chasing Subs problem. I'm trying to generate that regex according to the input data. The goal is go get all substrings (including overlapped ones) with all unique letters.
I'm trying to use regexp like this:
regexp = /(?=(?<gs>.)(?<gu>[^\k<gs>])(?<gb>[^\k<gs>\k<gu>])(?<gm>[^\k<gs>\k<gu>\k<gb>])(?<ga>[^\k<gs>\k<gu>\k<gb>\k<gm>])(?<gr>[^\k<gs>\k<gu>\k<gb>\k<gm>\k<ga>])(?<gi>[^\k<gs>\k<gu>\k<gb>\k<gm>\k<ga>\k<gr>])(?<gn>[^\k<gs>\k<gu>\k<gb>\k<gm>\k<ga>\k<gr>\k<gi>])(?<ge>[^\k<gs>\k<gu>\k<gb>\k<gm>\k<ga>\k<gr>\k<gi>\k<gn>]))/
"archipelago".scan(regexp) #=> []
"archipelbgo".scan(regexp) #=> []
"brchipelbgo".scan(regexp) #=> []
"zrchipelzgo".scan(regexp) #=> [["z", "r", "c", "h", "i", "p", "e", "l", "z"]]
Why does it behave like this? Why can't it find anything with "b" and "a"? And why does it return only one (incorrect) result with "z"? What am I doing wrong?

I don't think a regular expression is the correct tool for this problem. We could do the following, however.
def substrings(str)
arr = str.chars
(1..str.size).each_with_object([]) { |n,a|
arr.each_cons(n) { |b| a << b.join if b == b.uniq } }
end
substrings("archipelago")
#=> ["a", "r", "c", "h", "i", "p", "e", "l", "a", "g", "o", "ar", "rc", "ch", "hi",
# "ip", "pe", "el", "la", "ag", "go", "arc", "rch", "chi", "hip", "ipe", "pel",
# "ela", "lag", "ago", "arch", "rchi", "chip", "hipe", "ipel", "pela", "elag",
# "lago", "archi", "rchip", "chipe", "hipel", "ipela", "pelag", "elago", "archip",
# "rchipe", "chipel", "hipela", "ipelag", "pelago", "archipe", "rchipel", "chipela",
# "hipelag", "ipelago", "archipel", "rchipela", "chipelag", "hipelago", "rchipelag",
# "chipelago", "rchipelago"]

Generate a range of special characters with ruby

I'm very new to ruby at the moment but I came from a PHP background and must say that I enjoy doing ruby, alot. It's a really nice language and the community is strict but helpful.
Today I was looking at stackoverflow and checked one of my answers to a question to generate a random string using PHP. I had actually written a script for this so I thought, why not share it!
This script has some modifiers which allow you to choose wether you want to include the following sets
lowercase a-z
[1] + uppercase a-z
[1, 2] + numbers
[1, 2, 3] + special characters
[1, 2, 3, 4] + some crazy voodooh characters
So in this PHP script I physically typed each set into an array e.g.:
$charSubSets = array(
'abcdefghijklmnopqrstuvwxyz',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'0123456789',
'!##$%^&*()_+{}|:">?<[]\\\';,.`~',
'µñ©æáßðøäåé®þüúíóö'
);
and this was basically my way of being able to define complexity right there.
Now this looks alright, even in the code but ruby has ranges and ranges are something new and shiny for me to play with so I was thinking of building a random string generator later today just to get some more experience with it.
Now for my question, I know that you can do the following things with a range including:
'a'..'z'
'A'..'Z'
0..9
etc.. But I was thinking, could you also make a range with special characters? as in, only special characters? and if that is possible, would you also be able to do the same for the crazy voodooh?
The reason I'm asking is because there is no example in the docs or anything on SO explaining this.

Check out Range#to_a which is gotten from Enumerable. Note that on the left hand side of the docs it says that Range includes Enumerable, which means that the methods in Enumerable can be called on Ranges. If you can't find a method in a class, see what modules the docs say are included and click on the link to the included module.
Check out Array#shuffle.
Check out Array#join
Check out Array#[], which will take a range as a subscript, so you can take a slice of an array of random characters.
A two dot Range includes the end. A three dot Range doesn't include the end:
p (1...5).to_a #=> [1, 2, 3, 4]
Putting it all together:
chars = (0..9).to_a + ('A'..'z').to_a + ('!'..'?').to_a
10.times do
puts chars.shuffle[0..5].join
end
--output:--
I(m,E.
%_;i(3
rb=_ef
kJrA9n
YA`e.K
89qCji
Ba1x3D
acp)=8
2paq3I
U0>Znm
(Shakespeare will appear there eventually.)

Yes - this is certainly possible.
Fire up your console e.g. irb or pry.
1. for the special characters:
('!'..'?').to_a
# => [
# "!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", "-",
# ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":",
# ";", "<", "=", ">", "?"
# ]
2. for the 'voodooh' characters:
('µ'..'ö').to_a
# => [
# "µ", "¶", "·", "¸", "¹", "º", "»", "¼", "½", "¾", "¿", "À", "Á",
# "Â", "Ã", "Ä", "Å", "Æ", "Ç", "È", "É", "Ê", "Ë", "Ì", "Í", "Î",
# "Ï", "Ð", "Ñ", "Ò", "Ó", "Ô", "Õ", "Ö"
# ]
This is trivial to just try tho, the position (and kb index of the key) on your keyboard for the end special character defines what characters come inbetween, if I'd pick a ~ instead of a ? for the end it would look like this:
('!'..'~').to_a
# => [
# "`", "!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",",
# "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
# ":", ";", "<", "=", ">", "?", "#", "A", "B", "C", "D", "E", "F",
# "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S",
# "T", "U", "V", "W", "X", "Y", "Z", "[", "\\", "]", "^", "_", "a",
# "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n",
# "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "{",
# "|", "}", "~"
# ]
basically if character a is 65 and z is 90 then all characters inbetween like b which is 66 will be included, it works like that for anything you put in a range and since in ruby everything is an object, you can use anything in a range as long as it implements certain methods as explained by the docs!
EDIT (13-11-2015)
After doing some playing around in my console I came to this solution which "mimics" the given PHP example and perhaps even completes it.
def rng(length = 10, complexity = 4)
subsets = [("a".."z"), ("A".."Z"), (0..9), ("!".."?"), ("µ".."ö")]
chars = subsets[0..complexity].map { |subset| subset.to_a }.flatten
# => [
# "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l",
# "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x",
# "y", "z", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J",
# "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V",
# "W", "X", "Y", "Z", 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, "!", "\"",
# "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".",
# "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":",
# ";", "<", "=", ">", "?", "µ", "¶", "·", "¸", "¹", "º", "»",
# "¼", "½", "¾", "¿", "À", "Á", "Â", "Ã", "Ä", "Å", "Æ", "Ç",
# "È", "É", "Ê", "Ë", "Ì", "Í", "Î", "Ï", "Ð", "Ñ", "Ò", "Ó",
# "Ô", "Õ", "Ö"
# ]
chars.sample(length).join
end
Now calling rng will produce results like this:
rng # => "·boÇE»Ñ¼Á¸"
rng(10, 2) # => "nyLYAsxJi9"
rng(20, 2) # => "EOcQdjZa0t36xCN8TkoX"
EDIT#2 (14-05-2020)
As pointed out below in the comments, I did not even provide a documentation link to the relevant concept, in Ruby this is called a Range and can be found here (2.5.0).
If you need docs for your specific version, try googling for ruby range [your ruby version]. You can find out what your version is by running ruby -v in the terminal. Happy rubying :D
all dates are in dd-mm-yyyy format

Regex to check alphanumeric string in ruby

I am trying to validate strings in ruby.
Any string which contains spaces,under scores or any special char should fail validation.
The valid string should contain only chars a-zA-Z0-9
My code looks like.
def validate(string)
regex ="/[^a-zA-Z0-9]$/
if(string =~ regex)
return "true"
else
return "false"
end
I am getting error:
TypeError: type mismatch: String given.
Can anyone please let me know what is the correct way of doing this?

If you are validating a line:
def validate(string)
!string.match(/\A[a-zA-Z0-9]*\z/).nil?
end
No need for return on each.

You can just check if a special character is present in the string.
def validate str
chars = ('a'..'z').to_a + ('A'..'Z').to_a + (0..9).to_a
str.chars.detect {|ch| !chars.include?(ch)}.nil?
end
Result:
irb(main):005:0> validate "hello"
=> true
irb(main):006:0> validate "_90 "
=> false

def alpha_numeric?(char)
if (char =~ /[[:alpha:]]/ || char =~ /[[:digit:]]/)
true
else
false
end
end
OR
def alpha_numeric?(char)
if (char =~ /[[:alnum:]]/)
true
else
false
end
end
We are using regular expressions that match letters & digits:
The above [[:alpha:]] ,[[:digit:]] and [[:alnum:]] are POSIX bracket expressions, and they have the advantage of matching Unicode characters in their category. Hope this helps.
checkout the link below for more options:
Ruby: How to find out if a character is a letter or a digit?

No regex:
def validate(str)
str.count("^a-zA-Z0-9").zero? # ^ means "not"
end

Great answers above but just FYI, your error message is because you started your regex with a double quote ". You'll notice you have an odd number (5) of double quotes in your method.
Additionally, it's likely you want to return true and false as values rather than as quoted strings.

Similar to the very efficient regex-ish approach mentioned already by #steenslag and nearly just as fast:
str.tr("a-zA-Z0-9", "").length.zero?
OR
str.tr("a-zA-Z0-9", "") == 0
One benefit of using tr though is that you could also optionally analyze the results using the same basic formula:
str = "ABCxyz*123$"
rejected_chars = str.tr("a-zA-Z0-9", "")
#=> *$
is_valid = rejected_chars.length.zero?
#=> false

Similar to #rohit89:
VALID_CHARS = [*?a..?z, *?A..?Z, *'0'..'9']
#=> ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
# "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z",
# "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
# "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z",
# "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
def all_valid_chars?(str)
a = str.chars
a == a & VALID_CHARS
end
all_valid_chars?('a9Z3') #=> true
all_valid_chars?('a9 Z3') #=> false

Use .match? in Ruby 2.4+.
Ruby 2.4 introduced a convenient boolean-returning .match? method.
In your case, I would do something like this:
# Checks for any characters other than letters and numbers.
# Returns true if there are none. Returns false if there are one or more.
#
def valid?( string )
!string.match?( /[^a-zA-Z0-9]/ ) # NOTE: ^ inside [] set turns it into a negated set.
end

Ruby string split into words ignoring all special characters: Simpler query

I need a query to be split into words everywhere a non word character is used. For example:
query = "I am a great, boy's and I like! to have: a lot-of-fun and #do$$nice&acti*vities+enjoy good ?times."
Should output:
["I", "am", "a", "great", "", "boy", "s", "and", "I", "like", "", "to", "have", "", "a", "lot", "of", "fun", "and", "", "do", "", "nice", "acti", "vities", "enjoy", "good", "", "times"]
This does the trick but is there a simpler way?
query.split(/[ ,'!:\\#\\$\\&\\*+?.-]/)

query.split(/\W+/)
# => ["I", "am", "a", "great", "boy", "s", "and", "I", "like", "to", "have", "a", "lot", "of", "fun", "and", "do", "nice", "acti", "vities", "enjoy", "good", "times"]
query.scan(/\w+/)
# => ["I", "am", "a", "great", "boy", "s", "and", "I", "like", "to", "have", "a", "lot", "of", "fun", "and", "do", "nice", "acti", "vities", "enjoy", "good", "times"]
This is different from the expected output in that it does not include empty strings.

I am adding this answer as #sawa's did not exactly reproduce the desired output:
#Split using any single non-word character:
query.split(/\W/) #=> ["I", "am", "a", "great", "", "boy", "s", "and", "I", "like", "", "to", "have", "", "a", "lot", "of", "fun", "and", "", "do", "", "nice", "acti", "vities", "enjoy", "good", "", "times"]
Now if you do not want the empty strings in the result just use sawa's answer.
The result above will create many empty strings in the result if the string contains multiple spaces, as each extra spaces will be matched again and create a new splitting point. To avoid that we can add an or condition:
# Split using any number of spaces or a single non-word character:
query.split(/\s+|\W/)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Applescript: cleaning a string - applescript

Another Shell script method might be: set clean_text to do shell script "echo " & quoted form of the_string & "|sed \"s/[^[:alnum:][:space:]]//g\"" that uses sed to delete everything that isn't an alphanumeric character, or space. More regex reference here

Related

Finding and Replacing Text for every item in a list Applescript

Regexp ignores some letters

Generate a range of special characters with ruby

Regex to check alphanumeric string in ruby

Ruby string split into words ignoring all special characters: Simpler query

Categories

Resources