How do I break up a string to single character strings in Scheme? - scheme

My input is a string and I need to be able to analyze it lexically and I'd like to know how to decompose it into single character strings and analyze each character individually.

http://docs.racket-lang.org/reference/strings.html
Example:
> (string->list "Apple")
'(#\A #\p #\p #\l #\e)

Related

What does ? mean in (insert "hello" ?\s "world" ?\n)

The functions insert's example is demonstrated as:
(with-temp-buffer
(insert "hello" ?\s "world" ?\n)
(buffer-string))
What does ? mean here?
?\s and ?\n are read syntax for characters. The first is read syntax for the space character; the second for the character NEWLINE (also called Control J, and which sometimes appears as ^J, where the ^J is a newline character.
In Emacs Lisp characters are positive integers. They can be written as numerals for their ASCII codes (values) and they can be read by the Lisp reader when written that way: 32 (for \s) and 10 (for \n). But Elisp also offers a character read syntax introduced by ?, and these two characters have their own ? syntax. Other, more ordinary characters can just be prefixed by ?: ?s is read as the character s, and ?# is read as the character #.
The ? read syntax is more readable for humans than using just integers: 115 (for ?s) and 64 for ?#. But you can always use just the integers -- characters are integers.
Your example: (insert "hello" ?\s "world" ?\n). Function insert accepts strings and chars as its arguments. In this case you pass string "hello", character ?\s (space char), string "world", and character ?\n (newline char, Control J).
See the Elisp manual, node Basic Char Syntax.

Emacs Lisp: getting ascii value of character

I'd like to translate a character in Emacs to its numeric ascii code, similar to casting char a = 'a'; int i = (int)a in c. I've tried string-to-number and a few other functions, but none seem to make Emacs read the char as a number in the end.
What's the easiest way to do this?
To get the ascii-number which represents the character --as Drew said-- put a question mark before the character and evaluate that expression
?a ==> 97
Number appears in minibuffer, with C-u it's written behind expression.
Also the inverse works
(insert 97) will insert an "a" in the buffer.
BTW In some cases the character should be quoted
?\" will eval to 34
A character is a whole number in Emacs Lisp. There is no separate character data type.
Function string-to-char is built-in, and does what you want. (string-to-char "foo") is equivalent to (aref "foo" 0), which is #abo-abo's answer --- but it is coded in C.
String is an array.
(aref "foo" 0)

String gsub - Replace characters between two elements, but leave surrounding elements

Suppose I have the following string:
mystring = "start/abc123/end"
How can you splice out the abc123 with something else, while leaving the "/start/" and "/end" elements intact?
I had the following to match for the pattern, but it replaces the entire string. I was hoping to just have it replace the abc123 with 123abc.
mystring.gsub(/start\/(.*)\/end/,"123abc") #=> "123abc"
Edit: The characters between the start & end elements can be any combination of alphanumeric characters, I changed my example to reflect this.
You can do it using this character class : [^\/] (all that is not a slash) and lookarounds
mystring.gsub(/(?<=start\/)[^\/]+(?=\/end)/,"7")
For your example, you could perhaps use:
mystring.gsub(/\/(.*?)\//,"/7/")
This will match the two slashes between the string you're replacing and putting them back in the substitution.
Alternatively, you could capture the pieces of the string you want to keep and interpolate them around your replacement, this turns out to be much more readable than lookaheads/lookbehinds:
irb(main):010:0> mystring.gsub(/(start)\/.*\/(end)/, "\\1/7/\\2")
=> "start/7/end"
\\1 and \\2 here refer to the numbered captures inside of your regular expression.
The problem is that you're replacing the entire matched string, "start/8/end", with "7". You need to include the matched characters you want to persist:
mystring.gsub(/start\/(.*)\/end/, "start/7/end")
Alternatively, just match the digits:
mystring.gsub(/\d+/, "7")
You can do this by grouping the start and end elements in the regular expression and then referring to these groups in in the substitution string:
mystring.gsub(/(?<start>start\/).*(?<end>\/end)/, "\\<start>7\\<end>")

How do I match repeated characters?

How do I find repeated characters using a regular expression?
If I have aaabbab, I would like to match only characters which have three repetitions:
aaa
Try string.scan(/((.)\2{2,})/).map(&:first), where string is your string of characters.
The way this works is that it looks for any character and captures it (the dot), then matches repeats of that character (the \2 backreference) 2 or more times (the {2,} range means "anywhere between 2 and infinity times"). Scan will return an array of arrays, so we map the first matches out of it to get the desired results.

ruby parametrized regular expression

I have a string like "{some|words|are|here}" or "{another|set|of|words}"
So in general the string consists of an opening curly bracket,words delimited by a pipe and a closing curly bracket.
What is the most efficient way to get the selected word of that string ?
I would like do something like this:
#my_string = "{this|is|a|test|case}"
#my_string.get_column(0) # => "this"
#my_string.get_column(2) # => "is"
#my_string.get_column(4) # => "case"
What should the method get_column contain ?
So this is the solution I like right now:
class String
def get_column(n)
self =~ /\A\{(?:\w*\|){#{n}}(\w*)(?:\|\w*)*\}\Z/ && $1
end
end
We use a regular expression to make sure that the string is of the correct format, while simultaneously grabbing the correct column.
Explanation of regex:
\A is the beginnning of the string and \Z is the end, so this regex matches the enitre string.
Since curly braces have a special meaning we escape them as \{ and \} to match the curly braces at the beginning and end of the string.
next, we want to skip the first n columns - we don't care about them.
A previous column is some number of letters followed by a vertical bar, so we use the standard \w to match a word-like character (includes numbers and underscore, but why not) and * to match any number of them. Vertical bar has a special meaning, so we have to escape it as \|. Since we want to group this, we enclose it all inside non-capturing parens (?:\w*\|) (the ?: makes it non-capturing).
Now we have n of the previous columns, so we tell the regex to match the column pattern n times using the count regex - just put a number in curly braces after a pattern. We use standard string substition, so we just put in {#{n}} to mean "match the previous pattern exactly n times.
the first non skipped column after that is the one we care about, so we put that in capturing parens: (\w*)
then we skip the rest of the columns, if any exist: (?:\|\w*)*.
Capturing the column puts it into $1, so we return that value if the regex matched. If not, we return nil, since this String has no nth column.
In general, if you wanted to have more than just words in your columns (like "{a phrase or two|don't forget about punctuation!|maybe some longer strings that have\na newline or two?}"), then just replace all the \w in the regex with [^|{}] so you can have each column contain anything except a curly-brace or a vertical bar.
Here's my previous solution
class String
def get_column(n)
raise "not a column string" unless self =~ /\A\{\w*(?:\|\w*)*\}\Z/
self[1 .. -2].split('|')[n]
end
end
We use a similar regex to make sure the String contains a set of columns or raise an error. Then we strip the curly braces from the front and back (using self[1 .. -2] to limit to the substring starting at the first character and ending at the next to last), split the columns using the pipe character (using .split('|') to create an array of columns), and then find the n'th column (using standard Array lookup with [n]).
I just figured as long as I was using the regex to verify the string, I might as well use it to capture the column.

Resources