CL Format recipe: Dealing with nil as a value - format

I've been looking through formatting recipes , and I can't quite find what I'm looking for...
(format nil CONTROL-STRING day name num-apples)
Suppose I don't want to change the arguments in the above form, just CONTROL-STRING.
day and num-apples will always be non-nil, but name might be nil.
When name is nil, I want the output to look like
"Today is Monday. Hello, you have 3 apples."
but when name is defined, I want it to look like
"Today is Monday. Hello Adam, you have 3 apples."
So the control string needs to look at name, use it in the non-nil case, not use it in the nil case, but consume it in both cases.
Maybe it can be accomplished by consuming nil and printing it as ""? If so I don't know how to do that.

The question you linked to, Lisp format directive that interprets nil argument to empty string instead of "NIL", does include an answer that shows how you can do this, but doesn't cite any of the documentation. Since you're generating English text, there are also a few other options that you might want to consider as well.
First, with ~#[consequent~], you can process the consequent format directive just in the case that the argument is non-nil, and the argument to ~#[ isn't consumed, so it's still available. In general,
22.3.7.2 Tilde Left-Bracket: Conditional Expression describes lots of options, but about ~#[ it says:
~#[consequent~] tests the argument. If it is true, then the argument
is not used up by the ~[ command but remains as the next one to be
processed, and the one clause consequent is processed. If the arg is
false, then the argument is used up, and the clause is not processed.
The clause therefore should normally use exactly one argument, and may
expect it to be non-nil.
You can use this as follows:
(defun test (day name n-apples)
(format nil "Today is ~a. Hello~#[ ~a~], you have ~a apples."
day name n-apples))
CL-USER> (test 'monday 'adam 2)
"Today is MONDAY. Hello ADAM, you have 2 apples."
CL-USER> (test 'tuesday nil 42)
"Today is TUESDAY. Hello, you have 42 apples."
To make this even more robust, you should consider using ~p for pluralization, so that you get "1 apple" and "3 apples".
(defun test (day name n-apples)
(format nil "Today is ~a. Hello~#[ ~a~], you have ~a apple~:P."
day name n-apples))
CL-USER> (test 'monday 'john 2)
"Today is MONDAY. Hello JOHN, you have 2 apples."
CL-USER> (test 'tuesday 'john 1)
"Today is TUESDAY. Hello JOHN, you have 1 apple."
CL-USER> (test 'wednesday nil 0)
"Today is WEDNESDAY. Hello, you have 0 apples."
Finally, since you're generating text, you might appreciate some case normalization (e.g., print proper nouns with initial capitals), and writing the numbers in text:
(defun test (day name n-apples)
(format nil "Today is ~:(~a~). Hello~#[ ~:(~a~)~], you have ~r apple~:P."
day name n-apples))
CL-USER> (list
(test 'monday 'adam 4)
(test 'tuesday 'john 1)
(test 'wednesday 'mary\ sue 42)
(test 'thursday 'jim-bob 0))
("Today is Monday. Hello Adam, you have four apples."
"Today is Tuesday. Hello John, you have one apple."
"Today is Wednesday. Hello Mary Sue, you have forty-two apples."
"Today is Thursday. Hello Jim-Bob, you have zero apples.")

Related

Why does `puts(nil or 4)` fail in Ruby?

When I do:
puts(nil or 4)
Ruby complains:
SyntaxError: syntax error, unexpected keyword_or, expecting ')'
Why is that? puts(nil || 4) does work, but I wonder why or doesn't. I thought the difference between the two was only in their operator precedence.
(I know the expression nil or 4 doesn't seem useful, as it always returns 4. It's just an example, for simplicity's sake. My actual expression is Integer(ENV['WD'] or 4).)
Short answer
Because that's how ruby syntax is.
Longer answer
and/or keywords were designed to be used in control flow constructs. Consider this example:
def die(msg)
puts "Exited with: #{msg}"
end
def do_something_with(arg)
puts arg
end
do_something_with 'foo' or die 'unknown error'
# >> foo
# >> Exited with: unknown error
Here or works nicely with ruby's optional parentheses, because of ruby parsing rules (pseudo-BNF).
In short, an argument list (CALL_ARGS) is a list of ARG, separated by comma. Now, most anything is an ARG (class definitions, for example, through being a PRIMARY), but not an unadorned EXPR. If you surround an expression with parentheses, then it'll match a rule for "compound statement" and, therefore, will be a PRIMARY, which is an ARG. What this means is that
puts( (nil or 4) ) # will work, compound statement as first argument
puts (nil or 4) # same as above, omitted optional method call parentheses
puts(nil or 4) # will not work, EXPR can't be an argument
puts nil or 4 # will work as `puts(nil) or 4`
You can read the grammar referenced above to understand exactly how it works.
BONUS: Example of class definition being a valid ARG
puts class Foo
def bar
puts "hello"
end
end, 'second argument'
# >> bar # this is the "value" of the class definition
# >> second argument
It is because or and and have lower precedence than method call. Your expression is interpreted as:
{puts(nil} or {4)}
where {} stands for grouping. The syntax error comes from the expression
puts(nil
(and the following will also raise a syntax error):
4)
If you force grouping by putting a pair of parentheses around the expression, then it will work the way you intended:
puts((nil or 4))
Notice that the outer pair of parentheses is used for method call, not grouping, hence just having one pair of parentheses has no effect of changing the grouping.
Alternatively, if you disambiguate a single pair of parentheses to be used for grouping by putting a space, then that will work too:
puts (nil or 4)
#Sergio Tulentsev (and #sawa) gave a good answer, but I want to rephrase it so I can understand it quickly in the future:
Ruby lets us drop parenthesis in function calls. That is, instead of:
func1(ARG, ARG, ARG) or func2(ARG, ARG, ARG)
We can do:
func1 ARG, ARG, ARG or func2 ARG, ARG, ARG
However, in order to make this last line behave like the first one, "or" can't be an operator used in the top-level of an ARG (otherwise that last line will be interpreted as func1(ARG, ARG, ARG or func2 ARG, ARG, ARG)). Indeed, when we look in the BNF we see that ARG doesn't directly mention "or"/"and" (which means it's illegal there).
But ARG still makes it possible to use "or": by wrapping the expression in parentheses. In the BNF we see this as the PRIMARY alternative that ARG can branch to (as PRIMARY, in its turn, branches to '(' COMPSTMT ')').
Now, as to why func (1 or 2) and func((1 or 2)) work whereas func(1 or 2) doesn't:
func(1 or 2) is what the BNF calls FUNCTION, which expands to OPERATION ['(' [CALL_ARGS] ')'], which means the ARG is "1 or 2", but, as we've seen, ARG can't contain "or", so it's invalid.
func((1 or 2)) is, again, OPERATION ['(' [CALL_ARGS] ')'], but here the the ARG is "(1 or 2)", which is a valid ARG (see PRIMARY mentioned above).
func (1 or 2) is what the BNF calls COMMAND, which expands to OPERATION CALL_ARGS, which means the ARG is "(1 or 2)", which is a valid ARG (see PRIMARY mentioned above).

Why does the output from my map/regex block not capitalize?

I'm working through the Test First Ruby Master problems. My code for 08/book_titles is this:
class Book
attr_accessor :title
def title
if #title.include?(' ')
correct = #title.split.each_with_index.map {|x, index| ((x =~ /^a|an|of|the|or|in|and$/) && index != 0) ? x : x.capitalize}
correct.join(' ')
# this is throwing a weird error, the code looks right but isn't capitalizing last word (returns 'To Kill a mockingbird')
else #title.capitalize
end
end
end
I tested the map portion separately, and it works fine. But in the entirety of the problem, it does not capitalize as it should be. It throws an rspec error:
1) Book title should capitalize every word except... articles a
Failure/Error: expect(#book.title).to eq("To Kill a Mockingbird")
expected: "To Kill a Mockingbird"
got: "To Kill a mockingbird"
Anyone know why?
I originally didn't include ^/$ in the regex. I got the same error with a different title, and adding those anchors fixed it for that case. But then the error showed up again with the title.
Because mockingbird contains in
('mockingbird' =~ /^a|an|of|the|or|in|and$/) => 4
I think you want this regex:
/^a$|^an$|^of$|^the$|^or$|^in$|^and$/
It is not necessary to break the string into words, modify the words and join them back into a string. In fact, doing that has the disadvantage that spacing between words may be altered. Here's one way of operating on the string directly.
wee_words = ["a", "an", "of", "the", "or", "in", "and"]
str = "a dAy in the life of waltEr mITTY"
str.capitalize.gsub(/\w+/) { |s| wee_words.include?(s) ? s : s.capitalize }
#=> "A Day in the Life of Walter Mitty"
str.capitalize upcases the first letter of the string and downcases all subsequent letters. As a result, the first word will never be treated as a wee_word, since it is capitalized (e.g., wee_words.include?("The") #=> false).
The regex is slightly incorrect. The way to read it as it is can be done this way:
Match any string that
starts with 'a'
or contains 'an'
or contains 'of'
or contains 'the'
or contains 'or'
or contains 'in'
or ends in 'and'
What you really seem to want is something that reads like this:
Match any string that
only contains any of 'a', 'an', 'of', 'the', 'or', 'in', 'and'
To get this, you want your regex to be written like this:
/^(a|an|of|the|or|in|and)$/
Note the parenthesis around the alternation. (Alternation is the formal term for multiple choices in a regex, where choices are separated by '|').
If you're comparing against book or movie titles, this is much closer to the type of match you'd expect. It will match correctly for titles such as "Chariots of Fire" and "Benny and Joon", but not against falsely the 'in' of "To Kill a Mockingbird", which is a significant improvement.
However, it still won't quite work yet on something like "Benny AND Joon", because 'AND' is uppercase in this title (assuming that incoming titles may be arbitrarily mixed case). One last change will do it:
/^(a|an|of|the|or|in|and)$/i
That last letter 'i' at the end of the regex says to 'ignore case', so that matches can occur regardless of whether the 'AND' is uppercase, lowercase, or mixed case.
This should get you close to what you're trying to achieve and handle a few bumpy use cases in the process.

Why does single quote in Lisp always return upper case?

I'd like to be able to set case from a single quote, but that does not seem possible.
(format nil "The value is: ~a" 'foo)
"The value is: FOO"
(format nil "The value is: ~a" 'FOO)
"The value is: FOO"
(format nil "The value is: ~a" "Foo")
"The value is: Foo"
Quoting
The quote has nothing to do with case. A quote prevents evaluation.
quoting a symbol:
CL-USER 1 > 'foo
FOO
quoting a list:
CL-USER 2 > '(1 2 3 foo)
(1 2 3 FOO)
You can put a quote in front of many things. For example in front of a string:
CL-USER 3 > '"a b c"
"a b c"
Since strings evaluate to themselves, quoting them or not makes no difference:
CL-USER 4 > "a b c"
"a b c"
Symbols are by default read as uppercase:
CL-USER 5 > 'FooBar
FOOBAR
CL-USER 6 > (symbol-name 'FooBar)
"FOOBAR"
But that has nothing to do with quoting and is a feature of the reader.
CL-USER 7 > (read-from-string "foo")
FOO
3
Downcase
If you want the string in lowercase, you need to convert the string to lowercase:
CL-USER 8 > (string-downcase (symbol-name 'FooBar))
"foobar"
Symbols with mixed case
But you can create symbols with lowercase names or mixed case. You need to escape them:
CL-USER 9 > '|This is a symbol With spaces and mixed case|
|This is a symbol With spaces and mixed case|
CL-USER 10 > 'F\o\oB\a\r
|FooBar|
Downcasing output using FORMAT
You can also tell FORMAT to print in lowercase:
CL-USER 11 > (format nil "The value is: ~(~a~)" 'foo)
"The value is: foo"
'foo means "suppress the evaluation of the symbol FOO, leaving only the symbol FOO". Common Lisp tends towards upcasing symbol names by default (so the symbols expressed as 'foo, 'Foo and 'FOO are all the same symbol, with the symbol name "FOO").
To see exactly what your implementation will do, you can check the readtable case of the current readtable see CLHS, ch 23.1.2, effect of readtable case by calling (readtabe-case *readtable*).
Some lisp implementations will start with the readtable-case as :preserve.
As for if you should use symbols or strings, it's one of those "it depends". If you're not worried about cse preservation, using interned symbols gives you less storage and quicker comparison, at the (possible) price of case-mangling. But if case is important, the balance is probably further towards the "use strings throughout" end of the scale.
For understanding what's going on, see Rainer Joswigs answer. Just one thing to add: You can control the case of a printed symbol (without vertical bar syntax) with *print-case*:
CL-USER 1 > (let ((*print-case* :downcase))
(format nil "The value is: ~a" 'foo))
"The value is: foo"
CL-USER 2 > (let ((*print-case* :capitalize))
(format nil "The value is: ~a" 'foo))
"The value is: Foo"
CL-USER 3 > (let ((*print-case* :upcase)) ; default
(format nil "The value is: ~a" 'foo))
"The value is: FOO"
Lisp symbols, like 'a, are case insensitive. You can check that by doing...
(eq 'a 'A)
They are the same symbol.
If you need to differentiate case, you should use strings or characters, as appropriate.
In order to make format print symbols in a particular case, you can set the *print-case* global variable to :upcase, :downcase or :capitalize depending on your needs.
OK, this works:
(format nil "The value is: ~a" (string-downcase 'foo))
"The value is: foo"
Even better (from Rainer)
(format nil "The value is: ~(~a~)" 'foo)
I am still thinking one shouldn't use 'foo instead of "foo" if the intent is to represent a string.

Ruby extract substring from an array of strings

I have an array of strings.
irb(main):009:0* str_arr
=> ["hello how are you?", "I am fine.What are you doing?", "Hey, I am having a haircut. See you at Hotel KingsMen at 10 am."]
And I am trying to extract some info from it. The name of Hotel and the time.
irb(main):010:0> q = str_arr[2].scan(/(.*)Hotel(.*)at(.*)\./)
=> [["Hey, I am having a haircut. See you at ", " KingsMen ", " 10 am"]]
The thing is I cannot fix the index at 2. I need something like this:
irb(main):023:0> str_arr.each { |str| $res = str.scan(/(.*)Hotel(.*)at(.*)\./) }
=> ["hello how are you?", "I am fine.What are you doing?", "Hey, I am having a haircut. See you at Hotel KingsMen at 10 am."]
irb(main):024:0> $res
=> [["Hey, I am having a haircut. See you at ", " KingsMen ", " 10 am"]]
But I don't want to use global variable. Any suggestions to improve my code?
s = ["hello how are you?", "I am fine.What are you doing?", "Hey, I am having a haircut. See you at Hotel KingsMen at 10 am."]
s.join.scan(/Hotel\s(.+)?\sat\s(.+)?\./).flatten
#=> ["KingsMen", "10 am"]
Regex description:
\s - any whitespace character,
. - any character, .+ - one or more of any character, () - capture everything inside, so (.+) - capture one or more characters
a? means zero or one of a
Here is your array:
arr = ["hello how are you?",
"I am fine. What are you doing?",
"Hey, I am having a haircut. See you at Hotel KingsMen at 10 am."]
The first step is to join the elements into a string. I've chosen to use a space for the separator, but you could use something else:
str = arr.join(' ')
#=> "hello how...doing? Hey,...haircut. See you at Hotel KingsMen at 10 am."
Without loss of generality, let's suppose this string were one of the following:
str1 = "See you at Hotel KingsMen at 10 am."
str2 = "See you at 10:15am at Kingsmen hotel on Bloor Street."
Which hotel?
Let's first look at how to get the name of the hotel. We want a method that will work with both of these strings. We assume that the name of the hotel is just two words, with one of those words being "hotel", but we don't know which of the two words comes first, and we allow "hotel" to begin with a capital or lowercase letter.
We see in str1 that it could be "at Hotel" or "Hotel KingsMen", and in str2 it could be "Kingston hotel" or "hotel on". The correct result is obtained by making the reasonable assumption that the word other than "hotel" is capitalized.
Here's one way to do it:
def hotel(str)
str[/\b[hH]otel\s+\K[A-Z][a-zA-Z]*|[A-Z][a-zA-Z]*(?=\s[Hh]otel\b)/]
end
hotel(str1) #=> "KingsMen"
hotel(str2) #=> "Kingsmen"
Here:
\b is a (zero-width) word break
\K means match what comes before but do not include it in the match that is returned.
| means match what comes before or what comes after.
(?=\s[Hh]otel\b) is a ("zero-width") positive lookahead, which indicates what must immediately follow what comes before, but is not part of the match.
What time?
Here we must make an assumption about the way time is expressed. Should "noon", "1100 hours" and "14:21" be possibilites? OK, this is just an exercise, so let's assume that it's a 12-hour clock with hours and possibly minutes, but no seconds.
We could use the following regex to extract that information:
def time(str)
str[/\b(?:1[012]|[1-9])(?::[0-5]{2})?\s?(?:[ap]m?)/i]
end
time(str1) #=> "10 am"
time(str2) #=> "10:15am"
Here:
(?:...) is a non-capture group, which is part of the match.
1[012]|[1-9] says to match a) 1 followed by a 0, 1 or 2, or (|) b) one digit between 1 and 9.
the second colon in (?::...) indicates that a match beginning with a colon is to be made in another non-capture group).
[0-5]{2} means to match two ({2}) characters, each a digit between 0 and 5.
i in /i means to disregard case.
Suppose now we had:
str3 = "I'm leaving at 9:30 am, so I'll see you at Hotel KingsMen at 10 am."
We want to select "10 am" rather than "9:30 am". For that we need additional assumptions. For example, we might assume that the time is preceded by the word "at" and that "at" appears immediately after the name of the hotel:
Hotel KingsMen at 10am
or
Kingsmen hotel at 10:15 am
We could use a fairly complex regex to extract the time here, or we could first find the hotel name and it's location in the string, then look for the time immediately after.
In case you want to keep your current solution and do not want to use a global variable, then I suggest you use 'reduce' method :
str = ["hello how are you?", "I am fine.What are you doing?", "Hey, I am having a haircut. See you at Hotel KingsMen at 10 am."]
str.reduce([]) do |res, s|
res == [] ? s.scan(/(.*)Hotel(.*)at(.*)\./) : res
end
# => [["Hey, I am having a haircut. See you at ", " KingsMen ", " 10 am"]]
IMO, this makes the temporary variable used to hold and find the result as local as possible.
You could also use select method like this
[
"hello how are you?", "I am fine.What are you doing?",
"Hey, I am having a haircut. See you at Hotel KingsMen at 10 am."
].select{|str| str =~ /Hotel\s(.+)?\sat\s(.+)?\./}
#=> ["Hey, I am having a haircut. See you at Hotel KingsMen at 10 am."]

What is the % doing while handling strings in Ruby

I was going through some piece of code when I found this line
if ('%{test}' % {:test => 'replaced'} == 'replaced')
# If this works, we are all good to go.
Why is '%{test}' % {:test => 'replaced'} returning value "replaced"? What exactly is % doing over here?
That is doing "interpolation". The value "replaced" of the key :test of the hash is inserted to the %{test} position in the original string '%{test}'.
The % can take a string, array, or hash depending on the need. When you only have a single slot in the template as in this case, it is better to use %s and pass a string like
"%s" % "replaced"
With the particular example, it is not useful.
It becomes useful when you want to replace a part of a string.
For example, if you want to generate a series of strings:
"Hello World", "Hey World", "Bye World"
you can have a template string s = "%s World" and interpolate things into it like
s % "Hello"
s % "Hey"
s % "Bye"

Resources