This question already has answers here:
what is "?" in ruby
(3 answers)
Closed 3 years ago.
I ran across a code snippet today that used the ? operator to quote the next character. I have no idea where the documentation is for this method and really no idea what it's actually doing.
I've looked at the ruby docs, but haven't found it.
?1
=> "1"
?1"23abc"
=> "123abc"
? is not a method in this case but rather a parsable syntax. ? is a character literal in this context
Docs Excerpt:
There is also a character literal notation to represent single character strings, which syntax is a question mark (?) followed by a single character or escape sequence that corresponds to a single codepoint in the script encoding:
?a #=> "a"
?abc #=> SyntaxError
?\n #=> "\n"
?\s #=> " "
?\\ #=> "\\"
?\u{41} #=> "A"
?\C-a #=> "\x01"
?\M-a #=> "\xE1"
?\M-\C-a #=> "\x81"
?\C-\M-a #=> "\x81", same as above
?あ #=> "あ"
You have also found another fun little mechanism of the parser which is 2 Strings can be concatenated together by simply placing them side by side (with or without white space). e.g.
"1" "234"
#=> "1234"
"1""234"
#=> "1234"
Related
This question already has answers here:
Why do two strings separated by space concatenate in Ruby?
(2 answers)
Closed 8 years ago.
How the following line of code concatenates a string in ruby?
2.1.0 :052 > value = "Kamesh" "Waran"
=> "KameshWaran"
I understand that '+' is a method on String class which concatenates the strings passed. How the space(' ') can be the operator/method?
Can anyone elaborate how the space(' ') concatenate strings?
The space is not an operator. This only works for string literals, and is just part of the literal syntax, like the double-quotes. If you have two string literals with nothing but whitespace between them, they get turned into a single string. It's a convention borrowed from later versions of C.
irb(main):001:0> foobar = "foo" "bar"
=> "foobar"
irb(main):002:0> foo="foo"
=> "foo"
irb(main):003:0> bar="bar"
=> "bar"
irb(main):004:0> foo bar
NoMethodError: undefined method `foo' for main:Object
from (irb):4
from /usr/local/var/rbenv/versions/2.1.3/bin/irb:11:in `<main>'
irb(main):005:0>
If you do a search on this site you get an answer.
Found on :
Why do two strings separated by space concatenate in Ruby?
Implementation details can be found in parse.y file in Ruby source
code. Specifically, here.
A Ruby string is either a tCHAR (e.g. ?q), a string1 (e.g. "q", 'q',
or %q{q}), or a recursive definition of the concatenation of string1
and string itself, which results in string expressions like "foo"
"bar", 'foo' "bar" or ?f "oo" 'bar' being concatenated.
I'm writing a Rack app to split hostnames ending with certain prefixes.
For example, the hostname (and port) hello.world.lvh.me:3000 needs to be split into tokens hello.world, .lvh.me and :3000. Additionally, the prefix (hello.world), suffix (.lvh.me) and port (:3000) are all optional.
So far, I have a (Ruby) regex that looks like /(.*)(\.lvh\.me)(\:\d+)?/.
This successfully breaks the hostname into component parts but it falls down when one or more of the optional components is missing, e.g. for hello.world:3000 or lvh.me:3000 or even plain old hello.world.
I've tried adding ? to each group to make them optional (/(.*)?(\.lvh\.me)?(\:(\d+)?/) but this invariably ends up with the first group, (.*), capturing the entire string and stopping there.
My gut feeling is that this is something which might be solved using lookaround but I'll admit this is a totally new realm of regex for me.
You can try with this pattern:
\A(?=[^:])(.+?)??((?:\.|\A)lvh\.me)?(:[0-9]+)?\z
the lookahead (?=[^:]) checks there is at least one character that is not the : (in other words, not the port alone). This means that at least hello.word or lvh.me is present.
The first group is optional and non-greedy ??, this means that it is matched only when needed.
\A and \z are anchors for the start and the end of the string (when ^ and $ are used for the line)
Note that the character class \d matches all unicode digits in Ruby, but in this case you only need ascii digits. It's better to use [0-9]
Note too that \A(?=[^:])((?>[^l:\n.]+|\.|\Bl|l(?!vh\.me\b))*)((?:\.|\A)lvh\.me)?(:[0-9]+)?\z may be more performant.
online demo
Try ^(.*?)?(\.?lvh\.me)?(\:\d+)?$
I added:
a ? to the first group making the * non-greedy
^,$ to anchor it to the start and end.
a ? to the \. before lvh because you want to match lvh.me:3000 not .lvh.me:3000
A Tokenizing Answer
Just for fun, I decided to see if there was a relatively simple way to do what you wanted without a complicated regular expression. The only regular expressions I used were for splitting and validation.
This works for me with your provided corpus, and several variations.
str = 'hello.world.lvh.me:3000'
tokens = str.split /[.:]/
port = tokens.last =~ /\A\d+\z/ ? ?: + tokens.pop : ''
domain = sprintf '.%s.%s', *tokens.pop(2)
prefix = tokens.join ?.
You'll certainly need to check for empty strings in certain cases, but it seems like it might be more straightforward and/or flexible than a pure regex solution. I find it more readable, anyway. If you truly need a single regular expression, though, I'm sure one of the other answers will help you out.
You could try splitting rather than matching,
irb(main):012:0> "hello.world.lvh.me:3000".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["hello.world", "lvh.me", "3000"]
irb(main):013:0> "hello.world:3000".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["hello.world", "3000"]
irb(main):014:0> "lvh.me:3000".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["lvh.me", "3000"]
irb(main):015:0> "hello.world".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["hello.world"]
irb(main):016:0> "hello.world.lvh.me".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["hello.world", "lvh.me"]
Look, ma, no regex!
def split_up(str)
str.sub(':','.:')
.split('.')
.each_slice(2)
.map { |arr| arr.join('.') }
end
split_up("hello.world.lvh.me:3000") #=> ["hello.world", "lvh.me", ":3000"]
split_up("hello.world:3000") #=> ["hello.world", ":3000"]
split_up("hello.world.lvh.me") #=> ["hello.world", "lvh.me"]
split_up("hello.world") #=> ["hello.world"]
split_up("") #=> []
Steps:
str1 = "hello.world.lvh.me:3000" #=> "hello.world.lvh.me:3000"
str2 = str1.sub(':','.:') #=> "hello.world.lvh.me.:3000"
arr = str2.split('.') #=> ["hello", "world", "lvh", "me", ":3000"]
enum = arr.each_slice(2) #=> #<Enumerator: ["hello", "world", "lvh",
# "me", ":3000"]:each_slice(2)>
enum.to_a #=> [["hello", "world"], ["lvh", "me"],
# [":3000"]]
enum.map { |arr| arr.join('.') } #=> ["hello.world", "lvh.me", ":3000"]
Why does this work in Ruby:
"foo" "bar"
# => "foobar"
I'm unsure as to why the strings were concatenated instead of a syntax error being given.
I'm curious as to whether or not this is expected behavior and whether or not it's something the parser is responsible for wrangling (two strings without operators is considered a single string) or the language definition itself is specifying this behavior (implicit concat).
In C and C++, string literals next to each other are concatenated. As these languages influenced Ruby, I'd guess it inherits from there.
And it is documented in Ruby now: see this answer and this page in the Ruby repo which states:
Adjacent string literals are automatically concatenated by the interpreter:
"con" "cat" "en" "at" "ion" #=> "concatenation"
"This string contains "\
"no newlines." #=> "This string contains no newlines."
Any combination of adjacent single-quote, double-quote, percent strings will be concatenated as long as a percent-string is not last.
%q{a} 'b' "c" #=> "abc"
"a" 'b' %q{c} #=> NameError: uninitialized constant q
Implementation details can be found in parse.y file in Ruby source code. Specifically, here.
A Ruby string is either a tCHAR (e.g. ?q), a string1 (e.g. "q", 'q', or %q{q}), or a recursive definition of the concatenation of string1 and string itself, which results in string expressions like "foo" "bar", 'foo' "bar" or ?f "oo" 'bar' being concatenated.
What is the purpose of the question mark operator in Ruby?
Sometimes it appears like this:
assert !product.valid?
sometimes it's in an if construct.
It is a code style convention; it indicates that a method returns a boolean value (true or false) or an object to indicate a true value (or “truthy” value).
The question mark is a valid character at the end of a method name.
https://docs.ruby-lang.org/en/2.0.0/syntax/methods_rdoc.html#label-Method+Names
Also note ? along with a character acts as shorthand for a single-character string literal since Ruby 1.9.
For example:
?F # => is the same as "F"
This is referenced near the bottom of the string literals section of the ruby docs:
There is also a character literal notation to represent single
character strings, which syntax is a question mark (?) followed by a
single character or escape sequence that corresponds to a single
codepoint in the script encoding:
?a #=> "a"
?abc #=> SyntaxError
?\n #=> "\n"
?\s #=> " "
?\\ #=> "\\"
?\u{41} #=> "A"
?\C-a #=> "\x01"
?\M-a #=> "\xE1"
?\M-\C-a #=> "\x81"
?\C-\M-a #=> "\x81", same as above
?あ #=> "あ"
Prior to Ruby 1.9, this returned the ASCII character code of the character. To get the old behavior in modern Ruby, you can use the #ord method:
?F.ord # => will return 70
It's a convention in Ruby that methods that return boolean values end in a question mark. There's no more significance to it than that.
In your example it's just part of the method name. In Ruby you can also use exclamation points in method names!
Another example of question marks in Ruby would be the ternary operator.
customerName == "Fred" ? "Hello Fred" : "Who are you?"
It may be worth pointing out that ?s are only allowed in method names, not variables. In the process of learning Ruby, I assumed that ? designated a boolean return type so I tried adding them to flag variables, leading to errors. This led to me erroneously believing for a while that there was some special syntax involving ?s.
Relevant: Why can't a variable name end with `?` while a method name can?
In your example
product.valid?
Is actually a function call and calls a function named valid?. Certain types of "test for condition"/boolean functions have a question mark as part of the function name by convention.
I believe it's just a convention for things that are boolean. A bit like saying "IsValid".
It's also used in regular expressions, meaning "at most one repetition of the preceding character"
for example the regular expression /hey?/ matches with the strings "he" and "hey".
It's also a common convention to use with the first argument of the test method from Kernel#test
test ?d, "/dev" # directory exists?
# => true
test ?-, "/etc/hosts", "/etc/hosts" # are the files identical
# => true
as seen in this question here
I need to print escaped characters to a binary file using Ruby. The main problem is that slashes need the whole byte to escape correctly, and I don't know/can't create the byte in such a way.
I am creating the hex value with, basically:
'\x' + char
Where char is some 'hex' value, such as 65. In hex, \x65 is the ASCII character 'e'.
Unfortunately, when I puts this sequence to the file, I end up with this:
\\x65
How do I create a hex string with the properly escaped value? I have tried a lot of things, involving single or double quotes, pack, unpack, multiple slashes, etc. I have tried so many different combinations that I feel as though I understand the problem less now then I did when I started.
How?
You may need to set binary mode on your file, and/or use putc.
File.open("foo.tmp", "w") do |f|
f.set_encoding(Encoding::BINARY) # set_encoding is Ruby 1.9
f.binmode # only useful on Windows
f.putc "e".hex
end
Hopefully this can give you some ideas even if you have Ruby <1.9.
Okay, if you want to create a string whose first byte
has the integer value 0x65, use Array#pack
irb> [0x65].pack('U')
#=> "e"
irb> "e"[0]
#=> 101
10110 = 6516, so this works.
If you want to create a literal string whose first byte is '\',
second is 'x', third is '6', and fourth is '5', then just use interpolation:
irb> "\\x#{65}"
#=> "\\x65"
irb> "\\x65".split('')
#=> ["\\", "x", "6", "5"]
If you have the hex value and you want to create a string containing the character corresponding to that hex value, you can do:
irb(main):002:0> '65'.hex.chr
=> "e"
Another option is to use Array#pack; this can be used if you need to convert a list of numbers to a single string:
irb(main):003:0> ['65'.hex].pack("C")
=> "e"
irb(main):004:0> ['66', '6f', '6f'].map {|x| x.hex}.pack("C*")
=> "foo"