Semantic differences between percent literals and herdocs in Ruby? - ruby

Looking at some documentation, I saw a multiline string defined using a percent literal:
command %Q{
do this;
do that;
}
In the past, I've always used heredocs when I needed multiline strings:
command <<-heredoc
echo "stuff" | do stuff;
heredoc
What are the semantic differences between them? Is there any reason why I would want to use %Q and not a heredoc?

I tend to evaluate how much text is being used when deciding which to use.
I use %Q when there's not a lot of text (for example, a single line), e.g. %Q|foobar|. The value that %Q provides, is it allows you to easily mix quotes, e.g.
%Q|"Get a Job" ~Mom's words|
I use "heredoc"s when there is a lot of text that spans multiple lines.
For example, suppose you're pasting a lot of text into a REPL (like the content of a YAML file). Unless you traverse the whole file, you can't be certain whether or not you will have a conflict with whatever %Q separator you have chosen. With a "heredoc" you just use some really obscure piece of text that you're fairly certain will not have a conflict, e.g.
<<-BatMobilePrettyObscure
... Lots of text ...
BatMobilePrettyObscure
As far as I know, semantically, there are just a few small differences:
%Q can only use one character to delimit strings
%Q can be multi-line or single-line
"heredoc"s must be Multi-line, with the closing "heredoc" standing alone
%Q delimiters can be "mashed" up against their strings, e.g. %Q|foobar|

There's a funky trick that you can use with heredocs: the first line can be used as if it was a complete string. For example, all of the following examples are valid Ruby code:
puts(<<-EOS)
Hello, world!
EOS
<<-EOS.upcase
Hello, world!
EOS
puts(<<-EOS.upcase)
Hello, world!
EOS
However, you will not find that very often in the wild. Other than that, they are the same as double quoted strings or %Q{} and %{} literals, except that you can choose multi-character delimiters. This comes in handy when all of the possible percent literal delimiters may occur in the string. This especially applies to long strings.

There isn't really a semantic difference, and it doesn't have to do with multiline strings either. All strings can be multiline in Ruby. These are all the same string:
'a
b
'
"a
b
"
%Q{a
b
}
<<-heredoc
a
b
heredoc
The question of which to use is decided by whether you need interpolation and the convenience of escaping characters. For example:
Do you need interpolation? If not then '' or %q()
Will there be lots of quote characters to escape? Then use %Q()
Do you want to write a lot of text without thinking about escaping characters? Use heredocs.

Related

How to paste literal words in Tcl

Is there any syntax trick / feature which would allow me to paste two literal words in TCL, e.g. to concatenate a braced ({..}) word and a double-quoted "...") word into a single one?
I'm not asking about set a {foo}; set b "bar\nquux"; set c $a$b or append a $b -- I know about them; but about something without intermediate variables or commands. Analogous to the {*}word (which turns a word into a list).
I guess that the answer is "no way", but my shallow knowledge of Tcl doesn't allow me to draw such a conclusion.
If you are using a recent Tcl version (8.6.2 or newer) you can use
set c [string cat {foo} "bar\nquux"]
For older versions, you can resort to
set c [format %s%s {foo} "bar\nquux"]
There's no way to do what you're asking for without a command, since the syntax of braced words doesn't permit anything before or afterwards, and once you have several words you need to join them with a command (because that's what commands do from the perspective of Tcl's language core; take some values and produce a value result). Not that having braces in the middle of a string is syntax error — it isn't — but it does stop them being quote characters. To be clear:
puts a{b} prints a{b} because { is not special in that case and instead becomes part of the value.
puts {a}b is a syntax error. (The only exception to this is {*}, which started as {expand} but that was waaaay too wordy.)
Approaches that work:
Use string cat.
Use a concatenation procedure (e.g., proc strcat {a b} {return $a$b}
Put both values inside the braces so it is a combined literal. Which only works if you have both parts being literals, of course.
Convert the braced part to non-braced (and non-double-quoted) form. This is always possible as every braced string has a non-braced equivalent, but can involve a lot of backslashes.
If your word is a valid list, you can do:
set orig {abc def}
set new [join $orig {}]

Escape characters in bash & expect script [duplicate]

I am using Tcl_StringCaseMatch function in C++ code for string pattern matching. Everything works fine until input pattern or string has [] bracket. For example, like:
str1 = pq[0]
pattern = pq[*]
Tcl_StringCaseMatch is not working i.e returning false for above inputs.
How to avoid [] in pattern matching?
The problem is [] are special characters in the pattern matching. You need to escape them using a backslash to have them treated like plain characters
pattern= "pq\\[*\\]"
I don't think this should affect the string as well. The reason for double slashing is you want to pass the backslash itself to the TCL engine.
For the casual reader:
[] have a special meaning in TCL in general, beyond the pattern matching role they take here - "run command" (like `` or $() in shells), but [number] will have no effect, and the brackets are treated normally - thus the string str1 does not need escaping here.
For extra confusion:
TCL will interpret ] with no preceding [ as a normal character by default. I feel that's getting too confusing, and would rather that TCL complains on unbalanced brackets. As OP mentions though, this allows you to forgo the final two backslashes and use "pq\\[*]". I dislike this, and rather make it obvious both are treated normally and not the usual TCL way, but to each her/is own.

Matching an unescaped balanced pair of delimiters

How can I match a balanced pair of delimiters not escaped by backslash (that is itself not escaped by a backslash) (without the need to consider nesting)? For example with backticks, I tried this, but the escaped backtick is not working as escaped.
regex = /(?!<\\)`(.*?)(?!<\\)`/
"hello `how\` are` you"
# => $1: "how\\"
# expected "how\\` are"
And the regex above does not consider a backslash that is escaped by a backslash and is in front of a backtick, but I would like to.
How does StackOverflow do this?
The purpose of this is not much complicated. I have documentation texts, which include the backtick notation for inline code just like StackOverflow, and I want to display that in an HTML file with the inline code decorated with some span material. There would be no nesting, but escaped backticks or escaped backslashes may appear anywhere.
Lookbehind is the first thing everyone thinks of for this kind of problem, but it's the wrong tool, even in flavors like .NET that support unrestricted lookbehinds. You can hack something up, but it's going to be ugly, even in .NET. Here's a better way:
`[^`\\]*(\\.[^`\\]*)*`
The first part starts from the opening delimiter and gobbles up anything that's not the delimiter or a backslash. If the next character is a backslash, it consumes that and the character following it, whatever it may be. It could be the delimiter character, another backslash, or anything else, it doesn't matter.
It repeats those steps as many times as necessary, and when neither [^`\\] nor \\. can match, the next character must be the closing delimiter. Or the end of the string, but I'm assuming the input is well formed. But if it's not well formed, this regex will fail very quickly. I mention that because of this other approach I see a lot:
`(?:[^`\\]+|\\.)*`
This works fine on well-formed input, but what happens if you remove the last backtick from your sample input?
"hello `how\` are you"
According to RegexBuddy, after encountering the first backtick, this regex performed 9,252 distinct operations (or steps) before it could give up and report failure; mine failed in ten steps.
EDIT To extract just the par inside the delimiters, wrap that part in a capturing group. You'll still have to remove the backslashes manually.
`([^`\\]*(?:\\.[^`\\]*)*)`
I also changed the other group to non-capturing, which I should have done from the start. I don't avoid capturing religiously, but if you are using them to capture stuff, any other groups you use should be non-capturing.
EDIT I think I've been reading too much into the question. On StackOverflow, if you want to include literal backticks in an inline-code segment or a comment, you use three backticks as the the delimiter, not just one. Since there's no need to escape backticks, you can ignore backslashes as well. Your regex could turn out to be as simple as this:
```(.*?)```
Dealing with the possibility of false delimiters, you use the same basic technique:
```([^`]*(?:`(?!``)[^`]*)*)```
Is this what you're after?
By the way, this answer doesn't contradict #nneonneo's comment above. This answer doesn't consider the context in which the match is taking place. Is it in the source code of a program or web page? If it is, did the match occur inside a comment or a string literal? How do I even know the first backtick I found wasn't escaped? Regexes don't know anything about the context in which they operate; that's what parsers are for.
If you don't need nesting, regexes can indeed be a proper tool. Lexers of programming languages, for instance, use regexes to tokenize strings, and strings usually allow their own delimiters as an escaped content. Anything more complicated than that will probably need a full-blown parser though.
The "general formula" is to match an escaped character (\\.) or any character that's valid as content but don't need to be escaped ([^{list of invalid chars}]). A "naïve" solution would be joining them with or (|), but for a more efficient variant see #AlanMoore's answer.
The complete example is shown below, in two variants: the first assumes than backslashes should only be used for escaping inside the string, the second assumes that a backslash anywhere in the text escapes the next character.
`((?:\\.|[^`\\])*)`
(?:\\.|[^`\\])*`((?:\\.|[^`\\])*)`
Working examples here and here. However, as #nneonneo commented (and I endorsed), regexes are not meant to do a complete parse, so you'd better keep things simple if you want them to work out right (do you want to find a token in the text, or do you want to delimit it already knowing where it starts? The answer to that question is important to decide which strategy works best for your case).

String#split in Ruby not behaving as expected

File.open(path, 'r').each do |line|
row = line.chomp.split('\t')
puts "#{row[0]}"
end
path is the path of file having content like name, age, profession, hobby
I'm expecting output to be name only but I am getting the whole line.
Why is it so?
The question already has an accepted answer, but it's worth noting what the cause of the original problem was:
This is the problem part:
split('\t')
Ruby has several forms for quoted string, which have differences, usually useful ones.
Quoting from Ruby Programming at wikibooks.org:
...double quotes are designed to
interpret escaped characters such as
new lines and tabs so that they appear
as actual new lines and tabs when the
string is rendered for the user.
Single quotes, however, display the
actual escape sequence, for example
displaying \n instead of a new line.
Read further in the linked article to see the use of %q and %Q strings. Or Google for "ruby string delimiters", or see this SO question.
So '\t' is interpreted as "backslash+t", whereas "\t" is a tab character.
String#split will also take a Regexp, which in this case might remove the ambiguity:
split(/\t/)
Your question was not very clear
split("\n") - if you want to split by lines
split - if you want to split by spaces
and as I can understand, you do not need chomp, because it removes all the "\n"

Which style of Ruby string quoting do you favour?

Which style of Ruby string quoting do you favour? Up until now I've always used 'single quotes' unless the string contains certain escape sequences or interpolation, in which case I obviously have to use "double quotes".
However, is there really any reason not to just use double quoted strings everywhere?
Don't use double quotes if you have to escape them. And don't fall in "single vs double quotes" trap. Ruby has excellent support for arbitrary delimiters for string literals:
Mirror of Site - https://web.archive.org/web/20160310224440/http://rors.org/2008/10/26/dont-escape-in-strings
Original Site -
http://rors.org/2008/10/26/dont-escape-in-strings
I always use single quotes unless I need interpolation.
Why? It looks nicer. When you have a ton of stuff on the screen, lots of single quotes give you less "visual clutter" than lots of double quotes.
I'd like to note that this isn't something I deliberately decided to do, just something that I've 'evolved' over time in trying to achieve nicer looking code.
Occasionally I'll use %q or %Q if I need in-line quotes. I've only ever used heredocs maybe once or twice.
Like many programmers, I try to be as specific as is practical. This means that I try to make the compiler do as little work as possible by having my code as simple as possible. So for strings, I use the simplest method that suffices for my needs for that string.
<<END
For strings containing multiple newlines,
particularly when the string is going to
be output to the screen (and thus formatting
matters), I use heredocs.
END
%q[Because I strongly dislike backslash quoting when unnecessary, I use %Q or %q
for strings containing ' or " characters (usually with square braces, because they
happen to be the easiest to type and least likely to appear in the text inside).]
"For strings needing interpretation, I use %s."%['double quotes']
'For the most common case, needing none of the above, I use single quotes.'
My first simple test of the quality of syntax highlighting provided by a program is to see how well it handles all methods of quoting.
I use single quotes unless I need interpolation. The argument about it being troublesome to change later when you need interpolation swings in the other direction, too: You have to change from double to single when you found that there was a # or a \ in your string that caused an escape you didn't intend.
The advantage of defaulting to single quotes is that, in a codebase which adopts this convention, the quote type acts as a visual cue as to whether to expect interpolated expressions or not. This is even more pronounced when your editor or IDE highlights the two string types differently.
I use %{.....} syntax for multi-line strings.
I usually use double quotes unless I specifically need to disable escaping/interpolation.
I see arguments for both:
For using mostly double quotes:
The github ruby style guideline advocates always using double quotes:
It's easier to search for a string foobar by searching for "foobar" if you were consistent with quoting. However, I'm not. So I search for ['"]foobar['"] turning on regexps.
For using some combination of single double quotes:
Know if you need to look for string interpolation.
Might be slightly faster (although so slight it wasn't enough to affect the github style guide).
I used to use single quotes until I knew I needed interpolation. Then I found that I was wasting a lot of time when I'd go back and have to change some single-quotes to double-quotes. Performance testing showed no measurable speed impact of using double-quotes, so I advocate always using double-quotes.
The only exception is when using sub/gsub with back-references in the replacement string. Then you should use single quotes, since it's simpler.
mystring.gsub( /(fo+)bar/, '\1baz' )
mystring.gsub( /(fo+)bar/, "\\1baz" )
I use single quotes unless I need interpolation, or the string contains single quotes.
However, I just learned the arbitrary delimiter trick from Dejan's answer, and I think it's great. =)
Single quote preserve the characters inside them. But double quotes evaluate and parse them. See the following example:
"Welcome #{#user.name} to App!"
Results:
Welcome Bhojendra to App!
But,
'Welcome #{#user.name} to App!'
Results:
Welcome #{#user.name} to App!

Resources