What is the difference between << variants in Vagrant heredoc? - ruby

I've seen these examples for Heredocs in Vagrantfiles:
$myscript1 = <<SCRIPT
echo "test <<"
SCRIPT
$myscript2 = <<-SCRIPT
echo "test <<-"
SCRIPT
$myscript3 = <<~SCRIPT
echo "test <<~"
SCRIPT
Could anyone explain with examples what is the difference between these variants?
Are there more variants for inline Heredocs?

From the heredoc documentation pointed out in the comments:
If you are writing a large block of text you may use a “here document” or “heredoc”:
expected_result = <<HEREDOC
This would contain specially formatted text.
That might span many lines
HEREDOC
The heredoc starts on the line following <<HEREDOC and ends with the next line that starts with HEREDOC. The result includes the ending newline.
You may use any identifier with a heredoc, but all-uppercase identifiers are typically used.
You may indent the ending identifier if you place a “-” after <<:
expected_result = <<-INDENTED_HEREDOC
This would contain specially formatted text.
That might span many lines
INDENTED_HEREDOC
Note that the while the closing identifier may be indented, the content is always treated as if it is flush left. If you indent the content those spaces will appear in the output.
To have indented content as well as an indented closing identifier, you can use a “squiggly” heredoc, which uses a “~” instead of a “-” after <<:
expected_result = <<~SQUIGGLY_HEREDOC
This would contain specially formatted text.
That might span many lines
SQUIGGLY_HEREDOC
The indentation of the least-indented line will be removed from each line of the content. Note that empty lines and lines consisting solely of literal tabs and spaces will be ignored for the purposes of determining indentation, but escaped tabs and spaces are considered non-indentation characters.

Related

sort -o appends newline to end of file - why?

I'm working on a small text file with a list of words in it that I want to add a new word to, and then sort. The file doesn't have a newline at the end when I start, but does after the sort. Why? Can I avoid this behavior or is there a way to strip the newline back out?
Example:
words.txt looks like
apple
cookie
salmon
I then run printf "\norange" >> words.txt; sort words.txt -o words.txt
I use printf rather than echo figuring that'll avoid the newline, but the file then reads
apple
cookie
orange
salmon
#newline here
If I just run printf "\norange" >> words.txt orange appears at the bottom of the file, with no newline, ie;
apple
cookie
salmon
orange
This behavior is explicitly defined in the POSIX specification for sort:
The input files shall be text files, except that the sort utility shall add a newline to the end of a file ending with an incomplete last line.
As a UNIX "text file" is only valid if all lines end in newlines, as also defined in the POSIX standard:
Text file - A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the newline character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
Think about what you are asking sort to do.
You are asking it "take all the lines, and sort them in order."
You've given it a file containing four lines, which it splits to the following strings:
"salmon\n"
"cookie\n"
"orange"
It sorts these for you dutifully:
"cookie\n"
"orange"
"salmon\n"
And it then outputs them as a single string:
"cookie
orangesalmon
"
That is almost certainly exactly what you do not want.
So instead, if your file is missing the terminating newline that it should have had, the sort program understands that, most likely, you still intended that last line to be a line, rather than just a fragment of a line. It appends a \n to the string "orange", making it "orange\n". Then it can be sorted properly, without "orange" getting concatenated with whatever line happens to come immediately after it:
"cookie\n"
"orange\n"
"salmon\n"
So when it then outputs them as a single string, it looks a lot better:
"cookie
orange
salmon
"
You could strip the last character off the file, the one from the end of "salmon\n", using a range of handy tools such as awk, sed, perl, php, or even raw bash. This is covered elsewhere, in places like:
How can I remove the last character of a file in unix?
But please don't do that. You'll just cause problems for all other utilities that have to handle your files, like sort. And if you assume that there is no terminating newline in your files, then you will make your code brittle: any part of the toolchain which "fixes" your error (as sort kinda does here) will "break" your code.
Instead, treat text files the way they are meant to be treated in unix: a sequence of "lines" (strings of zero or more non-newline bytes), each followed by a newline.
So newlines are line-terminators, not line-separators.
There is a coding style where prints and echos are done with the newline leading. This is wrong for many reasons, including creating malformed text files, and causing the output of the program to be concatenated with the command prompt. printf "orange\n" is correct style, and also more readable: at a glance someone maintaining your code can tell you're printing the word "orange" and a newline, whereas printf "\norange" looks at first glance like it's printing a backslash and the phrase "no range" with a missing space.

Bash - Removing white space from indented multiline strings

This may be a more general question so sorry in advance. I am creating a script and thought it would be good to use multi-line strings instead of using multiple printf or echo statements. Say I have the following:
while :
do
printf "line 1
line 2
line 3"
done
The second and third lines would be printed with a space in front because of the indentation in the file.
l1
line 2
line 3
Is there a way to prevent that aside from removing the indentation on the code? Also, is it considered a better practice to just multiple printf/echo statements if you need to output information that spans multiple lines?
Indent with tabs (here whitespace) and use a heredoc (with <<-)
cat <<- EOF
line 1
line 2
line 3
EOF
Multi-line strings will always look a bit bad, or have some other downsides, I'm afraid. The most legible way to embed them in bash code is probably the here-doc, which shows the string (almost) exactly like it will look when output. As an extra knack, you can use extra punctuation to make the here-doc delimiter to stand out from the string itself too, like so:
if true
then
some commands
cat <<"____EndOfTextBlock____"
This text here
spans multiple
lines.
____EndOfTextBlock____
some other commands
even more commands
fi

Where can I read more on this use of STRING in ruby `s = <<-STRING ` [duplicate]

I recently used the <<- operator to output a multi-line string, like this:
<<-form
<h1>Name to say hi!</h1>
<form method="post">
<input type="text" name="name">
<input type="submit" value="send">
</form>
form
But I stole the <<- operator from some Open Source code, but I didn't find any documentation on it.
I kinda figured out that it works the same as in bash:
$ cat <<EOF >> form.html
> <h1>Name to say hi!</h1>
> <form method="post">
> <input type="text" name="name">
> <input type="submit" value="send">
> </form>
> EOF
Does it work that way? I just wanna find documentation on it.
From The Ruby Programming Language:
Here Documents
For long string literals, there may be no single character delimiter that can be used without worrying about remembering to escape characters within the literal. Ruby's solution to this problem is to allow you to specify an arbitrary sequence of characters to serve as the delimiter for the string. This kind of literal is borrowed from Unix shell syntax and is historically known as a here document. (Because the document is right here in the source code rather than in an external file.)
Here documents begin with << or <<-. These are followed immediately (no space is allowed, to prevent ambiguity with the left-shift operator) by an identifier or string that specifies the ending delimiter. The text of the string literal begins on the next line and continues until the text of the delimiter appears on a line by itself. For example:
document = <<HERE # This is how we begin a here document
This is a string literal.
It has two lines and abruptly ends...
HERE
The Ruby interpreter gets the contents of a string literal by reading a line at a time from its input. This does not mean, however, that the << must be the last thing on its own line. In fact, after reading the content of a here document, the Ruby interpreter goes back to the line it was on and continues parsing it. The following Ruby code, for example, creates a string by concatenating two here documents and a regular single-quoted string:
greeting = <<HERE + <<THERE + "World"
Hello
HERE
There
THERE
The <<HERE on line 1 causes the interpreter to read lines 2 and 3. And the <<THERE causes the interpreter to read lines 4 and 5. After these lines have been read, the three string literals are concatenated into one.
The ending delimiter of a here document really must appear on a line by itself: no comment may follow the delimiter. If the here document begins with <<, then the delimiter must start at the beginning of the line. If the literal begins with <<- instead, then the delimiter may have whitespace in front of it. The newline at the beginning of a here document is not part of the literal, but the newline at the end of the document is. Therefore, every here document ends with a line terminator, except for an empty here document, which is the same as "":
empty = <<END
END
If you use an unquoted identifier as the terminator, as in the previous examples, then the here document behaves like a double-quoted string for the purposes of interpreting backslash escapes and the # character. If you want to be very, very literal, allowing no escape characters whatsoever, place the delimiter in single quotes. Doing this also allows you to use spaces in your delimiter:
document = <<'THIS IS THE END, MY ONLY FRIEND, THE END'
.
. lots and lots of text goes here
. with no escaping at all.
.
THIS IS THE END, MY ONLY FRIEND, THE END
The single quotes around the delimiter hint that this string literal is like a single-quoted string. In fact, this kind of here document is even stricter. Because the single quote is not a delimiter, there is never a need to escape a single quote with a backslash. And because the backslash is never needed as an escape character, there is never a need to escape the backslash itself. In this kind of here document, therefore, backslashes are simply part of the string literal.
You may also use a double-quoted string literal as the delimiter for a here document. This is the same as using a single identifier, except that it allows spaces within the delimiter:
document = <<-"# # #" # This is the only place we can put a comment
<html><head><title>#{title}</title></head>
<body>
<h1>#{title}</h1>
#{body}
</body>
</html>
# # #
Note that there is no way to include a comment within a here document except on the first line after the << token and before the start of the literal. Of all the # characters in this code, one introduces a comment, three interpolate expressions into the literal, and the rest are the delimiter
http://www.ruby-doc.org/docs/ruby-doc-bundle/Manual/man-1.4/syntax.html#here_doc
This is the Ruby "here document" or heredoc syntax. The addition of the - indicates the indent.
The reason why you cannot find any documentation on the <<- operator is because it isn't an operator. It's literal syntax, like ' or ".
Specifically, it's the here document syntax which is one of the many syntactic forms of string literals in Ruby. Ruby here documents are similar to POSIX sh here documents, but handling of whitespace removal is different: in POSIX sh here documents delimited by <<-, only leading tabs are removed, but they are removed from the contents of the string, whereas in Ruby all leading whitespace is removed, but only from the delimiter.
This post will tell you everything you need to know about the "heredoc" string syntax. In addition, you can view the rubydoc page for string syntax.

Yaml - multi line syntax without delimiter

Is it possible in Yaml to have multi-line syntax for strings without an additional character generated between newlines?
Folded (>) syntax puts spaces, literal syntax (|) puts newlines between lines.
The summary here does not give a solution: In YAML, how do I break a string over multiple lines?.
E.g.
>-
line1_
line2
generates line1<space>line2 - I would like to have line1_line2 without additional token.
Use a double-quoted string:
"line1_\
line2"
By escaping the newline character, it is completely removed instead of being translated into a space. It is not possible to do this with block scalars because they have no escape sequences.

Regex - How can I remove specific characters between strings/delimiters?

This is related to cleaning files before parsing them elsewhere, namely, malformed/ugly CSV. I see plenty of examples for removing/matching all characters between certain strings/characters/delimiters, but I cannot find any for specific strings. Example portion of line would look something like:
","Should now be allowed by rule above "Server - Access" added by Rich"\r
To be clear, this is not the entire line, but the entire line is enclosed in quotes and separated by "," and ends in ^M (Windows newline/carriage return).The 'columns' preceding this would be enclosed at each side by ",". I would probably use this too to remove cruft that appears earlier in the line.
What I am trying to get to is the removal of all double quotes between "," and "\r ("Server - Access" - these ones) without removing the delimiters. Alternatively, I may just find and replace them with \" to delimit them for the Ruby CSV library. So far I have this:
(?<=",").*?(?="\\r)
Which basically matches everything between the delimiters. If I replace .*? with anything, be that a letter, double quotes etc, I get zero matches. What am I doing wrong?
Note: This should be Ruby compatible please.
If I understand you correctly, you can use negative lookahead and lookbehind:
text = '","Should now be allowed by rule above "Server - Access" added by Rich"\r'
puts text.gsub(/(?<!,)"(?![,\\r])/, '\"')
# ","Should now be allowed by rule above \"Server - Access\" added by Rich"\r
Of course, this won't work if the values themselves can contain comas and new lines...

Resources