How gsub got its receiver when invoked on command line in Ruby? - ruby

I'm now reading Programming Ruby 1.9&2.0 by Dave Thomas, in which there is the following command line script:
$ ruby -pi.bak -e "gsub(/Perl/, 'Ruby')" *.txt
I know from this text that -p option places program code within the loop while; ...; print; end, and regular expressions match against $_ within -e scripts. And I found that gsub is identical to $_.gsub within the -e script. But how gsub got its receiver object, are there any explicit rule describing it?

There is no explicit rule describing it because it works like everywhere else in Ruby and has nothing to do with the -p flag.
gsub gets sent to the main object, because it is the default receiver here, and, like you noted, there is no explicit receiver defined.
Ruby has two different gsub methods. The one in String that you were probably thinking of, and the one in Kernel that is the answer to your question. Kernel gets included by Object, main is an instance of Object.
From the Kernel#gsub documentation:
Equivalent to $_.gsub..., except that $_ will be updated if substitution occurs. Available only when -p/-n command line option specified.
$_ is the "The last string read by gets or readline in the current scope."

Related

Combine two expression in Bash

I did check the ABS, but it was hard to find a reference to my problem/question there.
Here it is. Consider the following code (Which extracts the first character of OtherVar and then converts MyVar to uppercase):
OtherVar=foobar
MyChar=${OtherVar:0:1} # get first character of OtherVar string variable
MyChar=${MyChar^} # first character to upper case
Could I somehow condense the second and third line into one statement?
P.S.: As was pointed out below, not needs to have a named variable. I should add, I would like to not add any sub-shells or so and would also accept a somehow hacky way to achieve the desired result.
P.P.S.: The question is purely educational.
You could do it all-in-one without forking sub-shell or running external command:
printf -v MyChar %1s "${OtherVar^}"
Or:
read -n1 MyChar <<<"${OtherVar^}"
Another option:
declare -u MyChar=${OtherVar:0:1}
But I can't see the point in such optimization in a bash script.
There are more suitable text processing interpreters, like awk, sed, even perl or python if performance matters.
You could use the cut command and put it in a complex expression to get it on one line, but I'm not sure it makes the code too much clearer:
OtherVar=foobar
MyChar=$(echo ${OtherVar^} | cut -c1-1) # uppercase first character and cut string

How does : <<'END' work in bash to create a multi-line comment block?

I found a great answer for how to comment in bash script (by #sunny256):
#!/bin/bash
echo before comment
: <<'END'
bla bla
blurfl
END
echo after comment
The ' and ' around the END delimiter are important, otherwise things inside the block like for example $(command) will be parsed and executed.
This may be ugly, but it works and I'm keen to know what it means. Can anybody explain it simply? I did already find an explanation for : that it is no-op or true. But it does not make sense to me to call no-op or true anyway....
I'm afraid this explanation is less "simple" and more "thorough", but here we go.
The goal of a comment is to be text that is not interpreted or executed as code.
Originally, the UNIX shell did not have a comment syntax per se. It did, however, have the null command : (once an actual binary program on disk, /bin/:), which ignores its arguments and does nothing but indicate successful execution to the calling shell. Effectively, it's a synonym for true that looks like punctuation instead of a word, so you could put a line like this in your script:
: This is a comment
It's not quite a traditional comment; it's still an actual command that the shell executes. But since the command doesn't do anything, surely it's close enough: mission accomplished! Right?
The problem is that the line is still treated as a command beyond simply being run as one. Most importantly, lexical analysis - parameter substitution, word splitting, and such - still takes place on those destined-to-be-ignored arguments. Such processing means you run the risk of a syntax error in a "comment" crashing your whole script:
: Now let's see what happens next
echo "Hello, world!"
#=> hello.sh: line 1: unexpected EOF while looking for matching `''
That problem led to the introduction of a genuine comment syntax: the now-familiar # (which was first introduced in the C shell created at BSD). Everything from # to the end of the line is completely ignored by the shell, so you can put anything you like there without worrying about syntactic validity:
# Now let's see what happens next
echo "Hello, world!"
#=> Hello, world!
And that's How The Shell Got Its Comment Syntax.
However, you were looking for a multi-line (block) comment, of the sort introduced by /* (and terminated by */) in C or Java. Unfortunately, the shell simply does not have such a syntax. The normal way to comment out a block of consecutive lines - and the one I recommend - is simply to put a # in front of each one. But that is admittedly not a particularly "multi-line" approach.
Since the shell supports multi-line string-literals, you could just use : with such a string as an argument:
: 'So
this is all
a "comment"
'
But that has all the same problems as single-line :. You could also use backslashes at the end of each line to build a long command line with multiple arguments instead of one long string, but that's even more annoying than putting a # at the front, and more fragile since trailing whitespace breaks the line-continuation.
The solution you found uses what is called a here-document. The syntax some-command <<whatever causes the following lines of text - from the line immediately after the command, up to but not including the next line containing only the text whatever - to be read and fed as standard input to some-command. Here's an alternate shell implementation of "Hello, world" which takes advantage of this feature:
cat <<EOF
Hello, world
EOF
If you replace cat with our old friend :, you'll find that it ignores not only its arguments but also its input: you can feed whatever you want to it, and it will still do nothing (and still indicate that it did that nothing successfully).
However, the contents of a here-document do undergo string processing. So just as with the single-line : comment, the here-document version runs the risk of syntax errors inside what is not meant to be executable code:
#!/bin/sh -e
: <<EOF
(This is a backtick: `)
EOF
echo 'In modern shells, $(...) is preferred over backticks.'
#=> ./demo.sh: line 2: bad substitution: no closing "`" in `
The solution, as seen in the code you found, is to quote the end-of-document "sentinel" (the EOF or END or whatever) on the line introducing the here document (e.g. <<'EOF'). Doing this causes the entire body of the here-document to be treated as literal text - no parameter expansion or other processing occurs. Instead, the text is fed to the command unchanged, just as if it were being read from a file. So, other than a line consisting of nothing but the sentinel, the here-document can contain any characters at all:
#!/bin/sh -e
: <<'EOF'
(This is a backtick: `)
EOF
echo 'In modern shells, $(...) is preferred over backticks.'
#=> In modern shells, $(...) is preferred over backticks.
(It is worth noting that the way you quote the sentinel doesn't matter - you can use <<'EOF', <<E"OF", or even <<EO\F; all have the same result. This is different from the way here-documents work in some other languages, such as Perl and Ruby, where the content is treated differently depending on the way the sentinel is quoted.)
Notwithstanding any of the above, I strongly recommend that you instead just put a # at the front of each line you want to comment out. Any decent code editor will make that operation easy - even plain old vi - and the benefit is that nobody reading your code will have to spend energy figuring out what's going on with something that is, after all, intended to be documentation for their benefit.
It is called a Here Document. It is a code block that lets you send a list of commands to another command or program
The string following the << is the marker determining the end of the block. If you send commands to no-op, nothing happens, which is why you can use it as a comment block.
That's heredoc syntax. It's a way of defining multi-line string literals.
As the answer at your link explains, the single quotes around the END disables interpolation, similar to the way single-quoted strings disable interpolation in regular bash strings.

Sed or Perl: One file with regex instructions, one instruction per line, executed on another file

I'm setting up a regex learning environment purely in bash/tmux with a pane for the file containing a regex, a pane for a text-file-for-processing, and a pane for the bash shell. I'm at the start of "The Bastards Book of Ruby"-regex chapter.
The 'Bastard's Book' shows an example of a 'negative-lookahead' regex (perfect, lets learn), where perl is recommended over sed. As I'm going for a CLI approach-> Bash command: $ perl -p file_with_regex.pl test.txt
(This prints the lines from test.txt with the intended substitutions)
Question: How would I add a second regex (on a new line) of the regex.pl file, and have perl execute both the first and (next) this second instruction for processing the text file?
# regex.pl
s/^(?!Mr)/Ms./g
s/Ms./Mrs./g
(Adding the second regex results in "Execution of regex.pl aborted due to compilation errors.")
The overall aim here is to progress in Ruby, while testing Regular Expressions as concisely as possible. Picking up a bare minimum of sed/perl while doing so would be a plus, as a proper dive into perl would take time from Ruby (and when it's time for the perl dive, I'll have had some time with the basics). The more I look at this the more it seems necessary to just do it in Ruby, if there isn't a perl switch that would enable a command-line-with-files approach.
The basic answer is that you need a semicolon after each line.
Paraphrased from perlrun, -p reads all lines of input, runs the commands you specified, and then prints out the value in $_ (the implicit variable you're running your substitute commands on in this script).
So, removing the magic, -p transformed your code into:
LINE:
while (<>) {
# regex.pl
s/^(?!Mr)/Ms./g
s/Ms./Mrs./g
} continue {
print or die "-p destination: $!\n";
}
Perl requires a semicolon between statements (but a terminal semicolon at the end of a block is optional) hence the error.
I personally would recommend writing the whole script above into the file instead of using -p because it is far less magical, but you're welcome to do it either way.
If you were going to write the whole script, I would recommend something more like the following:
use strict;
use warnings;
while ( my $line = <ARGV> ) {
$line =~ s/^(?!Mr)/Ms./g;
print "After first subst: $line";
$line =~ s/Ms./Mrs./g;
print "After second subst: $line";
}
use strict and use warnings are the boilerplate you want at the top of any perl script (to catch typos and other common mistakes) and explicitly calling the variable $line gives you a better understanding of how the script is working ($_ is very magical for beginners and the source of many errors IMO, but great when you know what's what).
If you're wondering about <> vs. <ARGV> they are the same thing and mean "Read through all the lines of files provided as command-line arguments to this script or standard input if no files are provided"."

Ruby: Why do I get warning "regex literal in condition" here?

A simple Ruby program, which works well (using Ruby 2.0.0):
#!/usr/bin/ruby
while gets
print if /foo/../bar/
end
However, Ruby also outputs the warning warning: regex literal in condition. It seems that Ruby considers my flip-flop-expression /foo/../bar/ as dangerous.
My question: Where lies the danger in this program? And: Can I turn off this warning (ideally only for this statement, keeping other warnings active)?
BTW, I found on the net several discussions of this kind of code, also mentioning the warning, but never found a good explanation why we get warned.
You can avoid the warning by using an explicit match:
while gets
print if ~/foo/..~/bar/
end
Regexp#~ matches against $_.
I don't know why the warning is shown (to be honest, I wasn't even aware that Ruby matches regexp literals against $_ implicitly), but according to the parser's source code, it is shown unless you provide the -e option when invoking Ruby, i.e. passing the script as an argument:
$ ruby -e "while gets; print if /foo/../bar/ end"
I would avoid using $_ as an implicit parameter and instead use something like:
while line = gets
print line if line=~/foo/..line=~/bar/
end
I think Neil Slater is right: It looks like a bug in a parser. If I change the code to
#!/usr/bin/ruby
while gets
print if $_=~/foo/..$_=~/bar/
end
the warning disappears.
I'll file a bug report.

Bash command line parsing containing whitespace

I have a parse a command line argument in shell script as follows:
cmd --a=hello world good bye --b=this is bash script
I need the parse the arguments of "a" i.e "hello world ..." which are seperated by whitespace into an array.
i.e a_input() array should contain "hello", "world", "good" and "bye".
Similarly for "b" arguments as well.
I tried it as follows:
--a=*)
a_input={1:4}
a_input=$#
for var in $a_input
#keep parsing until next --b or other argument is seen
done
But the above method is crude. Any other work around. I cannot use getopts.
The simplest solution is to get your users to quote the arguments correctly in the first place.
Barring that you can manually loop until you get to the end of the arguments or hit the next --argument (but that means you can't include a word that starts with -- in your argument value... unless you also do valid-option testing on those in which you limit slightly fewer -- words).
Adding to Etan Reisners answer, which is absolutely correct:
I personally find bash a bit cumbersome, when array/string processing gets more complex, and if you really have the strange requirement, that the caller should not be required to use quotes, I would here write an intermediate script in, say, Ruby or Perl, which just collects the parameters in a proper way, wraps quoting around them, and passes them on to the script, which originally was supposed to be called - even if this costs an additional process.
For example, a Ruby One-Liner such as
system("your_bash_script here.sh '".(ARGV.join(' ').split(' --').select {|s| s.size>0 }.join("' '"))."'")
would do this sanitizing and then invoke your script.

Resources