using grep in a ruby script - ruby

I have a simple script that scrapes a webpage and puts the lines of content to the screen, then I simply pipe it to grep to output what I want then pipe that to less.
myscript.rb scrape-term | grep argument | less
I changed the script to use the following, instead of having the extra arguments on the command-line;
%x[ #{my-text-output} | grep argument | less ]
but now I get the error;
sh: 0: command not found
I've tried the other variants found here but nothing works!

What you need to do is open a handle to grep, not jam the output into there. The hack solution is to dump your output in a Tempfile and read from that through grep but really that's a mess.
Ideally what you do is only emit output that matches your pattern by filtering:
while (...)
# ...
if (output.match(/#{argument}/))
puts argument
end
end
Then this can be channelled through to less if required.
Remember there's also a grep method inside of Ruby for anything in an Array. For example, if you've created an array called output with the lines in it:
output.grep(/#{argument}/).each do |line|
puts line
end

Since you're going to run it in a shell anyway, because of a less command, then maybe try this:
%x[ echo #{my-text-output} | grep argument | less ]
You can also try to do echo -e which will display \n as newline etc.
(Ideally, your text should have a \n char at the end of each line already.)
Edit:
Forgot about "" for variable. No more Unknown Command errors, although less doesn't want to work with me. I have this script.rb now:
text = "foo
foobar
foobaraski
bar
barski"
a = %x[ echo "#{text}" | grep foo]
print a
And it gives me:
shell> ruby script.rb
foo
foobar
foobaraski
Edit 2:
And now it works:
text = "foo
foobar
foobaraski
bar
barski"
system "echo '#{text}' | grep foo | less"
Opens up less with three lines, just as bash command would.

Related

bash for finding line that contains string and including that as part of command

I have a command that will print out three lines:
1-foo-1
1-bar-1
1-baz-1
I would like to include the result of this as part of a command where I search for the line that contains the string "bar" and then include that entire line as part of a command as follows:
vi 1-bar-1
I was wondering what the bash awk and/or grep combination would be for getting this. Thank you.
I had tried the following but I'm getting the entire output. For example, I'd have a file rows.txt with this content:
1-foo-1
1-bar-1
1-baz-1
and then I'd run echo $(cat rows.txt | awk /^1-baz.*$/) and I'd get 1-foo-1 1-bar-1 1-baz-1 as a result when I'm looking for just 1-baz-1. Thank you.
vi $(echo -e "1-foo-1\n1-bar-1\n1-baz-1\n" | grep bar | awk -F'-' '{print $2}')
The above script would equals vi bar
P.S.
echo -e "1-foo-1\n1-bar-1\n1-baz-1\n" is a demo to mimic your command output.
P.S.
You update the question... Now your goal becomes:
I'm looking for just 1-baz-1.
Then, the solution would be just
cat rows.txt | grep baz
I search for the line that contains the string "bar":
A naive approach would be to just use
vi $(grep -F bar rows.txt)
However, you have to keep in mind a few things:
If your file contains several lines with bar, say
1-bar-1
2-bar-2
the editor will open both files. This may or may not what you want.
Another point to consider: If your file contains a line
1-foobar-1
this would be choosed as well. If you don't want this to happen, use
vi $(grep -Fw bar rows.txt)
The -w option requires that the pattern must be delimited by word boundaries.

sed - replace pattern with content of file, while the name of the file is the pattern itself

Starting from the previous question, I have another one. If I make this work, I can just delete several lines of script :D
I want to transform this line:
sed -i -r -e "/$(basename "$token_file")/{r $token_file" -e "d}" "$out_dir_rug"/rug.frag
into this line:
sed -i -r -e "/(##_[_a-zA-Z0-9]+_##)/{r $out_dir_frags_rug/\1" -e "d}" "$out_dir_rug"/rug.frag
The idea is the following. Originally (the first line), I searched for some patterns, and then replaced those patterns with their associated files. The names of the files are the patterns themselves.
Eample:
Pattern: ##_foo_##
File name: ##_foo_##
Content of file ##_foo_##:
first line of foo text
second line of foo text
so the text
bar
##_foo_##
bar
would become
bar
first line of foo text
second line of foo text
bar
In my second attempt, I used sed for both locating the patterns, and for the actual replacement.
The result is that the patterns are found, but replaced with pretty much nothing.
Is sed supposed to be able to do the replacement I want? If yes, how should I change my command?
Note: a file usually has several different patterns (I call them tokens), and the same pattern may appear more than one time.
So an input file might look like:
bar
bar
##_foo_##
bar
##_haa_##
bar
##_foo_##
and so on
I already tried to replace the / in the address with ,, to no useful result. Escaping the / in the path to \/ also does not help.
I verified that the path to the replacement files is good by adding the next line, just before the sed:
echo "$out_dir_frags_rug"
The names of the files are the patterns themselves.
If you need anything "dynamic", then sed is not enough for it. As sed can't do "eval" - can't reinterpret the content of pattern buffer or hold buffer as commands (that would be amazing!) - you can't use the line as part of the command.*
You can use bash, untested, written here:
while IFS= read -r line; do
if [[ "$line" =~ ^##_([_a-zA-Z0-9]+)_## ]]; then
cat "${BASH_REMATCH[1]}"
else
printf "%s\n" "$line"
fi
done < inputfile
but that would be slow - bash is slow on reading lines. A similar design could be working in a POSIX shell with POSIX tools, by replacing [[ bash extension with some grep + sed or awk.
An awk would be waaaaaaay faster, something along, also untested:
awk '/^##_[_a-zA-Z0-9]+_##$/{
gsub(/^##_/, "", $0);
gsub(/_##$/, "", $0);
file = $0
while (getline tmp < (file)) print tmp;
next
}
{print}
' inputfile
That said, for your specific problem, instead of reinventing the wheel and writing yet another templating and preprocessing tool, I would advise to concentrate on researching existing solutions. A simple cpp file with the following content can be preprocessed with C preprocessor:
bar
bar
#include "foo"
bar
#include "haa"
bar
#include "foo"
and so on
It's clear to anyone what it means and it has a very standarized format and you also get get all the #ifdef conditional expressions, macros and macro functions that you can use - but you can't start lines with #, dunno if that's important. For endless ultimate templating power, I could recommend m4 from the standard unix commands.
* You can however with GNU sed execute the content of replacement string inside s command in shell with e flag. I did forget about it when writing this answer, as it's rarely used and I would strongly advise against using e flag - finding out proper quoting for the subshell is hard (impossible?) and it's very easy to abuse it. Anyway, the following could work:
sed -n '/^##_\(.*\)_##$/!{p;n;}; s//cat \1/ep'
but with the following input it may cause harm on your system:
some input file
##_$(rm /)_##
^^^^^^^ - will be executed in subshell and remove all your files
I think proper quoting would be something along (untested):
sed -n '/^##_\(.*\)_##$/!{p;n;}; s//\1/; '"s/'/'\\\\''/g; p; s/.*/cat '&'/ep"
but I would go with existing tools like cpp or m4 anyway.
With sed
Yes, this is possible with GNU sed.
With this input file input.txt:
= bar =
##_foo_##
= bar2 =
##_foo_##
= bar3 =
And the ##_foo_## file you gave in your question, the command
sed -E '
/^##_[_a-zA-Z0-9]+_##$/ {
s|^|cat ./|
e
}
' input.txt
... will yield:
= bar =
first line of foo text
second line of foo text
= bar2 =
first line of foo text
second line of foo text
= bar3 =
This command can also be shortened to this one-liner:
sed -E '/^##_[_a-zA-Z0-9]+_##$/ s|^|cat ./|e' input.txt
Explanation
GNU sed has a special command e that executes the command found in pattern space and then replaces the content of the pattern space with the output of the command.
When the above program encounters a line matching your pattern ##_file_##, it prepends cat ./ to the pattern space and executes it with e.
The s/.../.../e command is a shortened version that does exactly the same, the command being executed only if a successful substitution occured.
Contrary to what KamilCuk says in their answer, both sed commands above are perfectly safe and don't need any escaping/quoting because they are executed on a known harmless pattern that cannot be tricked to execute anything else than the expected cat.
Of course, this is designed to work with that ##_file_## pattern you gave in your question. Allowing spaces or other fancy characters in your pattern may break things since they might be interpreted by the shell.
With awk
Here is the equivalent with awk:
awk '
/^##_[_a-zA-Z0-9]+_##$/ {
system("cat ./" $0)
next
}
1
' input.txt
This command can also be shortened to this one-liner:
awk '! /^##_[_a-zA-Z0-9]+_##$/ || system("cat ./" $0)' input.txt
Explanation
This is very similar to the sed commands above: when awk meets the pattern ##_file_## it builds the corresponding cat command and executes it with system() then it skips to the next input line with next. Lines that don't match the pattern are printed as is (the 1 line).
Of course, the command being interpreted by the shell, the same caveat applies here: both awk commands are perfectly safe and don't need any escaping/quoting as long as your pattern stays that simple.

What is the use of "< " in bash

I can't differentiate between these two lines of code as the output for each command is same
cat volcanoes.txt
cat < volcanoes.txt
< reads a file (specified on the Right-Hand Side) and pipes it into the STDIN of the command on the Left-Hand Side.
cat takes input and outputs it to STDOUT.
If you provide an argument to cat, it takes input from there. Otherwise, it takes it from STDIN.
It isn't usually useful to use < in conjunction with cat.
cat volcanoes.txt passes volcanoes.txt as an argument, which cat will attempt to locate on disk and open.
cat < volcanoes.txt runs cat with no arguments, and the interpreter opens volcanoes.txt as cat's stdin.
For a clearer example, try testing with multiple files:
echo 1 > a
echo 2 > b
now you can see the difference by comparing
grep . a b
vs
cat a b | grep .
In the first one, a & b are passed as arguments, so grep opens them itself and knows the source of each line of data, so it tells you which file each line came from.
$: grep . a b
a:1
b:2
Done the second way, cat reads both files and puts the content on grep's stdin as a single anonymized stream, much the same as you did with a single file when you said cat < volcanoes.txt. This way, grep only knows data is coming on stdin, and it can't give you the additional info.
$: cat a b | grep .
1
2
For cat, it's functionally the same because of what cat is and does, but it's still mechanically different, and for some programs the difference could be crippling, or at least relevant.

wc output differs inside/outside vim

I'm working on a text file that contains normal text with LaTeX-style comments (lines starting with a %). To determine the non-comment word count of the file, I was running this command in Bash:
grep -v "^%" filename | wc -w
which returns about the number of words I would expect. However, if from within vim I run this command:
:r! grep -v "^%" filename | wc -w
It outputs the word count which includes the comments, but I cannot figure out why.
For example, with this file:
%This is a comment.
This is not a comment.
Running the command from outside vim returns 5, but opening the file in vim and running the similar command prints 9.
I also was having issues getting vim to prepend a "%" to the command's output, but if the output is wrong anyways, that issue becomes irrelevant.
The % character is special in vi. It gets substituted for the filename of the current file.
Try this:
:r! grep -v "^\%" filename | wc -w
Same as before but backslash-escaping the %. In my testing just now, your example :r! command printed 9 as it did for you, and the above printed 5.

How do you pipe shell output to ruby -e?

Say I was typing something in my terminal like:
ls | grep phrase
and after doing so I realize I want to delete all these files.
I want to use Ruby to do so, but can't quite figure out what to pass into it.
ls | grep phrase | ruby -e "what do I put in here to go through each line by line?"
Use this as a starting point:
ls ~ | ruby -ne 'print $_ if $_[/^D/]'
Which returns:
Desktop
Documents
Downloads
Dropbox
The -n flag means "loop over all incoming lines" and stores them in the "default" variable $_. We don't see that variable used much, partly as a knee-jerk reaction to Perl's overuse of it, but it has its useful moments in Rubydom.
These are the commonly used flags:
-e 'command' one line of script. Several -e's allowed. Omit [programfile]
-n assume 'while gets(); ... end' loop around your script
-p assume loop like -n but print line also like sed
ARGF will save your bacon.
ls | grep phrase | ruby -e "ARGF.read.each_line { |file| puts file }"
=> phrase_file
file_phrase
stuff_in_front_of_phrase
phrase_stuff_behind
ARGF is an array that stores whatever you passed into your (in this case command-line) script.
You can read more about ARGF here:
http://www.ruby-doc.org/core-1.9.3/ARGF.html
For more uses check out this talk on Ruby Forum:
http://www.ruby-forum.com/topic/85528

Resources