Bash script - grep output - bash

I need an output for multiple grep commands.
patterns: ([^"#]+)
wget -q -O - http://www.site1.com | grep -o -E -m 1 'site1content = "([^"#]+)"'
wget -q -O - http://www.site2.com | grep -o -E -m 1 'site2content"([^"#]+)"
.........
Output file:
http://www.site1.com***pattern
http://www.site2.com***pattern

Just redirect the output of your commands to a file.
wget -q -O - http://www.site1.com | grep -o -E -m 1 'site1content = "([^"#]+)"' > output.txt
wget -q -O - http://www.site2.com | grep -o -E -m 1 'site2content"([^"#]+)"' >> output.txt
> overwrites old content and >> appends to the end of the file.
Edit:
Not pretty but a quick and dirty solution might be
echo 'http://www.site1.com***'`wget -q -O - http://www.site1.com | grep -o -E -m 1 'site1content = "([^"#]+)"'` > output.txt
(untested)

As is the output you got from the above commends consists only of the pattern found because of the -o parameter:
http://explainshell.com/explain?cmd=grep+-o
I suggest using the above site for an explanation.

Related

how to pipe multi commands to bash?

I want to check some file on the remote website.
Here is bash command to generate commands that calculate the file md5
[root]# head -n 3 zrcpathAll | awk '{print $3}' | xargs -I {} echo wget -q -O - -i {}e \| md5sum\;
wget -q -O - -i https://example.com/zrc/3d2f0e76e04444f4ec456ef9f11289ec.zrce | md5sum;
wget -q -O - -i https://example.com/zrc/e1bd7171263adb95fb6f732864ceb556.zrce | md5sum;
wget -q -O - -i https://example.com/zrc/5300b80d194f677226c4dc6e17ba3b85.zrce | md5sum;
Then I pipe the outputed commands to bash, but only the first command was executed.
[root]# head -n 3 zrcpathAll | awk '{print $3}' | xargs -I {} echo wget -q -O - -i {}e \| md5sum\; | bash -v
wget -q -O - -i https://example.com/zrc/3d2f0e76e04444f4ec456ef9f11289ec.zrce | md5sum;
3d2f0e76e04444f4ec456ef9f11289ec -
[root]#
Would you please try the following instead:
while read -r _ _ url _; do
wget -q -O - "$url"e | md5sum
done < <(head -n 3 zrcpathAll)
we should not put -i in front of "$url" here.
[Explanation about -i option]
Manpage of wget says:
-i file
--input-file=file
Read URLs from a local or external file. [snip]
If this function is used, no URLs need be present on the command line. [snip]
If the file is an external one, the document will be automatically treated as html if the Content-Type matches text/html.
Furthermore, the file's location will be implicitly used as base
href if none was specified.
where the file will contain line(s) of url such as:
https://example.com/zrc/3d2f0e76e04444f4ec456ef9f11289ec.zrce
https://example.com/zrc/e1bd7171263adb95fb6f732864ceb556.zrce
https://example.com/zrc/5300b80d194f677226c4dc6e17ba3b85.zrce
Whereas if we use the option as -i url, wget first
downloads the url as a file which contains the lines of urls
as above. In our case, the url is the target to download itself,
not the list of urls, wget causes an error: No URLs found in url.
Even if the wget fails, why the command outputs just one line, not
three lines as the result of md5sum?
This seems to be because the head command immediately flushes the remaining
lines when the piped subprocess fails.

Ansible script module adds leading newline character in stdout, probably when sudo is used. Why?

My script is as follows:
#!/bin/bash
if lscpu | grep 'CPU max MHz' >/dev/null 2>&1; then
cpu_speed=$(lscpu | grep 'CPU max MHz' | cut -d':' -f2 | awk '{print $1}')
else
cpu_speed=$(lscpu | grep 'CPU MHz' | cut -d':' -f2 | awk '{print $1}')
fi
echo -ne "${cpu_speed}"
It adds newline in the output as follows:
"stdout": "\r\n4000.0000",
Even if my script file is empty, it still adds newline in stdout as follows:
"stdout": "\r\n"
My module is as follows:
- name: Get Processor Speed
script: files/get_cpu_speed.sh
register: cpu_speed_output
become: yes
So, as you can see it does sudo.
I was trying to find any references online if leading newline is common with ansible script command, but there is none.
So, what could be causing the leading line in my case?
Debug output is as follows:
ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=testuser -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/c4ee167dc2 -tt myserver1 '/bin/sh -c '"'"'sudo -H -S -p "[sudo via ansible, key=epollvusfsfrfsfsfsfmrxdcida] password: " -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-epollvusfsfrfsfsfsfmrxdcida; /home/testuser/.ansible/tmp/ansible-tmp-1557164925.73-152013190475693/get_cpu_speed.sh'"'"'"'"'"'"'"'"' && sleep 0'"'"''
I am unable to reproduce that behavior. If I run your code exactly as presented, I don't see any leading newline. In any case, there is an easy workaround available. Instead of referring to:
cpu_speed_output.stdout
You can refer to:
cpu_speed_output.stdout.strip()
The strip() method will strip any leading or trailing whitespace.

Pass multiline argument with xargs

I want to pass file content as quoted programm argument with xargs to skip temporary file creation.
With temp file I can do like this:
myprogram > /tmp/lld.json
zabbix_sender -z 127.0.0.1 -s testhost -k llditem -o "`cat /tmp/lld.json`"
rm /tmp/lld.json
But I don't want this extra actions with /tmp/lld.json.
So I try to use xargs like this:
myprogram |
xargs -e -I'{}' zabbix_sender -z 127.0.0.1 -s testhost -k llditem -o "'{}'"
guiding with xargs manpage:
-I replace-str
-e[eof-str] ... If eof-str is omitted, there is no end of file string..
http://man7.org/linux/man-pages/man1/xargs.1.html
But xargz executes zabbix-sender many times with each of the lines.
I guess that -I and -e options are mutually exclusive options. But I also assume that I misinterpret the xargs manual..
Would this work?
zabbix_sender -z 127.0.0.1 -s testhost -k llditem -o "`myprogram`"
If you insist on using xargs to do exactly that, then use -0:
myprogram | xargs -0 -I{} zabbix_sender -z 127.0.0.1 -s testhost -k llditem -o "'{}'"

Pipe grep response to a second command?

Here's the command I'm currently running:
curl 'http://test.com/?id=12345' | grep -o -P '(?<=content="2;url=).*?(?=")'
The response from this command is a URL, like this:
$ curl 'http://test.com/?id=12345' | grep -o -P '(?<=content="2;url=).*?(?=")'
http://google.com
I want to use whatever that URL is to essentially do this:
curl 'http://test.com/?id=12345' | grep -o -P '(?<=content="2;url=).*?(?=")' | curl 'http://google.com'
Is there any simple way to do this all in one line?
Use xargs with a place holder for the output from stdin with the -I{} flag as below. The -r flag is to ensure the curl command is not invoked on a empty output from previous grep output.
curl 'http://test.com/?id=12345' | grep -o -P '(?<=content="2;url=).*?(?=")' | xargs -r -I{} curl {}
A small description about the flags, -I and -r from the GNU xargs man page,
-I replace-str
Replace occurrences of replace-str in the initial-arguments with
names read from standard input.
-r, --no-run-if-empty
If the standard input does not contain any nonblanks, do not run
the command. Normally, the command is run once even if there is
no input. This option is a GNU extension
(or) if you are looking for a bash approach without other tools,
curl 'http://test.com/?id=12345' | grep -o -P '(?<=content="2;url=).*?(?=")' | while read line; do [ ! -z "$line" ] && curl "$line"; done

Wget page title

Is it possible to Wget a page's title from the command line?
input:
$ wget http://bit.ly/rQyhG5 <<code>>
output:
If it’s broke, fix it right - Keeping it Real Estate. Home
This script would give you what you need:
wget --quiet -O - http://bit.ly/rQyhG5 \
| sed -n -e 's!.*<title>\(.*\)</title>.*!\1!p'
But there are lots of situations where it breaks, including if there is a <title>...</title> in the body of the page, or if the title is on more than one line.
This might be a little better:
wget --quiet -O - http://bit.ly/rQyhG5 \
| paste -s -d " " \
| sed -e 's!.*<head>\(.*\)</head>.*!\1!' \
| sed -e 's!.*<title>\(.*\)</title>.*!\1!'
but it does not fit your case as your page contains the following head opening:
<head profile="http://gmpg.org/xfn/11">
Again, this might be better:
wget --quiet -O - http://bit.ly/rQyhG5 \
| paste -s -d " " \
| sed -e 's!.*<head[^>]*>\(.*\)</head>.*!\1!' \
| sed -e 's!.*<title>\(.*\)</title>.*!\1!'
but there is still ways to break it, including no head/title in the page.
Again, a better solution might be:
wget --quiet -O - http://bit.ly/rQyhG5 \
| paste -s -d " " \
| sed -n -e 's!.*<head[^>]*>\(.*\)</head>.*!\1!p' \
| sed -n -e 's!.*<title>\(.*\)</title>.*!\1!p'
but I am sure we can find a way to break it. This is why a true xml parser is the right solution, but as your question is tagged shell, the above it the best I can come with.
The paste and the 2 sed can be merged in a single sed, but is less readable. However, this version has the advantage of working on multi-line titles:
wget --quiet -O - http://bit.ly/rQyhG5 \
| sed -n -e 'H;${x;s!.*<head[^>]*>\(.*\)</head>.*!\1!;T;s!.*<title>\(.*\)</title>.*!\1!p}'
Update:
As explain in the comments, the last sed above uses the T command which is a GNU extension. If you do not have a compatible version, you can use:
wget --quiet -O - http://bit.ly/rQyhG5 \
| sed -n -e 'H;${x;s!.*<head[^>]*>\(.*\)</head>.*!\1!;tnext;b;:next;s!.*<title>\(.*\)</title>.*!\1!p}'
Update 2:
As above still not working on Mac, try:
wget --quiet -O - http://bit.ly/rQyhG5 \
| sed -n -e 'H;${x;s!.*<head[^>]*>\(.*\)</head>.*!\1!;tnext};b;:next;s!.*<title>\(.*\)</title>.*!\1!p'
and/or
cat << EOF > script
H
\$x
\$s!.*<head[^>]*>\(.*\)</head>.*!\1!
\$tnext
b
:next
s!.*<title>\(.*\)</title>.*!\1!p
EOF
wget --quiet -O - http://bit.ly/rQyhG5 \
| sed -n -f script
(Note the \ before the $ to avoid variable expansion.)
It seams that the :next does not like to be prefixed by a $, which could be a problem in some sed version.
The following will pull whatever lynx thinks the title of the page is, saving you from all of the regex nonsense. Assuming the page you are retrieving is standards compliant enough for lynx, this should not break.
lynx -dump example.com | sed '2q;d'

Resources