Pipe grep response to a second command? - shell

Here's the command I'm currently running:
curl 'http://test.com/?id=12345' | grep -o -P '(?<=content="2;url=).*?(?=")'
The response from this command is a URL, like this:
$ curl 'http://test.com/?id=12345' | grep -o -P '(?<=content="2;url=).*?(?=")'
http://google.com
I want to use whatever that URL is to essentially do this:
curl 'http://test.com/?id=12345' | grep -o -P '(?<=content="2;url=).*?(?=")' | curl 'http://google.com'
Is there any simple way to do this all in one line?

Use xargs with a place holder for the output from stdin with the -I{} flag as below. The -r flag is to ensure the curl command is not invoked on a empty output from previous grep output.
curl 'http://test.com/?id=12345' | grep -o -P '(?<=content="2;url=).*?(?=")' | xargs -r -I{} curl {}
A small description about the flags, -I and -r from the GNU xargs man page,
-I replace-str
Replace occurrences of replace-str in the initial-arguments with
names read from standard input.
-r, --no-run-if-empty
If the standard input does not contain any nonblanks, do not run
the command. Normally, the command is run once even if there is
no input. This option is a GNU extension
(or) if you are looking for a bash approach without other tools,
curl 'http://test.com/?id=12345' | grep -o -P '(?<=content="2;url=).*?(?=")' | while read line; do [ ! -z "$line" ] && curl "$line"; done

Related

how to pipe multi commands to bash?

I want to check some file on the remote website.
Here is bash command to generate commands that calculate the file md5
[root]# head -n 3 zrcpathAll | awk '{print $3}' | xargs -I {} echo wget -q -O - -i {}e \| md5sum\;
wget -q -O - -i https://example.com/zrc/3d2f0e76e04444f4ec456ef9f11289ec.zrce | md5sum;
wget -q -O - -i https://example.com/zrc/e1bd7171263adb95fb6f732864ceb556.zrce | md5sum;
wget -q -O - -i https://example.com/zrc/5300b80d194f677226c4dc6e17ba3b85.zrce | md5sum;
Then I pipe the outputed commands to bash, but only the first command was executed.
[root]# head -n 3 zrcpathAll | awk '{print $3}' | xargs -I {} echo wget -q -O - -i {}e \| md5sum\; | bash -v
wget -q -O - -i https://example.com/zrc/3d2f0e76e04444f4ec456ef9f11289ec.zrce | md5sum;
3d2f0e76e04444f4ec456ef9f11289ec -
[root]#
Would you please try the following instead:
while read -r _ _ url _; do
wget -q -O - "$url"e | md5sum
done < <(head -n 3 zrcpathAll)
we should not put -i in front of "$url" here.
[Explanation about -i option]
Manpage of wget says:
-i file
--input-file=file
Read URLs from a local or external file. [snip]
If this function is used, no URLs need be present on the command line. [snip]
If the file is an external one, the document will be automatically treated as html if the Content-Type matches text/html.
Furthermore, the file's location will be implicitly used as base
href if none was specified.
where the file will contain line(s) of url such as:
https://example.com/zrc/3d2f0e76e04444f4ec456ef9f11289ec.zrce
https://example.com/zrc/e1bd7171263adb95fb6f732864ceb556.zrce
https://example.com/zrc/5300b80d194f677226c4dc6e17ba3b85.zrce
Whereas if we use the option as -i url, wget first
downloads the url as a file which contains the lines of urls
as above. In our case, the url is the target to download itself,
not the list of urls, wget causes an error: No URLs found in url.
Even if the wget fails, why the command outputs just one line, not
three lines as the result of md5sum?
This seems to be because the head command immediately flushes the remaining
lines when the piped subprocess fails.

Sending grep data iteratively using curl for output of tail -f

I am using grep to extract data from a log file. The log file is dynamically updating new rows. and I need to send the grepped data to a REST endpoint using curl. This can be done easily for a static file but cannot find a solution fo a running log file. How can I realize this situation?
eg: tail -f | grep "<string>" > ~/<fileName>.log
The above can put the data in a file. Need to send it using a POST curl.
Maybe using a function like
send_data(){
curl -s -k -X POST –header Content-Type: application/json’ \
–header ‘Accept: application/json’ \
“http://${HOST}${PORT}/v1/notify” \
-d $1
}
If tail -f | grep "<string>" > ~/<fileName>.log is working for you then you could do:
tail -f file | stdbuf -i0 -o0 -e0 grep "<string>" | xargs -n 1 -d $'\n' curl ...
or:
while IFS= read -r line; do
curl ... "$line"
done < <(tail -f file | stdbuf -i0 -o0 -e0 grep "<string>")

Pass multiline argument with xargs

I want to pass file content as quoted programm argument with xargs to skip temporary file creation.
With temp file I can do like this:
myprogram > /tmp/lld.json
zabbix_sender -z 127.0.0.1 -s testhost -k llditem -o "`cat /tmp/lld.json`"
rm /tmp/lld.json
But I don't want this extra actions with /tmp/lld.json.
So I try to use xargs like this:
myprogram |
xargs -e -I'{}' zabbix_sender -z 127.0.0.1 -s testhost -k llditem -o "'{}'"
guiding with xargs manpage:
-I replace-str
-e[eof-str] ... If eof-str is omitted, there is no end of file string..
http://man7.org/linux/man-pages/man1/xargs.1.html
But xargz executes zabbix-sender many times with each of the lines.
I guess that -I and -e options are mutually exclusive options. But I also assume that I misinterpret the xargs manual..
Would this work?
zabbix_sender -z 127.0.0.1 -s testhost -k llditem -o "`myprogram`"
If you insist on using xargs to do exactly that, then use -0:
myprogram | xargs -0 -I{} zabbix_sender -z 127.0.0.1 -s testhost -k llditem -o "'{}'"

curl complex usage with pattern

I'm trying to get 2 files using curl based on some pattern but that doesn't seem to work:
Files:
SystemOut_15.04.01_21.12.36.log
SystemOut_15.04.01_15.54.05.log
curl -f -k -u "login:password" https://myserver/cgi-bin/logviewer/index.cgi?getlogfile=SystemOut_15.04.01_21.12.36.log'&'server=qwerty123.com'&'numlines=100000000'&'appenv=MBL%20-%20PROD'&'directory=/app/WAS/was85/profiles/node/logs/mbl-server1
I know there is -A key but it doesn't work since my file is inside the link.
How can I extract those 2 files using a pattern?
Did that myself. One curl gets the list of logs on the webpage. Another downloads those files.
The code looks like:
for file in $(curl -f -k -u "user:pwd" https://selfservice.pwj.com/cgi-bin/logviewer/index.cgi?listdirectory=/app/smx_client_mob/data/log'&'appenv=MBL%20-%20PROD'&'server=xshembl04pap.she.pwj.com | grep href | sed 's/.*href="//' | sed 's/".*//' | sed 's/javascript:getLog//g' | sed "s/['();]//g" | grep -i 'service' | grep '^[a-zA-Z].*'); do
curl -o $file -f -k -u "user:pwd" https://selfservice.pwj.com/cgi-bin/logviewer/index.cgi?getlogfile="$file"'&'server=xshembl04pap.she.pwj.com'&'numlines=100000000'&'appenv=MBL%20-%20PROD'&'directory=/app/smx_client_mob/data/log; done

Command composition in bash

So I have the equivalent of a list of files being output by another command, and it looks something like this:
http://somewhere.com/foo1.xml.gz
http://somewhere.com/foo2.xml.gz
...
I need to run the XML in each file through xmlstarlet, so I'm doing ... | xargs gzip -d | xmlstarlet ..., except I want xmlstarlet to be called once for each line going into gzip, not on all of the xml documents appended to each other. Is it possible to compose 'gzip -d' 'xmlstarlet ...', so that xargs will supply one argument to each of their composite functions?
Why not read your file and process each line separately in the shell? i.e.
fileList=/path/to/my/xmlFileList.txt
cat ${fileList} \
| while read fName ; do
gzip -d ${fName} | xmlstartlet > ${fName}.new
done
I hope this helps.
Although the right answer is the one suggested by shelter (+1), here is a one-liner "divertimento" providing that the input is the proposed by Andrey (a command that generates the list of urls) :-)
~$ eval $(command | awk '{a=a "wget -O - "$0" | gzip -d | xmlstartlet > $(basename "$0" .gz ).new; " } END {print a}')
It just generates a multi command line that does wget http://foo.xml.gz | gzip -d | xmlstartlet > $(basenname foo.xml.gz .gz).new for each of the urls in the input; after the resulting command is evaluated
Use GNU Parallel:
cat filelist | parallel 'zcat {} | xmlstarlet >{.}.out'
or if you want to include the fetching of urls:
cat urls | parallel 'wget -O - {} | zcat | xmlstarlet >{.}.out'
It is easy to read and you get the added benefit of having on job per CPU run in parallel. Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ
If xmlstarlet can operate on stdin instead of having to pass it a filename, then:
some command | xargs -i -n1 sh -c 'zcat "{}" | xmlstarlet options ...'
The xargs option -i means you can use the "{}" placeholder to indicate where the filename should go. Use -n 1 to indicate xargs should only one line at a time from its input.

Resources