AWK: execute CURL on each line and parse result - shell

given an input stream with following lines:
123
456
789
098
...
I would like to call
curl -s http://foo.bar/some.php?id=xxx
with xxx being the number for each line, and everytime let an awk script fetch some information from the curl output which is written to the output stream. I am wondering if this is possible without using the awk "system()" call in following way:
cat lines | grep "^[0-9]*$" | awk '
{
system("curl -s " $0 \
" | awk \'{ #parsing; print }\'")
}'

You can use bash and avoid awk system call:
grep "^[0-9]*$" lines | while read line; do
curl -s "http://foo.bar/some.php?id=$line" | awk 'do your parsing ...'
done

A shell loop would achieve a similar result, as follows:
#!/bin/bash
for f in $(cat lines|grep "^[0-9]*$"); do
curl -s "http://foo.bar/some.php?id=$f" | awk '{....}'
done
Alternative methods for doing similar tasks include using Perl or Python with an HTTP client.

If your file gets dynamically appended the id's, you can daemonize a small while loop to keep checking for more data in the file, like this:
while IFS= read -d $'\n' -r a || sleep 1; do [[ -n "$a" ]] && curl -s "http://foo.bar/some.php?id=${a}"; done < lines.txt
Otherwise if it's static, you can change the sleep 1 to break and it will read the file and quit when there is no data left, pretty useful to know how to do.

Related

awk exec command for every line and keep columns

I have a large dataset files with two columns like
AS জীৱবিজ্ঞানবিভাগ
AS চেতনাদাস
AS বৈকল্পিক
and I want to run my command on the second column, store the result and get the output with the same column formatting:
AS jibvigyanvibhag
AS chetanadas
AS baikalpik
where my command is this pipe:
echo "$0" | indictrans -s asm -t eng --ml --build-lookup
So I'm doing like
awk -v OFS="\t" '{ print "echo "$2" | indictrans -s asm -t eng --ml --build-lookup" | "/bin/sh"}' in.txt > out.txt
but this will not preserve the columns, it just prints out the first column like this
jibvigyanvibhag
chetanadas
baikalpik
My solution was the following
awk -v OFS="\t" '{ "echo "$2" | indictrans -s asm -t eng --ml --build-lookup" | getline RES; print $1,$2,RES}' in.txt > out.txt
that will print out
AS জীৱবিজ্ঞানবিভাগ jibvigyanvibhag
AS চেতনাদাস chetanadas
AS বৈকল্পিক baikalpik
Now I want to put parametrize the command, but the escape looks odd here:
"echo "$0" | indictrans -s $SOURCE -t $TARGET --ml --build-lookup"
and it does not work. How to correctly exec this command and escape the parameters?
[UPDATE]
This is a partial solution I came out inspired by the suggested one
#!/bin/bash
SOURCE=asm
TARGET=eng
IN=$2
OUT=$3
awk -v OFS="\t" '{
CMD = "echo "$2" | indictrans -s asm -t eng --ml --build-lookup"
CMD | getline RES
print $1,RES
close(CMD)
}' $IN > $OUT
I still cannot get rid of the variables, it seems that I cannot define with -v as usual like
awk -v OFS="\t" -v source=$SOURCE -v target=$TARGET '{
CMD = "echo "$2" | indictrans -s source -t target --ml --build-lookup"
...
NOTES.
The indictrans process handles the stdin and writes to stdout in this way:
for line in ifp:
tline = trn.convert(line)
ofp.write(tline)
# close files
ifp.close()
ofp.close()
where
ifp = codecs.getreader('utf8')(sys.stdin)
ofp = codecs.getwriter('utf8')(sys.stdout)
so it takes one line from stdin, processes the data with some library trn.convert and writes the results to stdout without any parallelism.
For this reason (lack of parallelism in terms of multiline input) the performances are bound by the size of the dataset (number of rows).
An example input two column dataset (1K rows) is available here. An example sample is
KN ಐಕ್ಯತೆ ಕ್ಷೇಮಾಭಿವೃದ್ಧಿ ಸಂಸ್ಥೆ ವಿಜಯಪುರ
KN ಹೊರಗಿನ ಸಂಪರ್ಕಗಳು
KN ಮಕ್ಕಳ ಸಾಹಿತ್ಯ ಮತ್ತು ಸಾಂಸ್ಖ್ರುತಿಕ ಕ್ಷೇತ್ರದಲ್ಲಿ ಸೇವೆ ಸಲ್ಲಿಸುತ್ತಿರುವ ಸಂಸ್ಠೆ ಮಕ್ಕಳ ಲೋಕ
while the example script based on the last accepted answer is here
Don't invoke shells with awk. The shell itself avoids treating data as if it were code unless explicitly instructed to do otherwise -- but when you use system() or popen(), as the awk code is doing here, everything passed as an argument is parsed in a context where data is able to escape its quoting and be treated as code.
Simple approach: One indictrans per line
If you need a separate copy of indictrans for each line to be executed, use:
while read -r col1 rest; do
printf '%s\t%s\n' "$col1" "$(indictrans -s asm -t eng --ml --build-lookup <<<"$rest")"
done <in.txt >out.txt
Fast Approach: One indictrans processing all lines
If indictrans generates one line of output per line of input, you can do even better, by pasting together one stream with all the first columns and a second string with the translations of the remainder of the lines, thus requiring only one copy of indictrans to be run:
#!/usr/bin/env bash
# ^^^^- not compatible with /bin/sh
paste <(<in.txt awk '{print $1}') \
<(<in.txt sed -E 's/^[^[:space:]]*[[:space:]]//' \
| indictrans -s asm -t eng --ml --build-lookup) \
>out.txt
You can pipe column 2 to your command and change it with command's output like below in awk.
{
cmd = "echo "$2" | indictrans -s asm -t eng --ml --build-lookup"
cmd | getline $2
close(cmd)
} 1
If SOURCE and TARGET are awk variables
{
cmd = "echo "$0" | indictrans -s "SOURCE" -t "TARGET" --ml --build-lookup"
cmd
close(cmd)
}

Read multiple variables from file

I need to read a file that has lines like
user=username1
pass=password1
How can I read multiple lines like this into separate variables like username and password?
Would I use awk or grep? I have found ways to read lines into variables with grep but would I need to read the file for each individual item?
The end result is to use these variables to access a database via the command line. So I need to be able to read, store and use these values in other commands.
if the process which generates the file is safe and has shell syntax just source the file.
. ./file
Otherwise the file can be processes before to add quotes
perl -ne 'if (/^([A-Za-z_]\w*)=(.*)/) {$k=$1;$v=$2;$v=~s/\x27/\x27\\\x27\x27/g;print "$k=\x27$v\x27\n";}' <file >file2
. ./file2
If you want to use awk then
Input
$ cat file
user=username1
pass=password1
Reading
$ user=$(awk -F= '$1=="user"{print $2;exit}' file)
$ pass=$(awk -F= '$1=="pass"{print $2;exit}' file)
Output
$ echo $user
username1
$ echo $pass
password1
You could use a loop for your file perhaps, but this is probably the functionality you're looking for.
$ echo 'user=username1' | awk -F= '{print $2}'
username1
Using the -F flag sets the delimiter to = and we select the 2nd item from the row.
file.txt:
user=username1
pass=password1
user=username2
pass=password2
user=username3
pass=password3
Do to avoid browsing several times the file file.txt:
#!/usr/bin/env bash
func () {
echo "user:$1 pass:$2"
}
i=0
while IFS='' read -r line; do
if [ $i -eq 0 ]; then
i=1
user=$(echo ${line} | cut -f2 -d'=')
else
i=0
pass=$(echo ${line} | cut -f2 -d'=')
func "$user" "$pass"
fi
done < file.txt
Output:
user:username1 pass:password1
user:username2 pass:password2
user:username3 pass:password3

curl in bash script vs curl one liner

This code ouputs a http status of 000 - which seems to indicate something didn't connect properly but when I do this curl outside of the bash script it works fine and produces a 200 so something with this code is off... any guidance?
#!/bin/bash
URLs=$(< test.txt | grep Url | awk -F\ ' { print $2 } ')
# printf "Preparing to check $URLs \n"
for line in $URLs
do curl -L -s -w "%{http_code} %{url_effective}\\n" $line
done
http://beerpla.net/2010/06/10/how-to-display-just-the-http-response-code-in-cli-curl/
your script works on my vt.
I added in a couple of debugging lines, this may help you to see where any metacharacters are getting in, as I would have to agree with the posted coments.
I've output lines in the for to a file which is then printed out with od.
I have amended the curl line to grab the last line, just to get the response code.
#!/bin/bash
echo -n > $HOME/Desktop/urltstfile # truncate urltstfile
URLs=$(cat testurl.txt | grep Url | awk -F\ ' { print $2 } ')
# printf "Preparing to check $URLs \n"
for line in $URLs
do echo $line >> $HOME/Desktop/urltstfile;
echo line:$line:
curl -IL -s -w "%{http_code}\n" $line | tail -1
done
od -c $HOME/Desktop/urltstfile
#do curl -L -s -w "%{http_code} %{url_effective}\\n" "$line\n"

how do I run concurrent background process from shell script?

I tried the following:
#!/bin/bash
while read device; do
name=$(echo "$device" | awk '{ print $1 }')
ip=$(echo "$device" | awk '{ print $2 }')
while read creds; do
community=$(echo "$creds" | awk '{ print $1 }')
version=$(echo "$creds" | awk '{ print $2 }')
mkdir -p walks/$name;
`echo -e "snmpwalk -v$version -c \x27$community\x27 $ip system > walks/$name/$community-$version.txt
done < <(##MySQL query that returns tuples in form: (snmp_ro,(1,2c,3))##")
done < <(cat devices.txt)
exit 0
This is meant to go through and find the snmp string and version of each device.
devices.txt is a list of devices in form: hostname ip
It doesn't create the file: walks/$name/$community-$version.txt, and it only seems to run through the walks 1 at a time, something I don't want.
Use & to put the contents you want backgrounded in, well, the background.
pids=( )
while read -r -u 3 name ip _; do
while read -r -u 4 community version _; do
mkdir -p "walks/$name"
snmpwalk -v"$version" -c "$community" "$ip" system \
</dev/null >"walks/$name/$community-$version.txt" & pids+=( "$!" )
done 4< <(: get data for "$name" and "$ip")
done 3<devices.txt
wait "${pids[#]}"
Other items of note:
read can already split fields into their own variables; using awk for this is silly.
The _ in read -r foo bar _ ensures that if more than two columns exist in the input file, the third column and onward are discarded (actually, put into a variable named _, but this is considered discard by convention) rather than appended to bar.
Make a habit of quoting expansions unless you have a specific and compelling reason to do otherwise; otherwise, you get string-splitting and glob expansion of string contents.
This example puts each input stream on its own file descriptor, and redirects each read to its own FD. This prevents any other content within your loop from consuming stdin.

how to get the last login time for all users in one line for different shells

I can make the following line work on ksh
for user in $( awk -F: '{ print $1}' /etc/passwd); do last $user | head -1 ; done | tr -s "\n" |sort
But I'd like to make it work on UNIX sh and UNIX csh. (in linux sh it runs fine, but linux is not unix...)
I know there are limitations for this since it seems that each UNIX(*) has its own variations on the syntax.
update: sorry, there are some restrictions here:
I can't write on the disk, so I can't save scripts.
how do i write this in CSH?
This awk-script seems to be the equivalent to you loop above:
{
cmd = "last "$1
cmd | getline result
printf "%s", result
}
use it like this:
awk -F: -f script_above.awk /etc/passwd
Pipe the output to sort
As a one-liner:
$ awk -F: '{cmd = "last "$1; cmd | getline result;printf "%s", result}' /etc/passwd
This might do the trick for you, should be POSIX compliant:
last | awk 'FNR==NR{split($0,f,/:/);a[f[1]];next}($1 in a)&&++b[$1]==1' /etc/passwd - | sort
You don't really need Awk for this.
while IFS=: read user _; do
last "$user" | head -n 1
done </etc/passwd # | grep .
Instead of reinvent it in Csh, how about
sh -c 'while IFS=: read user _; do last "$user" | head -n 1; done </etc/passwd'
You will get empty output for users who have not logged in since wtmp was rotated; maybe add a | grep . to weed those out. (I added it commented out above.)
To reiterate, IFS=: sets the shell's internal field separator to a colon, so that read will split the password file on that.
Just use simple command:
lastlog

Resources