I wrote a script that it takes a csv file and replace the third column with the HASH of the second column with some string (Key).
After 256 rows, I got an error
awk: cmd. line:3: (FILENAME=C:/hanatest/test.csv FNR=257) fatal:
cannot create child process for `echo -n
E5360712819A7EF1584E2FDA06287379FF5CC3E0A5M7J6PiQMaSBut52ZQhVlS4 |
openssl ripemd160 | cut -f2 -d" "' (fork: Resource temporarily
unavailable)
I change the CSV file and I got always the same error after 256 rows.
here is my code:
awk -F "," -v env_var="$key" '{
tmp="echo -n "$2env_var" | openssl ripemd160 | cut -f2 -d\" \""
tmp | getline cksum
$3=toupper(cksum)
print
}' //test/source.csv > //ziel.csv
Can you please help me ?
Here my sample input:
25,XXXXXXXXXXXXXXXXXX,?
44,YYYYYYYYYYYYYYYYYY,?
84,ZZZZZZZZZZZZZZZZZZ,?
and here my expected output:
25,XXXXXXXXXXXXXXXXXX,301E2A8BF32A7046F65E48DF32CF933F6CAEC529
44,YYYYYYYYYYYYYYYYYY,301E2A8BF32A7046F65E48EF32CF933F6CAEC529
84,ZZZZZZZZZZZZZZZZZZ,301E2A8BF32A7046F65E48EF33CF933F6CAEC529
Thanks in advance
Let's make your code more robust first:
awk -F "," -v env_var="$key" '{
tmp="echo -n \047" $2 env_var "\047 | openssl ripemd160 | cut -f2 -d\047 \047"
if ( (tmp | getline cksum) > 0 ) {
$3 = toupper(cksum)
}
close(tmp)
print
}' /test/source.csv > /ziel.csv
Now - do you still have a problem? If you're considering using getline make sure to read and fully understand the correct uses and all of the caveats discussed at http://awk.freeshell.org/AllAboutGetline.
Related
I have a file containing like below, multiple rows are there
test1| 1234 | test2 | test3
Extract second column 1234 and run a command feeding that as input
lets say we get X as output to the command
Print the output as below for each of the line
test1 | X | test2 | test3
Prefer if I could do it in one-liner, but open to ideas.
I am able to extract string using awk, but I am not sure how I can still preserve the initial output and replace it in the output. Below is what I tested
cat file.txt | awk -F '|' '{newVar=system("command "$2); print newVar $4}'
#
Sample command output, where we extract the "name"
openstack show 36a6c06e-5e97-4a53-bb42
+----------------------------+-----------------------------------+
| Property | Value |
+----------------------------+-----------------------------------+
| id | 36a6c06e-5e97-4a53-bb42 |
| name | testVM1 |
+----------------------------+-----------------------------------+
Perl to the rescue!
perl -lF'/\|/' -ne 'chomp( $F[1] = qx{ command $F[1] }); print join "|", #F' < file.txt
-n reads the input line by line
-l removes newlines from input and adds them to prints
F specifies how to split each input line into the #F array
$F[1] corresponds to the second column, we replace it with the output of the command
chomp removes the trailing newline from the command output
join glues the array back to one line
Using awk:
awk -F ' *| *' '{("command "$2) | getline $2}1' file.txt
e.g.
$ awk -F ' *| *' '{("date -d #"$2) | getline $2}1' file.txt
test1| Thu 01 Jan 1970 05:50:34 AM IST | test2 | test3
I changed the field separator from | to *| * to accommodate the spaces surrounding the fields. You can remove those based on your actual input.
This finally did the trick..
awk -F' *[|] *' -v OFS=' | ' '{
cmd = "openstack show \047" $2 "\047"
while ( (cmd | getline line) > 0 ) {
if ( line ~ /name/ ) {
split(line,flds,/ *[|] */)
$2 = flds[3]
break
}
}
close(cmd)
print
}' file
If command can take the whole list of values once and generate the converted list as output (e.g. tr 'a-z' 'A-Z') then you'd want to do something like this to avoid spawning a shell once per input line (which is extremely slow):
awk -F' *[|] *' '{print $2}' file |
command |
awk -F' *[|] *' -v OFS=' | ' 'NR==FNR{a[FNR]=$0; next} {$2=a[FNR]} 1' - file
otherwise if command needs to be called with one value at a time (e.g. echo) or you just don't care about execution speed then you'd do:
awk -F' *[|] *' -v OFS=' | ' '{
cmd = "command \047" $2 "\047"
if ( (cmd | getline line) > 0 ) {
$2 = line
}
close(cmd)
print
}' file
The \047s will produce single quotes around $2 when it's passed to command and so shield it from shell interpretation (see https://mywiki.wooledge.org/Quotes) and the test on the result of getline will protect you from silently overwriting the current $2 with the output of an earlier command execution in the event of a failure (see http://awk.freeshell.org/AllAboutGetline). The close() ensures that you don't end up with a "too many open files" error or other cryptic problem if the pipe isn't being closed properly, e.g. if command is generating multiple lines and you're just reading the first one.
Given your comment below, if you're going with the 2nd approach above then you'd write something like:
awk -F' *[|] *' -v OFS=' | ' '{
cmd = "openstack show \047" $2 "\047"
while ( (cmd | getline line) > 0 ) {
split(line,flds)
if ( flds[2] == "name" ) {
$2 = flds[3]
break
}
}
close(cmd)
print
}' file
Hey guys so i have this sample data from uniq-c:
100 c.m milk
99 c.s milk
45 cat food
30 beef
desired output:
beef,30
c.m milk,100
c.s milk,99
cat food,45
the thing i have tried are using:
awk -F " " '{print $2" " $3 " " $4 " " $5 "," $1}' stock.txt |sort>stock2.csv
i got :
beef ,30
cat food
,45
c.m milk
,100
c.s milk
,99
think its because some item doesn't have 2,3,4,5 and i still use " ", and the sort in unix doesn't prioritise dot first unlike sql. however i'm not too sure how to fix it
To obtain your desired output you could sort first your current input and then try to swap the columns.
Using awk, please give a try to this:
$ sort -k2 stock.txt | awk '{t=$1; sub($1 FS,""); print $0"," t}'
It will output:
beef,30
c.m milk,100
c.s milk,99
cat food,45
i think you can solve it in bash using some easy commands, if the format of the file is as you posted it:
prova.txt is your file.
then do:
cat prova.txt | cut -d" " -f2,3 > first_col
cat prova.txt | cut -d" " -f1 > second_col
paste -d "," first_col second_col | sort -u > output.csv
rm first_col second_col
in output.txt you have your desired output in CSV format!
EDIT:
after reading and applying PesaThe comment, the code is way easier:
paste -d, <(cut -d' ' -f2- prova.txt) <(cut -d' ' -f1 prova.txt) | sort -u > output.csv
Combining additional information from this thread with awk, the following script is a possible solution:
awk ' { printf "%s", $2; if ($3) printf " %s", $3; printf ",%d\n", $1; } ' stock.txt | LC_ALL=C sort > stock2.csv
It works well in my case. Nevertheless, I would prefer nbari's solution because it is shorter.
$ awk '{$0=$0","$1; sub(/^[^[:space:]]+[[:space:]]+/,"")} 1' file | LC_ALL=C sort
beef,30
c.m milk,100
c.s milk,99
cat food,45
You can use sed + sort:
sed -E 's/^([^[:blank:]]+)[[:blank:]]+(.+)/\2,\1/' file | C_ALL=C sort
beef,30
c.m milk,100
c.s milk,99
cat food,45
I'm currently using awk to replicate the function uniq -c with commas as delimiters.
This gives correct output:
$ cut --delimiter=, -s -f2 wordlist.csv | awk '{ cnts[$0] += 1 } END { for (v in cnts) print cnts[v], v}' OFS="," | head
2,laecherlichen
111,doctrine
1,cremonas
1,embedding
1,conincks
2,similiter
1,mitgesellen
1,hysnelement
1,geringem
1,aquarian
However, if I reverse the awk command print cnts[v], v into print v, cnts[v], I get a messed up output:
$ cut --delimiter=, -s -f2 wordlist.csv | awk '{ cnts[$0] += 1 } END { for (v in cnts) print v, cnts[v]}' OFS="," | head
,2echerlichen
,111rine
,1emonas
,1bedding
,1nincks
,2militer
,1tgesellen
,1snelement
,1ringem
,1uarian
I'm confused by this output, because I'm expecting something like word,1 as output. What is the problem?
Most likely you have DOS line feed characters i.e. \r before end of line \n. You can use RS variable in awk to ignore this:
cut --delimiter=, -s -f2 wordlist.csv | awk -v RS='\r|\n' '{
cnts[$0] += 1 } END { for (v in cnts) print cnts[v], v}' OFS="," | head
However if you show your csv file I believe even cut and head can be removed from above commands.
PS: Thanks to #Bammar you can also run:
dos2unix file.csv
to convert your csv file to unix compatible file.
I'm on OS X.
I have a large list of records and I need to md5 (or any other hash function) nth column and add it to a new column.
There was something that almost worked except that it did not:
awk '{
tmp="echo " $3 " | openssl md5 | cut -f1 -d\" \"" tmp | getline cksum $2=","cksum print }'< file.csv
Thanks for help.
EDIT:
My CSV:
fname,lname,email,cpid,mcssid
tester,testurion,test#test.org,Campaign2014,12345
tester,testuci,test#test.com,Campaign2014,123456
Results:
dzh:Desktop dzh$ awk '{
tmp="echo "$0" | openssl md5 | cut -f5 -d\" \""
tmp | getline cksum
$2=","cksum
print
}'< testfile.csv
fname,lname,email,cpid,mcssid ,60a0c14d2af1ac9b429d5323092d46e4
tester,testurion,test#test.org,Campaign2014,12345 ,01ef8935ad33c1a419d5a935f2eced69
tester,testuci,test#test.com,Campaign2014,123456 ,536f1e8583e3e2e1666cf9cda92664db
dzh:Desktop dzh$ md5 -s test#test.com
MD5 ("test#test.com") = b642b4217b34b1e8d3bd915fc65c4452
dzh:Desktop dzh$ md5 -s testuci
MD5 ("testuci") = c9e9ffe7eb5c77a59b77e897ff56b33c
dzh:Desktop dzh$ md5 -s Campaign2014
MD5 ("Campaign2014") = e9d6e2c2752c3d228783e0fa8134c545
dzh:Desktop dzh$ md5 -s 123456
MD5 ("123456") = e10adc3949ba59abbe56e057f20f883e
Here's something that seems to work:
awk -F "," '{
cmd="md5 -q -s "$3
cmd|getline cksum
close(cmd)
printf "%s,%s,%s,%s,%s,%s\n", $1, $2, $3, $4, $5, cksum
}' < testfile.csv
You need to close the pipe, and I'm using md5 in quiet mode rather than openssl to get the sum. Also, you need -F "," to tell awk that the fields are separated by commas.
I have maillog file with below parameters
relay=mx3.xyz.com
relay=mx3.xyz.com
relay=mx1.xyz.com
relay=mx1.xyz.com
relay=mx2.xyz.com
relay=home.xyz.abc.com
relay=127.0.0.1
I want to count all relay except 127.0.0.1
Output should like this
total relay= 6
mx3.xyz.com = 2
mx1.xyz.com = 2
mx2.xyz.com = 1
home.xyz.abc.com = 1
If you don't mind using awk:
awk -F= '$2 != "127.0.0.1" && /relay/ {count[$2]++; total++}
END { print "total relay = "total;
for (k in count) { print k" = " count[k]}
}' maillog
And you could also make do with just uniq and grep, though you won't get your total this way:
grep relay maillog | cut -d= -f2 | grep -v 127.0.0.1 | uniq -c
And if you don't hate perl:
perl -ne '/relay=(.*)/ and $1 ne "127.0.0.1" and ++$t and $h{$1}++;
END {print "total = $t\n";
print "$_ = $h{$_}\n" foreach keys %h;
}' maillog
here you go:
awk -F= '$2!="127.0.0.1"&&$2{t++;a[$2]++} END{print "total relay="t; for(x in a)print x"="a[x]}' yourfile
the output would be:
total relay=6
mx2.xyz.com=1
mx1.xyz.com=2
mx3.xyz.com=2
home.xyz.abc.com=1
I would definitely use awk for this (#Faiz's answer). However I worked out this excruciating pipeline
cut -d= -f2 filename | grep -v -e '^[[:space:]]*$' -e 127.0.0.1 | sort | uniq -c | tee >(echo "$(bc <<< $(sed -e 's#[[:alpha:]].\+$#+#' -e '$a0')) total") | sed 's/^ *\([0-9]\+\) \(.*\)/\2 = \1/' | tac
outputs
total = 6
mx3.xyz.com = 2
mx2.xyz.com = 1
mx1.xyz.com = 2
home.xyz.abc.com = 1
Please do not upvote this answer ;)