MD5 checksum of CSV file column rows in terminal - bash

I'm on OS X.
I have a large list of records and I need to md5 (or any other hash function) nth column and add it to a new column.
There was something that almost worked except that it did not:
awk '{
tmp="echo " $3 " | openssl md5 | cut -f1 -d\" \"" tmp | getline cksum $2=","cksum print }'< file.csv
Thanks for help.
EDIT:
My CSV:
fname,lname,email,cpid,mcssid
tester,testurion,test#test.org,Campaign2014,12345
tester,testuci,test#test.com,Campaign2014,123456
Results:
dzh:Desktop dzh$ awk '{
tmp="echo "$0" | openssl md5 | cut -f5 -d\" \""
tmp | getline cksum
$2=","cksum
print
}'< testfile.csv
fname,lname,email,cpid,mcssid ,60a0c14d2af1ac9b429d5323092d46e4
tester,testurion,test#test.org,Campaign2014,12345 ,01ef8935ad33c1a419d5a935f2eced69
tester,testuci,test#test.com,Campaign2014,123456 ,536f1e8583e3e2e1666cf9cda92664db
dzh:Desktop dzh$ md5 -s test#test.com
MD5 ("test#test.com") = b642b4217b34b1e8d3bd915fc65c4452
dzh:Desktop dzh$ md5 -s testuci
MD5 ("testuci") = c9e9ffe7eb5c77a59b77e897ff56b33c
dzh:Desktop dzh$ md5 -s Campaign2014
MD5 ("Campaign2014") = e9d6e2c2752c3d228783e0fa8134c545
dzh:Desktop dzh$ md5 -s 123456
MD5 ("123456") = e10adc3949ba59abbe56e057f20f883e

Here's something that seems to work:
awk -F "," '{
cmd="md5 -q -s "$3
cmd|getline cksum
close(cmd)
printf "%s,%s,%s,%s,%s,%s\n", $1, $2, $3, $4, $5, cksum
}' < testfile.csv
You need to close the pipe, and I'm using md5 in quiet mode rather than openssl to get the sum. Also, you need -F "," to tell awk that the fields are separated by commas.

Related

CGI output Shell - md5sum wrong value

I need to split an IPv4 address into octets, calculate the MD5 hash of each and print as a CGI output:
IP1=$(echo ${REMOTE_ADDR} | tr "." " " | awk '{print $1'} | md5sum | cut -c1-32)
printf $IP1
In this example, REMOTE_ADDR = 192.168.20.100
But the MD5 of 192 gives me a wrong MD5 IP1=6be7de648baa9067fa3087928d5ab0b4, while it should be 58a2fc6ed39fd083f55d4182bf88826d
If I do this:
cat /tmp/test.txt | md5sum | cut -c1-32
where test.txt contains 192,
I get the correct MD5 hash, i.e 58a2fc6ed39fd083f55d4182bf88826d
What am I doing wrong?
Your awk's print is adding a newline so you're computing the md5 of "192\n", not "192". Use
IP1=$(printf "%s" "${REMOTE_ADDR%.*.*.*}" | md5sum | cut -c1-32)
instead, which uses shell parameter expansion to remove all but the first octet of the IP address, and printf to write it without the newline.
As #Shawn said, the problem was on awk print.
Adding tr -d '\n' solved the problem.
Now it is working correctly; for other octects I had to change print $2..etc on awk
IP1=$(echo ${REMOTE_ADDR} | awk -F. '{print $1'} | tr -d '\n' | md5sum | cut -c1-32 )

Unique ending lines for csv with bash

I would like to add a unique link to each line in a csv file in the following form
data1,name1,date1
data2,name2,date2
and afterward, it should look like
data1,name1,date1,somedomain.com/test-ZmQwZTdiNzIyZGExYTc1Njg1YjJjMWE2
data2,name2,date2,somedomain.com/test-ZTdmYjY4N2M5MjM0NzcxYjJjNGE0N2I5
whereby I was thinking to generate the unique strings with
date +%s | sha256sum | base64 | head -c 32 ; echo
I found approaches for part of it but I am not sure how to put it together.
You can use awk with the built-in getline command to call an external command and append the result to the end of each line.
Assuming your date is on the last field $NF
awk -F "," '{
cmd = "date -d "$NF" +%s | sha256sum | base64 | head -c 32"
cmd | getline hash
print $0 FS hash
close(cmd)
}' file.csv
Input
data1,name1,2017-11-01
data2,name2,2017-11-02
Output
data1,name1,2017-11-01,YTRiYWNmYmExMmM0NjJhYjAzNzU4ZGIx
data2,name2,2017-11-02,MTBjYjNlZTc5ZmNlMTU2NWFiY2Q2NmJk

"Resource temporarily unavailable" when using Awk and Fork

I wrote a script that it takes a csv file and replace the third column with the HASH of the second column with some string (Key).
After 256 rows, I got an error
awk: cmd. line:3: (FILENAME=C:/hanatest/test.csv FNR=257) fatal:
cannot create child process for `echo -n
E5360712819A7EF1584E2FDA06287379FF5CC3E0A5M7J6PiQMaSBut52ZQhVlS4 |
openssl ripemd160 | cut -f2 -d" "' (fork: Resource temporarily
unavailable)
I change the CSV file and I got always the same error after 256 rows.
here is my code:
awk -F "," -v env_var="$key" '{
tmp="echo -n "$2env_var" | openssl ripemd160 | cut -f2 -d\" \""
tmp | getline cksum
$3=toupper(cksum)
print
}' //test/source.csv > //ziel.csv
Can you please help me ?
Here my sample input:
25,XXXXXXXXXXXXXXXXXX,?
44,YYYYYYYYYYYYYYYYYY,?
84,ZZZZZZZZZZZZZZZZZZ,?
and here my expected output:
25,XXXXXXXXXXXXXXXXXX,301E2A8BF32A7046F65E48DF32CF933F6CAEC529
44,YYYYYYYYYYYYYYYYYY,301E2A8BF32A7046F65E48EF32CF933F6CAEC529
84,ZZZZZZZZZZZZZZZZZZ,301E2A8BF32A7046F65E48EF33CF933F6CAEC529
Thanks in advance
Let's make your code more robust first:
awk -F "," -v env_var="$key" '{
tmp="echo -n \047" $2 env_var "\047 | openssl ripemd160 | cut -f2 -d\047 \047"
if ( (tmp | getline cksum) > 0 ) {
$3 = toupper(cksum)
}
close(tmp)
print
}' /test/source.csv > /ziel.csv
Now - do you still have a problem? If you're considering using getline make sure to read and fully understand the correct uses and all of the caveats discussed at http://awk.freeshell.org/AllAboutGetline.

Unix: Get the latest entry from the file

I have a file where there are name and time. I want to keep the entry only with the latest time. How do I do it?
for example:
>cat user.txt
"a","03-May-13
"b","13-May-13
"a","13-Aug-13
"a","13-May-13
I am using command sort -u user.txt. It is giving the following output:
"a","11-May-13
"a","13-Aug-13
"a","13-May-13
"b","13-May-13
but I want the following output.
"a","13-Aug-13
"b","13-May-13
Can someone help?
Thanks.
Try this:
sort -t, -k2 user.txt | awk -F, '{a[$1]=$2}END{for(e in a){print e, a[e]}}' OFS=","
Explanation:
Sort the entries by the date field in ascending order, pipe the sorted result to awk, which simply uses the first field as a key, so only the last entry of the entries with an identical key will be kept and finally output.
EDIT
Okay, so I can't sort the entries lexicographically. the date need to be converted to timestamp so it can be compared numerically, use the following:
awk -F",\"" '{ cmd=" date --date " $2 " +%s "; cmd | getline ts; close(cmd); print ts, $0, $2}' user.txt | sort -k1 | awk -F"[, ]" '{a[$2]=$3}END{for(e in a){print e, a[e]}}' OFS=","
If you are using MacOS, use gdate instead:
awk -F",\"" '{ cmd=" gdate --date " $2 " +%s "; cmd | getline ts; close(cmd); print ts, $0, $2}' user.txt | sort -k1 | awk -F"[, ]" '{a[$2]=$3}END{for(e in a){print e, a[e]}}' OFS=","
I think you need to sort year, month and day.
Can you try this
awk -F"\"" '{print $2"-"$4}' data.txt | sort -t- -k4 -k3M -k2 | awk -F- '{kv[$1]=$2"-"$3"-"$4}END{for(k in kv){print k,kv[k]}}'
For me this is doing the job. I am sorting on the Month and then applying the logic that #neevek used. Till now I am unable to find a case that fails this. But I am not sure if this is a full proof solution.
sort -t- -k2 -M user1.txt | awk -F, '{a[$1]=$2}END{for(e in a){print e, a[e]}}' OFS=","
Can someone tell me if this solution has any issues?
How about this?
grep `cut -d'"' -f4 user.txt | sort -t- -k 3 -k 2M -k 1n | tail -1` user.txt
Explaining: using sort as you have done, get the latest entry with tail -1, extract that date (second column when cutting with a comma delimiter) and then sort and grep on that.
edit: fixed to sort via month.

Using awk and df (disk free) to show only mount name and space used

What would be the correct CL sequence to execute a df -h and only print out the mount name and used space (percentage)? I'm trying to do a scripted report for our servers.
I tried
df -h | awk '{print $1 $4}'
which spits out
$df -h | awk '{print $1 $4}'
FilesystemAvail
/dev/sda164G
udev3.9G
tmpfs1.6G
none5.0M
none3.9G
none100M
/home/richard/.Private64G
How would you change this to add spacing? Am I selecting the right columns?
Try this:
df -h | awk '{if ($1 != "Filesystem") print $1 " " $5}'
Or just
df -h | awk '{print $1 " " $5}'
if you want to keep the headers.
You are almost there:
df -h | awk 'NR>1{print $1, $5}'
The issues with your code are what input to process, and how to format the output.
As an example, this awk selects records that have a % symbol at the end of field five, and put a space between the two output fields.
df -h | awk '$5 ~ /\%$/ {print $1 " " $5 }'
Everything else is just refining those two things.

Resources