I would like to add a unique link to each line in a csv file in the following form
data1,name1,date1
data2,name2,date2
and afterward, it should look like
data1,name1,date1,somedomain.com/test-ZmQwZTdiNzIyZGExYTc1Njg1YjJjMWE2
data2,name2,date2,somedomain.com/test-ZTdmYjY4N2M5MjM0NzcxYjJjNGE0N2I5
whereby I was thinking to generate the unique strings with
date +%s | sha256sum | base64 | head -c 32 ; echo
I found approaches for part of it but I am not sure how to put it together.
You can use awk with the built-in getline command to call an external command and append the result to the end of each line.
Assuming your date is on the last field $NF
awk -F "," '{
cmd = "date -d "$NF" +%s | sha256sum | base64 | head -c 32"
cmd | getline hash
print $0 FS hash
close(cmd)
}' file.csv
Input
data1,name1,2017-11-01
data2,name2,2017-11-02
Output
data1,name1,2017-11-01,YTRiYWNmYmExMmM0NjJhYjAzNzU4ZGIx
data2,name2,2017-11-02,MTBjYjNlZTc5ZmNlMTU2NWFiY2Q2NmJk
Related
Suppose there is one file.txt in which below content text is written:
ABC/xyz
ABC/xyz/rst
EFG/ghi
I need to write a shell script that can extract the first unique word before the first /.
So as output, I want ABC and EFG to be written in one file.
You can extract the first word with cut (slash as delimiter), then pipe to sort with the -u (for "unique") option:
$ cut -d '/' -f 1 file.txt | sort -u
ABC
EFG
To get the output into a file, just redirect by appending > filename to the command. (Or pipe to tee filename to see the output and get it in a file.)
Try this :
cat file.txt | tr -s "/" ' ' | awk -F " " '{print $1}' | sort | uniq > outfile.txt
Another interesting variation:
awk -F'/' '{print $1 |" sort -u" }' file.txt > outfile.txt
Not that it matters here, but being able to pipe and redirect within awk can be very handy.
Another easy way:
cut -d"/" -f1 file.txt|uniq > out.txt
You can use a mix of cut and sort like so:
cut -d '/' -f 1 file.txt | sort -u > newfile.txt
The first line grabs any string until a slash / and outputs it into newfile.txt.
The second line sorts the text, removing any duplicate strings you might have.
Content of the file is
Feb-01-2014 one two
Mar-02-2001 three four
I'd like to format the first field (the date) to %Y%m%d format
I'm trying to use a combination of awk and date command, but somehow this is failing even though i got the feeling i'm almost there:
cat infile | awk -F"\t" '{$1=system("date -d " $1 " +%Y%m%d");print $1"\t"$2"\t"$3}' > test
this prints out date's usage pages which makes me think that the date command is triggered properly, but there is something wrong with the argument, do you see the issue somewhere?
i'm not that familiar with awk,
You don't need date for this, its simply rearranging the date string:
$ awk 'BEGIN{FS=OFS="\t"} {
split($1,t,/-/)
$1 = sprintf("%s%02d%s", t[3], (match("JanFebMarAprMayJunJulAugSepOctNovDec",t[1])+2)/3, t[2])
}1' file
20140201 one two
20010302 three four
You can use:
while read -r a _; do
date -d "$a" '+%Y%m%d'
done < file
20140201
20010302
system() returns the exit code of the command.
Instead:
cat infile | awk -F"\t" '{"date -d " $1 " +%Y%m%d" | getline d;print d"\t"$2"\t"$3}'
$ awk '{var=system("date -d "$1" +%Y%m%d | tr -d \"\\n\"");printf "%s\t%s\t%s\n", var, $2, $3}' file
201402010 one two
200103020 three four
I am trying to run a shell command from within awk for each line of a file, and the shell command needs one input argument. I tried to use system(), but it didn't recognize the input argument.
Each line of this file is an address of a file, and I want to run a command to process that file. So, for a simple example I want to use 'wc' command for each line and pass $1to wc.
awk '{system("wc $1")}' myfile
you are close. you have to concatenate the command line with awk variables:
awk '{system("wc "$1)}' myfile
You cannot grab the output of an awk system() call, you can only get the exit status. Use the getline/pipe or getline/variable/pipe constructs
awk '{
cmd = "your_command " $1
while (cmd | getline line) {
do_something_with(line)
}
close(cmd)
}' file
FYI here's how to use awk to process files whose names are stored in a file (providing wc-like functionality in this example):
gawk '
NR==FNR { ARGV[ARGC++]=$0; next }
{ nW+=NF; nC+=(length($0) + 1) }
ENDFILE { print FILENAME, FNR, nW, nC; nW=nC=0 }
' file
The above uses GNU awk for ENDFILE. With other awks just store the values in an array and print in a loop in the END section.
I would suggest another solution:
awk '{print $1}' myfile | xargs wc
the difference is that it executes wc once with multiple arguments. It often works (for example, with kill command)
Or use the pipe | as in bash then retrive the output in a variable with awk's getline, like this
zcat /var/log/fail2ban.log* | gawk '/.*Ban.*/ {print $7};' | sort | uniq -c | sort | gawk '{ "geoiplookup " $2 "| cut -f2 -d: " | getline geoip; print $2 "\t\t" $1 " " geoip}'
That line will print all the banned IPs from your server along with their origin (country) using the geoip-bin package.
The last part of that one-liner is the one that affects us :
gawk '{ "geoiplookup " $2 "| cut -f2 -d: " | getline geoip; print $2 "\t\t" $1 " " geoip}'
It simply says : run the command "geoiplookup 182.193.192.4 | -f2 -d:" ($2 gets substituted as you may guess) and put the result of that command in geoip (the | getline geoip bit). Next, print something something and anything inside the geoip variable.
The complete example and the results can be found here, an article I wrote.
I have a Text string like below
"/path/to/log/file/LOG_FILE.log.2013-10-02-15:2013-10-02 15:46:57.809 INFO - TTT005|Receive|0000293|N~0000284~YOS~TTT005~ ~000~YC~|YOS TYOS-YCUPDT1-H 20131002154657669284YCARR TTT005 Y0TD04 |1|0150520106050|001|051052020603|003|015030010101502702060510520101|000||000|| "
Here "|" is repeated several times within the string and I need to get the index of 4th occurrence of "|" character using shell-script (BASH) command. I tried to find a way using grep command's options.
Thanks.
Using awk you can do:
awk -F '|' '{print index($0, $5)-1}' file
This will print character position of fourth pipe in the file.
grep can print the byte-offset; when used with -o it prints the byte-offset of the matching part.
$ string="/path/to/log/file/LOG_FILE.log.2013-10-02-15:2013-10-02 15:46:57.809 INFO - TTT005|Receive|0000293|N~0000284~YOS~TTT005~ ~000~YC~|YOS TYOS-YCUPDT1-H 20131002154657669284YCARR TTT005 Y0TD04 |1|0150520106050|001|051052020603|003|015030010101502702060510520101|000||000||"
$ grep -ob "[^|]*" <<< "${string}" | sed '5!d' | cut -d: -f1
132
Alternatively, without using grep:
$ newstring=$(echo "${string}" | cut -d\| -f5-)
$ echo $(( ${#string} - ${#newstring} ))
132
I am really new a bash and I was trying to search a csv file column for a value and then add a counter. I found this online but it prints it and I have been trying to count how many times an R shows up and not print the whole thing.
awk -F "\"*,\"*" '{print $2}' $file
The csv file is like:
12345,R,N,N,Y,N,N,N,Bob Builder
I am looking for R in column 2. Can anybody point me in the right direction?
The following should do what you want (where file.csv is your csv file):
Case sensitive version:
cut -f 2 -d , file.csv | grep -c R
Case insensitive version:
cut -f 2 -d , file.csv | grep -ic R
Explanation
cut -f 2 -d , file.csv
This takes each line of file.csv and extracts the specified fields. The -f 2 option means extract field 2 and the -d , means use a ',' as the field delimiter. The output of this is then piped to grep.
grep -c R This looks for lines containing 'R'. Since it is passed the contents of the previous cut command, it is looking for an 'R' in field two. The -c option means count the number of matching lines.
Using awk only:
awk -F "\",\"" '{if ($2 == "R") cnt++} END{print cnt}' file
For a fun - perl only - this count everything.
perl -F, -anle 'map{$cnt{$_}{$F[$_]}++}0..$#F;END{print $cnt{1}{R}}'