pick unique lowest row in unix - bash

I have a 2 fields in below file.
Field 1 is name and field 2 is transaction date.
I want unique name with lowest transaction date
cat abc.lst
John_20130201
David_20130202
Scott_20130203
Li_20130201
John_20130202
Scott_20130201
David_20130201
Li_20130204
Torres_20121231
output desired
John_20130201
Li_20130201
Scott_20130201
David_20130201
Torres_20121231

sort -t_ -nk2 abc.lst | awk -F_ '!a[$1]++'
or save a pipe and do this -
awk -F_ '!a[$1]++' <(sort -t_ -nk2 abc.lst)

this one-liner selects those entries. but doesn't sort the output:
awk -F_ '$1 in a{a[$1]=$2>a[$1]?a[$1]:$2;next}{a[$1]=$2}
END{for(x in a)print x"_"a[x]}' file

awk -F_ '{ l = lowest[$1]; if (!l || $2 < l) { lowest[$1] = $2 } }
END { for (name in lowest) print name FS lowest[name] }'
(if output order doesn't matter)

A pure bash solution (not ordered by name):
declare -A hash
while IFS=_ read nam val; do
[[ -z "${hash[$nam]}" || $val < "${hash[$nam]}" ]] && hash[$nam]=$val
done <abc.lst
for i in ${!hash[*]}; do echo ${i}_${hash[$i]}; done
Output:
David_20130201
Li_20130201
John_20130201
Scott_20130201
Torres_20121231

Use the sort command:
cat abc.lst | sort -u -t_ -k1,1
And read about it on man sort

Related

Problem replacing numbers with words from a file

I have two files:
In the first one (champions.csv) I have the number and the name of some LoL champions
1,Annie
2,Olaf
3,Galio
4,Twisted Fate
5,Xin Zhao
6,Urgot
7,LeBlanc
8,Vladimir
9,Fiddlesticks
10,Kayle
11,Master Yi
In the second one (top.csv) I have couples of champions (first and second column) and the number of won matches by them (third column)
2,1,3
3,1,5
4,1,6
5,1,1
6,1,10
7,1,9
8,1,11
10,4,12
7,5,2
3,3,6
I need to substitute the numbers of the second file with the respective names of the first file.
I tried using awk and storing the names in an array but it didn't work
lengthChampions=`cat champions.csv | wc -l`
for i in `seq 1 $length`; do
name=`cat champions.csv | head -$i | tail -1 | awk -F',' '{print $2}'`
champions[$i]=$name
done
for i in `seq 1 10`; do
champion1=${champions[`cat top.csv | head -$i | tail -1 | awk -F',' '{print $1}'`]}
champion2=${champions[`cat top.csv | head -$i | tail -1 | awk -F',' '{print $2}'`]}
awk -F',' 'NR=='$i' {$1='$champion1'} {$2='$champion2'} {print $1","$2","$3}' top.csv > tmptop.csv && mv tmptop.csv top.csv
done
I would like a solution for this problem maybe with less code than this. The result should be something like that (not the actual result for my files):
Ahri,Ashe,1502
Camille,Ezreal,892
Ekko,Dr. Mundo,777
Fizz,Caitlyn,650
Gnar,Ezreal,578
Fiora,Irelia,452
Janna,Graves,321
Jax,Jinx,245
Ashe,Corki,151
Katarina,Lee Sin,102
this can be accomplished in a single awk call. associate numbers with champions in an array and use it for replacing numbers in second file.
awk 'BEGIN{FS=OFS=","} NR==FNR{a[$1]=$2;next} {$1=a[$1];$2=a[$2]} 1' champions.csv top.csv
Olaf,Annie,3
Galio,Annie,5
Twisted Fate,Annie,6
Xin Zhao,Annie,1
Urgot,Annie,10
LeBlanc,Annie,9
Vladimir,Annie,11
Kayle,Twisted Fate,12
LeBlanc,Xin Zhao,2
Galio,Galio,6
in case there should be some numbers in top.csv that don't exist in champions.csv, use the following instead to prevent those numbers from being deleted:
awk 'BEGIN{FS=OFS=","} NR==FNR{a[$1]=$2;next} ($1 in a){$1=a[$1]} ($2 in a){$2=a[$2]} 1' champions.csv top.csv
Assuming that the 2nd column of champions.csv isn't too huge, (i.e. larger than the maximum size of the bash array ${c[#]}), then using bash and cut:
readarray -t -O 1 c < <(cut -d, -f2 champions.csv)
while IFS=, read x y z; do
printf '%s,%s,%s\n' "${c[$x]}" "${c[$y]}" "$z"
done < top.csv
Output:
Olaf,Annie,3
Galio,Annie,5
Twisted Fate,Annie,6
Xin Zhao,Annie,1
Urgot,Annie,10
LeBlanc,Annie,9
Vladimir,Annie,11
Kayle,Twisted Fate,12
LeBlanc,Xin Zhao,2
Galio,Galio,6

awk or shell command to count occurence of value in 1st column based on values in 4th column

I have a large file with records like below :
jon,1,2,apple
jon,1,2,oranges
jon,1,2,pineaaple
fred,1,2,apple
tom,1,2,apple
tom,1,2,oranges
mary,1,2,apple
I want to find the no of person (names in col 1) have apple and oranges both. And the command should take as less memory as possible and should be fast. Any help appreciated!
Output :
awk/sed file => 2 (jon and tom)
Using awk is pretty easy:
awk -F, \
'$4 == "apple" { apple[$1]++ }
$4 == "oranges" { orange[$1]++ }
END { for (name in apple) if (orange[name]) print name }' data
It produces the required output on the sample data file:
jon
tom
Yes, you could squish all the code onto a single line, and shorten the names, and otherwise obfuscate the code.
Another way to do this avoids the END block:
awk -F, \
'$4 == "apple" { if (apple[$1]++ == 0 && orange[$1]) print $1 }
$4 == "oranges" { if (orange[$1]++ == 0 && apple[$1]) print $1 }' data
When it encounters an apple entry for the first time for a given name, it checks to see if the name also (already) has an entry for oranges and prints it if it has; likewise and symmetrically, if it encounters an orange entry for the first time for a given name, it checks to see if the name also has an entry for apple and prints it if it has.
As noted by Sundeep in a comment, it could use in:
awk -F, \
'$4 == "apple" { if (apple[$1]++ == 0 && $1 in orange) print $1 }
$4 == "oranges" { if (orange[$1]++ == 0 && $1 in apple) print $1 }' data
The first answer could also use in in the END loop.
Note that all these solutions could be embedded in a script that would accept data from standard input (a pipe or a redirected file) — they have no need to read the input file twice. You'd replace data with "$#" to process file names if they're given, or standard input if no file names are specified. This flexibility is worth preserving when possible.
With awk
$ awk -F, 'NR==FNR{if($NF=="apple") a[$1]; next}
$NF=="oranges" && ($1 in a){print $1}' ip.txt ip.txt
jon
tom
This processes the input twice
In first pass, add key to an array if last field is apple (-F, would set , as input field separator)
In second pass, check if last field is oranges and if first field is a key of array a
To print only number of matches:
$ awk -F, 'NR==FNR{if($NF=="apple") a[$1]; next}
$NF=="oranges" && ($1 in a){c++} END{print c}' ip.txt ip.txt
2
Further reading: idiomatic awk for details on two file processing and awk idioms
I did a work around and used only grep and comm commands.
grep "apple" file | cut -d"," -f1 | sort > file1
grep "orange" file | cut -d"," -f1 | sort > file2
comm -12 file1 file2 > names.having.both.apple&orange
comm -12 shows only the common names between the 2 files.
Solution from Jonathan also worked.
For the input:
jon,1,2,apple
jon,1,2,oranges
jon,1,2,pineaaple
fred,1,2,apple
tom,1,2,apple
tom,1,2,oranges
mary,1,2,apple
the command:
sed -n "/apple\|oranges/p" inputfile | cut -d"," -f1 | uniq -d
will output a list of people with both apples and oranges:
jon
tom
Edit after comment: For an for input file where lines are not ordered by 1st column and where each person can have two or more repeated fruits, like:
jon,1,2,apple
fred,1,2,apple
fred,1,2,apple
jon,1,2,oranges
jon,1,2,pineaaple
jon,1,2,oranges
tom,1,2,apple
mary,1,2,apple
tom,1,2,oranges
This command will work:
sed -n "/\(apple\|oranges\)$/ s/,.*,/,/p" inputfile | sort -u | cut -d, -f1 | uniq -d

How do I read a file into a matrix in bash?

I have a text file like this
A;green;3
B;blue;2
A;red;4
C;red;2
C;blue;3
B;green;3
I have to write a script that if started with parameter "B" gives me the color of the row with the biggest number (from the rows starting with B). In this case it would be the last line, so the output would be "green".
How do I separate the elements by ";"-s and newlines and store them into a matrix so I can work with it? Do I even need to do that, or is there an easier solution?
Thanks in advance!
awk + sort solution:
awk -v param="B" -F';' '$1==param{ print $2; exit }' <(sort -t';' -k1,1 -k3nr file.txt)
The output:
green
Or in addition to #William Pursell's answer - to extract only color value:
awk -F';' '/^B/ && $3>m{ m=$3; c=$2 }END{ print c }' file.txt
green
Via bash script:
get_max_color.sh script:
#!/bin/bash
awk -F';' -v p="$1" '$0~"^"p && $3>m{ m=$3; c=$2 }END{ print c }' "$2"
Usage:
bash get_max_color.sh B "file.txt"
green
You just need to filter out the appropriate lines and store the one with the max value seen. The obvious solution is:
awk '/^B/ && $3 > m{s=$0} END { print s}' FS=\; input
To use a parameter, do
awk "/^$1/"' && $3 > m{s=$0} END { print s}' FS=\; input
A non-awk solution, possibly less elegant and slower than the already proposed solution:
sort -r -t\; -k1,1 -k3 file | uniq -w1 | grep "B" | cut -f2 -d\;
awk to the rescue!
probably not understood what you want to achieve but
awk -v key="$c" -F\; 'm[$1]<$3{m[$1]=$3; c[$1]=$2} END{print c[key]}' file
will pick the highest coded color from the file for the key
some poor usage pattern
$ for c in A B C;
do
echo $c "->" $(awk -v key="$c" -F\; 'm[$1]<$3 {m[$1]=$3; c[$1]=$2}
END {print c[key]}' file);
done;
A -> red
B -> green
C -> blue
you can probably implement the rest of the script in awk and do this process once.
Or, perhaps you want an associative array, can be done as below:
$ declare -A colors;
while IFS=\; read k c _ ;
do
colors[$k]=$c;
done < <(sort -t\; -k1,1 -k3nr file | uniq -w1)
$ echo ${colors[A]}
red

sed: to merge contents of one CSV to another based on variable match

Attempting to correlate GEO IP data from one CSV to an access log of another.
Sample lines of data:
CSV1
Bob,App1,8-Jan-15,8.8.8.8
April,App3,2-Jan-15,5.5.5.5
George,App2,1-Feb-15,8.8.8.8
CSV2
8.8.8.8,US,United States,CA,California,Mountain View,94040,America/Los_Angeles
5.5.5.5,US,United States,FL,Florida,Miami
I want to search CSV1 for any IP listed in in CSV2 and append fields 1,2,4 to CSV1 when the IP matches.
So far I have, but I'm getting errors I believe at the SED portion.
#!/bin/bash
for LINE in $( cat CSV2 | awk -F',' '{print $1 "," $2 "," $4}' )
do
$IP = $( echo $LINE | cut -d, -f1 )
sed -i.bak "s/"$IP/\""$LINE\"" CSV1
done
Desired Output:
Bob,App1,8-Jan-15,8.8.8.8,United States,CA
Dawn,App3,2-Jan-15,5.5.5.5,United States,FL
George,App2,1-Feb-15,8.8.8.8,United States,CA
Using the join command:
$ join -t , -1 4 -2 1 -o 1.1,1.2,1.3,1.4,2.3,2.4 <(sort -t, -k4,4 CSV1) <(sort -t, CSV2)
Bob,App1,8-Jan-15,8.8.8.8,United States,CA
Using sort is overkill here, but for >1 line files, join requires the files to be sorted on the join key
With awk
$ awk -F, -v OFS=, 'NR == FNR {a[$1] = $3 OFS $4; next} $4 in a {print $0, a[$4]}' CSV2 CSV1
Bob,App1,8-Jan-15,8.8.8.8,United States,CA

get Nth line in file after parsing another file

I have one of my large file as
foo:43:sdfasd:daasf
bar:51:werrwr:asdfa
qux:34:werdfs:asdfa
foo:234:dfasdf:dasf
qux:345:dsfasd:erwe
...............
here 1st column foo, bar and qux etc. are file names. and 2nd column 43,51, 34 etc. are line numbers. I want to print Nth line(specified by 2nd column) for each file(specified in 1st column).
How can I automate above in unix shell.
Actually above file is generated while compiling and I want to print warning line in code.
-Thanks,
while IFS=: read name line rest
do
head -n $line $name | tail -1
done < input.txt
while IFS=: read file line message; do
echo "$file:$line - $message:"
sed -n "${line}p" "$file"
done <yourfilehere
awk 'NR==4 {print}' yourfilename
or
cat yourfilename | awk 'NR==4 {print}'
The above one will work for 4th line in your file.You can change the number as per your requirement.
Just in awk, but probably worse performance than answers by #kev or #MarkReed.
However it does process each file just once. Requires GNU awk
gawk -F: '
BEGIN {OFS=FS}
{
files[$1] = 1
lines[$1] = lines[$1] " " $2
msgs[$1, $2] = $3
}
END {
for (file in files) {
split(lines[file], l, " ")
n = asort(l)
count = 0
for (i=1; i<=n; i++) {
while (++count <= l[i])
getline line < file
print file, l[i], msgs[file, l[i]]
print line
}
close(file)
}
}
'
This might work for you:
sed 's/^\([^,]*\),\([^,]*\).*/sed -n "\2p" \1/' file |
sort -k4,4 |
sed ':a;$!N;s/^\(.*\)\(".*\)\n.*"\(.*\)\2/\1;\3\2/;ta;P;D' |
sh
sed -nr '3{s/^([^:]*):([^:]*):.*$/\1 \2/;p}' namesNnumbers.txt
qux 34
-n no output by default,
-r regular expressions (simplifies using the parens)
in line 3 do {...;p} (print in the end)
s ubstitute foobarbaz with foo bar
So to work with the values:
fnUln=$(sed -nr '3{s/^([^:]*):([^:]*):.*$/\1 \2/;p}' namesNnumbers.txt)
fn=$(echo ${fnUln/ */})
ln=$(echo ${fnUln/* /})
sed -n "${ln}p" "$fn"

Resources