Problem replacing numbers with words from a file - shell

I have two files:
In the first one (champions.csv) I have the number and the name of some LoL champions
1,Annie
2,Olaf
3,Galio
4,Twisted Fate
5,Xin Zhao
6,Urgot
7,LeBlanc
8,Vladimir
9,Fiddlesticks
10,Kayle
11,Master Yi
In the second one (top.csv) I have couples of champions (first and second column) and the number of won matches by them (third column)
2,1,3
3,1,5
4,1,6
5,1,1
6,1,10
7,1,9
8,1,11
10,4,12
7,5,2
3,3,6
I need to substitute the numbers of the second file with the respective names of the first file.
I tried using awk and storing the names in an array but it didn't work
lengthChampions=`cat champions.csv | wc -l`
for i in `seq 1 $length`; do
name=`cat champions.csv | head -$i | tail -1 | awk -F',' '{print $2}'`
champions[$i]=$name
done
for i in `seq 1 10`; do
champion1=${champions[`cat top.csv | head -$i | tail -1 | awk -F',' '{print $1}'`]}
champion2=${champions[`cat top.csv | head -$i | tail -1 | awk -F',' '{print $2}'`]}
awk -F',' 'NR=='$i' {$1='$champion1'} {$2='$champion2'} {print $1","$2","$3}' top.csv > tmptop.csv && mv tmptop.csv top.csv
done
I would like a solution for this problem maybe with less code than this. The result should be something like that (not the actual result for my files):
Ahri,Ashe,1502
Camille,Ezreal,892
Ekko,Dr. Mundo,777
Fizz,Caitlyn,650
Gnar,Ezreal,578
Fiora,Irelia,452
Janna,Graves,321
Jax,Jinx,245
Ashe,Corki,151
Katarina,Lee Sin,102

this can be accomplished in a single awk call. associate numbers with champions in an array and use it for replacing numbers in second file.
awk 'BEGIN{FS=OFS=","} NR==FNR{a[$1]=$2;next} {$1=a[$1];$2=a[$2]} 1' champions.csv top.csv
Olaf,Annie,3
Galio,Annie,5
Twisted Fate,Annie,6
Xin Zhao,Annie,1
Urgot,Annie,10
LeBlanc,Annie,9
Vladimir,Annie,11
Kayle,Twisted Fate,12
LeBlanc,Xin Zhao,2
Galio,Galio,6
in case there should be some numbers in top.csv that don't exist in champions.csv, use the following instead to prevent those numbers from being deleted:
awk 'BEGIN{FS=OFS=","} NR==FNR{a[$1]=$2;next} ($1 in a){$1=a[$1]} ($2 in a){$2=a[$2]} 1' champions.csv top.csv

Assuming that the 2nd column of champions.csv isn't too huge, (i.e. larger than the maximum size of the bash array ${c[#]}), then using bash and cut:
readarray -t -O 1 c < <(cut -d, -f2 champions.csv)
while IFS=, read x y z; do
printf '%s,%s,%s\n' "${c[$x]}" "${c[$y]}" "$z"
done < top.csv
Output:
Olaf,Annie,3
Galio,Annie,5
Twisted Fate,Annie,6
Xin Zhao,Annie,1
Urgot,Annie,10
LeBlanc,Annie,9
Vladimir,Annie,11
Kayle,Twisted Fate,12
LeBlanc,Xin Zhao,2
Galio,Galio,6

Related

uniq sort parsing

I have one file with field separated by ";", like this:
test;group;10.10.10.10;action2
test2;group;10.10.13.11;action1
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
test5;group2;10.10.10.12;action5
test6;group4;10.10.13.11;action8
I would like to identify all non-unique IP addresses (3rd column). With the example the extract should be:
test;group;10.10.10.10;action2
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
test2;group;10.10.13.11;action1
test6;group4;10.10.13.11;action8
Sorted by IP address (3rd column).
Ssing simple commands like cat, uniq, sort, awk (not Perl, not Python, only shell).
Any idea?
$ awk -F';' 'NR==FNR{a[$3]++;next}a[$3]>1' file file|sort -t";" -k3
test;group;10.10.10.10;action2
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
test2;group;10.10.13.11;action1
test6;group4;10.10.13.11;action8
awk picks all duplicated ($3) lines
sort sorts by ip
You can also try this solution using grep, cut, sort, uniq, and a casual process substitution in the middle.
grep -f <(cut -d ';' -f3 file | sort | uniq -d) file | sort -t ';' -k3
It is not really elegant (I actually prefer the awk answer given above), but I think worth sharing, since it accomplishes what you want.
here is another awk assisted pipeline
$ awk -F';' '{print $0 "\t" $3}' file | sort -sk2 | uniq -Df1 | cut -f1
test;group;10.10.10.10;action2
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
test2;group;10.10.13.11;action1
test6;group4;10.10.13.11;action8
single pass, so special caching; also keeps the original order (stable sorting). Assumes tab doesn't appear in the fields.
This is very similar to Kent's answer, but with a single pass through the file. The tradeoff is memory: you need to store the lines to keep. This uses GNU awk for the PROCINFO variable.
awk -F';' '
{count[$3]++; lines[$3] = lines[$3] $0 ORS}
END {
PROCINFO["sorted_in"] = "#ind_str_asc"
for (key in count)
if (count[key] > 1)
printf "%s", lines[key]
}
' file
The equivalent perl
perl -F';' -lane '
$count{$F[2]}++; push #{$lines{$F[2]}}, $_
} END {
print join $/, #{$lines{$_}}
for sort grep {$count{$_} > 1} keys %count
' file
awk + sort + uniq + cut:
$ awk -F ';' '{print $0,$3}' <file> | sort -k2 | uniq -D -f1 | cut -d' ' -f1
sort + awk
$ sort -t';' -k3,3 | awk -F ';' '($3==k){c++;b=b"\n"$0}($3!=k){if (c>1) print b;c=1;k=$3;b=$0}END{if(c>1)print b}
awk
$ awk -F ';' '{b[$3"_"++k[$3]]=$0; }
END{for (i in k) if(k[i]>1) for(j=1;j<=k[i];j++) print b[i"_"j] } <file>
This buffers the full file (same as sort does) and keeps track how many times a key k is appearing. At the end, if the key appears more then ones, print the full set.
test2;group;10.10.13.11;action1
test6;group4;10.10.13.11;action8
test;group;10.10.10.10;action2
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
If you want it sorted :
$ awk -F ';' '{b[$3"_"++k[$3]]=$0; }
END{ asorti(k,l);
for (i in l) if(k[l[i]]>1) for(j=1;j<=k[l[i]];j++) print b[l[i]"_"j] } <file>

How do I read a file into a matrix in bash?

I have a text file like this
A;green;3
B;blue;2
A;red;4
C;red;2
C;blue;3
B;green;3
I have to write a script that if started with parameter "B" gives me the color of the row with the biggest number (from the rows starting with B). In this case it would be the last line, so the output would be "green".
How do I separate the elements by ";"-s and newlines and store them into a matrix so I can work with it? Do I even need to do that, or is there an easier solution?
Thanks in advance!
awk + sort solution:
awk -v param="B" -F';' '$1==param{ print $2; exit }' <(sort -t';' -k1,1 -k3nr file.txt)
The output:
green
Or in addition to #William Pursell's answer - to extract only color value:
awk -F';' '/^B/ && $3>m{ m=$3; c=$2 }END{ print c }' file.txt
green
Via bash script:
get_max_color.sh script:
#!/bin/bash
awk -F';' -v p="$1" '$0~"^"p && $3>m{ m=$3; c=$2 }END{ print c }' "$2"
Usage:
bash get_max_color.sh B "file.txt"
green
You just need to filter out the appropriate lines and store the one with the max value seen. The obvious solution is:
awk '/^B/ && $3 > m{s=$0} END { print s}' FS=\; input
To use a parameter, do
awk "/^$1/"' && $3 > m{s=$0} END { print s}' FS=\; input
A non-awk solution, possibly less elegant and slower than the already proposed solution:
sort -r -t\; -k1,1 -k3 file | uniq -w1 | grep "B" | cut -f2 -d\;
awk to the rescue!
probably not understood what you want to achieve but
awk -v key="$c" -F\; 'm[$1]<$3{m[$1]=$3; c[$1]=$2} END{print c[key]}' file
will pick the highest coded color from the file for the key
some poor usage pattern
$ for c in A B C;
do
echo $c "->" $(awk -v key="$c" -F\; 'm[$1]<$3 {m[$1]=$3; c[$1]=$2}
END {print c[key]}' file);
done;
A -> red
B -> green
C -> blue
you can probably implement the rest of the script in awk and do this process once.
Or, perhaps you want an associative array, can be done as below:
$ declare -A colors;
while IFS=\; read k c _ ;
do
colors[$k]=$c;
done < <(sort -t\; -k1,1 -k3nr file | uniq -w1)
$ echo ${colors[A]}
red

How to count duplicates in Bash Shell

Hello guys I want to count how many duplicates there are in a column of a file and put the number next to them. I use awk and sort like this
awk -F '|' '{print $2}' FILE | sort | uniq -c
but the count (from the uniq -c) appears at the left side of the duplicates.
Is there any way to put the count on the right side instead of the left, using my code?
Thanks for your time!
Though I believe you shouls show us your Input_file so that we could create a single command or so for this requirement, since you have't shown Input_file so trying to solve it with your command itself.
awk -F '|' '{print $2}' FILE | sort | uniq -c | awk '{for(i=2;i<=NF;i++){printf("%s ",$i)};printf("%s%s",$1,RS)}'
You can just use awk to reverse the output like below:
awk -F '|' '{print $2}' FILE | sort | uniq -c | awk {'print $2" "$1'}
awk -F '|' '{print $2}' FILE | sort | uniq -c| awk '{a=$1; $1=""; gsub(/^ /,"",$0);print $0,a}'
You can use awk to calculate the amount of duplicates, so your command can be simplified as followed,
awk -F '|' '{a[$2]++}END{for(i in a) print i,a[i]}' FILE | sort
Check this command:
awk -F '|' '{c[$2]++} END{for (i in c) print i, c[i]}' FILE | sort
Use awk to do the counting is enough. If you do not want to sort by browser, remove the pipe and sort.

How to obtain the value for the 3rd one from the bottom in bash?

I have a line like this
3672975 3672978 3672979
awk '{print $1}' will return the first number 3672975
If I still want the first number, but indicating it is the 3rd one from the bottom, how should I adjust awk '{print $-3}'?
The reason is, I have hundreds of numbers, and I always want to obtain the 3rd one from the bottom.
Can I use awk to obtain the total number of items first, then do the subtraction?
$NF is the last field, $(NF-1) is the one before the last etc., so:
$ awk '{print $(NF-2)}'
for example:
$ echo 3672975 3672978 3672979 | awk '{print $(NF-2)}'
3672975
Edit:
$ echo 1 10 100 | awk '{print $(NF-2)}'
1
or with cut and rev
echo 1 2 3 4 | rev | cut -d' ' -f 3 | rev
2

Awk and head not identifying columns properly

Here is my code that I want to use to separate 3 columns from hist.txt into 2 separate files, hist1.dat with first and second column and hist2.dat with first and third column. The columns in hist.txt may be separated with more than one space. I want to save in histogram1.dat and histogram2.dat the first n lines until the last nonzero value.
The script creates histogram1.dat correct, but histogram2.dat contains all the lines from hist2.dat.
hist.txt is like :
http://pastebin.com/JqgSKZrP
#!bin/bash
sed 's/\t/ /g' hist.txt | awk '{print $1 " " $2;}' > hist1.dat
sed 's/\t/ /g' hist.txt | awk '{print $1 " " $3;}' > hist2.dat
head -n $( awk 'BEGIN {last=1}; {if($2!=0) last=NR};END {print last}' hist1.dat) hist1.dat > histogram1.dat
head -n $( awk 'BEGIN {last=1}; {if($2!=0) last=NR};END {print last}' hist2.dat) hist2.dat > histogram2.dat
What is the cause of this problem? Might it be due to some special restriction with head?
Thanks.
For your first histogram, try
awk '$2 ~ /000000/{exit}{print $1, $2}' hist.txt
and for your second:
awk '$3 ~ /000000/{exit}{print $1, $3}' hist.txt
Hope I understood you correctly...

Resources