not getting the latest against the uniq with awk - bash
65 20150427 000000000
8 20120930 000000000
18 20130626 000000000
6 20140505 000000000
1 20150603 000000000
18 20140712 000000000
65 20150502 000000000
10 20150113 000000000
92 20140707 000001000
20 20130530 000000000
11 20141231 000000000
15 20140516 000000000
1 20150523 000000000
18 20130620 120014000
7 20140505 000000000
Above is the file. First col is the uniq id followed by date & time. Whenever i am running the below query am not getting the desired result...
awk '{a[$1]=$1}END{for(i in a) print i,a[i]}' file
it is showing the random date but i want the latest date to be displayed against each uniq with no repetition.
Please suggest a way forward.
You can use this awk instead:
awk '!a[$1] || $2>a[$1]{a[$1]=$2} END{for (i in a) print i, a[i]}' file
1 20150603
6 20140505
7 20140505
8 20120930
10 20150113
11 20141231
15 20140516
18 20140712
20 20130530
65 20150502
92 20140707
$ sort -rn datetime.txt | sort -n -u
1 20150603 000000000
6 20140505 000000000
7 20140505 000000000
8 20120930 000000000
10 20150113 000000000
11 20141231 000000000
15 20140516 000000000
18 20140712 000000000
20 20130530 000000000
65 20150502 000000000
92 20140707 000001000
Related
Bash command "Head" is not showing certain columns of my bed/csv file
I have a bed file named coverage.bed. When I execute head coverage.bed, this is the beginning of what outputs: chr start end . . strand length CG CA CT CC TG AG GG chr1 3000380 3000440 . . + 172 0 2 9 2 chr1 3000492 3000552 . . + 172 0 1 9 1 chr1 3000593 3000653 . . + 1055 0 4 7 4 However, when I view the file using gedit coverage.bed, I see that this is the correct first 3 lines: chr start end . . strand length CG CA CT CC TG AG GG chr1 3000380 3000440 . . + 172 0 2 9 1 3 5 2 chr1 3000492 3000552 . . + 172 0 1 9 2 8 1 1 chr1 3000593 3000653 . . + 1055 0 4 7 3 6 5 4 Why is this happening? A python script outputted this file-- could it be possible that there is something wrong with the code that would lead to this error? Edit: the output of sed -n 2p coverage.bed | hexdump -C is: 00000000 63 68 72 31 09 33 30 30 30 33 38 30 09 33 30 30 |chr1.3000380.300| 00000010 30 34 34 30 09 2e 09 2e 09 2b 09 31 37 32 09 30 |0440.....+.172.0| 00000020 09 32 09 39 09 31 09 33 09 35 09 32 0d 0a |.2.9.1.3.5.2..| 0000002e
Count occurrences in a text line
Is there any way to count how often a value occurs in a line?. My input is a tab delimited .txt file. It looks something like this (but with thousands of lines): #N/A 14 13 #N/A 15 13 #N/A 14 13 13 15 14 13 15 14 14 15 24 26 #N/A 24 22 #N/A 24 26 #N/A 24 26 24 22 24 22 24 26 45 43 45 43 #N/A #N/A #N/A 43 45 45 43 #N/A 47 45 45 43 I would like an output like this or similar. #N/A(3) 14 13(3) 15 13(1) 13 15(1) 15 14(1) 14 15 (1) 24 26(4) #N/A(3) 24 22(3) 45 45(4) #N/A(4) 43 45(1) 47 45(1)
Perl solution: perl -laF'/\t/' -ne ' chomp; my %h; $h{$_}++ for #F; print join "\t", map "$_ ($h{$_})", keys %h ' < input -a splits each line on -F (\t means tab) into the #F array -l adds newlines to prints -n reads the input line by line chomp removes the final newline %h is a hash table, the keys are the members of #F, the values are the counts
awk to the rescue! $ awk -F' +' -v OFS=' ' '{for(i=1;i<=NF;i++) if($i!="")a[$i]++; for(k in a) printf "%s", k"("a[k]")" OFS; delete a; print ""}' file #N/A(3) 14 13(3) 13 15(1) 15 13(1) 14 15(1) 15 14(1) #N/A(3) 24 22(3) 24 26(4) #N/A(4) 43 45(1) 45 43(4) 47 45(1)
Get the average of the selected cells line by line in a file?
I have a single file with the multiple columns. I want to select few and get average for selected cell in a line and output the entire average as column. For example: Month Low.temp Max.temp Pressure Wind Rain JAN 17 36 120 5 0 FEB 10 34 110 15 3 MAR 13 30 115 25 5 APR 14 33 105 10 4 ....... How to get average temperature (Avg.temp) and Humidity (Hum)as column? Avg.temp = (Low.temp+Max.temp)/2 Hum = Wind * Rain To get the Avg.temp Month Low.temp Max.temp Pressure Wind Rain Avg.temp Hum JAN 17 36 120 5 0 26.5 0 FEB 10 34 110 15 3 22 45 MAR 13 30 115 25 5 21.5 125 APR 14 33 105 10 4 23.5 40 ....... I don't want to do it in excel. Is there any simple shell command to do this?
I would use awk like this: awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {$(NF+1)=($2+$3)/2; $(NF+1)=$5*$6}1' file or: awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {print $0, ($2+$3)/2, $5*$6}' file This consists in doing the calculations and appending them to the original values. Let's see it in action, piping to column -t for a nice output: $ awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {$(NF+1)=($2+$3)/2; $(NF+1)=$5*$6}1' file | column -t Month Low.temp Max.temp Pressure Wind Rain Avg.temp Hum JAN 17 36 120 5 0 26.5 0 FEB 10 34 110 15 3 22 45 MAR 13 30 115 25 5 21.5 125 APR 14 33 105 10 4 23.5 40
convert comma separated list in text file into columns in bash
I've managed to extract data (from an html page) that goes into a table, and I've isolated the columns of said table into a text file that contains the lines below: [30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55], [28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47], [-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71], [0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5] Each bracketed list of numbers represents a column. What I'd like to do is turn these lists into actual columns that I can work with in different data formats. I'd also like to be sure to include that blank parts of these lists too (i.e., "[,,,]") This is basically what I'm trying to accomplish: 30 28 -7 0 30 6 32 6 35 50 43 3 34 58 71 5 43 56 30 1.5 52 64 23 1 . . . . . . . . . . . . I'm parsing data from a web page, and ultimately planning to make the process as automated as possible so I can easily work with the data after I output it to a nice format. Anyone know how to do this, have any suggestions, or thoughts on scripting this?
Since you have your lists in python, just do it in python: l=[["30", "30", "32"], ["28","6","6"], ["-7", "", ""], ["0", "", ""]] for i in zip(*l): print "\t".join(i) produces 30 28 -7 0 30 6 32 6
awk based solution: awk -F, '{gsub(/\[|\]/, ""); for (i=1; i<=NF; i++) a[i]=a[i] ? a[i] OFS $i: $i} END {for (i=1; i<=NF; i++) print a[i]}' file 30 28 -7 0 30 6 32 6 35 50 43 3 34 58 71 5 43 56 30 1.5 52 64 23 1 .......... ..........
Another solution, but it works only for file with 4 lines: $ paste \ <(sed -n '1{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \ <(sed -n '2{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \ <(sed -n '3{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \ <(sed -n '4{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) 30 28 -7 0 30 6 32 6 35 50 43 3 34 58 71 5 43 56 30 1.5 52 64 23 1 68 87 28 1.5 88 99 13 0.5 97 110 13 0.5 105 116 10 0 107 119 11 0.5 107 120 12 0.5 105 117 11 0.5 101 114 13 0.5 93 113 22 1 88 103 17 0.5 80 82 3 0 69 6 -0.5 55 47 -15 -0.5 -20 2.5 38 71 Updated: or another version with preprocessing: $ sed 's|\[||;s|\][,]\?||' t >t2 $ paste \ <(sed -n '1{s|,|\n|g;p}' t2) \ <(sed -n '2{s|,|\n|g;p}' t2) \ <(sed -n '3{s|,|\n|g;p}' t2) \ <(sed -n '4{s|,|\n|g;p}' t2)
If a file named data contains the data given in the problem (exactly as defined above), then the following bash command line will produce the output requested: $ sed -e 's/\[//' -e 's/\]//' -e 's/,/ /g' <data | rs -T Example: cat data [30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55], [28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47], [-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71], [0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5] $ sed -e 's/[//' -e 's/]//' -e 's/,/ /g' <data | rs -T 30 28 -7 0 30 6 43 3 32 6 71 5 35 50 30 1.5 34 58 23 1 43 56 28 1.5 52 64 13 0.5 68 87 13 0.5 88 99 10 0 97 110 11 0.5 105 116 12 0.5 107 119 11 0.5 107 120 13 0.5 105 117 22 1 101 114 17 0.5 93 113 3 0 88 103 -15 -0.5 80 82 -20 -0.5 69 6 38 2.5 55 47 71
Using bash to read elements on a diagonal on a matrix and redirecting it to another file
So, currently i have created a code to do this as shown below. This code works and does what it is supposed to do after I echo the variables: a=`awk 'NR==2 {print $1}' $coor` b=`awk 'NR==3 {print $2}' $coor` c=`awK 'NR==4 {print $3}' $coor` ....but i have to do this for many more lines and i want a more general expression. So I have attempted to create a loop shown below. Syntax wise i don't think anything is wrong with the code, but it is not outputting anything to the file "Cmain". I was wondering if anyone could help me, I'm kinda new at scripting. If it helps any, I can also post what i am trying to read. for (( i=1; i <= 4 ; i++ )); do for (( j=0; j <= 3 ; j++ )); do B="`grep -n "cell" "$coor" | awk 'NR=="$i" {print $j}'`" done done echo "$B" >> Cmain
You can replace your lines of awk with this one: awk '{ for (i=1; i<=NF; i++) if (NR >= 2 && NR == i) print $(i - 1) }' file.txt Tested input: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 Output: 11 22 33 44 55 66 77
awk 'BEGIN {f=1} {print $f; f=f+1}' infile > outfile
An alternative using sed and coreutils, assuming space separated input is in infile: n=$(wc -l infile | cut -d' ' -f1) for i in $(seq 1 $n); do sed -n "${i} {p; q}" infile | cut -d' ' -f$i done