not getting the latest against the uniq with awk

not getting the latest against the uniq with awk - bash

65 20150427 000000000
8 20120930 000000000
18 20130626 000000000
6 20140505 000000000
1 20150603 000000000
18 20140712 000000000
65 20150502 000000000
10 20150113 000000000
92 20140707 000001000
20 20130530 000000000
11 20141231 000000000
15 20140516 000000000
1 20150523 000000000
18 20130620 120014000
7 20140505 000000000
Above is the file. First col is the uniq id followed by date & time. Whenever i am running the below query am not getting the desired result...
awk '{a[$1]=$1}END{for(i in a) print i,a[i]}' file
it is showing the random date but i want the latest date to be displayed against each uniq with no repetition.
Please suggest a way forward.

You can use this awk instead:
awk '!a[$1] || $2>a[$1]{a[$1]=$2} END{for (i in a) print i, a[i]}' file
1 20150603
6 20140505
7 20140505
8 20120930
10 20150113
11 20141231
15 20140516
18 20140712
20 20130530
65 20150502
92 20140707

$ sort -rn datetime.txt | sort -n -u
1 20150603 000000000
6 20140505 000000000
7 20140505 000000000
8 20120930 000000000
10 20150113 000000000
11 20141231 000000000
15 20140516 000000000
18 20140712 000000000
20 20130530 000000000
65 20150502 000000000
92 20140707 000001000

Related

Bash command "Head" is not showing certain columns of my bed/csv file

I have a bed file named coverage.bed. When I execute head coverage.bed, this is the beginning of what outputs:
chr start end . . strand length CG CA CT CC TG AG GG
chr1 3000380 3000440 . . + 172 0 2 9 2
chr1 3000492 3000552 . . + 172 0 1 9 1
chr1 3000593 3000653 . . + 1055 0 4 7 4
However, when I view the file using gedit coverage.bed, I see that this is the correct first 3 lines:
chr start end . . strand length CG CA CT CC TG AG GG
chr1 3000380 3000440 . . + 172 0 2 9 1 3 5 2
chr1 3000492 3000552 . . + 172 0 1 9 2 8 1 1
chr1 3000593 3000653 . . + 1055 0 4 7 3 6 5 4
Why is this happening? A python script outputted this file-- could it be possible that there is something wrong with the code that would lead to this error?
Edit: the output of sed -n 2p coverage.bed | hexdump -C is:
00000000 63 68 72 31 09 33 30 30 30 33 38 30 09 33 30 30 |chr1.3000380.300|
00000010 30 34 34 30 09 2e 09 2e 09 2b 09 31 37 32 09 30 |0440.....+.172.0|
00000020 09 32 09 39 09 31 09 33 09 35 09 32 0d 0a |.2.9.1.3.5.2..|
0000002e

Count occurrences in a text line

Is there any way to count how often a value occurs in a line?. My input is a tab delimited .txt file. It looks something like this (but with thousands of lines):
#N/A 14 13 #N/A 15 13 #N/A 14 13 13 15 14 13 15 14 14 15
24 26 #N/A 24 22 #N/A 24 26 #N/A 24 26 24 22 24 22 24 26
45 43 45 43 #N/A #N/A #N/A 43 45 45 43 #N/A 47 45 45 43
I would like an output like this or similar.
#N/A(3) 14 13(3) 15 13(1) 13 15(1) 15 14(1) 14 15 (1)
24 26(4) #N/A(3) 24 22(3)
45 45(4) #N/A(4) 43 45(1) 47 45(1)

Perl solution:
perl -laF'/\t/' -ne '
chomp; my %h;
$h{$_}++ for #F;
print join "\t", map "$_ ($h{$_})", keys %h
' < input
-a splits each line on -F (\t means tab) into the #F array
-l adds newlines to prints
-n reads the input line by line
chomp removes the final newline
%h is a hash table, the keys are the members of #F, the values are the counts

awk to the rescue!
$ awk -F' +' -v OFS=' ' '{for(i=1;i<=NF;i++) if($i!="")a[$i]++;
for(k in a) printf "%s", k"("a[k]")" OFS; delete a; print ""}' file
#N/A(3) 14 13(3) 13 15(1) 15 13(1) 14 15(1) 15 14(1)
#N/A(3) 24 22(3) 24 26(4)
#N/A(4) 43 45(1) 45 43(4) 47 45(1)

Get the average of the selected cells line by line in a file?

I have a single file with the multiple columns. I want to select few and get average for selected cell in a line and output the entire average as column.
For example:
Month Low.temp Max.temp Pressure Wind Rain
JAN 17 36 120 5 0
FEB 10 34 110 15 3
MAR 13 30 115 25 5
APR 14 33 105 10 4
.......
How to get average temperature (Avg.temp) and Humidity (Hum)as column?
Avg.temp = (Low.temp+Max.temp)/2
Hum = Wind * Rain
To get the Avg.temp
Month Low.temp Max.temp Pressure Wind Rain Avg.temp Hum
JAN 17 36 120 5 0 26.5 0
FEB 10 34 110 15 3 22 45
MAR 13 30 115 25 5 21.5 125
APR 14 33 105 10 4 23.5 40
.......
I don't want to do it in excel. Is there any simple shell command to do this?

I would use awk like this:
awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {$(NF+1)=($2+$3)/2; $(NF+1)=$5*$6}1' file
or:
awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {print $0, ($2+$3)/2, $5*$6}' file
This consists in doing the calculations and appending them to the original values.
Let's see it in action, piping to column -t for a nice output:
$ awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {$(NF+1)=($2+$3)/2; $(NF+1)=$5*$6}1' file | column -t
Month Low.temp Max.temp Pressure Wind Rain Avg.temp Hum
JAN 17 36 120 5 0 26.5 0
FEB 10 34 110 15 3 22 45
MAR 13 30 115 25 5 21.5 125
APR 14 33 105 10 4 23.5 40

convert comma separated list in text file into columns in bash

I've managed to extract data (from an html page) that goes into a table, and I've isolated the columns of said table into a text file that contains the lines below:
[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]
Each bracketed list of numbers represents a column. What I'd like to do is turn these lists into actual columns that I can work with in different data formats. I'd also like to be sure to include that blank parts of these lists too (i.e., "[,,,]")
This is basically what I'm trying to accomplish:
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
. . . .
. . . .
. . . .
I'm parsing data from a web page, and ultimately planning to make the process as automated as possible so I can easily work with the data after I output it to a nice format.
Anyone know how to do this, have any suggestions, or thoughts on scripting this?

Since you have your lists in python, just do it in python:
l=[["30", "30", "32"], ["28","6","6"], ["-7", "", ""], ["0", "", ""]]
for i in zip(*l):
print "\t".join(i)
produces
30 28 -7 0
30 6
32 6

awk based solution:
awk -F, '{gsub(/\[|\]/, ""); for (i=1; i<=NF; i++) a[i]=a[i] ? a[i] OFS $i: $i}
END {for (i=1; i<=NF; i++) print a[i]}' file
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
..........
..........

Another solution, but it works only for file with 4 lines:
$ paste \
<(sed -n '1{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '2{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '3{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '4{s,\[,,g;s,\],,g;s|,|\n|g;p}' t)
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
68 87 28 1.5
88 99 13 0.5
97 110 13 0.5
105 116 10 0
107 119 11 0.5
107 120 12 0.5
105 117 11 0.5
101 114 13 0.5
93 113 22 1
88 103 17 0.5
80 82 3 0
69 6 -0.5
55 47 -15 -0.5
-20 2.5
38
71
Updated: or another version with preprocessing:
$ sed 's|\[||;s|\][,]\?||' t >t2
$ paste \
<(sed -n '1{s|,|\n|g;p}' t2) \
<(sed -n '2{s|,|\n|g;p}' t2) \
<(sed -n '3{s|,|\n|g;p}' t2) \
<(sed -n '4{s|,|\n|g;p}' t2)

If a file named data contains the data given in the problem (exactly as defined above), then the following bash command line will produce the output requested:
$ sed -e 's/\[//' -e 's/\]//' -e 's/,/ /g' <data | rs -T
Example:
cat data
[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]
$ sed -e 's/[//' -e 's/]//' -e 's/,/ /g' <data | rs -T
30 28 -7 0
30 6 43 3
32 6 71 5
35 50 30 1.5
34 58 23 1
43 56 28 1.5
52 64 13 0.5
68 87 13 0.5
88 99 10 0
97 110 11 0.5
105 116 12 0.5
107 119 11 0.5
107 120 13 0.5
105 117 22 1
101 114 17 0.5
93 113 3 0
88 103 -15 -0.5
80 82 -20 -0.5
69 6 38 2.5
55 47 71

Using bash to read elements on a diagonal on a matrix and redirecting it to another file

So, currently i have created a code to do this as shown below. This code works and does what it is supposed to do after I echo the variables:
a=`awk 'NR==2 {print $1}' $coor`
b=`awk 'NR==3 {print $2}' $coor`
c=`awK 'NR==4 {print $3}' $coor`
....but i have to do this for many more lines and i want a more general expression. So I have attempted to create a loop shown below. Syntax wise i don't think anything is wrong with the code, but it is not outputting anything to the file "Cmain".
I was wondering if anyone could help me, I'm kinda new at scripting.
If it helps any, I can also post what i am trying to read.
for (( i=1; i <= 4 ; i++ )); do
for (( j=0; j <= 3 ; j++ )); do
B="`grep -n "cell" "$coor" | awk 'NR=="$i" {print $j}'`"
done
done
echo "$B" >> Cmain

You can replace your lines of awk with this one:
awk '{ for (i=1; i<=NF; i++) if (NR >= 2 && NR == i) print $(i - 1) }' file.txt
Tested input:
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
Output:
11
22
33
44
55
66
77

awk 'BEGIN {f=1} {print $f; f=f+1}' infile > outfile

An alternative using sed and coreutils, assuming space separated input is in infile:
n=$(wc -l infile | cut -d' ' -f1)
for i in $(seq 1 $n); do
sed -n "${i} {p; q}" infile | cut -d' ' -f$i
done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

not getting the latest against the uniq with awk - bash

You can use this awk instead: awk '!a[$1] || $2>a[$1]{a[$1]=$2} END{for (i in a) print i, a[i]}' file 1 20150603 6 20140505 7 20140505 8 20120930 10 20150113 11 20141231 15 20140516 18 20140712 20 20130530 65 20150502 92 20140707

$ sort -rn datetime.txt | sort -n -u 1 20150603 000000000 6 20140505 000000000 7 20140505 000000000 8 20120930 000000000 10 20150113 000000000 11 20141231 000000000 15 20140516 000000000 18 20140712 000000000 20 20130530 000000000 65 20150502 000000000 92 20140707 000001000

Related

Bash command "Head" is not showing certain columns of my bed/csv file

Count occurrences in a text line

Get the average of the selected cells line by line in a file?

convert comma separated list in text file into columns in bash

Using bash to read elements on a diagonal on a matrix and redirecting it to another file

Categories

Resources