Count occurrences in a text line - bash

Is there any way to count how often a value occurs in a line?. My input is a tab delimited .txt file. It looks something like this (but with thousands of lines):
#N/A 14 13 #N/A 15 13 #N/A 14 13 13 15 14 13 15 14 14 15
24 26 #N/A 24 22 #N/A 24 26 #N/A 24 26 24 22 24 22 24 26
45 43 45 43 #N/A #N/A #N/A 43 45 45 43 #N/A 47 45 45 43
I would like an output like this or similar.
#N/A(3) 14 13(3) 15 13(1) 13 15(1) 15 14(1) 14 15 (1)
24 26(4) #N/A(3) 24 22(3)
45 45(4) #N/A(4) 43 45(1) 47 45(1)

Perl solution:
perl -laF'/\t/' -ne '
chomp; my %h;
$h{$_}++ for #F;
print join "\t", map "$_ ($h{$_})", keys %h
' < input
-a splits each line on -F (\t means tab) into the #F array
-l adds newlines to prints
-n reads the input line by line
chomp removes the final newline
%h is a hash table, the keys are the members of #F, the values are the counts

awk to the rescue!
$ awk -F' +' -v OFS=' ' '{for(i=1;i<=NF;i++) if($i!="")a[$i]++;
for(k in a) printf "%s", k"("a[k]")" OFS; delete a; print ""}' file
#N/A(3) 14 13(3) 13 15(1) 15 13(1) 14 15(1) 15 14(1)
#N/A(3) 24 22(3) 24 26(4)
#N/A(4) 43 45(1) 45 43(4) 47 45(1)

Related

in bash split a variable into an array with each array value containing n values from the list

So i'm issuing a query to mysql and it's returning say 1,000 rows,but each iteration of the program could return a different number of rows. I need to break up (without using a mysql limit) this result set into chunks of 100 rows that i can then programatically iterate through in these 100 row chunks.
So
MySQLOutPut='1 2 3 4 ... 10,000"
I need to turn that into an array that looks like
array[1]="1 2 3 ... 100"
array[2]="101 102 103 ... 200"
etc.
I have no clue how to accomplish this elegantly
Using Charles' data generation:
MySQLOutput=$(seq 1 10000 | tr '\n' ' ')
# the sed command will add a newline after every 100 words
# and the mapfile command will read the lines into an array
mapfile -t MySQLOutSplit < <(
sed -r 's/([^[:blank:]]+ ){100}/&\n/g; $s/\n$//' <<< "$MySQLOutput"
)
echo "${#MySQLOutSplit[#]}"
# 100
echo "${MySQLOutSplit[0]}"
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
echo "${MySQLOutSplit[99]}"
# 9901 9902 9903 9904 9905 9906 9907 9908 9909 9910 9911 9912 9913 9914 9915 9916 9917 9918 9919 9920 9921 9922 9923 9924 9925 9926 9927 9928 9929 9930 9931 9932 9933 9934 9935 9936 9937 9938 9939 9940 9941 9942 9943 9944 9945 9946 9947 9948 9949 9950 9951 9952 9953 9954 9955 9956 9957 9958 9959 9960 9961 9962 9963 9964 9965 9966 9967 9968 9969 9970 9971 9972 9973 9974 9975 9976 9977 9978 9979 9980 9981 9982 9983 9984 9985 9986 9987 9988 9989 9990 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000
Something like this:
# generate content
MySQLOutput=$(seq 1 10000 | tr '\n' ' ') # seq is awful, don't use in real life
# split into a large array, each item stored individually
read -r -a MySQLoutArr <<<"$MySQLOutput"
# add each batch of 100 items into a new array entry
batchSize=100
MySQLoutSplit=( )
for ((i=0; i<${#MySQLoutArr[#]}; i+=batchSize)); do
MySQLoutSplit+=( "${MySQLoutArr[*]:i:batchSize}" )
done
To explain some of the finer points:
read -r -a foo reads contents into an array named foo, split on IFS, up to the next character specified by read -d (none given here, thus reading only a single line). If you wanted each line to be a new array entry, consider IFS=$'\n' read -r -d '' -a foo, which will read each line into an array, terminated at the first NUL in the input stream.
"${foo[*]:i:batchSize}" expands to a list of items in array foo, starting at index i, and taking the next batchSize items, concatenated into a single string with the first character in $IFS used as a separator.

what's wrong in this awk print statement?

I have a file test.txt below. Each line contains a value and the values are sequence of 6 values in the order of current1, voltage1, current2, voltage2, current3, voltage3. Below is the test.txt file.
11
12
13
14
15
16
21
22
23
24
25
26
31
32
33
34
35
36
41
42
43
44
45
46
Using awk, I want to print it in the format below(one set in one line).
11 12 13 14 15 16
21 22 23 24 25 26
31 32 33 34 35 36
41 42 43 44 45 46
So I wrote a simple awk script like below. I run a modular counter which runs from 1 to 6 and according to cnt value, I keep the input value in i1,v1,i2,v2,i3,v3 repectively. and when cnt is 6(when all the values in a set have been collected), I print the values.
BEGIN{cnt=1}
cnt == 1{i1 = $0}
cnt == 2{v1 = $0}
cnt == 3{i2 = $0}
cnt == 4{v2 = $0}
cnt == 5{i3 = $0}
cnt == 6{v3 = $0}
{if (cnt==6) {cnt = 1; print i1 v1 i2 v2 i3 v3} else cnt = cnt + 1}
The result is like below which is weird. It's been a while that I used awk so I can't figure out what is wrong with the script easily.
awk -f div.awk test.txt
16
26
36
46
What is the problem?
Use the modulo operator. It should be:
awk 'NR%6{printf "%s ",$0}!(NR%6){print}' file
Btw, it looks like your file is using Windows line endings, which leads to the error you reported. Convert them to UNIX before using awk, for example:
sed 's/\r//' file | awk 'NR%6{printf "%s ",$0}!(NR%6){print}'

Get the average of the selected cells line by line in a file?

I have a single file with the multiple columns. I want to select few and get average for selected cell in a line and output the entire average as column.
For example:
Month Low.temp Max.temp Pressure Wind Rain
JAN 17 36 120 5 0
FEB 10 34 110 15 3
MAR 13 30 115 25 5
APR 14 33 105 10 4
.......
How to get average temperature (Avg.temp) and Humidity (Hum)as column?
Avg.temp = (Low.temp+Max.temp)/2
Hum = Wind * Rain
To get the Avg.temp
Month Low.temp Max.temp Pressure Wind Rain Avg.temp Hum
JAN 17 36 120 5 0 26.5 0
FEB 10 34 110 15 3 22 45
MAR 13 30 115 25 5 21.5 125
APR 14 33 105 10 4 23.5 40
.......
I don't want to do it in excel. Is there any simple shell command to do this?
I would use awk like this:
awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {$(NF+1)=($2+$3)/2; $(NF+1)=$5*$6}1' file
or:
awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {print $0, ($2+$3)/2, $5*$6}' file
This consists in doing the calculations and appending them to the original values.
Let's see it in action, piping to column -t for a nice output:
$ awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {$(NF+1)=($2+$3)/2; $(NF+1)=$5*$6}1' file | column -t
Month Low.temp Max.temp Pressure Wind Rain Avg.temp Hum
JAN 17 36 120 5 0 26.5 0
FEB 10 34 110 15 3 22 45
MAR 13 30 115 25 5 21.5 125
APR 14 33 105 10 4 23.5 40

Using bash to read elements on a diagonal on a matrix and redirecting it to another file

So, currently i have created a code to do this as shown below. This code works and does what it is supposed to do after I echo the variables:
a=`awk 'NR==2 {print $1}' $coor`
b=`awk 'NR==3 {print $2}' $coor`
c=`awK 'NR==4 {print $3}' $coor`
....but i have to do this for many more lines and i want a more general expression. So I have attempted to create a loop shown below. Syntax wise i don't think anything is wrong with the code, but it is not outputting anything to the file "Cmain".
I was wondering if anyone could help me, I'm kinda new at scripting.
If it helps any, I can also post what i am trying to read.
for (( i=1; i <= 4 ; i++ )); do
for (( j=0; j <= 3 ; j++ )); do
B="`grep -n "cell" "$coor" | awk 'NR=="$i" {print $j}'`"
done
done
echo "$B" >> Cmain
You can replace your lines of awk with this one:
awk '{ for (i=1; i<=NF; i++) if (NR >= 2 && NR == i) print $(i - 1) }' file.txt
Tested input:
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
Output:
11
22
33
44
55
66
77
awk 'BEGIN {f=1} {print $f; f=f+1}' infile > outfile
An alternative using sed and coreutils, assuming space separated input is in infile:
n=$(wc -l infile | cut -d' ' -f1)
for i in $(seq 1 $n); do
sed -n "${i} {p; q}" infile | cut -d' ' -f$i
done

using sort command in shell scripting

I execute the following code :
for i in {1..12};do printf "%s %s\n" "${edate1[$i]}" "${etime1[$i]}"
(I retrieve the values of edate1 and etime1 from my database and store it in an array which works fine.)
I receive the o/p as:
97 16
97 16
97 12
107 16
97 16
97 16
97 16
97 16
97 16
97 16
97 16
100 15
I need to sort the first column using the sort command.
Expected o/p:
107 16
100 16
97 12
97 16
97 16
97 16
97 16
97 16
97 16
97 16
97 16
97 15
This is what I did to find your solution:
Copy your original input to in.txt
Run this code, which uses awk, sort, and paste.
awk '{print $1}' in.txt | sort -g -r -s > tmp.txt
paste tmp.txt in.txt | awk '{print $1 " " $3}' > out.txt
Then out.txt matches the expected output in your original post.
To see how it works, look at this:
$ paste tmp.txt in.txt
107 97 16
100 97 16
97 97 12
97 107 16
97 97 16
97 97 16
97 97 16
97 97 16
97 97 16
97 97 16
97 97 16
97 100 15
So you're getting the first column sorted, then the original columns in place.
Awk makes it easy to print out the columns (fields) you're interested in, ie, the first and third.
This is the best and simplest way to sort your data
<OUTPUT> | sort -nrk1
Refer the following link to know more about the magic of sort.

Resources