Perform arithmetic operations on all "cells" in tab-delimited file - shell

I have a tab delimited file of n by m (where n is number of rows and m is number of columns).
I want to perform a mathematical operation on values present in the file (say adding 5 to value present in each column and then dividing it by 12).
any one line regex command or a mixture of things .... help
Thank you in advance.

awk '{
# add all numbers on a line
tot=0
for (i=1;i<=NF;i++) tot+=$i
# print detail
print "LineNo=" NR "\ttot="tot "\tavg=" tot/12 "data=" $0
gTot+=tot
}
END {
print "Number of Lines =" NR "\n" \
GrandTotal=\t" gTot
}
' yourFile
You'll want to work thru this excellent awk tutorial to really understand what is happening.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, and/or give it a + (or -) as a useful answer. Note that you can 'accept' only one answer (with a check mark) and you can vote for up to 30 answers each day.

Example using awk:
gawk '{for (i = 1; i <= NF; i += 1) {printf "%f\t", ($i + 5) / 12;} printf "\n"}'

try sed or awk (awk is very good) they were designed to do that

Related

How $0 is used in awk, how it works?

read n
awk '
BEGIN {sum=0;}{if( $0%2==0 ){sum+=$0;
}
}
END { print sum}'
Here i add, sum of even numbers and what i want is, initially i give input as how many(count) and then the numbers i wanted to check as even and add it.
eg)
3
6
7
8
output is : 14
here 3 is count and followed by numbers i want to check, the code is executed correctly and output is correct, but i wanted to know how $0 left the count value i.e) 3 and calculates the remaining numbers.
Please update your question to be meaningful: There is no relationship between $0 and the Unix operating system, as choroba already pointed out in his comment. You obviously want to know the meaning of $0 in the awk programming language. From the awk man-page in the section about Fields:
$0 is the whole record, including leading and trailing whitespace.
you're reading the count but not using it in the script,
a rewrite can be
$ awk 'NR==1 {n=$1; next} // read the first value and skip the rest
!($1%2) {sum+=$1} // add up even numbers
NR>n {print sum; exit}' file // done when the # linespass the counter.
in awk, $0 corresponds to the record (here the line), and $i for the fields i=1,2,3...
even number is the one with remainder 0 divided by 2. NR is the line number.

bash command for group by count

I have a file in the following format
abc|1
def|2
abc|8
def|3
abc|5
xyz|3
I need to group by these words in the first column and sum the value of the second column. For instance, the output of this file should be
abc|14
def|5
xyz|3
Explanation: the corresponding values for word "abc" are 1, 8, and 5. By adding these numbers, the sum comes out to be 14 and the output becomes "abc|14". Similarly, for word "def", the corresponding values are 2 and 3. Summing up these, the final output comes out to be "def|5".
Thank you very much for the help :)
I tried the following command
awk -F "|" '{arr[$1]+=$2} END {for (i in arr) {print i"|"arr[i]}}' filename
another command which I found was
awk -F "," 'BEGIN { FS=OFS=SUBSEP=","}{arr[$1]+=$2 }END {for (i in arr) print i,arr[i]}' filename
Both didn't show me the intended results. Although I'm also in doubt of the working of these commands as well.
Short GNU datamash solution:
datamash -s -t\| -g1 sum 2 < filename
The output:
abc|14
def|5
xyz|3
-t\| - field separator
-g1 - group by the 1st column
sum 2 - sum up values of the 2nd column
I will just add an answer to fix the sorting issue you had, in your Awk logic, you don't need to use sort/uniq piped to the output of Awk, but process in Awk itself.
Referring to GNU Awk Using Predefined Array Scanning Orders with gawk, you can use the PROCINFO["sorted_in"] variable(gawk specific) to control how you want Awk to sort your final output.
Referring to the section below,
#ind_str_asc
Order by indices in ascending order compared as strings; this is the most basic sort. (Internally, array indices are always strings, so with a[2*5] = 1 the index is 10 rather than numeric 10.)
So using this in your requirement in the END clause just do,
END{PROCINFO["sorted_in"]="#ind_str_asc"; for (i in unique) print i,unique[i]}
with your full command being,
awk '
BEGIN{FS=OFS="|"}{
unique[$1]+=$2;
next
}
END{
PROCINFO["sorted_in"]="#ind_str_asc";
for (i in unique)
print i,unique[i]
}' file
awk -F\| '{ arry[$1]+=$2 } END { asorti(arry,arry2);for (i in arry2) { print arry2[i]"|"arry[arry2[i]]} }' filename
Your initial solution should work apart from the issue with sort. Use asorti function to sort the indices from arry to arry2 and then process these in the loop.

Average of first ten numbers of text file using bash

I have a file of two columns. The first column is dates and the second contains a corresponding number. The two commas are separated by a column. I want to take the average of the first three numbers and print it to a new file. Then do the same for the 2nd-4th number. Then 3rd-5th and so on. For example:
File1
date1,1
date2,1
date3,4
date4,1
date5,7
Output file
2
2
4
Is there any way to do this using awk or some other tool?
Input
akshay#db-3325:/tmp$ cat file.txt
date1,1
date2,1
date3,4
date4,1
date5,7
akshay#db-3325:/tmp$ awk -v n=3 -v FS=, '{
x = $2;
i = NR % n;
ma += (x - q[i]) / n;
q[i] = x;
if(NR>=n)print ma;
}' file.txt
2
2
4
OR below one useful for plotting and keeping reference axis (in your case date) at center of average point
Script
akshay#db-3325:/tmp$ cat avg.awk
BEGIN {
m=int((n+1)/2)
}
{L[NR]=$2; sum+=$2}
NR>=m {d[++i]=$1}
NR>n {sum-=L[NR-n]}
NR>=n{
a[++k]=sum/n
}
END {
for (j=1; j<=k; j++)
print d[j],a[j] # remove d[j], if you just want values only
}
Output
akshay#db-3325:/tmp$ awk -v n=3 -v FS=, -v OFS=, -f avg.awk file.txt
date2,2
date3,2
date4,4
$ awk -F, '{a[NR%3]=$2} (NR>=3){print (a[0]+a[1]+a[2])/3}' file
2
2
4
Add a little bit math tricks here, set $2 to a[NR%3] for each record. So the value in each element would be updated cyclically. And the sum of a[0], a[1], a[2] would be the sum of past 3 numbers.
updated based on the changes made due to the helpful feedback from Ed Morton
here's a quick and dirty script to do what you've asked for. It doesn't have much flexibility in it but you can easily figure out how to extend it.
To run save it into a file and execute it as an awk script either with a shebang line or by calling awk -f
// {
Numbers[NR]=$2;
if ( NR >= 3 ) {
printf("%i\n", (Numbers[NR] + Numbers[NR-1] + Numbers[NR-2])/3)
}
}
BEGIN {
FS=","
}
Explanation:
Line 1: Match all lines, "/" is the match operator and in this case we have an empty match which means "do this thing on every line". Line 3: Use the Record Number (NR) as the key and store the value from column 2 Line 4: If we have 3 or more values read from the file Line 5: Do the maths and print as an integer BEGIN block: Change the Field Separator to a comma ",".

AWK array parsing issue

My two input files are pipe separated.
File 1 :
a|b|c|d|1|44
File 2 :
44|ab|cd|1
I want to store all my values of first file in array.
awk -F\| 'FNR==NR {a[$6]=$0;next}'
So if I store the above way is it possible to interpret array; say I want to know $3 of File 1. How can I get tat from a[].
Also will I be able to access array values if I come out of that awk?
Thanks
I'll answer the question as it is stated, but I have to wonder whether it is complete. You state that you have a second input file, but it doesn't play a role in your actual question.
1) It would probably be most sensible to store the fields individually, as in
awk -F \| '{ for(i = 1; i < NF; ++i) a[$NF,i] = $i } END { print a[44,3] }' filename
See here for details on multidimensional arrays in awk. You could also use the split function:
awk -F \| '{ a[$NF] = $0 } END { split(a[44], fields); print fields[3] }'
but I don't see the sense in it here.
2) No. At most you can print the data in a way that the surrounding shell understands and use command substitution to build a shell array from it, but POSIX shell doesn't know arrays at all, and bash only knows one-dimensional arrays. If you require that sort of functionality, you should probably use a more powerful scripting language such as Perl or Python.
If, any I'm wildly guessing here, you want to use the array built from the first file while processing the second, you don't have to quit awk for this. A common pattern is
awk -F \| 'FNR == NR { for(i = 1; i < NF; ++i) { a[$NF,i] = $i }; next } { code for the second file here }' file1 file2
Here FNR == NR is a condition that is only true when the first file is processed (the number of the record in the current file is the same as the number of the record overall; this is only true in the first file).
To keep it simple, you can reach your goal of storing (and accessing) values in array without using awk:
arr=($(cat yourFilename |tr "|" " ")) #store in array named arr
# accessing individual elements
echo ${arr[0]}
echo ${arr[4]}
# ...or accesing all elements
for n in ${arr[*]}
do
echo "$n"
done
...even though I wonder if that's what you are looking for. Inital question is not really clear.

Hi, trying to obtain the mean from the array values using awk?

Im new to bash programming. Here im trying to obtain the mean from the array values.
Heres what im trying:
${GfieldList[#]} | awk '{ sum += $1; n++ } END { if (n > 0) print "mean: " sum / n; }';
Using $1 Im not able to get all the values? Guys pls help me out in this...
For each non-empty line of input, this will sum everything on the line and print the mean:
$ echo 21 20 22 | awk 'NF {sum=0;for (i=1;i<=NF;i++)sum+=$i; print "mean=" sum / NF; }'
mean=21
How it works
NF
This serves as a condition: the statements which follow will only be executed if the number of fields on this line, NF, evaluates to true, meaning non-zero.
sum=0
This initializes sum to zero. This is only needed if there is more than one line.
for (i=1;i<=NF;i++)sum+=$i
This sums all the fields on this line.
print "mean=" sum / NF
This prints the sum of the fields divided by the number of fields.
The bare
${GfieldList[#]}
will not print the array to the screen. You want this:
printf "%s\n" "${GfieldList[#]}"
All those quotes are definitely needed .

Resources