Reshape table and complete voids with NA (or -999) using bash

Reshape table and complete voids with NA (or -999) using bash - bash

I'm trying to create a table based on the ASCII bellow. What I need is to arrange the numbers from the 2nd column in a matrix. The first and third columns of the ASCII give columns and rows in the new matrix. The new matrix needs to be fully populated, so it is necessary to complete missing positions on the new table with NA (or -999).
This is what I have
$ cat infile.txt
1 68 2
1 182 3
1 797 4
2 4 1
2 70 2
2 339 3
2 1396 4
3 12 1
3 355 3
3 1854 4
4 7 1
4 85 2
4 333 3
5 9 1
5 68 2
5 182 3
5 922 4
6 10 1
6 70 2
and what I would like to have:
NA 4 12 7 9 10
68 70 NA 85 68 70
182 339 355 333 182 NA
797 1396 1854 NA 922 NA
I can only use standard UNIX commands (e.g. awk, sed, grep, etc).
So What I have so far...
I can mimic a 2d array in bash
irows=(`awk '{print $1 }' infile.txt`) # rows positions
jcols=(`awk '{print $3 }' infile.txt`) # columns positions
values=(`awk '{print $2 }' infile.txt`) # values
declare -A matrix # the new matrix
nrows=(`sort -k3 -n in.txt | tail -1 | awk '{print $3}'`) # numbers of rows
ncols=(`sort -k1 -n in.txt | tail -1 | awk '{print $1}'`) # numbers of columns
nelem=(`echo "${#values[#]}"`) # number of elements I want to pass to the new matrix
# Creating a matrix (i,j) with -999
for ((i=0;i<=$((nrows-1));i++)) do
for ((j=0;j<=$((ncols-1));j++)) do
matrix[$i,$j]=-999
done
done
and even print on the screen
for ((i=0;i<=$((nrows-1));i++)) do
for ((j=0;j<=$((ncols-1));j++)) do
printf " %i" ${matrix[$i,$j]}
done
echo
done
But when I tried to assign the elements, something gets wrong
for ((i=0;i<=$((nelem-1));i++)) do
matrix[${irows[$i]},${jcols[$i]}]=${values[$i]}
done
Thanks in advance for any help with this, really.

A solution in plain bash by simulating a 2D array with an associative array could be something like that (Notice that row and column counts are not hard coded and the code works with any permutation of input lines provided that each line has the format specified in the question):
$ cat printmat
#!/bin/bash
declare -A mat
nrow=0
ncol=0
while read -r col elem row; do
mat[$row,$col]=$elem
if ((row > nrow)); then nrow=$row; fi
if ((col > ncol)); then ncol=$col; fi
done
for ((row = 1; row <= nrow; ++row)); do
for ((col = 1; col <= ncol; ++col)); do
elem=${mat[$row,$col]}
if [[ -z $elem ]]; then elem=NA; fi
if ((col == ncol)); then elem+=$'\n'; else elem+=$'\t'; fi
printf "%s" "$elem"
done
done
$ ./printmat < infile.txt prints out
NA 4 12 7 9 10
68 70 NA 85 68 70
182 339 355 333 182 NA
797 1396 1854 NA 922 NA

Any time you find yourself writing a loop in shell just to manipulate text you have the wrong approcah. See why-is-using-a-shell-loop-to-process-text-considered-bad-practice for many of the reasons why.
Using any awk in any shell on every UNIX box:
$ cat tst.awk
{
vals[$3,$1] = $2
numRows = ($3 > numRows ? $3 : numRows)
numCols = $1
}
END {
OFS = "\t"
for (rowNr=1; rowNr<=numRows; rowNr++) {
for (colNr=1; colNr<=numCols; colNr++) {
val = ((rowNr,colNr) in vals ? vals[rowNr,colNr] : "NA")
printf "%s%s", val, (colNr < numCols ? OFS : ORS)
}
}
}
.
$ awk -f tst.awk infile.txt
NA 4 12 7 9 10
68 70 NA 85 68 70
182 339 355 333 182 NA
797 1396 1854 NA 922 NA

here is one way to get you started. Note that this is not intended to be "the" answer but to encourage you to try to learn the toolkit.
$ join -a1 -e NA -o2.2 <(printf "%s\n" {1..4}"_"{1..6}) \
<(awk '{print $3"_"$1,$2}' file | sort -n) |
pr -6at
NA 4 12 7 9 10
68 70 NA 85 68 70
182 339 355 333 182 NA
797 1396 1854 NA 922 NA
works, however, row and column counts are hard coded, which is not the proper way to do it.
Preferred solution will be filling up an awk 2D array with the data and print it in matrix form at the end.

Related

Need help to find average, min and max values in shell script from text file (again)

This is an update to a question I posted before. I've gotten a little farther into this but need help with a new problem.
I'm working on a shell script right now. I need to loop through a text file, grab the text from it, and find the average number, max number and min number from each line of numbers then print them in a chart with the name of each line. This is the text file:
Experiment1 9 8 1 2 9 0 2 3 4 5
collect1 83 39 84 2 1 3 0 9
jump1 82 -1 9 26 8 9
exp2 22 0 7 1 0 7 3 2
jump2 88 7 6 5
taker1 5 5 44 2 3
This is my code so far. It should be working but it won't do any of the calculations. First loop grabs the line of text, second loop separates the name from the numbers, these two work. tHe thrid loop takes the numbers and does the calculations. It keeps giving me an error saying "expr: non integer argument", why is it doing that? I shouldn't
#!/bin/bash
while read line
do
echo $line | while read first second
do
echo $first
echo $second
sum=0
max=0
min=0
len=0
for arg in $second
do
sum=`expr $sum + $arg`
if [ $min > $arg ]
then
set min=$arg
fi
if [ $max < $arg ]
then
set max=$arg
fi
len=`expr $len + 1`
done
avg=`expr $sum / $len`
echo $avg
echo $min
echo $max
done
done < mystats.txt
This is the desired output when you type "bash statcalc.sh -s name mystats.txt"
Experiment Name Average Max Min
collect1 27 84 0
exp2 5 22 0
Experiment1 3 9 0
jump1 21 82 -1
jump2 31 88 5
taker1 13 44 2

Using awk
awk '{if (NR==1)print "Experiment Name Average Max Min"; min=$2;max=$2;for(i=2;i<=NF;i++) {a[$1]=a[$1]+$i; if (min<$i) min=$i; if(max>$i)max=$i} print $1, int(a[$1]/(NF-1)),min,max}'
Demo :
$awk '{if (NR==1)print "Experiment Name Average Max Min"; min=$2;max=$2;for(i=2;i<=NF;i++) {a[$1]=a[$1]+$i; if (min<$i) min=$i; if(max>$i)max=$i} print $1, int(a[$1]/(NF-1)),min,max}' file.txt | column -t
Experiment Name Average Max Min
Experiment1 4 9 0
collect1 27 84 0
jump1 22 82 -1
exp2 5 22 0
jump2 26 88 5
taker1 11 44 2
$cat file.txt
Experiment1 9 8 1 2 9 0 2 3 4 5
collect1 83 39 84 2 1 3 0 9
jump1 82 -1 9 26 8 9
exp2 22 0 7 1 0 7 3 2
jump2 88 7 6 5
taker1 5 5 44 2 3
$

find all values sequentially from given to target value in columns, bash

I have a data.txt file
1 2 3 4 5 6
cat data.txt
17 245 1323 17.7777 10.2222 61.1111
19 232 1232 19.9999 19.9999 68.8888
13 133 1233 13.3333 13.3333 63.3333
17 177 1678 17.7777 17.7777 69.9999
12 122 2325 12.2222 11.333 64.4444
18 245 1323 18.8888 12.4444 68.8888
12 222 1222 12.2222 19.9999 61.1111
14 245 1323 14.4444 13.5555 68.8888
I would like the find all the values sequentially from 12.2222 in column 4 to 18.8888.
Answer:
echo ${minValsCol4[#]}
12.2222 13.3333 14.4444 17.7777 18.8888
And the values sequentially from 63.3333 in column 6 to 68.8888.
Answer:
echo ${minValsCol6[#]}
63.3333 64.4444 68.8888
Any solution in awk?
Thanks.

Using awk and sort -nu:
awk -v col=4 -v start=12.2222 -v end=18.8888 '$col>=start && $col<=end{
print $col}' data.txt | sort -nu
12.2222
13.3333
14.4444
17.7777
18.8888

awk merge two columns by key, joining values

These are my two imput files:
file1.txt
1 34
2 55
3 44
6 77
file2.txt
1 12
2 7
5 32
And I wish my output to be:
1 34 12
2 55 0
3 44 0
5 0 32
6 77 0
I need to do this in awk and although I was able to merge files, I do not know how to do it without losing info...
awk -F"\t" 'NR==FNR {h[$1] = $2; next }{print $1,$2,h[$2]}' file1.txt file2.txt > try.txt
awk '{ if ($3 !="") print $1,$2,$3; else print $1,$2,"0";}' try.txt > output.txt
And the output is:
1 34 12
2 55 7
3 44 0
6 77 0
Sorry, I know this must be very easy, but I am quite new in this world! Please I need help!!! Thanks in advance!!

this command gives you the desired output:
awk 'NR==FNR{a[$1]=$2;next}
{if($1 in a){print $0,a[$1];delete a[$1]}
else print $0,"0"}
END{for(x in a)print x,"0",a[x]}' file2 file1|sort -n|column -t
note that I used sort and column to sort & format the output.
output: (note I guess the 2 55 0 was a typo in your expected output)
1 34 12
2 55 7
3 44 0
5 0 32
6 77 0

Here is another way using join and awk:
join -a1 -a2 -o1.1 2.1 1.2 2.2 -e0 file1 file2 | awk '{print ($1?$1:$2),$3,$4}' OFS='\t'
1 34 12
2 55 7
3 44 0
5 0 32
6 77 0
-a switch allows to join on un-pairable lines.
-o builds our output format
-e allows to specify what should be printed for values that do not exist
awk just completes the final formatting.

filtering fields based on certain values

I wish you you all a very happy New Year.
I have a file that looks like this(example): There is no header and this file has about 10000 such rows
123 345 676 58 1
464 222 0 0 1
555 22 888 555 1
777 333 676 0 1
555 444 0 58 1
PROBLEM: I only want those rows where both field 3 and 4 have a non zero value i.e. in the above example row 1 & row 3 should be included and rest should be excluded. How can I do this?
The output should look like this:
123 345 676 58 1
555 22 888 555 1
Thanks.

awk is perfect for this kind of stuff:
awk '$3 && $4' input.txt
This will give you the output that you want.
$3 && $4 is a filter. $3 is the value of the 3rd field, $4 is the value of the forth. 0 values will be evaluated as false, anything else will be evaluated as true. If there can be negative values, than you need to write more precisely:
awk '$3 > 0 && $4 > 0' input.txt

printing selected rows from a file using awk

I have a text file with data in the following format.
1 0 0
2 512 6
3 992 12
4 1536 18
5 2016 24
6 2560 29
7 3040 35
8 3552 41
9 4064 47
10 4576 53
11 5088 59
12 5600 65
13 6080 71
14 6592 77
15 7104 83
I want to print all the lines where $1 > 1000.
awk 'BEGIN {$1 > 1000} {print " " $1 " "$2 " "$3}' graph_data_tmp.txt
This doesn't seem to give the output that I am expecting.What am I doing wrong?

You can do this :
awk '$1>1000 {print $0}' graph_data_tmp.txt
print $0 will print all the content of the line
If you want to print the content of the line after the 1000th line/ROW, then you could do the same by replacing $1 with NR. NR represents the number of rows.
awk 'NR>1000 {print $0}' graph_data_tmp.txt

All you need is:
awk '$1>1000' file

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Reshape table and complete voids with NA (or -999) using bash - bash

Related

Need help to find average, min and max values in shell script from text file (again)

find all values sequentially from given to target value in columns, bash

awk merge two columns by key, joining values

filtering fields based on certain values

printing selected rows from a file using awk

Categories

Resources