printing selected rows from a file using awk - bash

I have a text file with data in the following format.
1 0 0
2 512 6
3 992 12
4 1536 18
5 2016 24
6 2560 29
7 3040 35
8 3552 41
9 4064 47
10 4576 53
11 5088 59
12 5600 65
13 6080 71
14 6592 77
15 7104 83
I want to print all the lines where $1 > 1000.
awk 'BEGIN {$1 > 1000} {print " " $1 " "$2 " "$3}' graph_data_tmp.txt
This doesn't seem to give the output that I am expecting.What am I doing wrong?

You can do this :
awk '$1>1000 {print $0}' graph_data_tmp.txt
print $0 will print all the content of the line
If you want to print the content of the line after the 1000th line/ROW, then you could do the same by replacing $1 with NR. NR represents the number of rows.
awk 'NR>1000 {print $0}' graph_data_tmp.txt

All you need is:
awk '$1>1000' file

Related

Reshape table and complete voids with NA (or -999) using bash

I'm trying to create a table based on the ASCII bellow. What I need is to arrange the numbers from the 2nd column in a matrix. The first and third columns of the ASCII give columns and rows in the new matrix. The new matrix needs to be fully populated, so it is necessary to complete missing positions on the new table with NA (or -999).
This is what I have
$ cat infile.txt
1 68 2
1 182 3
1 797 4
2 4 1
2 70 2
2 339 3
2 1396 4
3 12 1
3 355 3
3 1854 4
4 7 1
4 85 2
4 333 3
5 9 1
5 68 2
5 182 3
5 922 4
6 10 1
6 70 2
and what I would like to have:
NA 4 12 7 9 10
68 70 NA 85 68 70
182 339 355 333 182 NA
797 1396 1854 NA 922 NA
I can only use standard UNIX commands (e.g. awk, sed, grep, etc).
So What I have so far...
I can mimic a 2d array in bash
irows=(`awk '{print $1 }' infile.txt`) # rows positions
jcols=(`awk '{print $3 }' infile.txt`) # columns positions
values=(`awk '{print $2 }' infile.txt`) # values
declare -A matrix # the new matrix
nrows=(`sort -k3 -n in.txt | tail -1 | awk '{print $3}'`) # numbers of rows
ncols=(`sort -k1 -n in.txt | tail -1 | awk '{print $1}'`) # numbers of columns
nelem=(`echo "${#values[#]}"`) # number of elements I want to pass to the new matrix
# Creating a matrix (i,j) with -999
for ((i=0;i<=$((nrows-1));i++)) do
for ((j=0;j<=$((ncols-1));j++)) do
matrix[$i,$j]=-999
done
done
and even print on the screen
for ((i=0;i<=$((nrows-1));i++)) do
for ((j=0;j<=$((ncols-1));j++)) do
printf " %i" ${matrix[$i,$j]}
done
echo
done
But when I tried to assign the elements, something gets wrong
for ((i=0;i<=$((nelem-1));i++)) do
matrix[${irows[$i]},${jcols[$i]}]=${values[$i]}
done
Thanks in advance for any help with this, really.
A solution in plain bash by simulating a 2D array with an associative array could be something like that (Notice that row and column counts are not hard coded and the code works with any permutation of input lines provided that each line has the format specified in the question):
$ cat printmat
#!/bin/bash
declare -A mat
nrow=0
ncol=0
while read -r col elem row; do
mat[$row,$col]=$elem
if ((row > nrow)); then nrow=$row; fi
if ((col > ncol)); then ncol=$col; fi
done
for ((row = 1; row <= nrow; ++row)); do
for ((col = 1; col <= ncol; ++col)); do
elem=${mat[$row,$col]}
if [[ -z $elem ]]; then elem=NA; fi
if ((col == ncol)); then elem+=$'\n'; else elem+=$'\t'; fi
printf "%s" "$elem"
done
done
$ ./printmat < infile.txt prints out
NA 4 12 7 9 10
68 70 NA 85 68 70
182 339 355 333 182 NA
797 1396 1854 NA 922 NA
Any time you find yourself writing a loop in shell just to manipulate text you have the wrong approcah. See why-is-using-a-shell-loop-to-process-text-considered-bad-practice for many of the reasons why.
Using any awk in any shell on every UNIX box:
$ cat tst.awk
{
vals[$3,$1] = $2
numRows = ($3 > numRows ? $3 : numRows)
numCols = $1
}
END {
OFS = "\t"
for (rowNr=1; rowNr<=numRows; rowNr++) {
for (colNr=1; colNr<=numCols; colNr++) {
val = ((rowNr,colNr) in vals ? vals[rowNr,colNr] : "NA")
printf "%s%s", val, (colNr < numCols ? OFS : ORS)
}
}
}
.
$ awk -f tst.awk infile.txt
NA 4 12 7 9 10
68 70 NA 85 68 70
182 339 355 333 182 NA
797 1396 1854 NA 922 NA
here is one way to get you started. Note that this is not intended to be "the" answer but to encourage you to try to learn the toolkit.
$ join -a1 -e NA -o2.2 <(printf "%s\n" {1..4}"_"{1..6}) \
<(awk '{print $3"_"$1,$2}' file | sort -n) |
pr -6at
NA 4 12 7 9 10
68 70 NA 85 68 70
182 339 355 333 182 NA
797 1396 1854 NA 922 NA
works, however, row and column counts are hard coded, which is not the proper way to do it.
Preferred solution will be filling up an awk 2D array with the data and print it in matrix form at the end.

Count occurrences in a text line

Is there any way to count how often a value occurs in a line?. My input is a tab delimited .txt file. It looks something like this (but with thousands of lines):
#N/A 14 13 #N/A 15 13 #N/A 14 13 13 15 14 13 15 14 14 15
24 26 #N/A 24 22 #N/A 24 26 #N/A 24 26 24 22 24 22 24 26
45 43 45 43 #N/A #N/A #N/A 43 45 45 43 #N/A 47 45 45 43
I would like an output like this or similar.
#N/A(3) 14 13(3) 15 13(1) 13 15(1) 15 14(1) 14 15 (1)
24 26(4) #N/A(3) 24 22(3)
45 45(4) #N/A(4) 43 45(1) 47 45(1)
Perl solution:
perl -laF'/\t/' -ne '
chomp; my %h;
$h{$_}++ for #F;
print join "\t", map "$_ ($h{$_})", keys %h
' < input
-a splits each line on -F (\t means tab) into the #F array
-l adds newlines to prints
-n reads the input line by line
chomp removes the final newline
%h is a hash table, the keys are the members of #F, the values are the counts
awk to the rescue!
$ awk -F' +' -v OFS=' ' '{for(i=1;i<=NF;i++) if($i!="")a[$i]++;
for(k in a) printf "%s", k"("a[k]")" OFS; delete a; print ""}' file
#N/A(3) 14 13(3) 13 15(1) 15 13(1) 14 15(1) 15 14(1)
#N/A(3) 24 22(3) 24 26(4)
#N/A(4) 43 45(1) 45 43(4) 47 45(1)

what's wrong in this awk print statement?

I have a file test.txt below. Each line contains a value and the values are sequence of 6 values in the order of current1, voltage1, current2, voltage2, current3, voltage3. Below is the test.txt file.
11
12
13
14
15
16
21
22
23
24
25
26
31
32
33
34
35
36
41
42
43
44
45
46
Using awk, I want to print it in the format below(one set in one line).
11 12 13 14 15 16
21 22 23 24 25 26
31 32 33 34 35 36
41 42 43 44 45 46
So I wrote a simple awk script like below. I run a modular counter which runs from 1 to 6 and according to cnt value, I keep the input value in i1,v1,i2,v2,i3,v3 repectively. and when cnt is 6(when all the values in a set have been collected), I print the values.
BEGIN{cnt=1}
cnt == 1{i1 = $0}
cnt == 2{v1 = $0}
cnt == 3{i2 = $0}
cnt == 4{v2 = $0}
cnt == 5{i3 = $0}
cnt == 6{v3 = $0}
{if (cnt==6) {cnt = 1; print i1 v1 i2 v2 i3 v3} else cnt = cnt + 1}
The result is like below which is weird. It's been a while that I used awk so I can't figure out what is wrong with the script easily.
awk -f div.awk test.txt
16
26
36
46
What is the problem?
Use the modulo operator. It should be:
awk 'NR%6{printf "%s ",$0}!(NR%6){print}' file
Btw, it looks like your file is using Windows line endings, which leads to the error you reported. Convert them to UNIX before using awk, for example:
sed 's/\r//' file | awk 'NR%6{printf "%s ",$0}!(NR%6){print}'

awk merge two columns by key, joining values

These are my two imput files:
file1.txt
1 34
2 55
3 44
6 77
file2.txt
1 12
2 7
5 32
And I wish my output to be:
1 34 12
2 55 0
3 44 0
5 0 32
6 77 0
I need to do this in awk and although I was able to merge files, I do not know how to do it without losing info...
awk -F"\t" 'NR==FNR {h[$1] = $2; next }{print $1,$2,h[$2]}' file1.txt file2.txt > try.txt
awk '{ if ($3 !="") print $1,$2,$3; else print $1,$2,"0";}' try.txt > output.txt
And the output is:
1 34 12
2 55 7
3 44 0
6 77 0
Sorry, I know this must be very easy, but I am quite new in this world! Please I need help!!! Thanks in advance!!
this command gives you the desired output:
awk 'NR==FNR{a[$1]=$2;next}
{if($1 in a){print $0,a[$1];delete a[$1]}
else print $0,"0"}
END{for(x in a)print x,"0",a[x]}' file2 file1|sort -n|column -t
note that I used sort and column to sort & format the output.
output: (note I guess the 2 55 0 was a typo in your expected output)
1 34 12
2 55 7
3 44 0
5 0 32
6 77 0
Here is another way using join and awk:
join -a1 -a2 -o1.1 2.1 1.2 2.2 -e0 file1 file2 | awk '{print ($1?$1:$2),$3,$4}' OFS='\t'
1 34 12
2 55 7
3 44 0
5 0 32
6 77 0
-a switch allows to join on un-pairable lines.
-o builds our output format
-e allows to specify what should be printed for values that do not exist
awk just completes the final formatting.

Using bash to read elements on a diagonal on a matrix and redirecting it to another file

So, currently i have created a code to do this as shown below. This code works and does what it is supposed to do after I echo the variables:
a=`awk 'NR==2 {print $1}' $coor`
b=`awk 'NR==3 {print $2}' $coor`
c=`awK 'NR==4 {print $3}' $coor`
....but i have to do this for many more lines and i want a more general expression. So I have attempted to create a loop shown below. Syntax wise i don't think anything is wrong with the code, but it is not outputting anything to the file "Cmain".
I was wondering if anyone could help me, I'm kinda new at scripting.
If it helps any, I can also post what i am trying to read.
for (( i=1; i <= 4 ; i++ )); do
for (( j=0; j <= 3 ; j++ )); do
B="`grep -n "cell" "$coor" | awk 'NR=="$i" {print $j}'`"
done
done
echo "$B" >> Cmain
You can replace your lines of awk with this one:
awk '{ for (i=1; i<=NF; i++) if (NR >= 2 && NR == i) print $(i - 1) }' file.txt
Tested input:
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
Output:
11
22
33
44
55
66
77
awk 'BEGIN {f=1} {print $f; f=f+1}' infile > outfile
An alternative using sed and coreutils, assuming space separated input is in infile:
n=$(wc -l infile | cut -d' ' -f1)
for i in $(seq 1 $n); do
sed -n "${i} {p; q}" infile | cut -d' ' -f$i
done

Resources