I need to extract the values of the 2nd column from a file while the value from $1 = 2 until $1 = 3. As an example, from the file
1 | 2.158e+06
| 2.31e+06
| 5.008e+06
2 | 693000
| 718000
| 725000
3 | 2.739e+06
| 2.852e+06
| 2.865e+06
| 2.874e+06
4 | 4.033e+06
| 4.052e+06
| 4.059e+06
I would like to extract values of the 2nd column from $1=2 until $1=3
693000
718000
725000
I tried using awk, but I have just figured out how to extract the values from $1=1 until $2=2
awk -F "|" '{if ($1>1) exit; else print $2}' foo.txt
Output
2.158e+06
2.31e+06
5.008e+06
I also tried this
awk -F "|" '{i=2; do {print $2; i++} while ($4); if ($1>i) exit}' foo.txt
But it gives me the whole 2nd column
2.158e+06
2.31e+06
5.008e+06
693000
718000
725000
2.739e+06
2.852e+06
2.865e+06
2.874e+06
4.033e+06
Does anyone know how to do this using awk or other command?
Thanks
A range pattern could work nicely here. The pattern $1==2,$1==3 will start executing the action when the first column is 2 and stop when it is 3. (Since the range is inclusive we need to check that the first column is not 3 before printing the second column in this case.)
$ awk -F\| '$1==2,$1==3 { if ($1 != 3) print $2 }' foo.txt
693000
718000
725000
hzhang#dell-work ~ $ cat sample.csv
1 | 2.158e+06
| 2.31e+06
| 5.008e+06
2 | 693000
| 718000
| 725000
3 | 2.739e+06
| 2.852e+06
| 2.865e+06
| 2.874e+06
4 | 4.033e+06
| 4.052e+06
| 4.059e+06
hzhang#dell-work ~ $ awk -F"|" 'BEGIN{c=0}{if($1>=3){c=0} if(c==1 ||($1>=2 && $1<3)){c = 1;print $2}}' sample.csv
693000
718000
725000
I set a flag c. If $1 is not between 2 and 3, the flag set to 0, otherwise it is 1, which means we can print $2 out.
This is what I came up with:
awk -F "|" '{if ($1==3) exit} /^2/,EOF {print $2}' file
1) /^2/,EOF {print $2} signifies print everything in second column up to the end of file, starting with a row that begins with a 2
2) {if ($1==3) exit} stops printing once the first column is a number 3
Output
693000
718000
725000
using getline statement in awk tactically
awk -v FS=" [|] " '$1=="2"{print $2;getline;while(($1==" "||$1==2)){print $2;$0="";getline>0}}' my_file
Here is another awk
awk -F\| '/^2$/ {f=1} /^3$/ {f=0} f {print $2+0}' file
693000
718000
725000
-F\| set field separator to |
/^2/ if file start with 2, set flag f to true.
/^3/ if file start with 2, set flag f to false.
f {print $2+0}' if flag f is true, print filed 2.
$2+0 this is used to remove space in front of number. Remove it if it contains letters.
Just so you don't have to read the entire file, exit when you see a '3':
$ awk -F\| '/^2\s+/ {f=1} /^3\s+/ {exit} f {print $2+0}' file
693000
718000
725000
Related
I have a line like this
3672975 3672978 3672979
awk '{print $1}' will return the first number 3672975
If I still want the first number, but indicating it is the 3rd one from the bottom, how should I adjust awk '{print $-3}'?
The reason is, I have hundreds of numbers, and I always want to obtain the 3rd one from the bottom.
Can I use awk to obtain the total number of items first, then do the subtraction?
$NF is the last field, $(NF-1) is the one before the last etc., so:
$ awk '{print $(NF-2)}'
for example:
$ echo 3672975 3672978 3672979 | awk '{print $(NF-2)}'
3672975
Edit:
$ echo 1 10 100 | awk '{print $(NF-2)}'
1
or with cut and rev
echo 1 2 3 4 | rev | cut -d' ' -f 3 | rev
2
I was about to find column 1 and 3 from a fileA to col 2 and 4 of fileB.
file1
111111,22222,555555
xxxxxx,555555,yyyyy
file2
xxxxxx,111111,oooooo,555555
yyyyyy,222222,111111,555555
output
111111,22222,555555 | xxxxxx,111111,oooooo,555555
So far, this code below only matches column 1 (fileA) from column 2(fileB).
awk -F, '{print $1}' fileA | grep "$(awk -F, '{print $2}' fileB)"
Use a single awk script to process the files. E.g:
awk -F, 'NR==FNR{a[$1,$3]=$0;next}($2,$4)in a{print a[$2,$4]" | "$0}' fileA fileB
111111,22222,555555 | xxxxxx,111111,oooooo,555555
I would like to zero one column of a csv file. Let's assume the csv file looks like this:
col1|col2|col3
v | 54| t
a | 25| f
d | 53| f
s | 04| t
Using awk this way, gives me almost what I want
command:
awk -F'|' -v OFS='|' '$2=0.0;7' input.csv > output.csv
the result
col1|0|col3
v |0| t
a |0| f
d |0| f
s |0| t
But notice that the column header has been also zeroed which is something I am trying to avoid. So I tried to skip the first line with the awk command but I am getting an empty file
awk -F'|' -v OFS='|' 'NR<1 {exit} {$5=0.0;7}' input.csv > output.csv
What am I missing?
Just apply the rule from the 2nd line on with a NR>1 {}:
$ awk -F'|' -v OFS='|' 'NR>1{$2=0.0}7' file
col1|col2|col3
v |0| t
a |0| f
d |0| f
s |0| t
Why wasn't your approach awk -F'|' -v OFS='|' 'NR<1 {exit} {$5=0.0;7}'working?
The expression NR<1{exit} is never True because NR is always at least 1.
This means that the second expression {$5=0.0;7} is always evaluated. The $5=0.0 is fine, but 7 is not printing as you would want to, because to print a line you need some kind of print instruction. It would work if you moved the 7 outside the braces, so that it will evaluate to True and the record will be printed: awk -F'|' -v OFS='|' 'NR<1 {exit} {$5=0.0}7'.
But this wouldn't do what you want. Instead, you may want to say NR==1 {next} to skip the first line. However, this will prevent it from being printed:
$ awk -F'|' -v OFS='|' 'NR==1{next} $2=0.0;7' file
v |0| t
a |0| f
d |0| f
s |0| t
Change exit to next to skip the remaining actions for that first line.
Here is my code that I want to use to separate 3 columns from hist.txt into 2 separate files, hist1.dat with first and second column and hist2.dat with first and third column. The columns in hist.txt may be separated with more than one space. I want to save in histogram1.dat and histogram2.dat the first n lines until the last nonzero value.
The script creates histogram1.dat correct, but histogram2.dat contains all the lines from hist2.dat.
hist.txt is like :
http://pastebin.com/JqgSKZrP
#!bin/bash
sed 's/\t/ /g' hist.txt | awk '{print $1 " " $2;}' > hist1.dat
sed 's/\t/ /g' hist.txt | awk '{print $1 " " $3;}' > hist2.dat
head -n $( awk 'BEGIN {last=1}; {if($2!=0) last=NR};END {print last}' hist1.dat) hist1.dat > histogram1.dat
head -n $( awk 'BEGIN {last=1}; {if($2!=0) last=NR};END {print last}' hist2.dat) hist2.dat > histogram2.dat
What is the cause of this problem? Might it be due to some special restriction with head?
Thanks.
For your first histogram, try
awk '$2 ~ /000000/{exit}{print $1, $2}' hist.txt
and for your second:
awk '$3 ~ /000000/{exit}{print $1, $3}' hist.txt
Hope I understood you correctly...
I'm trying to use AWK to post the second row of last line of this command (the total disk space):
df --total
The command I'm using is:
df --total | awk 'FNR == NF {print $2}'
But it does not get it right.
Is there another way to do it?
You're using the awk variable NF which is Number of Fields. You might have meant NR, Number of Rows, but it's easier to just use END:
df --total | awk 'END {print $2}'
You can use tail first then use awk:
df --total | tail -1 | awk '{print $2}'
One way to do it is with a tail/awk combination, the former to get just the last line, the latter print the second column:
df --total | tail -1l | awk '{print $2}'
A pure-awk solution is to simply store the second column of every line and print it out at the end:
df --total | awk '{store = $2} END {print store}'
Or, since the final columns are maintained in the END block from the last line, simply:
df --total | awk 'END {print $2}'
awk has no concept of "this is the last line". sed does though:
df --total | sed -n '$s/[^[:space:]]\+[[:space:]]\+\([[:digit:]]\+\).*/\1/p'