match specific columns to another file - shell

I was about to find column 1 and 3 from a fileA to col 2 and 4 of fileB.
file1
111111,22222,555555
xxxxxx,555555,yyyyy
file2
xxxxxx,111111,oooooo,555555
yyyyyy,222222,111111,555555
output
111111,22222,555555 | xxxxxx,111111,oooooo,555555
So far, this code below only matches column 1 (fileA) from column 2(fileB).
awk -F, '{print $1}' fileA | grep "$(awk -F, '{print $2}' fileB)"

Use a single awk script to process the files. E.g:
awk -F, 'NR==FNR{a[$1,$3]=$0;next}($2,$4)in a{print a[$2,$4]" | "$0}' fileA fileB
111111,22222,555555 | xxxxxx,111111,oooooo,555555

Related

Append 2 column variables in unix

I have a file as follows.
file1.csv
H,2 A:B,pq
D,34 C:B,wq
D,64 F:B,rq
D,6 R:B,tq
I want to format 2nd a column as follows
H,02 0A:0B,pq
D,34 0C:0B,wq
D,64 0F:0B,rq
D,06 0R:0B,tq
I am able to separate the column and format it but cannot merge it
I use following command
formated_nums =`awk -F"," '{print $2}' file1.csv | awk '{print $1}' | awk '{if(length($1)!=2){$1="0"$1}}1'`
formated_letters = `awk -F"," '{print $2}' file1.csv | awk '{print $2}' | awk -F":" '{if(length($1)!=2){$1="0"$1}; if(length($2)!=2){$2="0"$2}}1'| awk '{print $1":"$2}'`
Now I want to merge formated_nums and formated_letters with a space in between
I tried echo "${formated_nums} ${formated_letters}" but it takes variables as rows and appends the whole thing as a row
The simplest I found in awk is to use another separation including space and ':' and reformat the final layout. The only real tricky part is the number that need sometimes to add a 0 in front but it's trivial in formating because number are never bigger than 2 digit (here)
awk -F '[[:blank:],:]' '{printf("%s,%02d 0%s:0%s,%s", $1, $2, $3, $4, $5)}' YourFile
Assuming your data are in the same format (no bigger latest field with space or other "separator" inside)
An alternative awk solution based on gnu awk :
awk -F"[, :]" '{sub($2,sprintf("%02d",$2));sub($3,"0" $3);sub($4,"0" $4)}1' file1
H,02 0A:0B,pq
D,34 0C:0B,wq
D,64 0F:0B,rq
D,06 0R:0B,tq
It sounds like this is what you're really looking for:
$ awk '
BEGIN { FS=OFS=","; p=2 }
{ split($2,t,/[ :]/); for (i in t) {n=length(t[i]); t[i] = (n<p ? sprintf("%0*s",p-n,0) : "") t[i]; $2=t[1]" "t[2]":"t[3]} }
1
' file
H,02 0A:0B,pq
D,34 0C:0B,wq
D,64 0F:0B,rq
D,06 0R:0B,tq

How to extract year from date format and find all years greater than

I need to print all years greater than 1900 in a field. I have used to awk to get the field with the date. But I can not figure out how to use awk to pull only the years that are >=1900.
So far I pulled this field by entering this:
awk -F',' '{print $5}' presidents.csv
this gives me these dates
4/03/1893
4/03/1897
14/9/1901
4/3/1909
4/03/1913
4/03/1921
2/8/1923
4/03/1929
4/03/1933
12/4/1945
20/01/1953
20/01/1961
22/11/1963
20/1/1969
9/8/1974
20/01/1977
20/01/1981
20/01/1989
20/01/1993
20/01/2001
20/01/2009
Incumbent
awk -F',' '{print $5}' presidents.csv | awk -F '/' '$3 > 1900 { print $1"/"$2"/"$3 }'
Try this
#instead of presidents.csv
echo "
f1,f2,f3,f4,4/03/1893
f1,f2,f3,f4,4/03/1897
f1,f2,f3,f4,14/9/1901
f1,f2,f3,f4,4/3/1909
f1,f2,f3,f4,4/03/1913
f1,f2,f3,f4,4/03/1921" \
| awk -F',' '{print $5}' \
| awk '{split($0,lineArr,"/");if (lineArr[3] > 1900) print $0}'
14/9/1901
4/3/1909
4/03/1913
4/03/1921

Print column using while loop in awk

I need to extract the values of the 2nd column from a file while the value from $1 = 2 until $1 = 3. As an example, from the file
1 | 2.158e+06
| 2.31e+06
| 5.008e+06
2 | 693000
| 718000
| 725000
3 | 2.739e+06
| 2.852e+06
| 2.865e+06
| 2.874e+06
4 | 4.033e+06
| 4.052e+06
| 4.059e+06
I would like to extract values of the 2nd column from $1=2 until $1=3
693000
718000
725000
I tried using awk, but I have just figured out how to extract the values from $1=1 until $2=2
awk -F "|" '{if ($1>1) exit; else print $2}' foo.txt
Output
2.158e+06
2.31e+06
5.008e+06
I also tried this
awk -F "|" '{i=2; do {print $2; i++} while ($4); if ($1>i) exit}' foo.txt
But it gives me the whole 2nd column
2.158e+06
2.31e+06
5.008e+06
693000
718000
725000
2.739e+06
2.852e+06
2.865e+06
2.874e+06
4.033e+06
Does anyone know how to do this using awk or other command?
Thanks
A range pattern could work nicely here. The pattern $1==2,$1==3 will start executing the action when the first column is 2 and stop when it is 3. (Since the range is inclusive we need to check that the first column is not 3 before printing the second column in this case.)
$ awk -F\| '$1==2,$1==3 { if ($1 != 3) print $2 }' foo.txt
693000
718000
725000
hzhang#dell-work ~ $ cat sample.csv
1 | 2.158e+06
| 2.31e+06
| 5.008e+06
2 | 693000
| 718000
| 725000
3 | 2.739e+06
| 2.852e+06
| 2.865e+06
| 2.874e+06
4 | 4.033e+06
| 4.052e+06
| 4.059e+06
hzhang#dell-work ~ $ awk -F"|" 'BEGIN{c=0}{if($1>=3){c=0} if(c==1 ||($1>=2 && $1<3)){c = 1;print $2}}' sample.csv
693000
718000
725000
I set a flag c. If $1 is not between 2 and 3, the flag set to 0, otherwise it is 1, which means we can print $2 out.
This is what I came up with:
awk -F "|" '{if ($1==3) exit} /^2/,EOF {print $2}' file
1) /^2/,EOF {print $2} signifies print everything in second column up to the end of file, starting with a row that begins with a 2
2) {if ($1==3) exit} stops printing once the first column is a number 3
Output
693000
718000
725000
using getline statement in awk tactically
awk -v FS=" [|] " '$1=="2"{print $2;getline;while(($1==" "||$1==2)){print $2;$0="";getline>0}}' my_file
Here is another awk
awk -F\| '/^2$/ {f=1} /^3$/ {f=0} f {print $2+0}' file
693000
718000
725000
-F\| set field separator to |
/^2/ if file start with 2, set flag f to true.
/^3/ if file start with 2, set flag f to false.
f {print $2+0}' if flag f is true, print filed 2.
$2+0 this is used to remove space in front of number. Remove it if it contains letters.
Just so you don't have to read the entire file, exit when you see a '3':
$ awk -F\| '/^2\s+/ {f=1} /^3\s+/ {exit} f {print $2+0}' file
693000
718000
725000

Awk and head not identifying columns properly

Here is my code that I want to use to separate 3 columns from hist.txt into 2 separate files, hist1.dat with first and second column and hist2.dat with first and third column. The columns in hist.txt may be separated with more than one space. I want to save in histogram1.dat and histogram2.dat the first n lines until the last nonzero value.
The script creates histogram1.dat correct, but histogram2.dat contains all the lines from hist2.dat.
hist.txt is like :
http://pastebin.com/JqgSKZrP
#!bin/bash
sed 's/\t/ /g' hist.txt | awk '{print $1 " " $2;}' > hist1.dat
sed 's/\t/ /g' hist.txt | awk '{print $1 " " $3;}' > hist2.dat
head -n $( awk 'BEGIN {last=1}; {if($2!=0) last=NR};END {print last}' hist1.dat) hist1.dat > histogram1.dat
head -n $( awk 'BEGIN {last=1}; {if($2!=0) last=NR};END {print last}' hist2.dat) hist2.dat > histogram2.dat
What is the cause of this problem? Might it be due to some special restriction with head?
Thanks.
For your first histogram, try
awk '$2 ~ /000000/{exit}{print $1, $2}' hist.txt
and for your second:
awk '$3 ~ /000000/{exit}{print $1, $3}' hist.txt
Hope I understood you correctly...

AWK Print Second Column of Last Line

I'm trying to use AWK to post the second row of last line of this command (the total disk space):
df --total
The command I'm using is:
df --total | awk 'FNR == NF {print $2}'
But it does not get it right.
Is there another way to do it?
You're using the awk variable NF which is Number of Fields. You might have meant NR, Number of Rows, but it's easier to just use END:
df --total | awk 'END {print $2}'
You can use tail first then use awk:
df --total | tail -1 | awk '{print $2}'
One way to do it is with a tail/awk combination, the former to get just the last line, the latter print the second column:
df --total | tail -1l | awk '{print $2}'
A pure-awk solution is to simply store the second column of every line and print it out at the end:
df --total | awk '{store = $2} END {print store}'
Or, since the final columns are maintained in the END block from the last line, simply:
df --total | awk 'END {print $2}'
awk has no concept of "this is the last line". sed does though:
df --total | sed -n '$s/[^[:space:]]\+[[:space:]]\+\([[:digit:]]\+\).*/\1/p'

Resources