Pattern matching in unix shell script - shell

I have two files like below:
File 1:
id1
hftujdbbd
bdurijtbr
grhjend
Ghent
id2
fu Rubens
hejdnnd
bdudndn
id3
gjbfbd
vhrjend
rjndnd
.
.
.
File 2:
id1
id2
I need to find the ids in file1 that are matching with ids in file2 and print all the lines related to that matched id. Please let me know how to implement this.
So the expected output is as below:
hftujdbbd
bdurijtbr
grhjend
Ghent
fu Rubens
hejdnnd
bdudndn

Using awk:
awk -F "/" 'NR==FNR { indx[$0]=1;next } indx[$2]==1 && ($1 == 2 || $1 == 3) { print }' file2 file1
Explanation:
awk -F "/" 'NR==FNR { # Set the field separator to "/" and then process the first file (file2) (NR==FNR)
indx[$0]=1; # Set the index of an array (indx) to the line
next
}
indx[$2]==1 && ($1 == 2 || $1 == 3) { # When processing the second file (file1), if there in an entry for the second "/" delimited field in index and the first field is 2 or 3, print the line
print
}' file2 file1

Taking file1 and file2 as above
while read id; do
if grep -w $id file1.txt > /dev/null 2>&1; then
awk "/$id/{flag=1; next} /id[0-9]?/{flag=0} flag" file1.txt
fi
done<file2.txt
output:
abhishekphukan$ bash script.sh
hftujdbbd
bdurijtbr
grhjend
Ghent
fu Rubens
hejdnnd
bdudndn

Related

Write specific columns of files into another files, Who can give me a more concise solution?

I have a troublesome problem about writing specific columns of the file into another file, more details are I have the file1 like below, I need to write the first columns exclude the first row to file2 with one line and separated with '|' sign. And now I have a solution by sed and awk, this missing last step inserts into the top of file2, even though I still believe there should be some more concise solution on account of powerful of awk、sed, etc. So, Who can offer me another more concise script?
sed '1d;s/ .//' ./file1 | awk '{printf "%s|", $1; }' | awk '{if (NR != 0) {print substr($1, 1, length($1) - 1)}}'
file1:
col_name data_type comment
aaa string null
bbb int null
ccc int null
file2:
xxx ccc(whatever is this)
The result of file2 should be this :
aaa|bbb|ccc
xxx ccc(whatever is this)
Assuming there's no whitespace in the column 1 data, in increasing length:
sed -i "1i$(awk 'NR > 1 {print $1}' file1 | paste -sd '|')" file2
or
ed file2 <<END
1i
$(awk 'NR > 1 {print $1}' file1 | paste -sd '|')
.
wq
END
or
{ awk 'NR > 1 {print $1}' file1 | paste -sd '|'; cat file2; } | sponge file2
or
mapfile -t lines < <(tail -n +2 file1)
col1=( "${lines[#]%%[[:blank:]]*}" )
new=$(IFS='|'; echo "${col1[*]}"; cat file2)
echo "$new" > file2
This might work for you (GNU sed):
sed -z 's/[^\n]*\n//;s/\(\S*\).*/\1/mg;y/\n/|/;s/|$/\n/;r file2' file1
Process file1 "wholemeal" by using the -z command line option.
Remove the first line.
Remove all columns other than the first.
Replace newlines by |'s
Replace the last | by a newline.
Append file2.
Alternative using just command line utils:
tail +2 file1 | cut -d' ' -f1 | paste -s -d'|' | cat - file2
Tail file1 from line 2 onwards.
Using the results from the tail command, isolate the first column using a space as the column delimiter.
Using the results from the cut command, serialize each line into one, delimited by |',s.
Using the results from the paste, append file2 using the cat command.
I'm learning awk at the moment.
awk 'BEGIN{a=""} {if(NR>1) a = a $1 "|"} END{a=substr(a, 1, length(a)-1); print a}' file1
Edit: Here's another version that uses an array:
awk 'NR > 1 {a[++n]=$1} END{for(i=1; i<=n; ++i){if(i>1) printf("|"); printf("%s", a[i])} printf("\n")}' file1
Here is a simple Awk script to merge the files as per your spec.
awk '# From the first file, merge all lines except the first
NR == FNR { if (FNR > 1) { printf "%s%s", sep, $1; sep = "|"; } next }
# We are in the second file; add a newline after data from first file
FNR == 1 { printf "\n" }
# Simply print all lines from file2
1' file1 file2
The NR==FNR condition is true when we are reading the first input file: The overall line number NR is equal to the line number within the current file FNR. The final 1 is a common idiom for printing all input lines which make it this far into the script (the next in the first block prevent lines from the first file to reaching this far).
For conciseness, you can remove the comments.
awk 'NR == FNR { if (FNR > 1) { printf "%s%s", sep, $1; sep = "|"; } next }
FNR == 1 { printf "\n" } 1' file1 file2
Generally speaking, Awk can do everything sed can do, so piping sed into Awk (or vice versa) is nearly always a useless use of sed.

Complex csv question: how to generate a final csv after comparing multiple csvs (following manner) using shell scripting?

assume
file1.csv
Schemaname.tablename.columns
exam1
exam2
filetomatch.csv
exam1
exam2
exam4
exam5
exam6
I used
awk 'NR==FNR{a[$1];next} ($1) in a' file1.csv filetomatch.csv >> result.csv (each time one csv is produced)
result
exam 1
exam 2
to match the results.
I have n number of files to comapre to filetomatch.csv
i need out put to be as follows
file matchedcolumns
file1 exam 1
exam 2
file2 exam 4
.
.
.
filen exam 2
exam 3
and so on..
How can i concatenate result.csvs everytime with first field as file name.
also is there a way to show the null columns as well
How can i add null values using this?
Example
File1 Column1
File1 Column1
File2 null
File3 column3
and so on
>> result.csv should be doing the concatenation for you.
for example, create test files
$ for i in {1..4}; do echo $i > file$i.txt; done
$ head file?.txt
==> file1.txt <==
1
==> file2.txt <==
2
==> file3.txt <==
3
==> file4.txt <==
4
run some awk script on all files, print the filename part of output and concatenate the results
$ for f in file{1..4}.txt; do awk '{print FILENAME, $0}' "$f" >> results.csv; done
$ cat results.csv
file1.txt 1
file2.txt 2
file3.txt 3
file4.txt 4
found this two useful:
awk 'NR==FNR{a[$1];next}($1) in a{ print FILENAME, ($1) }' file1.csv filetomatch.csv
Merge the commmon values in a column
awk -F, '{ if (f == $1) { for (c=0; c <length($1) ; c++) printf " "; print FS $2 FS $3 } else { print $0 } } { f = $1 }' file.csv

awk - Compare columns from two files and replace text in first file

I have two files. The first has 1 column and the second has 3 columns. I want to compare first columns of both files. If there is a coincidence, replace column 2 and 3 for specific values; if not, print the same line.
File 1:
$ cat file1
26
28
30
File 2:
$ cat file2
1,a,0
2,a,0
22,a,0
23,a,0
24,a,0
25,a,0
26,r,1510139756
27,a,0
28,r,1510244156
29,a,0
30,r,1510157364
31,a,0
32,a,0
33,r,1510276164
34,a,0
40,a,0
Desired output:
$ cat file2
1,a,0
2,a,0
22,a,0
23,a,0
24,a,0
25,a,0
26,a,0
27,a,0
28,a,0
29,a,0
30,a,0
31,a,0
32,a,0
33,r,1510276164
34,a,0
40,a,0
I am using gawk to do this (it's inside a shell script and I am using solaris) but I can't get the output right. It only prints the lines that matches:
$fuente="file2"
gawk -v fuente="$fuente" 'FNR==NR{a[FNR]=$1; next}{print $1,$2="a",$3="0" }' $fuente file1 > file3
The output I got:
$ cat file3
26 a 0
28 a 0
30 a 0
awk one-liner:
awk 'NR==FNR{ a[$1]; next }$1 in a{ $2="a"; $3=0 }1' file1 FS=',' OFS=',' file2
The output:
1,a,0
2,a,0
22,a,0
23,a,0
24,a,0
25,a,0
26,a,0
27,a,0
28,a,0
29,a,0
30,a,0
31,a,0
32,a,0
33,r,1510276164
34,a,0
40,a,0
Really spread out for clarity; called (fuente.awk) like so:
awk -F \, -v fuente=file1 -f fuente.awk file2 # -F == IFS
BEGIN {
OFS="," # set OFS to make printing easier
while (getline x < fuente > 0) # safe way; read file into array
{
a[++i]=x # stuff indexed array
}
}
{ # For each line in file2
for (k=1 ; k<=i ; k++) # Lop over array (elements in file1)
{
if (($1==a[k]) && (! flag))
{
print($1,"a",0) # Found print new line
flag=1 # print only once
}
}
if (! flag) # Not found
{
print($0) # print original
}
flag=0 # reset flag
}
END { }

Consolidate two tables awk

I have two files (all tab delimited):
database.txt
MAR001;string1;H
MAR002;string2;G
MAR003;string3;H
data.txt
data1;MAR002
data2;MAR003
And I want to consolidate these two tables using the MAR### column. Expected output (tab-delimited):
data1;MAR002;string2;G
data2;MAR003;string3;H
I want to use awk; this is my attempt:
awk 'BEGIN{FS=OFS="\t"} FNR == NR { a[$2] = $1; next } $2 in a { print $0, a[$1] }' data.txt database.txt
but this fails...
I would just use the join command. It's very easy:
join -t \; -1 1 -2 2 database.txt data.txt
MAR002;string2;G;data1
MAR003;string3;H;data2
You can specify output column order using -o. For example:
join -t \; -1 1 -2 2 -o 2.1,2.2,1.2,1.3 database.txt data.txt
data1;MAR002;string2;G
data2;MAR003;string3;H
P.S. I did assume your files are "semicolon separated" and not "tab separated". Also, your files need to be sorted by the key column.
awk -F '\t' 'FNR==1 && NR == 1 { strt=1 } FNR==1 && NR != 1 { strt=0} strt==1 {dat[$1]=$2";"$3 } strt==0 { if ( dat[$2] != "" ) { print $1";"$2";"dat[$2] } }' database.txt data.txt
Read database.txt in first and read the data into an array dat. Then when we encounter the data.txt file, check for entries in the dat array and print the required data if there is one.
Output:
data1;MAR002;string2;G
data2;MAR003;string3;H
First of all ; and \t are different characters. If your real input files are tab delimited, here is the fix on your codes:
Change your codes into:
awk '....... $1 in a { print a[$1], $0 }' data.txt database.txt

using awk how to merge 2 files, say A & B and do a left outer join function and include all columns in both files

I have multiple files with different number of columns, i need to do a merge on first file and second file and do a left outer join in awk respective to first file and print all columns in both files matching the first column of both files.
I have tried below codes to get close to my output. But i can't print the ",', where no matching number is found in second file. Below is the code. Join needs sorting and takes more time than awk. My file sizes are big, like 30 million records.
awk -F ',' '{
if (NR==FNR){ r[$1]=$0}
else{ if($1 in r)
r[$1]=r[$1]gensub($1,"",1)}
}END{for(i in r){print r[i]}}' file1 file2
file1
number,column1,column2,..columnN
File2
numbr,column1,column2,..columnN
Output
number,file1.column1,file1.column2,..file1.columnN,file2.column1,file2.column3...,file2.columnN
file1
1,a,b,c
2,a,b,c
3,a,b,c
5,a,b,c
file2
1,x,y
2,x,y
5,x,y
6,x,y
7,x,y
desired output
1,a,b,c,x,y
2,a,b,c,x,y
3,a,b,c,,,
5,a,b,c,x,y
$ cat tst.awk
BEGIN { FS=OFS="," }
NR==FNR {
tail = gensub(/[^,]*,/,"",1)
if ( FNR == 1 ) {
empty = gensub(/[^,]/,"","g",tail)
}
file2[$1] = tail
next
}
{ print $0, ($1 in file2 ? file2[$1] : empty) }
$ awk -f tst.awk file2 file1
1,a,b,c,x,y
2,a,b,c,x,y
3,a,b,c,,
5,a,b,c,x,y
The above uses GNU awk for gensub(), with other awks it's just one more step to do [g]sub() on the appropriate variable after initially assigning it.
An interesting (to me at least!) alternative you might want to test for a performance difference is:
$ cat tst.awk
BEGIN { FS=OFS="," }
NR==FNR {
tail = gensub(/[^,]*,/,"",1)
idx[$1] = NR
file2[NR] = tail
if ( FNR == 1 ) {
file2[""] = gensub(/[^,]/,"","g",tail)
}
next
}
{ print $0, file2[idx[$1]] }
$ awk -f tst.awk file2 file1
1,a,b,c,x,y
2,a,b,c,x,y
3,a,b,c,,
5,a,b,c,x,y
but I don't really expect it to be any faster and it MAY even be slower.
you can try,
awk 'BEGIN{FS=OFS=","}
FNR==NR{d[$1]=substr($0,index($0,",")+1); next}
{print $0, ($1 in d?d[$1]:",")}' file2 file1
you get,
1,a,b,c,x,y
2,a,b,c,x,y
3,a,b,c,,
5,a,b,c,x,y
join to the rescue:
$ join -t $',' -a 1 -e '' -o 0,1.2,1.3,1.4,2.2,2.3 file1.txt file2.txt
Explanation:
-t $',': Field separator token.
-a 1: Do not discard records from file 1 if not present in file 2.
-e '': Missing records will be treated as an empty field.
-o: Output format.
file1.txt
1,a,b,c
2,a,b,c
3,a,b,c
5,a,b,c
file2.txt
1,x,y
2,x,y
5,x,y
6,x,y
7,x,y
Output
1,a,b,c,x,y
2,a,b,c,x,y
3,a,b,c,,
5,a,b,c,x,y

Resources