bash - Concatenate files in different subfolders into a single file and have each file name in the first column - bash

I am trying to concatenate a few thousand files that are in different subfolders into a single file and also have the name of each concatenated file inserted as the first column so that I know which file each data row came from. Essentially starting with something like this:
EDIT: I neglected to mention that each file has the same header so I updated the request accordingly.
Folder1
file1.txt
A B C
123 010 ...
456 020 ...
789 030 ...
Folder2
file2.txt
A B C
abc 100 ...
efg 200 ...
hij 300 ...
and outputting this:
CombinedFile.txt
A B C
file1 123 010 ...
file1 456 020 ...
file1 789 030 ...
file2 abc 100 ...
file2 efg 200 ...
file2 hij 300 ...
After reading this post, I have tried the following code, but end up with a syntax error (apologies, I'm super new to awk!)
shopt -s globstar
for filename in path/**/*.txt; do
awk '{print FILENAME "\t" $0}' *.txt > CombinedFile.txt
done
Thanks for your help!

This single awk should be able to do it without any looping:
shopt -s globstar
awk 'FNR == 1 {
f = FILENAME
gsub(/^.*\/|\.[^.]+$/, "", f)
if (NR > 1) # show header for first file only
next
}
{
print f, $0
}' path/**/*.txt > CombinedFile.txt
cat CombinedFile.txt
file1 123 010
file1 456 020
file1 789 030
file2 abc 100
file2 efg 200
file2 hij 300

Related

How to find text files with nothing but a pattern on a single line using bash

I am trying to find text files using bash that contain nothing but a specific pattern on 1 line of the file.
For example, I have the following textfile:
1234123 123412341 0000 23423 23422
121231 123123 12312 12312 1231
567 567 43 234 12
0000
929 020 040 040 0000
This file contains a line (line 4), that exclusively has pattern 0000.
I tried ls | grep 0000, however, that returns also the files in which the pattern is located elsewhere in the file and not necessarily 'solo' on a line.
How do you find a pattern using bash that is exclusively present on a single line of the file?
Assuming we have four input files:
$ head file*
==> file1 <==
0000
0000
==> file2 <==
abcd
0000
abcd
==> file3 <==
0000x
==> file4 <==
abcd
file4 doesn't contain the pattern at all, file3 contains the pattern, but it's not on a line on its own, file1 has multiple lines that contain just the pattern, and file2 has exactly one line with just the pattern.
To get all files that contain the pattern anywhere:
$ grep -l '0000' file*
file1
file2
file3
To get all files that contain lines with nothing but the pattern:
$ grep -lx '0000' file*
file1
file2
And if you wanted only files that contain exactly one line with nothing but the pattern, you could use -c to get a count first:
$ grep -xc '0000' file*
file1:2
file2:1
file3:0
file4:0
and then use awk to print only the files with exactly one match:
$ grep -xc '0000' file* | awk -F: '$2==1 {print $1}'
file2
With GNU awk, you could also do this directly:
$ awk 'BEGINFILE {c=0} /^0000$/ {++c} ENDFILE {if (c==1) print FILENAME}' file*
file2

Replace number in one file with the number in other file

I have a problem. I have two files (file1 and file2). Both files contain number (with different values) which characterize same variable from different estimations. In file1 this number1 is for example in row beginning with name var1 in field $3, in file2 this number2 is in row beginning with name var2 and is in field $2. I want take number1 from file1 and replace number2 in file2 with it. I tried following script, but it is not working, in output nothing is changed compared to original file2:
#! /bin/bash
Var1=$(cat file1 | grep 'var1' | awk '{printf "%s", $3}' )
Var2=$(cat file2 | grep 'var2' | awk '{printf "%s", $2}' )
cat file2 | awk '{gsub(/'$Var2'/,'$Var1'); print}'
Thanks in advance!
Addition: For example, in file1 I have:
Tomato 2.154 3.789
Apple 1.458 3.578
Orange 2.487 4.045
In file2:
Banana 2.892
Apple 1.687
Mango 2.083
I want to change file2 so, that it would be:
Banana 2.892
Apple 3.578
Mango 2.083
Using this as file1:
var1 junk 101
var2 junk 102
var3 junk 103
And this as file2:
var1 201
var2 202
var3 203
This will extract field 3 from file1 where field 1 is var1:
awk '$1=="var1"{print $3}' file1
101
This will replace field 2 in file2 with x (101) where the first field is var2:
awk -v x=101 '$1=="var2"{$2=x}1' file2
var1 201
var2 101
var3 203
And combining them, you get:
awk -v x=$(awk '$1=="var1"{print $3}' file1) '$1=="var2"{$2=x}1' file2
var1 201
var2 101
var3 203
Assuming you want to overwrite the first file, you can do a conditional mv that runs only when things worked:
awk -v x=$(awk '$1=="var1"{print $3}' file1) '$1=="var2"{$2=x}1' file2 > /tmp/a && mv /tmp/a file2

print the full line of the file if a string matched from another file in unix shell scripting

File1 id.txt
101
102
103
File2 emp_details.txt
101 john USA
103 Alex USA
104 Nike UK
105 phil UK
if the id of a.txt match with the first column of emp_details.txt then out put with full line to a new file matched.txt.If not matched then out put with only id to a new file notmatched.txt
example:
matched.txt
101 john USA
103 Alex USA
unmatched.txt (assumed by the editor)
102
grep -f f1 f2 > matched
grep -vf <(awk '{print $1}' matched) f1 > not_matched
Explanation:
use file1 as pattern to search in file2 and store matched results in matched file
use matched file's column1 as pattern to search in file1 and store non-matches in not_matched file
-v means "invert the match" in grep
Output :
$ cat matched
101 john USA
103 Alex USA
$ cat not_matched
102
Using awk:
One-liner:
awk 'FNR==NR{ arr[$1]; next }($1 in arr){ print >"matched.txt"; delete arr[$1] }END{for(i in arr)print i >"unmatched.txt"}' file1 file2
Better Readable:
awk '
FNR==NR{
arr[$1];
next
}
($1 in arr){
print >"matched.txt";
delete arr[$1]
}
END{
for(i in arr)
print i >"unmatched.txt"
}
' file1 file2
Test Results:
$ cat file1
101
102
103
$ cat file2
101 john USA
103 Alex USA
104 Nike UK
105 phil UK
$ awk 'FNR==NR{arr[$1];next }($1 in arr){print >"matched.txt";delete arr[$1]}END{for(i in arr)print i >"unmatched.txt"}' file1 file2
$ cat matched.txt
101 john USA
103 Alex USA
$ cat unmatched.txt
102
Usually we expect you to explain what you have tried and where you are stuck. We usually don't provide complete answers on this site. As it's just a few lines lines, I hacked up a not very efficient version. Simply loop over the id file and use egrep to find the matched and unmatched lines.
#!/bin/bash
while read p; do
egrep "^$p" emp_details.txt >> matched.txt
done <id.txt
while read p; do
if ! egrep -q "^$p" emp_details.txt; then
echo $p >> unmatched.txt;
fi
done <id.txt
It's an another thought compared to #Akshay Hegde's answer. Set the map for $1 and $0 in emp_details.txt into array a.
awk 'NR==FNR{a[$1]=$0;next} {if($1 in a){print a[$1]>>"matched.txt"}else{print $1 >> "unmatched.txt"}}' emp_details.txt id.txt

how to extract lines between two patterns only with awk?

$ awk '/abc/{flag=1; next} /edf/{flag=0} flag' file
flag will print $0, but I only need the first matching lines from two strings.
input:
abc
111
222
edf
333
444
abc
555
666
edf
output:
111
222
So I'm assuming you want to print out the matching lines only for 1st occurrence.
For that you can just use an additional variable and set it once flag goes 0
$ cat file
abc
111
222
edf
333
444
abc
555
666
edf
$ awk '/abc/{flag=1; next} /edf/{if(flag) got1stoccurence=1; flag=0} flag && !got1stoccurence' file
111
222
If you only want the first set of output, then:
awk '/abc/{flag=1; next} /edf/{if (flag == 1) exit} flag' file
Or:
awk '/abc/{flag++; next} /edf/{if (flag == 1) flag++} flag == 1' file
There are other ways to do it too, no doubt. The first is simple and to the point. The second is more flexible if you also want to process the first group of lines appearing between another pair of patterns.
Note that if the input file contains:
xyz
edf
pqr
abc
111
222
edf
It is important not to do anything about the first edf; it is an uninteresting line because no abc line has been read yet.
Using getline with while:
$ awk '/abc/ { while(getline==1 && $0!="edf") print; exit }' file
111
222
Look for /abc/ and once found records will be outputed in the while loop until edf is found.
$ awk '/edf/{exit} f; /abc/{f=1}' file
111
222
If it was possible for edf to appear before abc in your input then it'd be:
$ awk 'f{if (/edf/) exit; print} /abc/{f=1}' file
111
222

Compare one to one lines in 2 different files using shell scripting

I have 2 files:
File1 --------------------------------------->File2
abc -----------------------------------------> abc
cde -----------------------------------------> cde,xyz,efg,hij,...,n
efg -----------------------------------------> lmn,opq,weq,...n
Now I want to File1 line1 -> File2 line1, line 2 -> line2 and so on...
However, in file2 a single line can have multiple entries separated with 'comma'.
now if the entry in file1 matches with the any of the corresponding line entry in file 2 -> result ok
Else show the diff...
For example:
FILE1 ---------------------- FILE2
cde ---------------------- cde,xyz,efg,hij,opt
the result should be ok because cde exist in both files.
Can you please help me out to write a shell script for the same
sdiff gave me the entries difference also
Consider these two test files:
$ cat file1
abc
cde
efg
$ cat file2
abc
cde,xyz,efg,hij,n
lmn,opq,weq,n
Consider the command:
$ awk -F, 'FNR==NR{a[NR]=$1;next} {f=0;for (i=1;i<=NF;i++)if($i==a[FNR])f=1;if(f)print "OK";else print a[FNR]" -----> " $0}' file1 file2
OK
OK
efg -----> lmn,opq,weq,n
This prints OK on every line for which the key in file1 is found anywhere on the corresponding line in file2. If it is not, it prints both lines as shown.
Another example
From the comments, consider these two files in which all lines have a match:
$ cat f1
abc
cde
mno
$ cat f2
abc
efg,cde,hkl
mno
$ awk -F, 'FNR==NR{a[NR]=$1;next} {f=0;for (i=1;i<=NF;i++)if($i==a[FNR])f=1;if(f)print "OK";else print a[FNR]" -----> " $0}' f1 f2
OK
OK
OK

Resources