How to fill empty lines from one file with corresponding lines from another file, in BASH? - bash

I have two files, file1.txt and file2.txt. Each has an identical number of lines, but some of the lines in file1.txt are empty. This is easiest to see when the content of the two files is displayed in parallel:
file1.txt file2.txt
cat bear
fish eagle
spider leopard
snail
catfish rainbow trout
snake
koala
rabbit fish
I need to assemble these files together, such that the empty lines in file1.txt are filled with the data found in the lines (of the same line number) from file2.txt. The result in file3.txt would look like this:
cat
fish
spider
snail
catfish
snake
koala
rabbit
The best I can do so far, is create a while read -r line loop, create a counter that counts how many times the while loop has looped, then use an if-conditional to check if $line is empty, then use cut to obtain the line number from file2.txt according to the number on the counter. This method seems really inefficient.
Sometimes file2.txt might contain some empty lines. If file1.txt has an empty line and file2.txt also has an empty line in the same place, the result is an empty line in file3.txt.
How can I fill the empty lines in one file with corresponding lines from another file?

paste file1.txt file2.txt | awk -F '\t' '$1 { print $1 ; next } { print $2 }'

Here is the way to handle these files with awk:
awk 'FNR==NR {a[NR]=$0;next} {print (NF?$0:a[FNR])}' file2 file1
cat
fish
spider
snail
catfish
snake
koala
rabbit
First it store every data of the file2 in array a using record number as index
Then it prints file1, bit it thest if file1 contains data for each record
If there is data for this record, then use it, if not get one from file2

One with getline (harmless in this case) :
awk '{getline p<f; print NF?$0:p; p=x}' f=file2 file1

Just for fun:
paste file1.txt file2.txt | sed -E 's/^ //g' | cut -f1
This deletes tabs that are at the beginning of a line (those missing from file1) and then takes the first column.
(For OSX, \t doesn't work in sed, so to get the TAB character, you type ctrl-V then Tab)

a solution without awk :
paste -d"#" file1 file2 | sed 's/^#\(.*\)/\1/' | cut -d"#" -f1

Here is a Bash only solution.
for i in 1 2; do
while read line; do
if [ $i -eq 1 ]; then
arr1+=("$line")
else
arr2+=("$line")
fi
done < file${i}.txt
done
for r in ${!arr1[#]}; do
if [[ -n ${arr1[$r]} ]]; then
echo ${arr1[$r]}
else
echo ${arr2[$r]}
fi
done > file3.txt

Related

replace different text in different lines using sed

I need to do the following:
I have two files, the first one contains only the lines that are going to be modified:
1
2
3
and the second contains the text that is going to be replaced in original file (final_output.txt)
13e
19f
16a
the original file is
wire1: 0x'd318
wire2: 0x'd415
wire3: 0x'd362
I want to get the following:
wire1: 0x13e
wire2: 0x19f
wire3: 0x16a
This is only a part of final_output.txt, because the file can contain at least 100 lines, and I pretend to do it using for, but I don't know how to implement it
awk to the rescue!
assuming the part after the single quote will be replaced.
$ awk -v q="'" 'NR==FNR {a[$1]=$2;next}
FNR in a {sub(q".*",a[FNR])}1' <(paste index rep) file
index is the index file, rep is the replacement file, and file is the original data file.
Another solution where file1 contains only the lines, file2 contains the text that is going to be replaced in original file and final_output.txt contains your original text.
for ((i=1;i<=$(wc -l < file1);i++)); do sed -i "$(sed -n "${i}p" file1)s#$(sed -n "$(sed -n "${i}p" file1)p" final_output.txt | grep -oP "'.*")#$(sed -n "${i}p" file2)#g" final_output.txt; done
Output
darby#Debian:~/Scrivania$ cat final_output.txt
wire1: 0x13e
wire2: 0x19f
wire3: 0x16a
darby#Debian:~/Scrivania$

Unix-Read File Line by line.Check if a string exists on another file and do required operation

I need some assistance on the below.
File1.txt
aaa:/path/to/aaa:777
bob:/path/to/bbb:700
ccc:/path/to/ccc:600
File2.txt
aaa:/path/to/aaa:700
bbb:/path/to/bbb:700
ccc:/path/to/ccc:644
I should iterate file2.txt and if aaa exists in File1.txt, then i should compare the file permission. If the file permission is same for aaa in both the files then ignore.
If they are different then write them in the output.txt
So in above case
Output.txt
aaa:/path/to/aaa:700
ccc:/path/to/ccc:644
How can i achieve this in unix shell script? Please suggest
I agree with the comment of #Marc that you should try something before asking here.
However, the following answer is difficult to find when you never have seen the constructions, so I give you something to study.
When you want to parse line by line, you can start with
while IFS=: read -r file path mode; do
comparewith=$(grep "^${file}:${path}:" File2.txt | cut -d: -f3)
# compare and output
done < File1.txt
For large files that will become very slow.
You can first filter the lines you want to compare from File2.txt.
You want to grep strings like aaa:/path/to/aaa:, including the last :. With cut -d: -f1-2 you might be fine with your inputfile, but maybe it is better to remove the last three characters:
sed 's/...$//' File1.txt.
You can let grep use the output as a file with expressions using <():
grep -f <(sed 's/...$//' File1.txt) File2.txt
Your example files don't show the situation when both files have identical lines (that you want to skip), you will need another process substitution to get that working:
grep -v -f File1.txt <(grep -f <(sed 's/...$//' File1.txt ) File2.txt )
Another solution, worth trying yourself, is using awk (see What is "NR==FNR" in awk? for accessing 2 files).
comm - compare two sorted files line by line
According to manual, comm -13 <file1> <file2> must print only lines unique to <file2>:
$ ls
File1.txt File2.txt
$ cat File1.txt
aaa:/path/to/aaa:777
bbb:/path/to/bbb:700
ccc:/path/to/ccc:600
$ cat File2.txt
aaa:/path/to/aaa:700
bbb:/path/to/bbb:700
ccc:/path/to/ccc:644
$ comm -13 File1.txt File2.txt
aaa:/path/to/aaa:700
ccc:/path/to/ccc:644
$ # Nice!
But it doesn't check for lines in <file1> that are "similar" to corresponding lines of <file2>. I. e. it won't work as you want if File1.txt has line BOB:/path/to/BOB:700 and File2.txt has BBB:/path/to/BBB:700 since it will print the latter (while you want it not to be printed).
It also won't do what you want if strings bbb:/path/to/bbb:700 and bbb:/another/path/to/bbb:700 are supposed to be "identical".

Creating a script that checks to see if each word in a file

I am pretty new to Bash and scripting in general and could use some help. Each word in the first file is separated by \n while the second file could contain anything. If the string in the first file is not found in the second file, I want to output it. Pretty much "check if these words are in these words and tell me the ones that are not"
File1.txt contains something like:
dog
cat
fish
rat
file2.txt contains something like:
dog
bear
catfish
magic ->rat
I know I want to use grep (or do I?) and the command would be (to my best understanding):
$foo.sh file1.txt file2.txt
Now for the script...
I have no idea...
grep -iv $1 $2
Give this a try. This is straight forward and not optimized but it does the trick (I think)
while read line ; do
fgrep -q "$line" file2.txt || echo "$line"
done < file1.txt
There is a funny version below, with 4 parrallel fgrep and the use of an additional result.txt file.
> result.txt
nb_parrallel=4
while read line ; do
while [ $(jobs | wc -l) -gt "$nb_parralel" ]; do sleep 1; done
fgrep -q "$line" file2.txt || echo "$line" >> result.txt &
done < file1.txt
wait
cat result.txt
You can increase the value 4, in order to use more parrallel fgrep, depending on the number of cpus and cores and the IOPS available.
With the -f flag you can tell grep to use a file.
grep -vf file2.txt file1.txt
To get a good match on complete lines, use
grep -vFxf file2.txt file1.txt
As #anubhava commented, this will not match substrings. To fix that, we will use the result of grep -Fof file1.txt file2.txt (all the relevant keywords).
Combining these will give
grep -vFxf <(grep -Fof file1.txt file2.txt) file1.txt
Using awk you can do:
awk 'FNR==NR{a[$0]; next} {for (i in a) if (index(i, $0)) next} 1' file2 file1
rat
You can simply do the following:
comm -2 -3 file1.txt file2.txt
and also:
diff -u file1.txt file2.txt
I know you were looking for a script but I don't think there is any reason to do so and if you still want to have a script you can jsut run the commands from a script.
similar awk
$ awk 'NR==FNR{a[$0];next} {for(k in a) if(k~$0) next}1' file2 file1
rat

grep "output of cat command - every line" in a different file

Sorry title of this question is little confusing but I couldnt think of anything else.
I am trying to do something like this
cat fileA.txt | grep `awk '{print $1}'` fileB.txt
fileA contains 100 lines while fileB contains 100 million lines.
What I want is get id from fileA, grep that id in a different file-fileB and print that line.
e.g fileA.txt
1234
1233
e.g.fileB.txt
1234|asdf|2012-12-12
5555|asdd|2012-11-12
1233|fvdf|2012-12-11
Expected output is
1234|asdf|2012-12-12
1233|fvdf|2012-12-11
Getting rid of cat and awk altogether:
grep -f fileA.txt fileB.txt
awk alone can do that job well:
awk -F'|' 'NR==FNR{a[$0];next;}$1 in a' fileA fileB
see the test:
kent$ head a b
==> a <==
1234
1233
==> b <==
1234|asdf|2012-12-12
5555|asdd|2012-11-12
1233|fvdf|2012-12-11
kent$ awk -F'|' 'NR==FNR{a[$0];next;}$1 in a' a b
1234|asdf|2012-12-12
1233|fvdf|2012-12-11
EDIT
add explanation:
-F'|' #| as field separator (fileA)
'NR==FNR{a[$0];next;} #save lines in fileA in array a
$1 in a #if $1(the 1st field) in fileB in array a, print the current line from FileB
for further details I cannot explain here, sorry. for example how awk handle two files, what is NR and what is FNR.. I suggest that try this awk line in case the accepted answer didn't work for you. If you want to dig a little bit deeper, read some awk tutorials.
If the id's are on distinct lines you could use the -f option in grep as such:
cut -d "|" -f1 < fileB.txt | grep -F -f fileA.txt
The cut command will ensure that only the first field is searched for in the pattern searching using grep.
From the man page:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line.
The empty file contains zero patterns, and therefore matches nothing.
(-f is specified by POSIX.)

"while read LINE do" and grep problems

I have two files.
file1.txt:
Afghans
Africans
Alaskans
...
where file2.txt contains the output from a wget on a webpage, so it's a big sloppy mess, but does contain many of the words from the first list.
Bashscript:
cat file1.txt | while read LINE; do grep $LINE file2.txt; done
This did not work as expected. I wondered why, so I echoed out the $LINE variable inside the loop and added a sleep 1, so i could see what was happening:
cat file1.txt | while read LINE; do echo $LINE; sleep 1; grep $LINE file2.txt; done
The output looks in terminal looks something like this:
Afghans
Africans
Alaskans
Albanians
Americans
grep: Chinese: No such file or directory
: No such file or directory
Arabians
Arabs
Arabs/East Indians
: No such file or directory
Argentinans
Armenians
Asian
Asian Indians
: No such file or directory
file2.txt: Asian Naruto
...
So you can see it did finally find the word "Asian". But why does it say:
No such file or directory
?
Is there something weird going on or am I missing something here?
What about
grep -f file1.txt file2.txt
#OP, First, use dos2unix as advised. Then use awk
awk 'FNR==NR{a[$1];next}{ for(i=1;i<=NF;i++){ if($i in a) {print $i} } } ' file1 file2_wget
Note: using while loop and grep inside the loop is not efficient, since for every iteration, you need to invoke grep on the file2.
#OP, crude explanation:
For meaning of FNR and NR, please refer to gawk manual. FNR==NR{a[1];next} means getting the contents of file1 into array a. when FNR is not equal to NR (which means reading the 2nd file now), it will check if each word in the file is in array a. If it is, print out. (the for loop is used to iterate each word)
Use more quotes and use less cat
while IFS= read -r LINE; do
grep "$LINE" file2.txt
done < file1.txt
As well as the quoting issue, the file you've downloaded contains CRLF line endings which are throwing read off. Use dos2unix to convert file1.txt before iterating over it.
Although usng awk is faster, grep produces a lot more details with less effort. So, after issuing dos2unix use:
grep -F -i -n -f <file_containing_pattern> <file_containing_data_blob>
You will have all the matches + line numbers (case insensitive)
At minimum this will suffice to find all the words from file_containing_pattern:
grep -F -f <file_containing_pattern> <file_containing_data_blob>

Resources