find and replace words in files - bash

I have two files: file1 and file2.
Any match in file2 should append "-W" to the word in file1.
File1:
Verb=Applaud,Beg,Deliver
Adjective=Bitter,Salty,Minty
Adverb=Quickly,Truthfully,Firmly
file2:
Gate
Salty
Explain
Quickly
Hook
Deliver
Earn
Jones
Applaud
Take
Output:
Verb=Applaud-W,Beg,Deliver-W
Adjective=Bitter,Salty-W,Minty
Adverb=Quickly-W,Truthfully,Firmly
Tried but not working and may take too long:
for i in `cat file2` ; do
nawk -v DEE="$i" '{gsub(DEE, DEE"-W")}1' file1 > newfile
mv newfile file1
done

This should work:
sed 's=^=s/\\b=;s=$=\\b/\&-W/g=' file2 | sed -f- file1
Output:
Verb=Applaud-W,Beg,Deliver-W
Adjective=Bitter,Salty-W,Minty
Adverb=Quickly-W,Truthfully,Firmly
To make changes in place:
sed 's=^=s/\\b=;s=$=\\b/\&-W/g=' file2 | sed --in-place -f- file1

Your approach was not that bad but I would prefer sed here, since it has an in place option.
while read i
do
sed -i "s/$i/$i-W/g" file1
done < file2

Here is one using pure bash:
#!/bin/bash
while read line
do
while read word
do
if [[ $line =~ $word ]]; then
line="${line//$word/$word-W}"
fi
done < file2
echo $line
done < file1

An awk:
awk 'BEGIN{FS=OFS=",";RS="=|\n"}
NR==FNR{a[$1]++;next}
{
for (i=1;i<=NF;i++){
$i=($i in a) ? $i"-W":$i
}
printf("%s%s",$0,FNR%2?"=":"\n")
}' file2 file1
Results
Verb=Applaud-W,Beg,Deliver-W
Adjective=Bitter,Salty-W,Minty
Adverb=Quickly-W,Truthfully,Firmly

Related

Estimate number of lines in a file and insert that value as first line

I have many files for which I have to estimate the number of lines in each file and add that value as first line. To estimate that, I used something like this:
wc -l 000600.txt | awk '{ print $1 }'
However, no success on how to do it for all files and then to add the value corresponding to each file as first line.
An example:
a.txt b.txt c.txt
>>print a
15
>> print b
22
>>print c
56
Then 15, 22 and 56 should be added respectively to: a.txt b.txt and c.txt
I appreciate the help.
You can add a pattern for example (LINENUM) in first line of file and then use the following script.
wc -l a.txt | awk 'BEGIN {FS =" ";} { print $1;}' | xargs -I {} sed -i 's/LINENUM/LINENUM:{}/' a.txt
or just use from this script:
wc -l a.txt | awk 'BEGIN {FS =" ";} { print $1;}' | xargs -I {} sed -i '1s/^/LINENUM:{}\n/' a.txt
This way you can add the line number as the first line for all *.txt files in current directory. Also using that group command here would be faster than inplace editing commands, in case of large files. Do not modify spaces or semicolons into the grouping.
for f in *.txt; do
{ wc -l < "$f"; cat "$f"; } > "${f}.tmp" && mv "${f}.tmp" "$f"
done
For iterate over the all file you can add use from this script.
for f in `ls *` ; do if [ -f $f ]; then wc -l $f | awk 'BEGIN {FS =" ";} { print $1;}' | xargs -I {} sed -i '1s/^/LINENUM:{}\n/' $f ; fi; done
This might work for you (GNU sed):
sed -i '1h;1!H;$!d;=;x' file1 file2 file3 etc ...
Store each file in memory and insert the last lines line number as the file size.
Alternative:
sed -i ':a;$!{N;ba};=' file?

bash to get information from two files

I have file1:
NM_000014 A2M
NM_000015 NAT2
NM_000016 ACADM
NM_000017 ACADS
NM_000018 ACADVL
NM_000019 ACAT1
NM_000020 ACVRL1
NM_000021 PSEN1
NM_000022 ADA
And file2:
NM_000019
NM_000020
NM_000020
NM_12345
I need to get information from my file1 and put it to file2 - so create file3:
NM_000019 ACAT1
NM_000020 ACVRL1
NM_000020 ACVRL1
NM_12345 NO
Note - I can not change a original sort order (so not use comm and diff). I have duplication line in file2 - this I need keep (wc -l file2 == wc -l file3). If there is no match - print NO
I have about 70K rows and I do not need fastest solution.
My code is able just compare and print the same results.
code:
#!/bin/bash
while read -r c; do
grep $c file1 | uniq
done < file2 > file3
Using awk:
$ awk 'NR==FNR{a[$1]=$2;next} {print ($1 in a?$1 OFS a[$1]:$1 OFS "NO")}' file1 file2
NM_000019 ACAT1
NM_000020 ACVRL1
NM_000020 ACVRL1
NM_12345 NO
Explained:
NR==FNR{ # process the first file
a[$1]=$2 # hash records to a, $1 as key
next # skip to next record
}
{ # process the second file
print ($1 in a?$1 OFS a[$1]:$1 OFS "NO") # print hashed value if found or NO
# if($1 in a) # another way of saying above
# print $1, a[$1]
# else
# print $1, "NO"
}
So basically you have one file with patterns, and a second one that you want to search using those patterns:
#!/bin/bash
for PATTERN in $(cat $2); do
TMP=$(egrep $PATTERN $1)
if [ ! -z "$TMP" ]; then
echo "$TMP"
else
echo "$PATTERN NO"
fi
done
and a quick test:
$ bash filter.sh file1 file2
NM_000019 ACAT1
NM_000020 ACVRL1
NM_000020 ACVRL1
NM_12345 NO
Try with this if sentence added to your code:
if ! grep -q $i fileone ; then
echo -e $i " NO"
fi
For example:
#!/bin/bash
while read -r c; do grep $c fileone | uniq; done < filetwo
for i in $(cat filetwo)
do
if ! grep -q $i fileone ; then
echo -e $i " NO"
fi
done
It will print NO in case of no matches of a line of file2 in file1.
Try paste command. This is less noble form than awk. I prefer awk but paste command should help you.
paste file1 file2 file3... etc ..fileN
You can redirect command output to a file as usual.
paste file1 file2 file3... etc ..fileN > fileN+1 (or whatever)
Thats read files line by line and paralelize output sequentially way.
That's it. It is not very elegant but sometimes it is very useful until you find a different way to get the results you are looking for.
Hope that helps

Print all lines in "file2" which have line number stored in "file1" $2

File1:
count line_num
xy 55
ab 67
File2:
a|b|c
d|e|f
I want to print 55, 67 line numbers of file2
am trying:
#!/usr/bin/ksh
while read file_name; do
line_num=`echo $file_name | awk '{print $2}'`
awk 'NR==$line_num{print;exit}' file2 >> file3.txt
done < file1
but it's not working!
Using awk you can do:
awk 'NR==FNR{line[$2]; next} FNR in line' file1 file2
We iterate the first file and store second column in a map called line (we could ignore the first line which is the header by doing NR>1 but since it doesn't contain numbers we don't need to). Once the first file is loaded in map, we iterate the second file and print out lines that are in our map. NR and FNR are awk variables that remembers the line numbers.
You can use awk to read the line numbers in a loop and sed to print out the specific lines:
while read a; do sed -n ${a}p f2.txt; done < <(awk 'NR>1{print$2}' f1.txt)
If you have a bigger file, performance can be an issue as Ed pointed out, in that case you can use awk alone:
awk 'NR==FNR{if(NR>1)l[$2]=1;next}{if(l[FNR])print $0}' f1.txt f2.txt
Another way, is to use xargs:
awk 'NR>1{print $2}' f1.txt | xargs -n1 -I {} sed -n {}p f2.txt
Use sed to construct a sed one-liner (in the case of file1 it'd output and run sed -n "55p;67p;" file2):
sed -n "$(sed -n '2~1{s/.* //;s/.*/&p/p}' file1)" file2
A good advertisement for awk, alas!

delete all lines from a file using string from another file with sed

I have a file1:
hello
world
And file2:
hello A
hello X
world B
byebye C
And I want to delete all lines that match the string in file1 from file2, to get this output:
byebye C
I am a beginner programming so I could only come up with this:
for i in {1..2}
do p=`sed -n ${i}p file1`
sed '/$p\t\w/d' file2 > file2.tmp && mv file.tmp file2
done
Thanks for the help!
Using awk:
awk 'FNR==NR{a[$0];next} !($1 in a)' f1 f2
byebye C
Using grep -vf
grep -vFf f1 f2
byebye C
You can use this grep command
grep -vf file1 file2 > file2
Your coding:
for i in {1..2}
do
p=`sed -n ${i}p file1`
sed -i "/^$p/d" file2
done
Why didnt you code work
sed does not understand \w
solution use -r flag for extended regex
'/$p\t\w/d' the variables are not expanded in single quotes
solution use double quotes instead
Corrected
$for i in {1..2} ;
do
p=`sed -n ${i}p file1`
sed -r "/$p\t\w/d" file2 > file2.tmp && mv file2.tmp file2
done
$ cat file2
byebye C
This will provide output as
byebye C
Note
Offcourse this provides the expected output, But there is always some other easy ways of accomplishing the task as in other answers

How can I print the lines of four files together?

I have four files and i want to print the 1st line of file1, file2, file3, file4 , then the second line of file1,file2,file3,file4, and then the 3rd line of each file and so on
I tried the following code but it gave me an error:
for i in $(cat $file1)
do
for j in $(cat $file2)
do
for k in $(cat $file3)
do
for l in $(cat $file4)
echo "${i}"
echo "${j}"
echo "${k}"
echo "${l}"
done
done
done
done
so what can i use other than echo ?
There is s tool for that already.
paste "$file1" "$file2" "$file3" "$file4"
Use paste -d $'\n' if you don't want columnar output. (Thanks, #AnsgarWiechers!)
Use paste.
paste file1 file2 file3 file4
Will this do it for you?
paste -d '\n' file1 file2 file3 ...
If you want the contents the files on one line:
paste file1 file2 file3 ...

Resources