add ### at the beginning of a file if there is a match with the content of a list of strings in another file - shell

I have a file with some strings, I need to grep these strings in another file and if match add ### at the beginnig of the line that match.
Assuming this file (1.txt) the file with strings:
123
456
789
and this one the file (2.txt) where to perform the add of the ###:
mko 123 nhy
zaq rte vfr
cde nbv 456
789 bbb aaa
ooo www qqq
I'm expecting this output:
###mko 123 nhy
zaq rte vfr
###cde nbv 456
###789 bbb aaa
ooo www qqq
I've already tried the following without success:
cat 1.txt |while read line ; do sed '/^$line/s/./###&/' 2.txt >2.txt.out; done

With your shown samples please try following awk code.
awk '
FNR==NR{
arr[$0]
next
}
{
for(i=1;i<=NF;i++){
if($i in arr){
$0="###" $0
break
}
}
}
1
' 1.txt 2.txt
Explanation: Adding detailed explanation here.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition when 1.txt is being read.
arr[$0] ##Creating array arr with index of current line.
next ##next will skip all further all statements from here.
}
{
for(i=1;i<=NF;i++){ ##Traversing through all fields from here.
if($i in arr){ ##Checking if current field is present in arr then do following.
$0="###" $0 ##Adding ### before current line.
break;
}
}
}
1 ##Printing current edited/non-edited line here.
' 1.txt 2.txt ##Mentioning Input_file names here.

This might work for you (GNU sed):
sed 's#.*#/&/s/^#*/###/#' file1 | sed -f - file2
Create a sed script from file1 and run it against file2.

$ while read -r line; do sed -i "/\<$line\>/s/^/###/" 2.txt; done < 1.txt
$ cat 1.txt
###mko 123 nhy
zaq rte vfr
###cde nbv 456
###789 bbb aaa
ooo www qqq

Related

Searching a string and replacing another string above the searched string

I have a file with the lines below
123
456
123
789
abc
efg
xyz
I need to search with abc and replace immediate above 123 with 111. This is the requirement, abc is only one occurrence in the file but 123 can be multiple occurrences and 123 can be at any position above abc.
Please help me.
I have tried with below sed command
sed -i.bak "/abc/!{x;1!p;d;};x;s/123/1111" filename
With the above command, it is only replacing 123, if 123 is just above abc, if 123 is 2 lines above abc then replace is failing.
There's more than on way to do it. Here's one:
sed -i.bak '1{h;d;};/123/{x;p;d;};/abc/{x;s/123/111/;p;d;};H;${x;p;};d' filename
ed comes in handy for complex editing of files in scripts:
ed -s file <<EOF
/^abc$/;?^123$?;.c
111
.
w
EOF
This: Sets the current line to the first one matching abc (/^abc$/;). Then changes the first line before that point that matches 123 to 111 (?XXX? searches backwards for a matching regular expression, and ?^123$?;. selects that single line for c to change) and finally saves the modified file.
This is a classic case where you keep track of your previous line and change stuff depeinding on conditions satisfying the current line. Genearlly, an awk program looks like this:
awk '(FNR==1){prev=$0; next}
(condition_on_$0) { action_on_prev }
{ print prev; prev = $0 }
END { print $0 }'
So in the case of the OP, this would read:
awk '(FNR==1){prev=$0; next}
$0 == "abc" { if (prev == "123") prev = "111" }
{ print prev; prev = $0 }
END { print $0 }'
This might work for you (GNU sed):
sed -Ez 's/(.*)(\n123.*\nabc)/\1\n111\2/' file
This slurps the file into memory and inserts 111 in front of the last occurrence of 123 before abc.
A less memory intensive solution:
sed -E '/^123$/{:a;N;/\n123$/{h;s///p;g;s/.*\n//;ba};/\nabc$/!ba;s/^/111\n/}' file
This gathers up lines following a line containing 123. If another line containing 123 is encountered it offloads all lines before it and begins gathering lines again. If it finds a line containing abc it inserts 111 at the front of the lines gathered so far.
Another alternative:
sed '/abc/{x;/./{s/^/111\n/p;z};x;b};/123/{x;/./p;x;h;$!d;b};x;/./{x;H;$!d};x' file
$ tac file | awk 'f && sub(/123/,"111"){f=0} /abc/{f=1} 1' | tac
123
456
111
789
abc
efg
xyz

apply dictionary mapping to the column of a file with awk

I have a text file file.txt with several columns (tab separated), and the first column can contain indexes such as 1, 2, and 3. I want to update the first column so that 1 becomes "one", 2 becomes "two", and 3 becomes "three". I created a bash file a.sh containing:
declare -A DICO=( [1]="one" [2]="two" [3]="three" )
awk '{ $1 = ${DICO[$1]}; print }'
But now when I run cat file.txt | ./a.sh I get:
awk: cmd. line:1: { $1 = ${DICO[$1]}; print }
awk: cmd. line:1: ^ syntax error
I'm not able to fix the syntax. Any ideas? Also there is maybe a better way to do this with bash, but I could not think of another simple approach.
For instance, if the input is a file containing:
2 xxx
2 yyy
1 zzz
3 000
4 bla
The expected output would be:
two xxx
two yyy
one zzz
three 000
UNKNOWN bla
EDIT: Since OP had now added samples so changed solution as per that now.
awk 'BEGIN{split("one,two,three",array,",")} {$1=$1 in array?array[$1]:"UNKONW"} 1' OFS="\t" Input_file
Explanation: Adding explanation for above code too now.
awk '
BEGIN{ ##Starting BEGIN block of awk code here.
split("one,two,three",array,",") ##Creating an array named array whose values are string one two three with delimiter as comma.
}
{
$1=$1 in array?array[$1]:"UNKOWN" ##Re-creating first column which will be if $1 comes in array then its value will be aray[$1] else it will be UNKOWN string.
}
1 ##Mentioning 1 here. awk works on method of condition then action, so making condition is TRUE here and not mentioning any action so by default print of current line will happen.
' Input_file ##mentioning Input_file name here.
Since you haven't shown samples so couldn't tested completely, could you please try following and let me know if this helps.
awk 'function check(value){gsub(value,array[value],$1)} BEGIN{split("one,two,three",array,",")} check(1) check(2) check(3); 1' Input_file
Adding a non-one liner form of solution too here.
awk '
function check(value){
gsub(value,array[value],$1)
}
BEGIN{
split("one,two,three",array,",")
}
check(1)
check(2)
check(3);
1' OFS="\t" Input_file
Tested code as follows too:
Let's say we have following Input_file:
cat Input_file
1213121312111122243434onetwothree wguwvrwvrwvbvrwvrvr
vkewjvrkmvr13232424
Then after running the code following will be the output:
onetwoonethreeonetwoonethreeonetwooneoneoneonetwotwotwo4three4three4onetwothree wguwvrwvrwvbvrwvrvr
vkewjvrkmvronethreetwothreetwo4two4
Given a dico file containing this:
$ cat dico
1 one
2 two
3 three
You could use this awk script:
awk 'NR==FNR{a[$1]=$2;next}($1 in a){$1=a[$1]}1' dico file.txt
This fills the array a with the content of the dico file and replaces the first element of the file.txt file if this one is part of the array.

bash shell-How to swift a file specific content with another file after a match pattern found

>cat file1.txt
aa bb
ccc dd
ee fff
>cat file2.txt
1
2
3
I want to get the result like below:
aa1bb
ccc2dd
ee3fff
The space in file1.txt will be replaced by number in file2.txt.
paste + awk approach:
paste file1.txt file2.txt | awk '{ print $1$3$2 }'
The output:
aa1bb
ccc2dd
ee3fff
A straight forward way by awk,
$ awk 'NR==FNR{a[NR]=$0;next}{sub(/\ /,a[FNR])}1' file2 file1
aa1bb
ccc2dd
ee3fff
Brief explanation,
NR==FNR{a[NR]=$0;next}: store each record in file2 to array a
sub(/\ /,a[FNR]): substitute the space by a[FNR] in file2, where FNR would be the record number in the file2.
Appended 1would print each processed line in file2
This works:
$ paste <(cut -d " " -f1 file1.txt) file2.txt <(cut -d " " -f2 file1.txt) | tr -d $'\t'
aa1bb
ccc2dd
ee3fff
with a bash while-read loop
while read -u3 a b; read -u4 n; do
echo "$a$n$b"
done 3<file1.txt 4<file2.txt

Copy files containing all lines of an input file

I want to copy files in a directory which contain all the lines of an inputFile. Here is an example:
inputFile
Line3
Line1
LineX
Line4
LineB
file1
Line1
Line2
LineX
LineB
file2
Line100
Line10
LineB
Line4
LineX
Line3
Line1
Line4
Line1
The script is expected to copy only file2 to a destination directory since all lines of the inputFile are found in file2 but not in file1.
I could compare individual file with inputFile as discussed partly here and copy files manually if script produced no output. That is;
awk 'NR==FNR{a[$0];next}!($0 in a)' file1 inputFile
Line3
Line4
awk 'NR==FNR{a[$0];next}!($0 in a)' file2 inputFile
warranting no need to copy file1; however, replacing file2 will produce no result indicating all lines of inputFile are found in file2; so do a cp file2 ../distDir/.
This will be time taking and hope there will be some way I could do it in a for loop. I am not particular about awk, any bash scripting tool can be used.
Thank you,
Assuming the following:
All the files you need to check are in the current directory
The base file is also in the current directory and named inputFile
The target path is ../distDir/
You may run a BASH script like the following which basically loops over all the files, compares them against the base file and copies them if required.
#!/bin/bash
inputFile="./inputFile"
targetDir="../distDir/"
for file in *; do
dif=$(awk 'NR==FNR{a[$0];next}!($0 in a)' $file $inputFile)
if [ "$dif" == "" ]; then
# File contains all lines, copy
cp $file $targetDir
fi
done
bash (with comm + wc commands) solution:
#!/bin/bash
n=$(wc -l inputFile | cut -d' ' -f1) # number of lines of inputFile
for f in /yourdir/file*
do
if [[ $n == $(comm -12 <(sort inputFile) <(sort "$f") | wc -l | cut -d' ' -f1) ]]
then
cp "$f" "/dest/${f##*/}"
fi
done
comm -12 FILE1 FILE2 - output only lines that appear in both files
Could you please try following and let me know if this helps you.
I have written "echo cp " val " destination_path" in system, so you could remove echo from it and put destination_path's actual value too once you are happy with echo result(which will simply print eg--> cp file2 destination_path)
awk 'function check(array,val,count){
if(length(array)==count){
system("echo cp " val " destination_path")
}
}
FNR==NR{
a[$0];
next
}
val!=FILENAME{
check(a,val,count)
}
FNR==1{
val=FILENAME;
count=total="";
delete b
}
($1 in a) && !b[$1]++{
count++
}
END{
check(a,val,count)
}
' Input_file file1 file2
Will add explanation shortly too.
EDIT1: As per OP file named which should be compared by Input_file could be anything so changed code as per that request.
find -type f -exec awk 'function check(array,val,count){
if(length(array)==count){
system("echo cp " val " destination_path")
}
}
FNR==NR{
a[$0];
next
}
val!=FILENAME{
check(a,val,count)
}
FNR==1{
val=FILENAME;
count=total="";
delete b
}
($1 in a) && !b[$1]++{
count++
}
END{
check(a,val,count)
}
' Input_file {} +
Explanation: Adding explanation too as follows.
find -type f -iname "file*" -exec awk 'function check(array,val,count){ ##Using find command to get only the files in a directory, using exec passing their values to awk too.From here awk code starts, creating a function named check here, which will have parameters array,val and count to be passed into it, whenever a call is being made to it.
if(length(array)==count){ ##Checking here if length of array is equal to variable count, if yes then do following action.
system("echo cp " val " destination_path")##Using awks system function here by which we could execute shell commands in awk script, so I have written here echo to only check purposes initially, it will print copy command if any files al lines are matching to Input_file file, if OP is happy with it OP should remove echo then.
}
}
FNR==NR{ ##FNR==NR condition will be only TRUE when very first file named Input_file is being read.
a[$0]; ##creating an array named a whose index is current line.
next ##using next keyword will skip all further statements.
}
val!=FILENAME{ ##checking here when variable val is not having same value as current file name then perform following actions.
check(a,val,count) ##calling check function with passing arguments of array a,val,count.
}
FNR==1{ ##Checking if FNR==1, which will be true whenever a new files first line is being read.
val=FILENAME; ##creating variable named val whose value is current Input_file filename.
count=total=""; ##Nullifying variables count and total now.
delete b ##Deleting array b here.
}
($1 in a) && !b[$1]++{ ##Checking if first field of file is in array a and it is not present more than 1 time in array b then do following
count++ ##incrementing variable named count value to 1 each time cursor comes inside here.
}
END{ ##starting awk END block here.
check(a,val,count) ##Calling function named check with arguments array a,val and count in it.
}
' Input_file {} + ##Mentioning Input_file here
PS: I tested/written this in GNU awk.

In a text file replace patterns with other patterns according to another file

Example 1
Let's say I have in file1.txt this:
line 1
line 45
line 3
line 2
line 24
line 1
And in file2.txt this instead:
line 1,WWWW
line 2,EEE
line 3,RRR
What I would like is something that looks into file2.txt, search all the terms before the , and replace them with the terms after the , in file1.txt. I want all the lines not present in file2.txt to be ignored, the order preserved.
So, the expected output should be, file1.txt:
WWW
line 45
RRR
EEE
line 24
WWW
Example 2
Now, another example to a different need:
file1.txt:
line1 1
line22 78
line32 65
line3 3
line2 2
line2 2
file2.txt:
line1 1,SONG1 playing: X | NAME1
line2 2,SONG2 playing: Y | NAME2
line3 3,SONG3 playing: Z | NAME3
Expected output should be:
SONG1 playing: X | NAME1
line22 78
line32 65
SONG3 playing: Z | NAME3
SONG2 playing: Y | NAME2
SONG2 playing: Y | NAME2
And keep in mind that the script contains hundreds and hundreds of lines (5+ MB worth of text).
EDIT2: since OP added more scenario in question to adding this code to cover that now.
awk 'FNR==NR{val=$1;$1="";sub(/^ +/,"");a[val]=$0;next} $0 in a{$0=a[$0]} 1' FS=',' file2.txt FS=' ' file1.txt
Output will be as follows.
SONG1 playing: X | NAME1
line22 78
line32 65
SONG3 playing: Z | NAME3
SONG2 playing: Y | NAME2
SONG2 playing: Y | NAME2
EDIT: Since OP changed sample of Input_file and expected output so adding this solution now.
awk '
FNR==NR{
a[$1 OFS $2]=$NF
next
}
a[$0]{
$0=a[$0]
}
1
' FS='[, ]' file2.txt FS=" " file1.txt
While running code following will be output:
awk '
FNR==NR{
a[$1 OFS $2]=$NF
next
}
a[$0]{
$0=a[$0]
}
1
' FS='[, ]' file2.txt FS=" " file1.txt
WWWW
line 45
RRR
EEE
line 24
WWWW
Could you please try following.
awk '
FNR==NR{
a[$1 OFS $2]=$NF
next
}
a[$0]{
print a[$0]
}
' FS='[, ]' file2.txt FS=" " file1.txt > temp && mv temp file1.txt
Explanation: Adding explanation for above code.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file2.txt is being read.
a[$1 OFS $2]=$NF ##Creating an array named a whose index is $1 OFS $2 and value is $NF.
next ##next will skip further statements from here.
}
a[$0]{ ##Checking condition if a[$0] array a whose index $0 is NOT NULL then do following.
print a[$0] ##Printing value of array a with index $0.
}
' FS='[, ]' file2.txt FS=" " file1.txt > temp && mv temp file1.txt ##Setting FS as comma OR space for file2.txt AND setting FS as space for file1.txt
join -t , file1.txt file2.txt
Close! Specify the output format.
join -t , -o 2.2 file1.txt file2.txt
Remember that input files need to be sorted for join. If the files are not sorted we can use process substitution in bash to easly sort them on the first field:
join -t, -o2.2 <(sort -t, -k1 file1.txt) <(sort -t, -k1 file2.txt)
Tested on repl.
If you want to preserve the order of the files, then it get's a little bit harder. You have to number the lines, sort on the joining field, join, re-sort on the line numbers and remove line numbers. Let's preserve the sorting order of file2.txt below:
# number lines in file with comma as a separator
nl -w1 -s, file2.txt |
# sort the file on second field
sort -t, -k2 |
# join files on first field from file1.txt, but now on second field from file2
# output only the first and third field from file2.txt
join -11 -22 -t, -o2.1,2.3 <(sort -t, -k1 file1.txt) - |
# re-sort on the initial order from file2.txt
sort -t, -k1 |
# remove the line numbers
cut -d, -f2-
Which is not really nice. That's why awk solution is usually preferred in non-extreme cases.
what if there were multiple lines inbetween not present in file2.txt?
join outputs the lines that match. Non-matched lines are not outputted by join. You can edit the behavior with -a or -e or -v options in join.

Resources