I have two files:
abc
ghi
and the second (aka database file)
abc 123
def 456
ghi 789
and I want to query the database file to print the second column into the second column of the first file if there is a match
So my output would be
abc 123
ghi 789
logically, I understand what I have to do, but I lack the commands in bash for it...
my attempt was to use join with the -1 but I do not understand how to implement it...
what's wrong with join?
$ cat 1
abc
ghi
$ cat 2
abc 123
def 456
ghi 789
$ join 1 2
abc 123
ghi 789
then if you want to store it somewhere just redirect the stdout.
join is a little overkill here (as it requires sorting) because file1 has just one column. Can you not use grep -f?
grep -Fwf file1 file2
-F treats the content of file1 as strings, not patterns
-w looks for the whole word to match
Related
I have a following file and i would like to swap the values which are in format of digits(up to 3 digits)#digits(up to 4 digits) followed by # or space/end of line.
If # is followed by non digits then it shouldn't interchange them.
Sample Input
cat file1
xyz xyz xyz 123#456#1#34#123#2
xyz xyz xyz xyz xyz
test test
123#456#1#34#123#212#3#456#1#34#123#2#123#xyzxyz xyz
xyz xyz xyz
Sample output:
xyz xyz xyz 456#123#34#1#2#123
xyz xyz xyz xyz xyz
test test
456#123#34#1#212#123#456#3#34#1#2#123#123#xyzxyz xyz
xyz xyz xyz
Have tried the following logic, seems like split is required in order interchange the values, but not able to check the condition along with how to save this in same field
awk '{for(a=1;a<=NF;a++){if($a~/#/){split($a,b,"[##]");val1=b[1];val2=b[2];print val1,val2}}}' file1
123 456
123 456
This simple gnu sed should be able to do the job:
sed -E 's/\<([0-9]{1,3})#([0-9]{1,4})(#|$)/\2#\1\3/g' file
xyz xyz xyz 456#123#34#1#2#123
xyz xyz xyz xyz xyz
test test
456#123#34#1#212#123#456#3#34#1#2#123#123#xyzxyz xyz
xyz xyz xyz
Here, \< is used for word boundary.
Note that on BSD sed you have to use [[:<:]] for word boundary:
sed -E 's/[[:<:]]([0-9]{1,3})#([0-9]{1,4})(#|$)/\2#\1\3/g' file
Explanation:
\<: Word boundary
([0-9]{1,3}): Match 1 to 3 digits
#: Match a #
([0-9]{1,4}): Match 1 to 4 digits
(#|$): Match a # or end of line
With your shown samples, could you please try following. Written and tested in GNU awk.
awk -v RS='([0-9]{1,3}#[0-9]{1,4}#)+[0-9]{1,3}#[0-9]{1,4}' '
{
val=""
delete arr
delete arr2
num=split(RT,arr,"#")
for(i=1;i<=num;i++){
valTemp=""
split(arr[i],arr2,"#")
valTemp=arr2[2]"#"arr2[1]
val=(val?val "#":"")valTemp
}
ORS=val
}
1
' Input_file
Using GNU sed for the word \boundary:
sed -E 's/\b([[:digit:]]{1,3})#([[:digit:]]{1,4})(#|[[:blank:]]*|[[:blank:]]*$)/\2#\1\3/g' infile
Input:
xyz123#456#1#34#1234#2
0123#456# 123#456#
123#456#1#34#123#212#3#456#1#34#123#2#123#xyzx
5678#124 111#110# 002#001 01#010 1111#000
1111#000
Output:
xyz123#456#34#1#1234#2
0123#456# 456#123#
456#123#34#1#212#123#456#3#34#1#2#123#123#xyzx
5678#124 110#111# 001#002 010#01 1111#000
1111#000
I have a file with the lines below
123
456
123
789
abc
efg
xyz
I need to search with abc and replace immediate above 123 with 111. This is the requirement, abc is only one occurrence in the file but 123 can be multiple occurrences and 123 can be at any position above abc.
Please help me.
I have tried with below sed command
sed -i.bak "/abc/!{x;1!p;d;};x;s/123/1111" filename
With the above command, it is only replacing 123, if 123 is just above abc, if 123 is 2 lines above abc then replace is failing.
There's more than on way to do it. Here's one:
sed -i.bak '1{h;d;};/123/{x;p;d;};/abc/{x;s/123/111/;p;d;};H;${x;p;};d' filename
ed comes in handy for complex editing of files in scripts:
ed -s file <<EOF
/^abc$/;?^123$?;.c
111
.
w
EOF
This: Sets the current line to the first one matching abc (/^abc$/;). Then changes the first line before that point that matches 123 to 111 (?XXX? searches backwards for a matching regular expression, and ?^123$?;. selects that single line for c to change) and finally saves the modified file.
This is a classic case where you keep track of your previous line and change stuff depeinding on conditions satisfying the current line. Genearlly, an awk program looks like this:
awk '(FNR==1){prev=$0; next}
(condition_on_$0) { action_on_prev }
{ print prev; prev = $0 }
END { print $0 }'
So in the case of the OP, this would read:
awk '(FNR==1){prev=$0; next}
$0 == "abc" { if (prev == "123") prev = "111" }
{ print prev; prev = $0 }
END { print $0 }'
This might work for you (GNU sed):
sed -Ez 's/(.*)(\n123.*\nabc)/\1\n111\2/' file
This slurps the file into memory and inserts 111 in front of the last occurrence of 123 before abc.
A less memory intensive solution:
sed -E '/^123$/{:a;N;/\n123$/{h;s///p;g;s/.*\n//;ba};/\nabc$/!ba;s/^/111\n/}' file
This gathers up lines following a line containing 123. If another line containing 123 is encountered it offloads all lines before it and begins gathering lines again. If it finds a line containing abc it inserts 111 at the front of the lines gathered so far.
Another alternative:
sed '/abc/{x;/./{s/^/111\n/p;z};x;b};/123/{x;/./p;x;h;$!d;b};x;/./{x;H;$!d};x' file
$ tac file | awk 'f && sub(/123/,"111"){f=0} /abc/{f=1} 1' | tac
123
456
111
789
abc
efg
xyz
File A:
abc
bcd
def
ghi
jkl
File B:
bcd
def
klm
Desired output:
abc
bcd
def
klm
ghi
jkl
Give this awk one-liner a try:
awk '!a[$0]++' fileA fileB > output
It works for your example files.
cat A B | sort -u will remove the repeated ones and do sorts, #Kent 's anwser is more elegant, but still, the output doesn't satisfy your description.
I am given a file. If a line has "xxx" as its third word then I need to replace it with "yyy". My final output must have all the original lines with the modified lines.
The input file is-
abc xyz mno
xxx xyz abc
abc xyz xxx
abc xxx xxx xxx
The required output file should be-
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
I have tried-
grep "\bxxx\b" file.txt | awk '{if ($3=="xxx") print $0;}' | sed -e 's/[^ ]*[^ ]/yyy/3'
but this gives the output as-
abc xyz yyy
abc xxx yyy xxx
Following simple awk may help you in same.
awk '$3=="xxx"{$3="yyy"} 1' Input_file
Output will be as follows.
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
Explanation: Checking condition here if $3 3rd field is equal to string xxx then setting $3's value to string yyy. Then mentioning 1 there, since awk works on method of condition then action. I am making condition TRUE here by mentioning 1 here and NOT mentioning any action here so be default print of current line will happen(either with changed 3rd field or with new 3rd field).
sed solution:
sed -E 's/^(([^[:space:]]+[[:space:]]+){2})apathy\>/\1empathy/' file
The output:
abc xyz mno
apathy xyz abc
abc xyz empathy
abc apathy empathy apathy
To modify the file inplace add -i option: sed -Ei ....
In general the awk command may look like
awk '{command set 1}condition{command set 2}' file
The command set 1 would be executed for every line while command set 2 will be executed if the condition preceding that is true.
My final output must have all the original lines with the modified
lines
In your case
awk 'BEGIN{print "Original File";i=1}
{print}
$3=="xxx"{$3="yyy"}
{rec[i++]=$0}
END{print "Modified File";for(i=1;i<=NR;i++)print rec[i]}'file
should solve that.
Explanation
$3 is the the third space-delimited field in awk. If it matches "xxx", then it is replaced. Print the unmodified lines first while storing the modified lines in an array. At the end, print the modified lines. BEGIN and END blocks are executed only at the beginning and the end respectively. NR is the awk built-in variable which denotes that number of records processed till the moment. Since it is used in the END block it should give us the total number of records.
All good :-)
Ravinder has already provided you with the shortest awk solution possible.
In sed, the following would work:
sed -E 's/(([^ ]+ ){2})xxx/\1yyy/'
Or if your sed doesn't include -E, you can use the more painful BRE notation:
sed 's/\(\([^ ][^ ]* \)\{2\}\)xxx/\1yyy/'
And if you're in the mood to handle this in bash alone, something like this might work:
while read -r line; do
read -r -a a <<<"$line"
[[ "${a[2]}" == "xxx" ]] && a[2]="yyy"
printf '%s ' "${a[#]}"
printf '\n'
done < input.txt
Suppose I have a vi file as the following:
cat file1
abc 123 pqr
lmn 234 rst
jkl 100 mon
I want to take the 2nd field of each line (viz, in this case is 123, 234 and 100) and append it to the end of that same line.
How will I do that?
The output should look like the following:
abc 123 pqr 123
lmn 234 rst 234
jkl 100 mon 100
With awk:
$ awk '{NF=NF+1; $NF=$2}1' file
abc 123 pqr 123
lmn 234 rst 234
jkl 100 mon 100
It increments the number of field in one and sets the last one as the 2nd. Then 1 is a true condition, which is evaluated as the default awk behaviour: {print $0}.
Or also
awk '{print $0, $2}' file
It prints the full line plus the second field.
Or even shorter, thanks Håkon Hægland!:
awk '{$(NF+1)=$2}1' file
You have many ways to do that in Vi(m). This is the simplest that comes to my mind:
:%norm 0f<space>yaw$p
Explanation:
:{range}norm command executes normal mode command on each line in {range}
% is a shortcut range meaning "all lines in the buffer" so we will execute what follows on every line in the buffer
0 puts the cursor on the first column on the current line (not strictly necessary but good practice)
f<space> jumps the cursor on the first <space> after the cursor on the current line
yaw yanks the word and the <space> under the cursor
$ jumps to the end of the line
p pastes the previously yanked text
prompt with mark, you can do it in vi
:%s/\( [^ ]*\)\(.*\)/\1\2\1/
Another way, Using sed
sed -r 's/( [^ ]*)(.*)/\1\2\1/' file