How can I replace lines in a text file with lines from another file based on matching key fields? - bash

input.txt
1,Ram,Fail
2,John,Fail
3,Ron,Success
param.txt (New Input)
1,Sam,Success
2,John,Sucess
Now i want to replace the whole line in input.txt with those present in param.txt .
1st column will act like a primary key.
Output.txt
1,Sam,Success
2,John,Sucess
3,Ron,Success
I tried as
awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' input.txt param.txt > Output.txt
But it is merging the file contents.

This might work for you (GNU sed):
sed 's|^\([^,]*,\).*|/^\1/c\\&|' param.txt | sed -f - input.txt
Explanation:
Convert param.txt into a sed script using the first field as an address to change the line in the input.txt. s|^\([^,]*,\).*|/^\1/c\\&|
Run the script against the input.txt. sed -f - input.txt

This can be done with one call to sort:
sort -t, -k1,1n -us param.txt input.txt
Use a stable numerical sort on the first comma-delimited field, and list param.txt before input.txt so that the correct, newer, lines are preferred when eliminating duplicates.

You could use join(1) to make this work:
$ join -t, -a1 -j1 Input.txt param.txt | sed -E 's/,.*?,.*?(,.*?,.*?)/\1/'
1,Sam,Success
2,John,Sucess
3,Ron,Success
sed as a pipe tail strips fields from Input.txt out of replaced lines.
This will work only if both input files are sorted by first field.

Pure awk isn't really the right tool for the job. If you must use only awk, https://stackoverflow.com/a/5467806/1301972 is a good starting point for your efforts.
However, Unix provides some tools that will help with feeding awk the right input for what you're trying to do.
$ join -a1 -t, <(sort -n input.txt) <(sort -n param.txt) |
awk -F, 'NF > 3 {print $1 "," $4 "," $5; next}; {print}'
Basically, you're feeding awk a single file with the lines joined on the keys from input.txt. Then awk can parse out the fields you want for proper display or for redirection to your output file.

This should work in awk
awk -F"," 'NR==FNR{a[$1]=$0;next} ($1 in a){ print a[$1]; next}1' param.txt input.txt
Test:
$ cat input.txt
1,Ram,Fail
2,John,Fail
3,Ron,Success
$ cat param.txt
1,Sam,Success
2,John,Sucess
$ awk -F"," 'NR==FNR{a[$1]=$0;next} ($1 in a){ print a[$1]; next}1' param.txt input.txt
1,Sam,Success
2,John,Sucess
3,Ron,Success

Related

Move lines in file using awk/sed

Hi my files look like:
>ID.1
GGAACACGACATCCTGCAGGGTTAAAAAAGAAAAAATCAGTAAAAGTACTGGA
>ID.2
GGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGA
and I want to move the lines so that line 1 swaps with 3, and line 2 swaps with 4.
>ID.2
GGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGA
>ID.1
GGAACACGACATCCTGCAGGGTTAAAAAAGAAAAAATCAGTAAAAGTACTGGA
I have thought about using cut so cut send the lines into other files, and then bring them all back in the desired order using paste, but is there a solution using awk/sed.
EDIT: The file always has 4 lines (2 fasta entrys), no more.
For such a simple case, as #Ed_Morton mentioned, you can just swap the even-sized slices with head and tail commands:
$ tail -2 test.txt; head -2 test.txt
>ID.2
GGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGA
>ID.1
GGAACACGACATCCTGCAGGGTTAAAAAAGAAAAAATCAGTAAAAGTACTGGA
Generic solution with GNU tac to reverse contents:
$ tac -bs'>' ip.txt
>ID.2
GGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGA
>ID.1
GGAACACGACATCCTGCAGGGTTAAAAAAGAAAAAATCAGTAAAAGTACTGGA
By default tac reverses line wise but you can customize the separator.
Here, I'm assuming > can be safely used as a unique separator (provided to the -s option). The -b option is used to put the separator before the content in the output.
Using ed (inplace editing):
# move 3rd to 4th lines to the top
printf '3,4m0\nwq\n' | ed -s ip.txt
# move the last two lines to the top
printf -- '-1,$m0\nwq\n' | ed -s ip.txt
Using sed:
sed '1h;2H;1,2d;4G'
Store the first line in the hold space;
Add the second line to the hold space;
Don't print the first two lines;
Before printing the fourth line, append the hold space to it (i.e. append the 1st and 2nd line).
GNU AWK manual has example of swapping two lines using getline as you know that
The file always has 4 lines (2 fasta entrys), no more.
then you might care only about case when number of lines is evenly divisble by 4 and use getline following way, let file.txt content be
>ID.1
GGAACACGACATCCTGCAGGGTTAAAAAAGAAAAAATCAGTAAAAGTACTGGA
>ID.2
GGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGA
then
awk '{line1=$0;getline line2;getline line3;getline line4;printf "%s\n%s\n%s\n%s\n",line3,line4,line1,line2}' file.txt
gives output
>ID.2
GGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGA
>ID.1
GGAACACGACATCCTGCAGGGTTAAAAAAGAAAAAATCAGTAAAAGTACTGGA
Explanation: store current line in variable $0, then next line as line2, yet next line as line3, yet next line as line4, use printf with 4 placeholders (%s) followed by newlines (\n), which are filled accordingly to your requirement.
(tested in GNU Awk 5.0.1)
GNU sed:
sed -zE 's/(.*\r?\n)(.*\r?\n?)/\2\1/' file
A Perl:
perl -0777 -pe 's/(.*\R.*\R)(.*\R.*\R?)/\2\1/' file
A ruby:
ruby -ne 'BEGIN{lines=[]}
lines<<$_
END{puts lines[2...4]+lines[0...2] }' file
Paste and awk:
paste -s file | awk -F'\t' '{print $3, $4, $1, $2}' OFS='\n'
A POSIX pipe:
paste -sd'\t\n' file | nl | sort -nr | cut -f 2- | tr '\t' '\n'
This seems to work:
awk -F'\n' '{print $3, $4, $1, $2}' OFS='\n' RS= ORS='\n\n' file.txt

Cut files from 1 files and grep -v from another files

File1
Ada
Billy
Charles
Delta
Eight
File2
Ada,User,xxx
Beba,User,xxx
Charles,Admin,xxx
I am exuting the following
Acc=`cut -d',' -f1 $PATH/File2
for account in `cat $File1 |grep -v Acc`
do
cat......
sed....
How to correct this?>
Expect output
Check file2 account existing on file1
Ada
Charles
This awk should work for you:
awk -F, 'FNR == NR {seen[$1]; next} $1 in seen' file2 file1
Ada
Charles
If this is not the output you're looking for then edit your question and add your expected output.
Your grep command searches for files which do not contain the string Acc. You need the Flag -f, which causes grep to accept a list of pattern from a file, something like this:
tmpf=/tmp/$$
cut -d',' -f1 File2 >$tmpf
for account in $(grep -f "$tmpf" File1)
do
...
done

Using grep to pull a series of random numbers from a known line

I have a simple scalar file producing strings like...
bpred_2lev.ras_rate.PP 0.9413 # RAS prediction rate (i.e., RAS hits/used RAS)
Once I use grep to find this line in the output.txt, is there a way I can directly grab the "0.9413" portion? I am attempting to make a cvs file and just need whatever value is generated.
Thanks in advance.
There are several ways to combine finding and extracting into a single command:
awk (POSIX-compliant)
awk '$1 == "bpred_2lev.ras_rate.PP" { print $2 }' file
sed (GNU sed or BSD/OSX sed)
sed -En 's/^bpred_2lev\.ras_rate\.PP +([^ ]+).*$/\1/p' file
GNU grep
grep -Po '^bpred_2lev\.ras_rate\.PP +\K[^ ]+' file
You can use awk like this:
grep <your_search_criteria> output.txt | awk '{ print $2 }'

Awk adding constant values

I have data in the text file like val1,val2 with multiple lines
and I want to change it to 1,val1,val2,0,0,1
I tried with print statement in awk(solaris) to add constants by it didn't work.
What is the correct way to do it ?
(From the comments) This is what I tried
awk -F, '{print "%s","1,"$1","$2"0,0,1"}' test.txt
Based on the command you posted, a little change makes it:
$ awk -F, 'BEGIN{OFS=FS} {print 1,$1,$2,0,0,1}' file
1,val1,val2,0,0,1
OR using printf (I prefer print):
$ awk -F, '{printf "1,%s,%s,0,0,1", $1, $2}' file
1,val1,val2,0,0,1
To prepend every line with the constant 1 and append with 0,0,1 simply do:
$ awk '{print 1,$0,0,0,1}' OFS=, file
1,val1,val2,0,0,1
A idiomatic way would be:
$ awk '$0="1,"$0",0,0,1"' file
1,val1,val2,0,0,1
Using sed:
sed 's/.*/1,&,0,0,1/' inputfile
Example:
$ echo val1,val2 | sed 's/.*/1,&,0,0,1/'
1,val1,val2,0,0,1

Searching for Strings

I would like to have a shell script that searches two files and returns a list of strings:
File A contains just a list of unique alphanumeric strings, one per line, like this:
accc_34343
GH_HF_223232
cwww_34343
jej_222
File B contains a list of SOME of those strings (some times more than once), and a second column of infomation, like this:
accc_34343 dog
accc_34343 cat
jej_222 cat
jej_222 horse
I would like to create a third file that contains a list of the strings from File A that are NOT in File B.
I've tried using some loops with grep -v, but that doesn't work. So, in the above example, the new file would have this as it's contents:
GH_HF_223232
cwww_34343
Any help is greatly appreciated!
Here's what you can do:
grep -v -f <(awk '{print $1}' file_b) file_a > file_c
Explanation:
grep -v : Use -v option to grep to invert the matching
-f : Use -f option to grep to specify that the patterns are from file
<(awk '{print $1}' file_b): The <(awk '{print $1}' file_b) is to simply extract the first column values from file_b without using a temp file; the <( ... ) syntax is process substitution.
file_a : Tell grep that the file to be searched is file_a
> file_c : Output to be written to file_c
comm is used to find intersections and differences between files:
comm -23 <(sort fileA) <(cut -d' ' -f1 fileB | sort -u)
result:
GH_HF_223232
cwww_34343
I assume your shell is bash/zsh/ksh
awk 'FNR==NR{a[$0];next}!($1 in a)' fileA fileB
check here

Resources