Substituting values of one column from a list of corresponding values - bash

I want to replace the entries in one column of file input A.txt by the list given in B.txt in corresponding order
For example
A.txt is tab delimited but in a column values are separated by comma
need to change one of entries of that column values say P=
1 X y Z Q=Alpha,P=beta,O=Theta
2 x a b Q=Alpha,P=beta,O=Theta
3 y b c Q=Alpha,P=beta,O=Theta
4 a b c Q=Alpha,P=beta,O=Theta
5 x y z Q=Alpha,P=beta,O=Theta
B.txt is
1 gamma
2 alpha
3 alpha
4 gamma
5 alpha
now reading each entry in A.txt and replace P= with the corresponding line values in B.txt
Output:
1 X y Z Q=Alpha,P=gamma,O=Theta
2 x a b Q=Alpha,P=alpha,O=Theta
3 y b c Q=Alpha,P=alpha,O=Theta
4 a b c Q=Alpha,P=gamma,O=Theta
5 x y z Q=Alpha,P=alpha,O=Theta
Thanks in advance!!!

Assuming A.txt and B.txt are sorted on the first column, you can first join both files and then perform the replacement within a specified field using sed:
For example:
join -t $'\t' -j 1 A.txt B.txt | sed 's/,P=.*,\(.*\)\t\(.*\)/,P=\2,\1/g'

You could have sed write you a sed script, e.g.:
sed 's:^:/^:; s: :\\b/s/P=[^,]+/P=:; s:$:/:' B.txt
Output:
/^1\b/s/P=[^,]+/P=gamma/
/^2\b/s/P=[^,]+/P=alpha/
/^3\b/s/P=[^,]+/P=alpha/
/^4\b/s/P=[^,]+/P=gamma/
/^5\b/s/P=[^,]+/P=alpha/
Pipe it into a second sed:
sed 's:^:/^:; s: :\\b/s/P=[^,]+/P=:; s:$:/:' B.txt | sed -r -f - A.txt
Output:
1 X y Z Q=Alpha,P=gamma,O=Theta
2 x a b Q=Alpha,P=alpha,O=Theta
3 y b c Q=Alpha,P=alpha,O=Theta
4 a b c Q=Alpha,P=gamma,O=Theta
5 x y z Q=Alpha,P=alpha,O=Theta

Another solution:
awk '{getline b < "B.txt" split(b, a, FS)} -F "," {sub(/beta/, a[2]); print}' A.txt

Related

Add a specific string at the end of each line

I have a mainfile with 4 columns, such as:
a b c d
e f g h
i j k l
in another file, i have one line of text corresponding to the respective line in the mainfile, which i want to add as a new column to the mainfile, such as:
a b c d x
e f g h y
i j k l z
Is this possible in bash? I can only add the same string to the end of each line.
Two ways you can do
1) paste file1 file2
2) Iterate over both files and combine line by line and write to new file
You could use GNU parallel for that:
fe-laptop-m:test fe$ cat first
a b c d
e f g h
i j k l
fe-laptop-m:test fe$ cat second
x
y
z
fe-laptop-m:test fe$ parallel echo ::::+ first second
a b c d x
e f g h y
i j k l z
Do I get you right what you try to achieve?
This might work for you (GNU sed):
sed -E 's#(^.*) .*#/^\1/s/$/ &/#' file2 | sed -f - file1
Create a sed script from file2 that uses a regexp to match a line in file1 and if it does appends the contents of that line in file2 to the matched line.
N.B.This is independent of the order and length of file1.
You can try using pr
pr -mts' ' file1 file2

Combining 2 lines together but "interlaced"

I have 2 lines from an output as follow:
a b c
x y z
I would like to pipe both lines from the last command into a script that would combine them "interlaced", like this:
a x b y c z
The solution should work for a random number of columns from the output, such as:
a b c d e
x y z x y
Should result in:
a x b y c z d x e y
So far, I have tried using awk, perl, sed, etc... but without success. All I can do, is to put the output into one line, but it won't be "interlaced":
$ echo -e 'a b c\nx y z' | tr '\n' ' ' | sed 's/$/\n/'
a b c x y z
Keep fields of odd numbered records in an array, and update the fields of even numbered records using it. This will interlace each pair of successive lines in input.
prog | awk 'NR%2{split($0,a);next} {for(i in a)$i=(a[i] OFS $i)} 1'
Here's a 3 step solution:
$ # get one argument per line
$ printf 'a b c\nx y z' | xargs -n1
a
b
c
x
y
z
$ # split numbers of lines by 2 and combine them side by side
$ printf 'a b c\nx y z' | xargs -n1 | pr -2ts' '
a x
b y
c z
$ # combine all input lines into single line
$ printf 'a b c\nx y z' | xargs -n1 | pr -2ts' ' | paste -sd' '
a x b y c z
$ printf 'a b c d e\nx y z 1 2' | xargs -n1 | pr -2ts' ' | paste -sd' '
a x b y c z d 1 e 2
Could you please try following, it will join every 2 lines in "interlaced" fashion as follows.
awk '
FNR%2!=0 && FNR>1{
for(j=1;j<=NF;j++){
printf("%s%s",a[j],j==NF?ORS:OFS)
delete a
}
}
{
for(i=1;i<=NF;i++){
a[i]=(a[i]?a[i] OFS:"")$i}
}
END{
for(j=1;j<=NF;j++){
printf("%s%s",a[j],j==NF?ORS:OFS)
}
}' Input_file
Here is a simple awk script
script.awk
NR == 1 {split($0,inArr1)} # read fields frrom 1st line into arry1
NR == 2 {split($0,inArr2); # read fields frrom 2nd line into arry2
for (i = 1; i <= NF; i++) printf("%s%s%s%s", inArr1[i], OFS, inArr2[i], OFS); # ouput interlace fields from arr1 and arr2
print; # terminate output line.
}
input.txt
a b c d e
x y z x y
running:
awk -f script.awk input.txt
output:
a x b y c z d x e y x y z x y
Multiline awk solution:
interlaced.awk
{
a[NR] = $0
}
END {
split(a[1], b)
split(a[2], c)
for (i in b) {
printf "%s%s %s", i==1?"":OFS, b[i], c[i]
}
print ORS
}
Run it like this:
foo_program | awk -f interlaced.awk
Perl will do the job. It was invented for this type of task.
echo -e 'a b c\nx y z' | \
perl -MList::MoreUtils=mesh -e \
'#f=mesh #{[split " ", <>]}, #{[split " ", <>]}; print "#f"'
 
a x b y c z
You can of course print out the meshed output any way you want.
Check out http://metacpan.org/pod/List::MoreUtils#mesh
You could even make it into a shell function for easy use:
function meshy {
perl -MList::MoreUtils=mesh -e \
'#f=mesh #{[split " ", <>]}, #{[split " ", <>]}; print "#f"'
}
$ echo -e 'X Y Z W\nx y z w' |meshy
X x Y y Z z W w
$
Ain't Perl grand?
This might work for you (GNU sed):
sed -E 'N;H;x;:a;s/\n(\S+\s+)(.*\n)(\S+\s+)/\1\3\n\2/;ta;s/\n//;s// /;h;z;x' file
Process two lines at time. Append two lines in the pattern space to the hold space which will introduce a newline at the front of the two lines. Using pattern matching and back references, nibble away at the front of each of the two lines and place the pairs at the front. Eventually, the pattern matching fails, then remove the first newline and replace the second by a space. Copy the amended line to hold space, clean up the pattern space ready for the next couple of line (if any) and print.

match columns in 2 tab-delimited text files

I have two tab-delimited .txt files
file1 has 20 million lines and the following structure
col1 col2 col3 col4 col5
1 x x A x
2 y y A x
3 z z A x
4 x x B x
5 x y B x
6 x y E x
7 x z F x
file2 has 3000 lines and the following structure
col1
A
B
C
D
Now I want to extract from file1 the lines where there is a match between col1 from file2 and col4 of file1
So the new file3 should look like this
col1 col2 col3 col4 col5
1 x x A x
2 y y A x
3 z z A x
4 x x B x
5 x y B x
How can I do this with perl or bash?
You can use standard awk command to join 2 files:
awk 'BEGIN{FS=OFS="\t"} FNR == NR { a[$1]; next } $4 in a' file2 file1
try this -
awk -F'[ ]+' 'NR==FNR {a[$1]++;next} $4 in a{print $0}' f2 f1
1 x x A x
2 y y A x
3 z z A x
4 x x B x
5 x y B x
Since you also asked about Perl, here's a reusable perl solution. You first read file 2, generate an array of lookup values, and then read file 1, printing out any line in which column 4 matches a value inside of the array we created above. Something like this might work:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $key_file = shift;
open(my $fh, "<", $key_file);
my $header = <$fh>; # read the header line into '$h'
my %keys = map{ chomp; $_ => 1 }<$fh>;
close $fh;
my $query_file = shift;
open(my $q_fh, "<", $query_file);
print scalar <$q_fh>;
while (<$q_fh>) {
my #fields = split;
print if $keys{$fields[3]};
}
close $q_fh;
You can run this as table_combine.pl <file2> <file1>.

Awk to multiply consecutive lines

I have a file with a single column with N numbers:
a
b
c
d
e
And I would like to use awk to multiply first with second, second with third and so on and then add all these, i.e:
(a*b)+(b*c)+(c*d)+...
Any suggestions?
I would use the following command:
awk 'NR>1{t+=l*$0}{l=$0}END{print t}' input.txt
Having this input:
1
2
3
4
5
it will ouput:
40
which equals 1*2+2*3+3*4+4*5

complex line copying&modifying on-the-fly with grep or sed

Is there a way to do the followings with either grep, or sed: read each line of a file, and copy it twice and modify each copy:
Original line:
X Y Z
A B C
New lines:
Y M X
Y M Z
B M A
B M C
where X, Y, Z, M are all integers, and M is a fixed integer (i.e. 2) we inject while copying! I suppose a solution (if any) will be so complex that people (including me) will start bleeding after seeing it!
$ awk -v M=2 '{print $2,M,$1; print $2,M,$3;}' file
Y 2 X
Y 2 Z
B 2 A
B 2 C
How it works
-v M=2
This defines the variable M to have value 2.
print $2,M,$1
This prints the second column, followed by M, followed by the first column.
print $2,M,$3
This prints the second column, followed by M, followed by the third column.
Extended Version
Suppose that we want to handle an arbitrary number of columns in which we print all columns between first and last, followed by M, followed by the first, and then print all columns between first and last, followed by M, followed by the last. In this case, use:
awk -v M=2 '{for (i=2;i<NF;i++)printf "%s ",$i; print M,$1; for (i=2;i<NF;i++)printf "%s ",$i; print M,$NF;}' file
As an example, consider this input file:
$ cat file2
X Y1 Y2 Z
A B1 B2 C
The above produces:
$ awk -v M=2 '{for (i=2;i<NF;i++)printf "%s ",$i; print M,$1; for (i=2;i<NF;i++)printf "%s ",$i; print M,$NF;}' file2
Y1 Y2 2 X
Y1 Y2 2 Z
B1 B2 2 A
B1 B2 2 C
The key change to the code is the addition of the following command:
for (i=2;i<NF;i++)printf "%s "
This command prints all columns from the i=2, which is the column after the first to i=NF-1 which is the column before the last. The code is otherwise similar.
Sure; you can write:
sed 's/\(.*\) \(.*\) \(.*\)/\2 M \1\n\2 M \3/'
With bash builtin commands:
m=2; while read a b c; do echo "$b $m $a"; echo "$b $m $c"; done < file
Output:
Y 2 X
Y 2 Z
B 2 A
B 2 C

Resources