complex line copying&modifying on-the-fly with grep or sed - bash

Is there a way to do the followings with either grep, or sed: read each line of a file, and copy it twice and modify each copy:
Original line:
X Y Z
A B C
New lines:
Y M X
Y M Z
B M A
B M C
where X, Y, Z, M are all integers, and M is a fixed integer (i.e. 2) we inject while copying! I suppose a solution (if any) will be so complex that people (including me) will start bleeding after seeing it!

$ awk -v M=2 '{print $2,M,$1; print $2,M,$3;}' file
Y 2 X
Y 2 Z
B 2 A
B 2 C
How it works
-v M=2
This defines the variable M to have value 2.
print $2,M,$1
This prints the second column, followed by M, followed by the first column.
print $2,M,$3
This prints the second column, followed by M, followed by the third column.
Extended Version
Suppose that we want to handle an arbitrary number of columns in which we print all columns between first and last, followed by M, followed by the first, and then print all columns between first and last, followed by M, followed by the last. In this case, use:
awk -v M=2 '{for (i=2;i<NF;i++)printf "%s ",$i; print M,$1; for (i=2;i<NF;i++)printf "%s ",$i; print M,$NF;}' file
As an example, consider this input file:
$ cat file2
X Y1 Y2 Z
A B1 B2 C
The above produces:
$ awk -v M=2 '{for (i=2;i<NF;i++)printf "%s ",$i; print M,$1; for (i=2;i<NF;i++)printf "%s ",$i; print M,$NF;}' file2
Y1 Y2 2 X
Y1 Y2 2 Z
B1 B2 2 A
B1 B2 2 C
The key change to the code is the addition of the following command:
for (i=2;i<NF;i++)printf "%s "
This command prints all columns from the i=2, which is the column after the first to i=NF-1 which is the column before the last. The code is otherwise similar.

Sure; you can write:
sed 's/\(.*\) \(.*\) \(.*\)/\2 M \1\n\2 M \3/'

With bash builtin commands:
m=2; while read a b c; do echo "$b $m $a"; echo "$b $m $c"; done < file
Output:
Y 2 X
Y 2 Z
B 2 A
B 2 C

Related

Combining 2 lines together but "interlaced"

I have 2 lines from an output as follow:
a b c
x y z
I would like to pipe both lines from the last command into a script that would combine them "interlaced", like this:
a x b y c z
The solution should work for a random number of columns from the output, such as:
a b c d e
x y z x y
Should result in:
a x b y c z d x e y
So far, I have tried using awk, perl, sed, etc... but without success. All I can do, is to put the output into one line, but it won't be "interlaced":
$ echo -e 'a b c\nx y z' | tr '\n' ' ' | sed 's/$/\n/'
a b c x y z
Keep fields of odd numbered records in an array, and update the fields of even numbered records using it. This will interlace each pair of successive lines in input.
prog | awk 'NR%2{split($0,a);next} {for(i in a)$i=(a[i] OFS $i)} 1'
Here's a 3 step solution:
$ # get one argument per line
$ printf 'a b c\nx y z' | xargs -n1
a
b
c
x
y
z
$ # split numbers of lines by 2 and combine them side by side
$ printf 'a b c\nx y z' | xargs -n1 | pr -2ts' '
a x
b y
c z
$ # combine all input lines into single line
$ printf 'a b c\nx y z' | xargs -n1 | pr -2ts' ' | paste -sd' '
a x b y c z
$ printf 'a b c d e\nx y z 1 2' | xargs -n1 | pr -2ts' ' | paste -sd' '
a x b y c z d 1 e 2
Could you please try following, it will join every 2 lines in "interlaced" fashion as follows.
awk '
FNR%2!=0 && FNR>1{
for(j=1;j<=NF;j++){
printf("%s%s",a[j],j==NF?ORS:OFS)
delete a
}
}
{
for(i=1;i<=NF;i++){
a[i]=(a[i]?a[i] OFS:"")$i}
}
END{
for(j=1;j<=NF;j++){
printf("%s%s",a[j],j==NF?ORS:OFS)
}
}' Input_file
Here is a simple awk script
script.awk
NR == 1 {split($0,inArr1)} # read fields frrom 1st line into arry1
NR == 2 {split($0,inArr2); # read fields frrom 2nd line into arry2
for (i = 1; i <= NF; i++) printf("%s%s%s%s", inArr1[i], OFS, inArr2[i], OFS); # ouput interlace fields from arr1 and arr2
print; # terminate output line.
}
input.txt
a b c d e
x y z x y
running:
awk -f script.awk input.txt
output:
a x b y c z d x e y x y z x y
Multiline awk solution:
interlaced.awk
{
a[NR] = $0
}
END {
split(a[1], b)
split(a[2], c)
for (i in b) {
printf "%s%s %s", i==1?"":OFS, b[i], c[i]
}
print ORS
}
Run it like this:
foo_program | awk -f interlaced.awk
Perl will do the job. It was invented for this type of task.
echo -e 'a b c\nx y z' | \
perl -MList::MoreUtils=mesh -e \
'#f=mesh #{[split " ", <>]}, #{[split " ", <>]}; print "#f"'
 
a x b y c z
You can of course print out the meshed output any way you want.
Check out http://metacpan.org/pod/List::MoreUtils#mesh
You could even make it into a shell function for easy use:
function meshy {
perl -MList::MoreUtils=mesh -e \
'#f=mesh #{[split " ", <>]}, #{[split " ", <>]}; print "#f"'
}
$ echo -e 'X Y Z W\nx y z w' |meshy
X x Y y Z z W w
$
Ain't Perl grand?
This might work for you (GNU sed):
sed -E 'N;H;x;:a;s/\n(\S+\s+)(.*\n)(\S+\s+)/\1\3\n\2/;ta;s/\n//;s// /;h;z;x' file
Process two lines at time. Append two lines in the pattern space to the hold space which will introduce a newline at the front of the two lines. Using pattern matching and back references, nibble away at the front of each of the two lines and place the pairs at the front. Eventually, the pattern matching fails, then remove the first newline and replace the second by a space. Copy the amended line to hold space, clean up the pattern space ready for the next couple of line (if any) and print.

bash command for splitting cell content by delimiter into multiple rows in the cell column

To draw a task. I have dataframe:
x y1;y2;y3 z1;z2;z3
a b1;b2 c1;c2
I need:
x y1 z1
x y2 z2
x y3 z3
a b1 c1
a b2 c2
Column 1 has one instance always. Number of instances in a cell can be from one to many but always equal between column 2,3. Thanks
In awk:
$ awk -F"(\t|;)" '{
for(i=2;i<=4;i++)
if($i!="")
print $1, $i, $(i+3)
}' file
x y1 z1
x y2 z2
x y3 z3
a b1 c1
a b2 c2
Edit: Another version:
$ awk -F"(\t+|;)" '{ # FS tabs or semicolon
for(i=2;i<=int(NF/2)+1;i++)
print $1,$i,$(i+int(NF/2))
}' file
x y1 z1
x y2 z2
x y3 z3
a b1 c1
a b2 c2
Something like this should make it:
declare -a cols=() # array for individual columns (line fields)
IFS=' ;' # fields separators
while read -a cols; do
n=${#cols[#]} # number of fields in current line
if (( n < 3 || n % 2 != 1 )); then # skip invalid lines
printf "skipping invalid line: %s\n" "${cols[*]}"
continue
fi
for (( i = 1; i <= n / 2; i += 1 )); do # loop over pairs of fields
# printf line
printf "%s %s %s\n" "${cols[0]}" "${cols[i]}" "${cols[n/2+i]}"
done
done < data.txt
Explanations:
IFS is the list of characters used by read to split a line in fields. In your case spaces and ; seem to be the separators.
read -a cols assigns the fields of the read line to the cols array, starting at cell 0.
Example of run:
$ cat data.txt
x y1;y2;y3 z1;z2;z3
a b1;b2 c1;c2
$ ./foo.sh
x y1 z1
x y2 z2
x y3 z3
a b1 c1
a b2 c2

Extract lines having same second column but different third column

I have a file having strings in 3 columns as below.
a b x
a b y
a b z
a c x
a d y
I want to extract all the lines having same second column but different third column. The output I am expecting for the above example is
a b x
a b y
a b z
I tried uniq -f2 and sort -u -k2, But it isn't working as I expect. Any suggestions please.
awk '
seen[$2]++ {
if (!seen[$2,$3]++) {
printf "%s%s\n", first[$2], $0
}
delete first[$2]
next
}
{ first[$2] = $0 ORS }
' file
a b x
a b y
a b z
Note that the above will work in any awk, for any values in your input file, does not retain the whole of the input file in memory, doesn't rely on any external tools for pre/post processing, and will produce the output lines in exactly the same order they appeared in the input.
awk to the rescue!
Need to make sure all records are unique first
$ sort file | uniq |
awk '{c[$2]++; a[$2]=a[$2]?a[$2]RS$0:$0}
END{for(k in a) if(c[k]>1) print a[k]}'
a b x
a b y
a b z
Explanation: keep the counter of second field occurrences and aggregate the records. At the end print the records for which the counter is greater than one.

How to repeat lines in bash and paste with different columns?

is there a short way in bash to repeat the first line of a file as often as needed to paste it with another file in a kronecker product type (for the mathematicians of you)?
What I mean is, I have a file A:
a
b
c
and a file B:
x
y
z
and I want to merge them as follows:
a x
a y
a z
b x
b y
b z
c x
c y
c z
I could probably write a script, read the files line by line and loop over them, but I am wondering if there a short one-line command that could do the same job. I can't think of one and as you can see, I am also lacking some keywords to search for. :-D
Thanks in advance.
You can use this one-liner awk command:
awk 'FNR==NR{a[++n]=$0; next} {for(i=1; i<=n; i++) print $0, a[i]}' file2 file1
a x
a y
a z
b x
b y
b z
c x
c y
c z
Breakup:
NR == FNR { # While processing the first file in the list
a[++n]=$0 # store the row in array 'a' by the an incrementing index
next # move to next record
}
{ # while processing the second file
for(i=1; i<=n; i++) # iterate over the array a
print $0, a[i] # print current row and array element
}
alternative to awk
join <(sed 's/^/_\t/' file1) <(sed 's/^/_\t/' file2) | cut -d' ' -f2-
add a fake key for join to have all records of file1 to match all records of file2, trim afterwards

Substituting values of one column from a list of corresponding values

I want to replace the entries in one column of file input A.txt by the list given in B.txt in corresponding order
For example
A.txt is tab delimited but in a column values are separated by comma
need to change one of entries of that column values say P=
1 X y Z Q=Alpha,P=beta,O=Theta
2 x a b Q=Alpha,P=beta,O=Theta
3 y b c Q=Alpha,P=beta,O=Theta
4 a b c Q=Alpha,P=beta,O=Theta
5 x y z Q=Alpha,P=beta,O=Theta
B.txt is
1 gamma
2 alpha
3 alpha
4 gamma
5 alpha
now reading each entry in A.txt and replace P= with the corresponding line values in B.txt
Output:
1 X y Z Q=Alpha,P=gamma,O=Theta
2 x a b Q=Alpha,P=alpha,O=Theta
3 y b c Q=Alpha,P=alpha,O=Theta
4 a b c Q=Alpha,P=gamma,O=Theta
5 x y z Q=Alpha,P=alpha,O=Theta
Thanks in advance!!!
Assuming A.txt and B.txt are sorted on the first column, you can first join both files and then perform the replacement within a specified field using sed:
For example:
join -t $'\t' -j 1 A.txt B.txt | sed 's/,P=.*,\(.*\)\t\(.*\)/,P=\2,\1/g'
You could have sed write you a sed script, e.g.:
sed 's:^:/^:; s: :\\b/s/P=[^,]+/P=:; s:$:/:' B.txt
Output:
/^1\b/s/P=[^,]+/P=gamma/
/^2\b/s/P=[^,]+/P=alpha/
/^3\b/s/P=[^,]+/P=alpha/
/^4\b/s/P=[^,]+/P=gamma/
/^5\b/s/P=[^,]+/P=alpha/
Pipe it into a second sed:
sed 's:^:/^:; s: :\\b/s/P=[^,]+/P=:; s:$:/:' B.txt | sed -r -f - A.txt
Output:
1 X y Z Q=Alpha,P=gamma,O=Theta
2 x a b Q=Alpha,P=alpha,O=Theta
3 y b c Q=Alpha,P=alpha,O=Theta
4 a b c Q=Alpha,P=gamma,O=Theta
5 x y z Q=Alpha,P=alpha,O=Theta
Another solution:
awk '{getline b < "B.txt" split(b, a, FS)} -F "," {sub(/beta/, a[2]); print}' A.txt

Resources