replace a particular row and column value of one file with another - shell

I have a file containing
a b c d
g h i j
d e f f
and a another file containing
1 2 3 4
5 6 7 8
9 1 0 1
I know that I can extract a particular row and column using
awk 'FNR == 2 {print $3}' fit_detail.txt
But, I need to replace 2nd column and 3rd row of first file with the 2nd row and 3rd column of second file. How I could do this and saves it into another file.
Finally, my output should look like
a b c d
g h i j
d 1 f f

$ awk 'NR==FNR && NR==3 {a=$2} NR==FNR {next} FNR==3 {$2=a} {print}' file2 file1
a b c d
g h i j
d 1 f f
Explanation:
NR==FNR && NR==3 {a=$2}
In awk, NR is the number of records (lines) that have been read in total and FNR is the number of records (lines) that have been read in from the current file. So, when NR==FNR, then we know that we are working on the first file named on the command line. For that file, we select only the third row (NR==3) and save the value of its second column in the variable a.
NR==FNR {next}
If we are processing the first named file on the command line, skip to next line.
FNR==3 {$2=a}
Because of the preceding next statement, it is only possible to get to this command if we are now working on the second named file. For this file, if we are on the third row, change the 2nd column to the value a.
{print}
All lines from the second named file are printed.
Controlling the output format
By default, awk separates output fields with a space. If another output field separator, such as a tab, is desired, it can be specified as follows:
$ awk -v OFS="\t" 'NR==FNR && NR==3 {a=$2} NR==FNR {next} {$2=$2} FNR==3 {$2=a} {print}' file2 file1
a b c d
g h i j
d 1 f f
To accomplish this, we made two changes:
The output field separator (OFS) was specified as a tab with the -v option: -v OFS="\t"
When using a simple print statement, such as {print}, awk will normally apply the new output field separator only if the line had been changed in some way. That is accomplished here with the statement $2=$2. This assigns the second field to itself. Even though this leaves the second field unchanged, it is enough to trigger awk` to replace the old field separators with new ones on output.

Related

Bash: how to put each line of a column below the same-row-line of another column?

I'm working with some data using bash and I need that this kind of input file:
Col1 Col2
A B
C D
E F
G H
Turn out in this output file:
Col1
A
B
C
D
E
F
G
H
I tried with some commands but they didn't work. Any kind of suggestions will be very appreciated!
As with many problems, there are many solutions. Here is one using awk:
awk 'NR > 1 {print $1; print $2}' inputfile.txt
The NR > 1 expression says to execute the following block for all line numbers greater than one. (NR is the current record number which is the same as line number by default.)
The {print $1; print $2} code block says to print the first field, then print the second field. The advantage of using awk in this case is that it doesn't matter if the fields are separated by space characters, tabs, or a combination; the fields just have to be separated by some number of whitespace characters.
If the field values on each line are only separated by a single space character, then this would work:
tail -n +2 inputfile.txt | tr ' ' '\n'
In this solution, tail -n +2 is used to print all lines starting with the second line and tr ' ' '\n' is used to replace all the space characters with newlines, as suggested by previously.

Sort & Uniq the values on specific column

I am having a data separated by : delimeted
AA:w_c;w_c;r_c:1;3
BB:sync;sync:4
CC:t_wak;t_wak:6;7;8
I need to print only one value in column 2 that to unique value. If there are more than one unique value then it need to print in another file.
I tried this:
#!/bin/bash
sort -u -t : -k2,2 file >> txt
awk -F: '{gsub(";"," ",$3)}1' txt
Output:
BB:sync;sync:4
CC t_wak;t_wak 6 7 8
AA w_c;w_c;r_c 1 3
Actually I am trying to to do sort and uniq the values in column 2 and copying that output to another file called "txt". Then I am using AWk to replace the ; with space in column 3 seems above code is not working.
Desired Output 1:
BB:sync:4
CC:t_wak:6 7 8
The above two values are the actual output we need to get to print because in column 2 it contains only one value.
The below one needs to print in another file because in column 2 it contains more than one value.
Desired output 2:
AA:w_c;r_c:1;3
w_c
r_c
In column 2 it should have only one value, if there are more than one then need to print in another file by stating them as shown above.
This quick solution should work for the example:
awk 'BEGIN{FS=OFS=":"}
{
split($2, a, ";")
v=""; delete u
for(i=1;i<=length(a);i++){
if( ++u[a[i]]<2)
v=v (i==1?"":";") a[i]
}
$2=v
if(length(u)>1){
print > "output2.txt"
next
}
}7' input
Let's do a test:
kent$ awk 'BEGIN{FS=OFS=":"}
{
split($2, a, ";")
v=""; delete u
for(i=1;i<=length(a);i++){
if( ++u[a[i]]<2)
v=v (i==1?"":";") a[i]
}
$2=v
if(length(u)>1){
print > "output2.txt"
next
}
}7' f
BB:sync:4
CC:t_wak:6;7;8
kent$ cat output2.txt
AA:w_c;r_c:1;3
If you want to have each value in col2 in the output2.txt:
awk 'BEGIN{FS=OFS=":";out2="output2.txt"}
{
split($2, a, ";")
v=""; delete u
for(i=1;i<=length(a);i++){
if( ++u[a[i]]<2)
v=v (i==1?"":";") a[i]
}
$2=v
if(length(u)>1){
print > out2
for(x in u)
print x > out2
next
}
}7' input
Then you'll get:
kent$ cat output2.txt
AA:w_c;r_c:1;3
w_c
r_c

bash cycle - output according to string from file

How to call the output file as the string in 4th column of output (or according to 4th column of ith row of the input)?
I tried:
for i in {1..321}; do
awk '(FNR==i) {outfile = $4 print $0 >> outfile}' RV1_phase;
done
or
for i in {1..321}; do
awk '(FNR==i) {outfile = $4; print $0}' RV1_phase > "$outfile";
done
input file:
1 2 2 a
4 5 6 f
4 4 5 f
....
....
desired input i=1
name: a
1 2 2 a
The aim: I have data that I plotted in gnuplot and I would like to plot set of figures named after string to know which point come from which file. The point will be coloured. I need to get files for plotting in gnuplot so I would like to create them using the cycle from my question.
Simply
for i in {1..321}; do
awk '(FNR==i) {print $0 >> $4}' RV1_phase;
done
The problem with your first attempt was that you didn't use a ; to separate the assignment to outfile from the print command. The separate variable isn't necessary, though.
You don't need a bash loop, either:
awk '1 <= FNR && FNR <= 321 {print $0 >> $4}' RV1_phase;

Awk results not working

Hi I have a text file with values
A VAL|1|2|3|
C VAL|2|2|3|
D VAL|1|2|3|
[No space between lines]
I want to replace the values in the above as per the first col i.e A VAL,C
VAL,D VAL,
so I want to
1. replace 3 from A VAL row
2. replace 2 value from C VAL row.
3. replace 1 value from D VAL row.
Basically I want to modify the above values by using AWK as AWK helps
treating csv , pipe delimited files
So I tried by using AWK command as
enter code here
`awk 'BEGIN {OFS=FS="|"} {if ($1="A") sub($4,"A1") ;elseif ($1="C") sub
($2,"B1"); print }' myval.txt`
*But I am getting wrong results *
C|B1|2|A1|B1C
C|B1|2|A1|B1C
C|B1|2|3|B1C
>The fisrt column itself is geting replace and the substitution is at wrong
>position.
**Expected output is **
A VAL|1|2|A1|
C VAL|2|2|B1|
D VAL|1|2|3|
You can try this awk:
awk 'BEGIN{OFS=FS="|"} $1 ~ /^A/{$(NF-1)="A1"} $1 ~ /^C/{$(NF-1)="B1"} 1' file.csv
A VAL|1|2|A1|
C VAL|2|2|B1|
D VAL|1|2|3|
awk 'BEGIN{OFS=FS="|"}{if(substr($1,0,1)=="A")sub($3,"A1",$3);else if(substr($1,0,1)=="C")sub($3,"B1",$3);else if(substr($1,0,1)=="D")sub($3,"3",$3);print }' inputtext.txt > outtext.txt
This is working fine

Remove observations from text file based on values listed in separate file

I have a ~15,000,000 line text file (File A) with the following columns:
1 1:693731 0 693731 G A
1 1:706992 0 706992 T C
1 1:707014 0 707014 C A
1 1:715142 0 715142 T G
1 1:724721 0 724721 A C
1 1:729679 0 729679 C G
...
In a separate file (File B), I have a list of ~80,000 observations I want to delete from File A:
1:706992
1:715142
1:729679
...
I want to delete rows from File A based on the value in column 2 (listed in File B) and print the output. So, the output file should look like this:
1 1:693731 0 693731 G A
1 1:707014 0 707014 C A
1 1:724721 0 724721 A C
Any input would be greatly appreciated.
A single-pass awk solution:
awk 'NR==FNR { xclude[$0]++; next } !xclude[$2]' fileB fileA
NR==FNR { xclude[$0]++; next } processes rows from the 1st input file ( fileB) only and stores its rows ($0) as the keys of associative array xclude with associated nonzero values (by virtue of ++).
NR (the overall row index) is only equal to FNR (the input file-specific row index) for the first input file; next skips the remainder of the script and proceeds to the next input line.
!xclude[$2] is therefore only evaluated for rows from the 2nd input file (fileA), and only prints rows whose 2nd column value ($2) is not (!) contained in the array of exclusions, xclude.
Note that pattern !xclude[$2] evaluating to true implicitly prints the row at hand, because that is Awk's default action in the absence of an associated action ({...}).
In a comment, karakfa suggests the following variation, which bypasses the need for ++:
awk 'NR==FNR { xclude[$0]; next } !($2 in xclude)' fileB fileA
Simply referencing an array element causes Awk to create it implicitly, so xclude[$0], despite not assigning a value, creates an element whose key is the value of $0.
$2 in xclude then simply tests the existence of key $2 in array xclude with operator in (without testing the value, which would be empty in this case).
With grep:
$ grep -vwFf fileB fileA
1 1:693731 0 693731 G A
1 1:707014 0 707014 C A
1 1:724721 0 724721 A C
With these options:
-v inverted matching: exclude lines that match
-w word matching: only matches that form whole words to avoid substring matching
-F fixed strings: don't interpret search strings as regex
-f read from file: use fileB as list of strings to search for
More verbose, better for readability:
grep --invert-match --word-regexp --fixed-strings --file=fileB fileA
Notice that this is not a generally applicable solution, but might work for this dataset, assuming that the second column always is the only one to contain a colon.
if the file are sorted in the key as in the sample you can use join
$ join -v1 -12 file1 file2 | awk -v OFS='\t' '{t=$2;$2=$1;$1=t}1'
1 1:693731 0 693731 G A
1 1:707014 0 707014 C A
1 1:724721 0 724721 A C
you can do the column ordering with -o option as well.

Resources