Select first two columns from tab-delimited text file and and substitute with '_' character - shell

I have a sample input file as follows
RF00001 1c2x C 3 118 77.20 1.6e-20 1 119 f29242
RF00001 1ffk 9 1 121 77.40 1.4e-20 1 119 8e2511
RF00001 1jj2 9 1 121 77.40 1.4e-20 1 119 f29242
RF00001 1k73 B 1 121 77.40 1.4e-20 1 119 8484c0
RF00001 1k8a B 1 121 77.40 1.4e-20 1 119 93c090
RF00001 1k9m B 1 121 77.40 1.4e-20 1 119 ebeb30
RF00001 1kc8 B 1 121 77.40 1.4e-20 1 119 bdc000
I need to extract the second and third columns from the text file and substitute the tab with '_'
Desired output file :
1c2x_C
1ffk_9
1jj2_9
1k73_B
1k8a_B
1k9m_B
1kc8_B
I am able to print the two columns by :
awk -F" " '{ print $2,$3 }' input.txt
but unable to substitute the tab with '_' with the following command
awk -F" " '{ print $2,'_',$3 }' input.txt

Could you please try following.
awk '{print $2"_"$3}' Input_file
2nd solution:
awk 'BEGIN{OFS="_"} {print $2,$3}' Input_file
3rd solution: Adding a sed solution.
sed -E 's/[^ ]* +([^ ]*) +([^ ]*).*/\1_\2/' Input_file

Related

Insert rows using awk

How can I insert a row using awk?
My file looks as:
1 43
2 34
3 65
4 75
I would like to insert three rows with "?" So my desire file looks as:
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
I am trying with the below script.
awk '{if(NR<=3){print "NR ?"}} {printf" " NR $2}' file.txt
Here's one way to do it:
$ awk 'BEGIN{s=" "; for(c=1; c<4; c++) print c s "?"}
{print c s $2; c++}' ip.txt
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
$ awk 'BEGIN {printf "1 ?\n2 ?\n3 ?\n"} {printf "%d", $1 + 3; printf " %s\n", $2}' file.txt
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
You could also add the 3 lines before awk, e.g.:
{ seq 3; cat file.txt; } | awk 'NR <= 3 { $2 = "?" } $1 = NR' OFS='\t'
Output:
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
I would do it following way using GNU AWK, let file.txt content be
1 43
2 34
3 65
4 75
then
awk 'BEGIN{OFS=" "}NR==1{print 1,"?";print 2,"?";print 3,"?"}{print NR+3,$2}' file.txt
output
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
Explanation: I set output field separator (OFS) to 7 spaces. For 1st row I do print three lines which consisting of subsequent number and ? sheared by output field separator. You might elect to do this using for loop, especially if you expect that requirement might change here. For every line I print number of row plus 4 (to keep order) and 2nd column ($2). Thanks to use of OFS, you would need to make only one change if requirement regarding number of spaces will be altered. Note that construct like
{if(condition){dosomething}}
might be written in GNU AWK in more concise manner as
(condition){dosomething}
(tested in gawk 4.2.1)

shell script for extracting line of file using awk

I want, the selected lines of file to be print in output file side by side separated by space. Here what I have did so far,
for file in SAC*
do
awk 'FNR==2 {print $4}' $file >>exp
awk 'FNR==3 {print $4}' $file >>exp
awk 'FNR==4 {print $4}' $file >>exp
awk 'FNR==5 {print $4}' $file >>exp
awk 'FNR==7 {print $4}' $file >>exp
awk 'FNR==8 {print $4}' $file >>exp
awk 'FNR==24 {print $0}' $file >>exp
done
My output is:
XV
AMPY
BHZ
2012-08-15T08:00:00
2013-12-31T23:59:59
I want output should be
XV AMPY BHZ 2012-08-15T08:00:00 2013-12-31T23:59:59
First the test data (only 9 rows, tho):
$ cat file
1 2 3 14
1 2 3 24
1 2 3 34
1 2 3 44
1 2 3 54
1 2 3 64
1 2 3 74
1 2 3 84
1 2 3 94
Then the awk. No need for that for loop in shell, awk can handle multiple files:
$ awk '
BEGIN {
ORS=" "
a[2];a[3];a[4];a[5];a[7];a[8] # list of records for which $4 should be outputed
}
FNR in a { print $4 } # output the $4s
FNR==9 { printf "%s\n",$0 } # replace 9 with 24
' file file # ... # the files you want to process (SAC*)
24 34 44 54 74 84 1 2 3 94
24 34 44 54 74 84 1 2 3 94

get subset of table based on unique column values

H- I am looking for a bash/awk/sed solution to get subsets of a table based on unique column values. For example if I have:
chrom1 333
chrom1 343
chrom2 380
chrom2 501
chrom1 342
chrom3 102
I want to be able to split this table into 3:
chrom1 333
chrom1 343
chrom1 342
chrom2 380
chrom2 501
chrom3 102
I know how to do this in R using the split command, but I am specifically looking for a bash/awk/sed solution.
Thanks
I don’t know if this awk is of any use but it will create 3 separate file based on the unique column values:
awk '{print >> $1; close($1)}' file
alternative awk which keeps the original order of records within each block
$ awk '{a[$1]=a[$1]?a[$1] ORS $0:$0}
END{for(k in a) print a[k] ORS ORS}' file
generates
chrom1 333
chrom1 343
chrom1 342
chrom2 380
chrom2 501
chrom3 102
there are 2 trailing empty lines at the end but not displayed in the formatted output.
Using sort and awk:
sort -k1,1 file | awk 'NR>1 && p != $1{print ORS} {p=$1} 1'
EDIT: If you want to keep original order of records from input file then use:
awk -v ORS='\n\n' '!($1 in a){a[$1]=$0; ind[++i]=$1; next}
{a[$1]=a[$1] RS $0}
END{for(k=1; k<=i; k++) print a[ind[k]]}' file
create input list file.txt
(
cat << EOF
chrom1 333
chrom1 343
chrom2 380
chrom2 501
chrom1 342
chrom3 102
EOF
) > file.txt
transfomation
cat file.txt | cut -d" " -f1 | sort -u | while read c
do
cat file.txt | grep "^$c" | sort
echo
done

retrieve and add two numbers of files

In my file I have following structure :-
A | 12 | 10
B | 90 | 112
C | 54 | 34
What I have to do is I have to add column 2 and column 3 and print the result with column 1.
output:-
A | 22
B | 202
C | 88
I retrieve the two columns but dont know how to add
What I did is :-
cut -d ' | ' -f3,5 myfile.txt
How to add those columns and display.
A Bash solution:
#!/bin/bash
while IFS="|" read f1 f2 f3
do
echo $f1 "|" $((f2+f3))
done < file
You can do this easily with awk.
awk '{print $1," | ",($3+$5)'} myfile.txt
wil work perhaps.
You can do this with awk:
awk 'BEGIN{FS="|"; OFS="| "} {print $1 OFS $2+$3}' input_filename
Input:
A | 12 | 10
B | 90 | 112
C | 54 | 34
Output:
A | 22
B | 202
C | 88
Explanation:
awk: invoke the awk tool
BEGIN{...}: do things before starting to read lines from the file
FS="|": FS stands for Field Separator. Think of it as the delimiter that separates each line of your file into fields
OFS="| ": OFS stands for Output Field Separator. Same idea as above, but for output. FS =/= OFS in this case due to formatting
{print $1 OFS $2+$3}: For each line that awk reads, print the first field (the letter), followed by a delimiter specified by OFS, then the sum of field 2 and field 3.
input_filename: awk accepts the input file name as an argument here.

search for a string , and add if it matches

I have a file that has 2 columns as given below....
101 6
102 23
103 45
109 36
101 42
108 21
102 24
109 67
and so on......
I want to write a script that adds the values from 2nd column if their corresponding first column matches
for example add all 2nd column values if it's 1st column is 101
add all 2nd column values if it's 1st colummn is 102
add all 2nd column values if it's 1st colummn is 103 and so on ...
i wrote my script like this , but i'm not getting the correct result
awk '{print $1}' data.txt > col1.txt
while read line
do
awk ' if [$1 == $line] sum+=$2; END {print "Sum for time stamp", $line"=", sum}; sum=0' data.txt
done < col1.txt
awk '{array[$1]+=$2} END { for (i in array) {print "Sum for time stamp",i,"=", array[i]}}' data.txt
Pure Bash :
declare -a sum
while read -a line ; do
(( sum[${line[0]}] += line[1] ))
done < "$infile"
for index in ${!sum[#]}; do
echo -e "$index ${sum[$index]}"
done
The output:
101 48
102 47
103 45
108 21
109 103

Resources