awk: csv split works, but ignores the last field in the row - shell

I have a sample file that looks like:
Sample.csv
Data_1,0,289,292,293,300,306
Data_2,0,294,3,306
Data_3,0,294,305,306
Data_4,0,294,305,306
And Im running awk on it:
scr.sh:
awk -F ',' -v tId="$1" '{for(i=3; i<NF; i++){if($i==tId) print}}' $2
By calling
./scr.sh 300 Sample.csv
That works fine and returns me exactly one row that matches.
UK_4_AB34,0,289,292,293,300,306
Original Problem statement: From the 3rd column onwards, if any of the column data matches the number given, then the line should get printed.
But if I call:
./scr.sh 306 Sample.csv
That returns me NOTHING!
I've double checked the lines in Sample.csv and confirmed that there are NO trailing spaces on any of the lines.
Any clues? Thanks.

This awk will do what you're looking for:
awk -F ',' -v tId="$1" '$0 ~ "(^|,)" tId "(,|$)"' file
Alternatively this egrep will also do the job:
egrep '(^|,)306(,|$)' file
UPDATE: Based on your comments below you can use:
awk -v tId="$1" 'BEGIN{FS=OFS=","} {p=$0; $1=$2=""} $0 ~ "(^|,)" tId "(,|$)"{print p}' file

Here is a simple solution to your problem.
Lets say your argument is stored in a variable named var
ie var=$1;
Therefore run the following command to find the occurences in your file
grep -E "^${var},|,${var},|,${var}$" yourfilename

Related

How to replace only a column and for the rows contains specific values

I have the file with | delimited, and am trying to perform below logic
cat list.txt
101
102
103
LIST=`cat list.txt`
Input file
1|Anand|101|1001
2|Raj|103|1002
3|Unix|110|101
Expected result
1|Anand|101|1001
2|Raj|103|1002
3|Unix|UNKNOWN|101
I tried 2 methods,
using fgrep by passing list.txt as input and tried to segregate as 2 files. One matches the list and second not matching and post that non matching file using awk & gsub replacing the 3rd column with UNKNOWN, but issue here is in 3rd row 4th column contains the value available in list.txt, so not able to get expected result
Tried using one liner awk by passing list in -v VAR. Here no changes in the results.
awk -F"|" -v VAR="$LIST" '{if($3 !~ $VAR) {{{gsub(/.*/,"UNKNOWN", $3)1} else { print 0}' input_file
Can you please suggest me how to attain the expected results
There is no need to use cat to read complete file in a variable.
You may just use this awk:
awk 'BEGIN {FS=OFS="|"}
FNR==NR {a[$1]; next}
!($3 in a) {$3 = "UKNNOWN"} 1' list.txt input_file
1|Anand|101|1001
2|Raj|103|1002
3|Unix|UKNNOWN|101

Remove hyphen from duration format time

I need to remove hyphen from duration format time and i didn't succeed with sed command as i intended to do it.
original output:
00:0-26:0-8
00:0-28:0-30
00:0-28:0-4
00:0-28:0-28
00:0-27:0-54
00:0-27:0-19
Expected output:
00:26:08
00:28:30
00:28:04
00:28:28
00:27:54
00:27:19
I tried with command but i am stucked.
sed 's/;/ /g' temp_file.txt | awk '{print $8}' | grep - | sed 's/-//g;s/00:0/0:/g'
Using sed:
sed 's/\<[0-9]\>/0&/g;s/:00-/:/g' file
The first command s/\<[0-9]\>/0&/g is adding a zero to single digit numbers.
The second command s/:00-/:/g is removing the 0- in front of the number.
With your shown sample only, following awk may help you on same.
awk -F":" '{for(i=1;i<=NF;i++){sub(/0-/,"",$i);$i=length($i)==1?0$i:$i}} 1' OFS=":" Input_file
In case you want to save output into Input_file itself then append > temp_file && mv temp_file Input_file to above command too.
For the given example, this one-liner does the job:
awk -F':0-' '{printf "%02d:%02d:%02d\n",$1,$2,$3}' file
If I have the below output with two columns "duration time"? When I try to use one of your regexp above is adding me "0" for the first column duration time/timestamp and I dont want that, just the column $7 = duration_time separated by ; to be modified.
01;12May2018 8:20:36;192.168.1.111;78787;192.168.1.111;78787;80:25:0-49;2018-05-12_111111;RO
02;14May2018 2:43:16;192.168.1.132;78787;192.168.1.111;78787;36:10:0-10;2018-05-12_111111;RO
03;15May2018 7:40:01;192.168.131.1;78787;192.168.1.111;78787;18:39:0-44;2018-05-12_111111;RO
04;15May2018 12:37:46;192.168.1.201;78787;192.168.1.111;78787;12:51:0-14;2018-05-12_111111;RO
Here is the output:
root#root> sed 's/\<[0-9]\>/0&/g;s/:00-/:/g' temp_file
01;12May2018 08:20:36;192.168.01.111;78787;192.168.01.111;78787;80:25:49;2018-05-12_111111;RO
02;14May2018 02:43:16;192.168.01.132;78787;192.168.01.111;78787;36:10:10;2018-05-12_111111;RO
03;15May2018 07:40:01;192.168.131.01;78787;192.168.01.111;78787;18:39:44;2018-05-12_111111;RO
04;15May2018 12:37:46;192.168.01.201;78787;192.168.01.111;78787;12:51:14;2018-05-12_111111;RO

Using a value from stored in a different file awk

I have a value stored in a file named cutoff1
If I cat cutoff1 it will look like
0.34722
I want to use the value stored in cutoff1 inside an awk script. Something like following
awk '{ if ($1 >= 'cat cutoff1' print $1 }' hist1.dat >hist_oc1.dat
I think I am making some mistakes. If I do manually it will look like
awk '{ if ($1 >= 0.34722) print $1 }' hist1.dat >hist_oc1.dat
How can I use the value stored in cutoff1 file inside the above mentioned awk script?
The easiest ways to achieve this are
awk -v cutoff="$(cat cutoff1)" '($1 >= cutoff){print $1}' hist.dat
awk -v cutoff="$(< cutoff1)" '($1 >= cutoff){print $1}' hist.dat
or
awk '(NR==FNR){cutoff=$1;next}($1 >= cutoff){print $1}' cutoff1 hist.dat
or
awk '($1 >= cutoff){print $1}' cutoff="$(cat cutoff1)" hist.dat
awk '($1 >= cutoff){print $1}' cutoff="$(< cutoff1)" hist.dat
note: thanks to Glenn Jackman to point to :
man bash Command substitution: Bash performs the expansion by executing command and replacing the command substitution with the
standard output of the command, with any trailing newlines deleted.
Embedded newlines are not deleted, but they may be removed during word
splitting. The command substitution $(cat file) can be replaced by
the equivalent but faster $(< file).
since awk can read multiple files just add the filename before your data file and treat first line specially. No need for external variable declaration.
awk 'NR==1{cutoff=$1; next} $1>=cutoff{print $1}' cutoff data
PS Just noticed that it's similar to the #kvantour's second answer, but keepin it here as a different flavor.
You could use getline to read a value from another file at your convenience. First the main file to process:
$ cat > file
wait
wait
did you see that
nothing more to see here
And cutoff:
$ cat cutoff
0.34722
An wwk script that reads a line from cutoff when it meets the string see in a record:
$ awk '/see/{if((getline val < "cutoff") > 0) print val}1' file
wait
wait
0.34722
did you see that
nothing more to see here
Explained:
$ awk '
/see/ { # when string see is in the line
if((getline val < "cutoff") > 0) # read a value from cutoff if there are any available
print val # and output the value from cutoff
}1' file # output records from file
As there was only one value, it was printed only once even see was seen twice.

Extract the last three columns from a text file with awk

I have a .txt file like this:
ENST00000000442 64073050 64074640 64073208 64074651 ESRRA
ENST00000000233 127228399 127228552 ARF5
ENST00000003100 91763679 91763844 CYP51A1
I want to get only the last 3 columns of each line.
as you see some times there are some empty lines between 2 lines which must be ignored. here is the output that I want to make:
64073208 64074651 ESRRA
127228399 127228552 ARF5
91763679 91763844 CYP51A1
awk  '/a/ {print $1- "\t" $-2 "\t" $-3}'  file.txt.
it does not return what I want. do you know how to correct the command?
Following awk may help you in same.
awk 'NF{print $(NF-2),$(NF-1),$NF}' OFS="\t" Input_file
Output will be as follows.
64073208 64074651 ESRRA
127228399 127228552 ARF5
91763679 91763844 CYP51A1
EDIT: Adding explanation of command too now.(NOTE this following command is for only explanation purposes one should run above command only to get the results)
awk 'NF ###Checking here condition NF(where NF is a out of the box variable for awk which tells number of fields in a line of a Input_file which is being read).
###So checking here if a line is NOT NULL or having number of fields value, if yes then do following.
{
print $(NF-2),$(NF-1),$NF###Printing values of $(NF-2) which means 3rd last field from current line then $(NF-1) 2nd last field from line and $NF means last field of current line.
}
' OFS="\t" Input_file ###Setting OFS(output field separator) as TAB here and mentioning the Input_file here.
You can use sed too
sed -E '/^$/d;s/.*\t(([^\t]*[\t|$]){2})/\1/' infile
With some piping:
$ cat file | tr -s '\n' | rev | cut -f 1-3 | rev
64073208 64074651 ESRRA
127228399 127228552 ARF5
91763679 91763844 CYP51A1
First, cat the file to tr to squeeze out repeted \ns to get rid of empty lines. Then reverse the lines, cut the first three fields and reverse again. You could replace the useless cat with the first rev.

awk load one file into array, test against another file

I have two files:
seqs.fa:
>seq000007;size=72768;
ACTGTGAG
>seq000010;size=53132;
GTAAGATC
GAATTCTT
>seq00045;size=40321;
ACCCATTT
...
numbers.txt
72768
53132
my desired output would be the lines from the first file that match a number from the second file:
>seq000007;size=72768;
>seq000010;size=53132;
I attempted to use awk, but it only returns lines matching the first number:
awk -F"\n" -v RS=">" 'NR==FNR{for(i=1;i<=NF;i++) A[$i]; next} END {for (header in A) {if ( match(header,$1) ) {print header}}}' seqs.fa numbers.txt
seq000007;size=72768;
seq072768;size=1;
Why is awk only looping through the "header" array for the first line in numbers.txt? And, if this is an XY problem, is there a better way to accomplish this goal?
after fixing the typo in your numbers file
$ awk -F'=|;' 'NR==FNR{a[$1]; next}; $3 in a' numbers.txt seqs.fa
>seq000007;size=72768;
>seq000010;size=53132;
In this special case you can use GNU grep like this:
grep -F -f numbers.txt seqs.fa
The option -f filename uses all the patterns found in filename for the search. The options -F tells grep, that the patterns are simple fixed strings.

Resources