How not to get expanded variables in AWK - bash

Good day,
I was wondering how not to get expanded variables in AWK.
variable to pass:achi
But, when I try with:
awk -F, -v var1="achi" '$(NF-1)~var1' file
It just does not work. It prints all lines that match achi.
I'll appreciate some insights to do it properly.
Input
achi, francia
nachi, peru
universidad achi, japon
achito, suecia
Expected Output
achi, francia

You seem to be trying to test equivalence with the pattern matching operator ~. The proper operator to test equivalence is ==.
awk -F, -v var1="achi" '$(NF-1)==var1' file
If you are expecting more fields you should take into account that your values are separated with a comma and a space, this can be done using ", " as the field separator.
awk -F", " -v var1="achi" '$(NF-1)==var1'

Related

AWK match exact string inside square brackets

I have a file similar to the below-illustrated data.
https://www.test.example.com [503]
https://www.tst.example.com [403]
https://www.tt.example.com [302]
I want to fetch lines that match with the second column. For example, lines matching [403] should print only https://www.tst.example.com.
I tried escaping the square brackets with the below command, which gave me a warning.
$ awk -F "$2 == '\[403]\'" file.txt
awk: warning: escape sequence `\[' treated as plain `['
awk: warning: escape sequence `\'' treated as plain `''
You are mixing regular expressions and plain strings. [ is a regex special character, but you are not using a regex here, just a literal string comparison. You don't need any escaping at all (though you might want to reverse the usage of single and double quotes for simplicity, unless you are actually using Windows).
awk '$2 == "[403]"' file.txt
In basically all the Unix shells, the double quotes you used don't protect dollar signs, so $2 would be substituted by the shell, probably with nothing, or else with some unrelated string (whatever got passed in as the second command-line argument to the shell).
The -F option, if present, requires an argument; but based on your example data, the default field separator - any sequence of whitespace - should work fine. If you want to force it to e.g. a single space, try -F ' '.
Could you please try following, written and tested with shown samples in GNU awk.
awk -F'([[:space:]]*)?\\[|\\]([[:space:]]*)?' '$2=="403"{print $1}' Input_file
Explanation: Setting field separator as either spaces(optional)[ OR [spaces(optional) for all lines. Then checking if 2nd field is 403 then print the first field as per OP's request.
Will do what you want, with the benefit of allowing you to pass the desired code as an argument, rather than having it hardcoded into the awk script.
awk -v http_code=403 '$2 == "["http_code"]"' file.txt

Unable to remove last field CSV file

i have csv file contains data like, I need to get all fields as it is except last one.
"one","two","this has comment section1"
"one","two","this has comment section2 and ( anything ) can come here ( ok!!!"
gawk 'BEGIN {FS=",";OFS=","}{sub(FS $NF, x)}1'
gives error-
fatal: Unmatched ( or (:
I know if i remove '(' from second line solves the problem but i can not remove anything from comment section.
With any awk you could try:
awk 'BEGIN{FS=",";OFS=","}{$NF="";sub(/,$/,"")}1' Input_file
Or with GNU awk try:
awk 'BEGIN{FS=",";OFS=","}NF{--NF};1' Input_file
Since you mention that everything can come here, you might also have a line that looks like:
"one","two","comment with a , comma"
So it is a bit hard to just use the <comma>-character as a field separator.
The following two posts are now very handy:
What's the most robust way to efficiently parse CSV using awk?
[U&L] How to delete the last column of a file in Linux (Note: this is only for GNU awk)
Since you work with GNU awk, you can thus do any of the following two:
$ awk -v FPAT='[^,]*|"[^"]+"' -v OFS="," 'NF{NF--}1'
$ awk 'BEGIN{FPAT="[^,]*|\"[^\"]+\"";OFS=","}NF{NF--}1'
$ awk 'BEGIN{FPAT="[^,]*|\042[^\042]+\042";OFS=","}NF{NF--}1'
Why is your command failing: The sub(ere,repl,in) command of awk assumes that the first part ere is an extended regular expression. Hence, the bracket has a special meaning. If you want to replace fields which are known and unique, you should not use sub, but just redefine the field:
$ awk '{$NF=""}'
If you want to replace a string matching a field, you should do this:
s=$(number);while(i=index(s,$0)){$0=substr(1,i-1) "repl" substr(i+length(s),$0) }

Parameter expansion not working when used inside Awk on one of the column entries

System: Linux. Bash 4.
I have the following file, which will be read into a script as a variable:
/path/sample_A.bam A 1
/path/sample_B.bam B 1
/path/sample_C1.bam C 1
/path/sample_C2.bam C 2
I want to append "_string" at the end of the filename of the first column, but before the extension (.bam). It's a bit trickier because of containing the path at the beginning of the name.
Desired output:
/path/sample_A_string.bam A 1
/path/sample_B_string.bam B 1
/path/sample_C1_string.bam C 1
/path/sample_C2_string.bam C 2
My attempt:
I did the following script (I ran: bash script.sh):
List=${1};
awk -F'\t' -vOFS='\t' '{ $1 = "${1%.bam}" "_string.bam" }1' < ${List} ;
And its output was:
${1%.bam}_string.bam
${1%.bam}_string.bam
${1%.bam}_string.bam
${1%.bam}_string.bam
Problem:
I followed the idea of using awk for this substitution as in this thread https://unix.stackexchange.com/questions/148114/how-to-add-words-to-an-existing-column , but the parameter expansion of ${1%.bam} it's clearly not being recognised by AWK as I intend. Does someone know the correct syntax for that part of code? That part was meant to mean "all the first entry of the first column, except the last part of .bam". I used ${1%.bam} because it works in Bash, but AWK it's another language and probably this differs. Thank you!
Note that the paramter expansion you applied on $1 won't apply inside awk as the entire command
body of the awk command is passed in '..' which sends content literally without applying any
shell parsing. Hence the string "${1%.bam}" is passed as-is to the first column.
You can do this completely in Awk
awk -F'\t' 'BEGIN { OFS = FS }{ n=split($1, arr, "."); $1 = arr[1]"_string."arr[2] }1' file
The code basically splits the content of $1 with delimiter . into an array arr in the context of Awk. So the part of the string upto the first . is stored in arr[1] and the subsequent split fields are stored in the next array indices. We re-construct the filename of your choice by concatenating the array entries with the _string in the filename part without extension.
If I understood your requirement correctly, could you please try following.
val="_string"
awk -v value="$val" '{sub(".bam",value"&")} 1' Input_file
Brief explanation: -v value means passing shell variable named val value to awk variable variable here. Then using sub function of awk to substitute string .bam with string value along with .bam value which is denoted by & too. Then mentioning 1 means print edited/non-edtied line.
Why OP's attempt didn't work: Dear, OP. in awk we can't pass variables of shell directly without mentioning them in awk language. So what you are trying will NOT take it as an awk variable rather than it will take it as a string and printing it as it is. I have mentioned in my explanation above how to define shell variables in awk too.
NOTE: In case you have multiple occurences of .bam then please change sub to gsub in above code. Also in case your Input_file is TAB delmited then use awk -F'\t' in above code.
sed -i 's/\.bam/_string\.bam/g' myfile.txt
It's a single line with sed. Just replace the .bam with _string.bam
You can try this way with awk :
awk -v a='_string' 'BEGIN{FS=OFS="."}{$1=$1 a}1' infile

How to replace multiple " in between two, with help of sed or awk

In csv file If in between two, there are more than two " present then I want to replace them with only two " using shell script.
Example
If in csv file it is like, """any word"", it should get replaced with, "any word", or if it is like, [any number of "], it should get replaced with, "".
FYI: " this is double quote not two single quote.
and [] are not present actually in data , i gave it for understanding
awk solution:
sample testfile contents:
sdsdf,"""hello"",sdsdf
asdasd,[asdasd asdasd]",sdfsdf
sdf,"[asdasd]",asdasd
The job:
awk -F, '{ for(i=1;i<=NF;i++) if($i~/"{2,}/) gsub(/"+/,"\"",$i);
else if($i~/^[^"]*"{1,}[^"]*$/) $i="\"\""; }1' OFS=',' testfile
The output:
sdsdf,"hello",sdsdf
asdasd,"",sdfsdf
sdf,"[asdasd]",asdasd
Try to use Roman's file
awk -F, '{gsub(/"""hello""/,"\42hello\42",$2)gsub(/\[asdasd asdasd\]/,"\42")}1' OFS=, file
sdsdf,"hello",sdsdf
asdasd,"",sdfsdf
sdf,"[asdasd]",asdasd
Here's a sed solution, which as OP works between commas, but which doesn't work, if there are commas in between the quotation marks:
sed ':a;s/\(,"[^,"]*\|^"[^,"]*\)"\([^,]\)/\1\2/;ta' testfile
Using Roman's test file my output is:
sdsdf,"hello",sdsdf
asdasd,[asdasd asdasd]",sdfsdf
sdf,"[asdasd]",asdasd
Note that the second field of the second line is different in my version, as I'm not sure what behavior OP wants in that case or if fields like that even exist.

An error with asterisk in if statement

I'm having problem with the following code:
nawk -F "," '{if($2<=2)&&($9!=45)&&($11==2348*)) print $2}' abc12* | wc -l
The error is in ($11==2348*). I tried to put this number in variable x and do ($11==$x*).
if you mean a regex match change it to
$ awk -F, '$2<=2 && $9!=45 && $11~/^2348/ {c++; print $2} END{print c}' abc12*
note that you can incorporate line count in the script as well.
If you want equality check $11=="2348*" would do. Will check that the field is literally 2348* without any special meaning of *.
Looks like you intend to use regexp?
$11==2348*
should give you a syntax error as
2348*
is an incomplete multiplication.
For a regular expression match you would have to use
$11 ~ /2348*/
if you intend to have zero to man "8"s or
$11 ~ /2348.*/ or may be $11 ~ /2348[0-9]*/
if the intial intent is having any character or only digits after "2348"
i think your code would work just fine if you wouldnt have added one more ")" than expected. if you count them you have 7.... so this ($11==2348*)) should acctually be ($11==2348*)

Resources