Awk print is not working inside bash shell script - bash

When I use AWK print command outside shell it is working perfectly. Below is content of the file (sample.txt) which is comma separated.
IROG,1245,OUTO,OTUG,USUK
After, executing below command outside shell I get IROG as output.
cat sample.txt | awk -F, '{print $1}' > data.txt
Below is inside the shell script
my $HOME ='home/tmp/stephen';
my $DATA ="$HOME/data.txt";
my $SAMPLE ="$HOME/sample.txt";
`cat $SAMPLE | awk -F, '{print $1}' > $DATA`;
But here i get the same content as in original file instead of 1st column.
output is IROG,1245,OUTO,OTUG,USUK
but I expect only IROG. Can someone advise where I am wrong here?

The $1 inside your backticks expression is being expanded by perl before being executed by the shell. Presumably it has no value, so your awk command is simply {print }, which prints the whole record. You should escape the $ to prevent this from happening:
`awk -F, '{print \$1}' "$SAMPLE" > "$DATA"`;
Note that I have quoted your variables and also removed your useless use of cat.
If you mean to use a shell script, as opposed to a perl one (which is what you've currently got), you can do this:
home=home/tmp/stephen
data="$home/data.txt"
sample="$home/sample.txt"
awk -F, '{print $1}' "$sample" > "$data"
In the shell, there must be no spaces in variable assignments. Also, it is considered bad practice to use UPPERCASE variable names, as you risk overwriting the ones used internally by the shell. Furthermore, it is considered good practice to use double quotes around variable expansions to prevent problems related to word splitting and glob expansion.
There are a few ways that you could trim the leading whitespace from your first field. One would be to use sub to remove it:
awk -F, '{sub(/^ */, ""); print $1}'
This removes any space characters from the start of the line. Again, remember to escape the $ if doing this within backticks in perl.

Related

Does awk support using a different character than ' in print?

Does awk support using a different delimiter character than ' in print? e.g. Instead of awk '{print $1}', something like awk -d # #{print $1}#
I was actually looking at the C source code and it's a pretty short program; is there an alternative version that allow that?
It can't: The ' isn't passed to awk; instead, it's understood by the shell itself. Thus, when you run awk '{print $1}', what you're actually calling at the OS level is something like:
/* this is C syntax, so the double-quotes are C quotes; only their contents are literal */
execvp("awk", { "awk", "{print $1}", NUL });
Notably, the single-quotes aren't there at all any more -- they were parsed out by the shell when it understood them as instructions for how it should break the command into an argument list.
To see it in another way, consider if you put your script in a separate file and called
awk -f my.awk
The contents of my.awk would simply be
{print $1}
not
'{print $1}'
The single quotes are only used by the shell to ensure that the script is passed literally to awk, rather than being subject to any particular shell processing that could change the script before awk could read it.

Why Does Running Awk With Double Quotes Break But Works With Single Quotes?

I noticed when running a command that this statement doesn't recognize the delimiter
awk -F',' "{print $4}" wtd.csv
However, this one does.
awk -F',' '{print $4}' wtd.csv
Any reason why? I'm sure this is part of some general bash rule I'm forgetting.
If you're using double quotes, $4 will get replaced by Bash (probably with the empty string). You'd need to escape the $ to use it in double quotes.
Example where this also is happening:
[thom#lethe ~]$ echo '$4'
$4
[thom#lethe ~]$ echo "$4"
[thom#lethe ~]$ echo "\$4"
$4
You are forgetting that double-quotes allow bash variable interpolation. In this case it tries to replace $4 with the fourth argument to the shell which is usually empty.
The single-quotes prevent bash interpolation and passes the literal $4 to awk.
You'll have identical results with:
awk -F',' '{print $4}' wtd.csv
awk -F',' "{print \$4}" wtd.csv

How to use awk variable in search?

How to use awk variable in search?
Name="jony"
awk -v name="$Name" '/name/ {print $0}' file
this will search for string name, not for $Name which is actually jony.
Correct, awk won't recogize variables in / /. You can do:
Name="jony"
awk -v name="$Name" '$0 ~ name' file
Since print is awk's default behavior we can avoid using it here.
Hope I understood problem correctly:
Why wont you try following one:
awk '/'"$Name"'/ { print } ' testfile
When writing an AWK one-liner, you could quote the script with either the single quotes or double quotes. In the latter case the shell does all the substitution directly so that you do not need to pass the variable into the script via -v option:
Name="jony"
awk "/$Name/" file
# this works. after shell has performed substitutions, the line looks like
awk "/jony/" file
[bad!] Or even without quotes if the name does not contain spaces:
awk /$Name/ file
All the simplicity vanishes as soon as you want to use $ in the script, including awk special variables that use $0, $1, etc, because you will have to escape the dollar sign to prevent shell variable expansion.
awk "/$Name/ {print \$0}"
In addition you will have to escape the double quotes to add literal text to the script. Looks clumsy:
awk "/$Name/ {print \"Found in: \" \$0}"
To crown it all, negating regular expression with double quotes will cause a shell error:
awk "!/$Name/"
#error> ... event not found ...
The error will happen if $Name itself contains ! sign. This makes using double quotes unreliable.
So, to be on the safe side, prefer single quotes :)

how to pre-construct awk statement to pass to awk on command line?

I have a shell script that constructs an awk program as a string then pass that string to awk. This is because I want to use values of shell variables in the awk program.
My code looks like this:
awk_prog="'{if (\$4~/$shell_var/) print \$1,\$2}'"
echo $awk_prog
awk $awk_prog $FILENAME
However, when I pass the string to awk, I always get the error:
'{if ($4~/regex/) print $1,$2}'
awk: '{if
awk: ^ invalid char ''' in expression
What does that error message mean? I tried the -F: switch but it does not help. How can I settle this issue?
Thank you.
This is caused by shell quoting. The following will work:
awk_prog="{ if (\$4 ~ /$shell_var/) print \$1, \$2 }"
echo "$awk_prog"
awk "$awk_prog" $FILENAME
When you run awk '{ print }' foo from the command line, the shell interprets and removes the quotes around the program so awk receives two arguments - the first is the program text and the second is the filename foo. Your example was sending awk the program text '{if ...}' which is invalid syntax as far as awk is concerned. The outer quotes should not be present.
In the snippet that I gave above, the shell uses the quotes in the awk_prog= line to group the contents of the string into a single value and then assigns it to the variable awk_prog. When it executes the awk "$awk_prog"... line, you have to quote the expansion of $awk_prog so awk receives the program text as a single argument.
There's another way to get your shell variable into awk -- use awk's -v option:
awk -v pattern="$shell_var" '$4 ~ pattern {print $1, $2}' "$FILENAME"
Use -v multiple times if you have several variables to pass to awk.
If you truly want to hold your awk program in a shell variable, build it up using printf:
awk_script="$( printf '$4 ~ /%s/ {print $1, $2}' "$shell_var" )"
awk "$awk_script" "$FILENAME"
Note the use of quotes in the printf command: single quotes around the template to protect the dollar signs you want awk to interpret, double quotes for shell variables.
Another (IMO simpler) solution which (I think) addresses what you are intuitively trying to do is simply to use eval. You want the shell to behave as if you had literally typed:
awk '{if ($4~/foo/) print $1,$2}' path
(where foo and path are the literal contents of $shell_var and $FILENAME). To make that happen, just slap an eval on the front of your last line (and perhaps quotes for good measure, but they aren't necessary in this case) so that your last line is:
eval "awk $awk_prog $FILENAME"

How do I print a field from a pipe-separated file?

I have a file with fields separated by pipe characters and I want to print only the second field. This attempt fails:
$ cat file | awk -F| '{print $2}'
awk: syntax error near line 1
awk: bailing out near line 1
bash: {print $2}: command not found
Is there a way to do this?
Or just use one command:
cut -d '|' -f FIELDNUMBER
The key point here is that the pipe character (|) must be escaped to the shell. Use "\|" or "'|'" to protect it from shell interpertation and allow it to be passed to awk on the command line.
Reading the comments I see that the original poster presents a simplified version of the original problem which involved filtering file before selecting and printing the fields. A pass through grep was used and the result piped into awk for field selection. That accounts for the wholly unnecessary cat file that appears in the question (it replaces the grep <pattern> file).
Fine, that will work. However, awk is largely a pattern matching tool on its own, and can be trusted to find and work on the matching lines without needing to invoke grep. Use something like:
awk -F\| '/<pattern>/{print $2;}{next;}' file
The /<pattern>/ bit tells awk to perform the action that follows on lines that match <pattern>.
The lost-looking {next;} is a default action skipping to the next line in the input. It does not seem to be necessary, but I have this habit from long ago...
The pipe character needs to be escaped so that the shell doesn't interpret it. A simple solution:
$ awk -F\| '{print $2}' file
Another choice would be to quote the character:
$ awk -F'|' '{print $2}' file
Another way using awk
awk 'BEGIN { FS = "|" } ; { print $2 }'
And 'file' contains no pipe symbols, so it prints nothing. You should either use 'cat file' or simply list the file after the awk program.

Resources