When using awk to parse a CSV file, why does it ignore empty cells? - bash

I have some scripts which use awk to parse a CSV file. I have noticed that, if a cell is empty, awk simply moves to the next cell. This means, if I ask it to read column 4, but that cell is empty, it prints the data from column 5, e.g.:
echo "1#2#3##5" | awk -F "#*" '{print $4}'
My expected result is that it will print nothing, because column 4 is empty.
Why is awk skipping column 4?
How can I get awk to not ignore empty columns?

The problem is not what you think. awk is not ignoring empty cells; it is parsing that line as 4 fields instead of 5.
[me#home]$ echo "1#2#3##5" | awk -F "#*" '{print NF}'
4
That's becuase you're using #* as your field separator which allows one or more consecutive # as your field separator (#, ##, ###, ... are all valid field separators).
Try using -F "#" instead.
[me#home]$ echo "1#2#3##5" | awk -F "#" '{print NF}'
5
[me#home]$ echo "1#2#3##5" | awk -F "#" '{print $4}'
[me#home]$ echo "1#2#3##5" | awk -F "#" '{print $5}'
5

Related

xargs and cut: getting `cut` fields of a csv to bash variable

I am using xargs in conjuction with cut but I am unsure how to get the output of cut to a variable which I can pipe to use for further processing.
So, I have a text file like so:
test.txt:
/some/path/to/dir,filename.jpg
/some/path/to/dir2,filename2.jpg
...
I do this:
cat test.txt | xargs -L1 | cut -d, -f 1,2
/some/path/to/dir,filename.jpg
but what Id like to do is:
cat test.txt | xargs -L1 | cut -d, -f 1,2 | echo $1 $2
where $1 and $2 are /some/path/to/dir and filename.jpg
I am stumped that I cannot seem to able to achieve this..
You may want to say something like:
#!/bin/bash
while IFS=, read -r f1 f2; do
echo ./mypgm -i "$f1" -o "$f2"
done < test.txt
IFS=, read -r f1 f2 reads a line from test.txt one by one,
splits the line on a comma, then assigns the variables f1 and f2
to the fields.
The line echo .. is for the demonstration purpose. Replace the
line with your desired command using $f1 and $f2.
Try this:
cat test.txt | awk -F, '{print $1, $2}'
From man xargs:
xargs [-L number] [utility [argument ...]]
-L number
Call utility for every number non-empty lines read.
From man awk:
Awk scans each input file for lines that match any of a set of patterns specified literally in prog or in one or more files specified as -f progfile.
So you don't have to use xargs -L1 as you don't pass the utility to call.
Also from man awk:
The -F fs option defines the input field separator to be the regular expression fs.
So awk -F, can replace the cut -d, part.
The fields are denoted $1, $2, ..., while $0 refers to the entire line.
So $1 is for the first column, $2 is for the second one.
An action is a sequence of statements. A statement can be one of the following:
print [ expression-list ] [ > expression ]
An empty expression-list stands for $0.
The print statement prints its argument on the standard output (or on a file if > file or >> file is present or on a pipe if | cmd is present), separated by the current output field separator, and terminated by the output record separator.
Put all these together, cat test.txt | awk -F, '{print $1, $2}' would achieve that you want.

AWK -F with print all but last record

/Home/in/test_file.txt
echo /Home/in/test_file.txt | awk -F'/' '{ print $2,$3 }'
Gives the result as:
Home in
But I need /Home/in/ as the result .I have to get all except test_file.txt
How to achieve this?
$ echo '/Home/in/test_file.txt' | awk '{sub("/[^/]+$","")} 1'
/Home/in
$ echo '/Home/in/test_file.txt' | awk '{sub("[^/]+$","")} 1'
/Home/in/
$ echo '/Home/in/test_file.txt' | sed 's:/[^/]*$::'
/Home/in
$ echo '/Home/in/test_file.txt' | sed 's:[^/]*$::'
/Home/in/
$ dirname '/Home/in/test_file.txt'
/Home/in
Your attempt awk -F'/' '{ print $2,$3 }' didn't do what you wanted as -F'/' is telling awk to split the input into fields at every / and then print $2,$3 is telling awk to print the 2nd and 3rd fields separated by a blank char (the default value for OFS). You could do:
$ echo '/Home/in/test_file.txt' | awk 'BEGIN{FS=OFS="/"} { print "",$2,$3,"" }'
/Home/in/
to get the expected output but it'd be the wrong approach since it's removing the field you don't want AND removing the input separators AND then adding new output separators which happen to the have the same value as the input separators rather than simply removing the field you don't want like the other solutions above do.
echo /Home/in/test_file.txt | awk -F'/[^/]*$' '{ print $1 }'
..will print the everything but the trailing slash
There are several ways to achieve this:
Using dirname:
$ dirname /home/in/test_file.txt
/home/in
Using Shell substitution:
$ var="/home/in/test_file.txt"
$ echo "${var%/*}"
/home/in
Using sed: (See Ed Morton)
Using AWK:
$ echo "/home/in/test_file.txt" | awk -F'/' '{OFS=FS;$NF=""}1'
/home/in/
Remark: all these work since you can't have a filename with a forward slash (Is it possible to use "/" in a filename?)
Note: all but dirname will fail if you just have a single file_name without a path. While dirname foo will return ./ all others will return foo
awk behaves as it should.
When you define slash / as a separator, the fields in your expression become the content between the separators.
If you need the separator to be printed as well, you need to do it explicitly, like:
echo /Home/in/test_file.txt | awk -F'/' '{ printf "%s/%s/",$2,$3 }'
replace your last field with an empty string and
put the slash back in as the (builtin) Output Field Separator (OFS)
echo /Home/in/test_file.txt | awk -F'/' -vOFS='/' '{$NF="";print}

Unable to get second column using awk

I have a file that contains three columns separated by four spaces
1234 567 q
1902 190 r
I'm trying to get the second column by searching for the first column string
i=`grep $str $file | awk -F "[ ]" '{print $2 }'`
j=`grep $str $file | awk -F "[ ]" '{print $3 }'`
echo second_col=$i
echo third_col=$j
I modified the file and used tab and comma as separators but I'm still unable to print the second or third column values for a particular string.
What am I doing wrong?
I'm trying to get the second column by searching for the first column string
If you don't have spaces in your columns then you can just use awk for this:
awk -v str="$str" '$1 ~ str { print $2 }' "$file"
awk automatically splits fields on whitespaces.
In case you have spaces in your column value then use:
awk -F ' {4}' -v str="$str" '$1 ~ str { print $2 }' "$file"
' {4}' is a regex to make 4 spaces a input field separator.
Reference: Effective AWK Programming
if you have a broken awk try this solution with sed
sed -nE 's/^1234\s+(\S+).*/\1/p'
find the pattern at the beginning of the line and print the next non-space field. If your fields include spaces this approach is not going to work.

what does this bash script line of code mean

I am new to shell scripting and I found following line of code in a given script.
Could someone explain me with an example what the following line of code means
Path=`echo $line | awk -F '|' '{print $1}'`
echo $line will print the value of the variable $line, the | symbol means that the output of this will be passed (or piped) to another program/command/script. I will not attempt to explain awk here, but what is done above is that the output from the echo $line is taken and processed with it.
the option -FS as per awk man page means
-F fs Use fs for the input field separator
so the string after it will be used to split the input string given to awk into different fields. Example, you variable $line has a value of a|b it will be split into two fields a and b. What is to be done with this is specified within the '{}' expression.
Again, what can be done in there is next to infinite, here the only thing that is done is to print the first field which can be accessed with $1, or a in the above example ($2 would be b as can be guessed).
Finally, the output of this whole operation is then stored in the variable Path.
to summarize:
line="a|b"
echo $line | awk -F '|' '{print $1}'
> a
Path=`echo $line | awk -F '|' '{print $1}'`
echo $Path
> a
echo $line | awk -F '|' '{print $1}'
Explanation:
echo -> display a line of text
$line -> parameter expansion read the line
| -> A pipeline is a sequence of one or more commands separated by one of the control operators |
awk -> Invoke awk program
-F '|' -> Field separator as | for the data feed
'{print $1}' -> Print the first field
Example
echo 'a|b|c' | awk -F '|' '{print $1}'
will print a
I think this is just a complicated way to express
echo ${line%%|*}
i.e. write to stdout the part of the content of the variable line which goes up to - but not including - the first vertical bar.
Path=`echo $line | awk -F '|' '{print $1}'`
^ ^ ^ ^
| | | |
| | | print 1st column
| | |
| | input field separator
| |
| echo variable line
|
variable Path
-F'|' - by default awk splits record/line/row into columns by single space, but with |, awk splits by pipe
Above one can be written as
Path=$( awk -F '|' '{ print $1 }' <<< "$line" )
Suppose say
$ line="1|2|3"
$ Path=$( awk -F '|' '{ print $1 }' <<< "$line" )
$ echo $Path; # you get first column
1
Same as
$ Path=$( cut -d'|' -f1 <<< "$line" )
$ echo $Path;
1
the default field separator is ' ', if you have -F , means change default separator to '|'

How to: In bash print a value from a key/value pair

I need to print only the 900 in this line: auth required pam_faillock.so preauth silent deny=3 unlock_time=604800 fail_interval=900
However, this line will not always be in this order.
I need to find out how to print the value after the =.
I will need to do this for unlock_time and fail_interval
I have been searching all night for something that will work exactly for me and cannot find it. I have been toying around with sed and awk and have not nailed this down yet.
Let's define your string:
s='auth required pam_faillock.so preauth silent deny=3 unlock_time=604800 fail_interval=900'
Using awk:
$ printf %s "$s" | awk -F= '$1=="fail_interval"{print $2}' RS=' '
900
Or:
$ printf %s "$s" | awk -F= '$1=="unlock_time"{print $2}' RS=' '
604800
How it works
Awk divides its input into records. We tell it to use a space as the record separator. Each record is divided into fields. We tell awk to use = as the field separator. In more detail:
printf %s "$s"
This prints the string. printf is safer than echo in cases where the string might begin with -.
-F=
This tells awk to use = as the field separator.
$1=="fail_interval" {print $2}
If the first field is fail_interval, then we tell awk to print the second field.
RS=' '
This tells awk to use a space as the record separator.
You may use sed for this
Command
echo "...stuff.... unlock_time=604800 fail_interval=900" | sed -E '
s/^.*unlock_time=([[:digit:]]*).*fail_interval=([[:digit:]]*).*$/\1 \2/'
Output
604800 900
Notes
The (..) in sed is used for selections.
[[:digit:]]* or 0-9 is used to match any number of digits
The \1 and \2 is used to replace the matched stuff, in order.
Given an input variable:
input="auth required pam_faillock.so preauth silent deny=3 unlock_time=604800 fail_interval=900"
With GNU grep:
$ grep -oP 'fail_interval=\K([0-9]*)' <<< "$input"
900
$ grep -oP 'unlock_time=\K([0-9]*)' <<< "$input"
604800
Try using this.
unlock_time=$(echo "auth required pam_faillock.so preauth silent deny=3 unlock_time=604800 fail_interval=900" | awk -F'unlock_time=' '{print $2}' | awk '{print $1}')
echo "$unlock_time"
fail_interval=$(echo "auth required pam_faillock.so preauth silent deny=3 unlock_time=604800 fail_interval=900" | awk -F'fail_interval=' '{print $2}' | awk '{print $1}')
echo "$fail_interval"

Resources