Unable to get second column using awk - shell

I have a file that contains three columns separated by four spaces
1234 567 q
1902 190 r
I'm trying to get the second column by searching for the first column string
i=`grep $str $file | awk -F "[ ]" '{print $2 }'`
j=`grep $str $file | awk -F "[ ]" '{print $3 }'`
echo second_col=$i
echo third_col=$j
I modified the file and used tab and comma as separators but I'm still unable to print the second or third column values for a particular string.
What am I doing wrong?

I'm trying to get the second column by searching for the first column string
If you don't have spaces in your columns then you can just use awk for this:
awk -v str="$str" '$1 ~ str { print $2 }' "$file"
awk automatically splits fields on whitespaces.
In case you have spaces in your column value then use:
awk -F ' {4}' -v str="$str" '$1 ~ str { print $2 }' "$file"
' {4}' is a regex to make 4 spaces a input field separator.
Reference: Effective AWK Programming

if you have a broken awk try this solution with sed
sed -nE 's/^1234\s+(\S+).*/\1/p'
find the pattern at the beginning of the line and print the next non-space field. If your fields include spaces this approach is not going to work.

Related

AWK -F with print all but last record

/Home/in/test_file.txt
echo /Home/in/test_file.txt | awk -F'/' '{ print $2,$3 }'
Gives the result as:
Home in
But I need /Home/in/ as the result .I have to get all except test_file.txt
How to achieve this?
$ echo '/Home/in/test_file.txt' | awk '{sub("/[^/]+$","")} 1'
/Home/in
$ echo '/Home/in/test_file.txt' | awk '{sub("[^/]+$","")} 1'
/Home/in/
$ echo '/Home/in/test_file.txt' | sed 's:/[^/]*$::'
/Home/in
$ echo '/Home/in/test_file.txt' | sed 's:[^/]*$::'
/Home/in/
$ dirname '/Home/in/test_file.txt'
/Home/in
Your attempt awk -F'/' '{ print $2,$3 }' didn't do what you wanted as -F'/' is telling awk to split the input into fields at every / and then print $2,$3 is telling awk to print the 2nd and 3rd fields separated by a blank char (the default value for OFS). You could do:
$ echo '/Home/in/test_file.txt' | awk 'BEGIN{FS=OFS="/"} { print "",$2,$3,"" }'
/Home/in/
to get the expected output but it'd be the wrong approach since it's removing the field you don't want AND removing the input separators AND then adding new output separators which happen to the have the same value as the input separators rather than simply removing the field you don't want like the other solutions above do.
echo /Home/in/test_file.txt | awk -F'/[^/]*$' '{ print $1 }'
..will print the everything but the trailing slash
There are several ways to achieve this:
Using dirname:
$ dirname /home/in/test_file.txt
/home/in
Using Shell substitution:
$ var="/home/in/test_file.txt"
$ echo "${var%/*}"
/home/in
Using sed: (See Ed Morton)
Using AWK:
$ echo "/home/in/test_file.txt" | awk -F'/' '{OFS=FS;$NF=""}1'
/home/in/
Remark: all these work since you can't have a filename with a forward slash (Is it possible to use "/" in a filename?)
Note: all but dirname will fail if you just have a single file_name without a path. While dirname foo will return ./ all others will return foo
awk behaves as it should.
When you define slash / as a separator, the fields in your expression become the content between the separators.
If you need the separator to be printed as well, you need to do it explicitly, like:
echo /Home/in/test_file.txt | awk -F'/' '{ printf "%s/%s/",$2,$3 }'
replace your last field with an empty string and
put the slash back in as the (builtin) Output Field Separator (OFS)
echo /Home/in/test_file.txt | awk -F'/' -vOFS='/' '{$NF="";print}

How do I check for blank fields on a delimited line with sed or awk

I'm parsing source input files using a bash script. I'm generating delimited output in a file. I need a way to check that each field of the delimited output is populated. For example AA,BB,3,4,5,6,7,8 would be good and AA,,3,4,5,6,,8 would be bad. How do I check if there are blank fields on a line using sed/awk or some other tool I can put in a bash script? Thanks in advance!
With bash:
string='AA,,3,4,5,6,,8'
if [[ $string =~ ^,|,,|,$ ]]; then
echo "error"
else
echo "okay"
fi
Output:
error
You can print the lines with at least one empty field using:
awk -F, '{for (i=1;i<=NF;i++) if ($i=="") {print; next}}'
-F, sets the field delimiter as ,
for (i=1;i<=NF;i++) iterates over the fields
if ($i=="") {print; next} prints the record if the field being tested is empty and goes to the next record
Example:
% cat file.txt
AA,BB,3,4,5,6,7,8
AA,,3,4,5,6,,8
% awk -F, '{for (i=1;i<=NF;i++) if ($i=="") {print; next}}' file.txt
AA,,3,4,5,6,,8
You can test with a regular expression with a repeating group that fits with your requirement:
grep -E '^([^,]+,)*[^,]$' <<< "${AA,,3,4,5,6,,8}"
Testcode:
for str in "AA,BB,3,4,5,6,7,8" "AA,,3,4,5,6,,8" ; do
echo "==========="
echo "Testing >>>${str}<<<"
grep -Eq '^([^,]+,)*[^,]$' <<< "${str}" || echo "String incorrect"
done
You can grep the incorrect lines from a file using
grep -vE '^([^,]+,)*[^,]$' inputfile

Unix Shell command for removing each words between space and comma

I have a string variable as
columns = "name string,age int,address string,dob timestamp"
I want to remove the datatypes. ie I want to remove the words coming after space and a comma. The output should be as
name,age,address,dob
Assuming bash shell and extglob shell option is available - see pattern matching manual
$ columns='name string,age int,address string,dob timestamp'
$ echo "${columns// +([^,])/}"
name,age,address,dob
With sed
$ echo "$columns" | sed 's/ [^,]*//g'
name,age,address,dob
With awk to process fields separated by ,
$ echo "$columns" | awk -F, -v OFS="," '{for(i=1; i<=NF; i++){split($i,n," "); $i=n[1]}} 1'
name,age,address,dob
If all columns contain two words separated by space, one can use space or comma as delimiter and filter out unwanted fields
$ echo "$columns" | awk -F' |,' -v OFS=',' '{print $1,$3,$5,$7}'
name,age,address,dob

printing results in one line separated by commas in bash

How can I print all text file location separated by commas in one line? Can I do this in for loop?
Here is an example of files.
/data/home/files/txt_files_1/file1.txt
/data/home/files/txt_files_1/file2.txt
/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt
/data/home/files/txt_files_2/file2.txt
/data/home/files/txt_files_2/file3.txt
output would look like
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt \
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
Thanks
Here is the correct code
#!/bin/bash
delim=""
for i in /data/home/files/txt_files_1/file*
do
printf "%s%s" "$delim" "$i"
delim=","
done
printf "\\"
printf "\n"
for i in /data/home/files/txt_files_2/file*
do
printf "%s%s" "$delim" "$i"
delim=","
done
For single file input:
awk -v OFS=, -v RS= 'NF { $1 = $1; print }' file
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
Or
awk -v OFS=, -v RS= -v ORS='\n\n' 'NF { $1 = $1; print }' file
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
You can use printf "%s," "$file" to print several names into a single line. To get the delimiters right, I use this trick:
delim=""
...loop...
printf "%s%s" "$delim" "$file"
delim=","
printf "\n"
<command to generate lines of paths> | tr '\n' ','
example:
echo "/data/home/files/txt_files_1/file1.txt
/data/home/files/txt_files_1/file2.txt
/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt
/data/home/files/txt_files_2/file2.txt" | tr '\n' ','
outputs:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt,,/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt
Assuming your input is in a file called list, this Perl one-liner does the job:
perl -F'\n' -00 -ane 'push #a, join(",", #F) }{ print(join(" \\\n\n", #a), "\n")' list
explanation
-00, in combination with -n, reads the file one block (paragraph) at a time.
The -a switch in combination with -F'\n' auto-splits the text on each new line. The result goes into the array #F.
An array is built, each element containing the comma separated list of the elements in #F
Once the file has been processed, all the elements of the array #a are printed, joined together as you specified. The additional "\n" on the end is optional.
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt \
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt

When using awk to parse a CSV file, why does it ignore empty cells?

I have some scripts which use awk to parse a CSV file. I have noticed that, if a cell is empty, awk simply moves to the next cell. This means, if I ask it to read column 4, but that cell is empty, it prints the data from column 5, e.g.:
echo "1#2#3##5" | awk -F "#*" '{print $4}'
My expected result is that it will print nothing, because column 4 is empty.
Why is awk skipping column 4?
How can I get awk to not ignore empty columns?
The problem is not what you think. awk is not ignoring empty cells; it is parsing that line as 4 fields instead of 5.
[me#home]$ echo "1#2#3##5" | awk -F "#*" '{print NF}'
4
That's becuase you're using #* as your field separator which allows one or more consecutive # as your field separator (#, ##, ###, ... are all valid field separators).
Try using -F "#" instead.
[me#home]$ echo "1#2#3##5" | awk -F "#" '{print NF}'
5
[me#home]$ echo "1#2#3##5" | awk -F "#" '{print $4}'
[me#home]$ echo "1#2#3##5" | awk -F "#" '{print $5}'
5

Resources