behavior of awk in read line - bash

$ cat file
11 asasaw121
12 saasks122
13 sasjaks22
$ cat no
while read line
do
var=$(awk '{print $1}' $line)
echo $var
done<file
$ cat yes
while read line
do
var=$(echo $line | awk '{print $1}')
echo $var
done<file
$ sh no
awk: can't open file 11
source line number 1
awk: can't open file 12
source line number 1
awk: can't open file 13
source line number 1
$ sh yes
11
12
13
Why doesn't the first one work? What does awk expect to find in $1 in it? I think understanding this will help me avoid numerous scripting problems.

awk always expects a file name as input
In following, $line is string not a file.
var=$(awk '{print $1}' $line)
You could say (Note double quotes around variable)
var=$(awk '{print $1}' <<<"$line")

Why doesn't the first one work?
Because of this line:
var=$(awk '{print $1}' $line)
Which assumes $line is a file.
You can make it:
var=$(echo "$line" | awk '{print $1}')
OR
var=$(awk '{print $1}' <<< "$line")

awk '{print $1}' $line
^^ awk expects to see a file path or list of file paths here
what it is getting from you is the actual file line
What you want to do is pipe the line into awk as you do in your second example.

You got the answers to your specific questions but I'm not sure it's clear that you would never actually do any of the above.
To print the first field from a file you'd either do this:
while IFS= read -r first rest
do
printf "%s\n" "$first"
done < file
or this:
awk '{print $1}' file
or this:
cut -d ' ' -f1 <file
The shell loop would NOT be recommended.

Related

How to grab fields in inverted commas

I have a text file which contains the following lines:
"user","password_last_changed","expires_in"
"jeffrey","2021-09-21 12:54:26","90 days"
"root","2021-09-21 11:06:57","0 days"
How can I grab two fields jeffrey and 90 days from inverted commas and save in a variable.
If awk is an option, you could save an array and then save the elements as individual variables.
$ IFS="\"" read -ra var <<< $(awk -F, '/jeffrey/{ print $1, $NF }' input_file)
$ $ var2="${var[3]}"
$ echo "$var2"
90 days
$ var1="${var[1]}"
$ echo "$var1"
jeffrey
while read -r line; do # read in line by line
name=$(echo $line | awk -F, ' { print $1} ' | sed 's/"//g') # grap first col and strip "
expire=$(echo $line | awk -F, ' { print $3} '| sed 's/"//g') # grap third col and strip "
echo "$name" "$expire" # do your business
done < yourfile.txt
IFS=","
arr=( $(cat txt | head -2 | tail -1 | cut -d, -f 1,3 | tr -d '"') )
echo "${arr[0]}"
echo "${arr[1]}"
The result is into an array, you can access to the elements by index.
May be this below method will help you using
sed and awk command
#!/bin/sh
username=$(sed -n '/jeffrey/p' demo.txt | awk -F',' '{print $1}')
echo "$username"
expires_in=$(sed -n '/jeffrey/p' demo.txt | awk -F',' '{print $3}')
echo "$expires_in"
Output :
jeffrey
90 days
Note :
This above method will work if their is only distinct username
As far i know username are not duplicate

shortening headers using awk

I have headers like
>XX|6226515|new|xx_000000.1| XXXXXXX
in a text file which I am trying shorten to
>XX6226515
using awk. I tried
awk -F"|" '/>/{$0=">"$1}1' input.txt > output.txt
but it yields the following instead
>XX|6226515|new|
awk -F"|" '{print $1$2}' input.txt > output.txt
Output:
>XX6226515
sed solution:
sed -e 's/|//' -e 's/|.*//'
The first substitution removes the first vertical bar, the second one removes the second one and anything after it.
$ awk -F'|' '$0=$1$2' <<< ">XX|6226515|new|xx_000000.1| XXXXXXX"
>XX6226515
This cut can also make it:
cut -d"|" --output-delimiter="" -f-2
See output:
$ echo ">XX|6226515|new|xx_000000.1| XXXXXXX" | cut -d"|" --output-delimiter="" -f-2
>XX6226515
-d"|" sets | as field delimiter.
--output-delimiter="" indicates that the output delimiter has to be empty.
-f-2 indicates that it has to print all records up to the 2nd (inclusive).
Also with just bash:
while IFS="|" read a b _
do
echo "$a$b"
done <<< ">XX|6226515|new|xx_000000.1| XXXXXXX"
See output:
$ while IFS="|" read a b _; do echo "$a$b"; done <<< ">XX|6226515|new|xx_000000.1| XXXXXXX"
>XX6226515

get the file name from the path

I have a file file.txt having the following structure:-
./a/b/c/sdsd.c
./sdf/sdf/wer/saf/poi.c
./asd/wer/asdf/kljl.c
./wer/asdfo/wer/asf/asdf/hj.c
How can I get only the c file names from the path.
i.e., my output will be
sdsd.c
poi.c
kljl.c
hj.c
You can do this simpy with using awk.
set field seperator FS="/" and $NF will print the last field of every record.
awk 'BEGIN{FS="/"} {print $NF}' file.txt
or
awk -F/ '{print $NF}' file.txt
Or, you can do with cut and unix command rev like this
rev file.txt | cut -d '/' -f1 | rev
You can use basename command:
basename /a/b/c/sdsd.c
will give you sdsd.c
For a list of files in file.txt, this will do:
while IFS= read -r line; do basename "$line"; done < file.txt
Using sed:
$ sed 's|.*/||g' file
sdsd.c
poi.c
kljl.c
hj.c
The most simple one ($NF is the last column of current line):
awk -F/ '{print $NF}' file.txt
or using bash & parameter expansion:
while read file; do echo "${file##*/}"; done < file.txt
or bash with basename :
while read file; do basename "$file"; done < file.txt
OUTPUT
sdsd.c
poi.c
kljl.c
hj.c
Perl solution:
perl -F/ -ane 'print $F[#F-1]' your_file
Also you can use sed:
sed 's/.*[/]//g' your_file

Setting multiple field to awk variables at once

I am trying to set an awk variable field to several field at once.
Right now I can only set the variables one by one.
for line in `cat file.txt`;do
var1=`echo $line | awk -F, '{print $1}'`
var2=`echo $line | awk -F, '{print $2}'`
var3=`echo $line | awk -F, '{print $3}'`
#Some complex code....
done
I think this is costly cause it parses the linux variable several times. Is there a special syntax to set the variable at once? I know that awk has a BEGIN and END block but the reason I am trying to avoid the BEGIN and END block is to avoid nested awk.
I plan to place another loop and awk code in the #Some complex code.... part.
for line in `cat file.txt`;do
var1=`echo $line | awk -F, '{print $1}'`
var2=`echo $line | awk -F, '{print $2}'`
var3=`echo $line | awk -F, '{print $3}'`
for line2 in `cat file_old.txt`;do
vara=`echo $line2 | awk -F, '{print $1}'`
varb=`echo $line2 | awk -F, '{print $2}'`
# Do comparison of $var1,var2 and $vara,$varb , then do something with either
done
done
You can use the IFS internal field separator to use a comma (instead of whitespace) and do the assignments in a while loop:
SAVEIFS=$IFS;
IFS=',';
while read line; do
set -- $line;
var1=$1;
var2=$2;
var3=$3;
...
done < file.txt
IFS=$SAVEIFS;
This will save a copy of your current IFS, change it to a , character, and then iterate over each line in your file. The line set -- $line; will convert each word (separated by a comma) into a numeric-variable ($1, $2, etc.). You can either use these variables directly, or assign them to other (more meaningful) variable names.
Alternatively, you could use IFS with the answer provided by William:
IFS=',';
while read var1 var2 var3; do
...
done < file.txt
They are functionally identical and it just comes down to whether or not you want to explicitly set var1=$1 or have it defined in the while-loop's head.
Why are you using awk at all?
while IFS=, read var1 var2 var3; do
...
done < file.txt
#!/bin/bash
FILE="/tmp/values.txt"
function parse_csv() {
local lines=$lines;
> $FILE
OLDIFS=$IFS;
IFS=","
i=0
for val in ${lines}
do
i=$((++i))
eval var${i}="${val}"
done
IFS=$OLDIFS;
for ((j=1;j<=i;++j))
do
name="var${j}"
echo ${!name} >> $FILE
done
}
for lines in `cat file_old.txt`;do
parse_csv;
done
The problem you have described has only got 3 values, would there be a chance that 3 values may differ and be 4 or 5 or undefined ?
if so the above will parse through the csv line by line and output each value at a time on a new line in a file called /tmp/values.txt
feel free to modify to match your requirements its far more dynamic than defining 3 values

Why I can't split the string?

I want to read a file by shell script, and process it line by line. I would like to extract 2 fields from each line. Here is my code:
#!/bin/bsh
mlist=`ls *.log.2011-11-1* | grep -v error`
for log in $mlist
do
while read line
do
echo ${line} | awk -F"/" '{print $4}' #This produce nothing
echo ${line} #This work and print each line
done < $log | grep "java.lang.Exception"
done
This is a sample line from the input file:
<ERROR> LimitFilter.WebContainer : 4 11-14-2011 21:56:55 - java.lang.Exception: File - /AAA/BBB/CCC/DDDDDDDD.PDF does not exist
If I don't use bsh, I can use ksh, and the result is the same. We have no bash here.
It's because you are passing the output of your while loop through grep "java.lang.Exception".
The output of echo $line | awk -F"/" '{print $4}' is CCC. When this is piped through grep, nothing is printed because CCC does not match the search pattern.
Try removing | grep "java.lang.Exception" and you will see the output of your loop come out correctly.
An alternative approach to take might be to remove the while loop and instead just use:
grep "java.lang.Exception" $log | awk -F"/" '{print $4}'

Resources