Passing two awk columns into a while read command - bash

SampleFile:
Two Words,Extra
My code:
cat SampleFile | awk -F "," '{print $1" "$2}' | while read var1 var2;
do
echo $var1
done
This will print out only Two and var2 will take Words. Is there a way so that I can pass Two Words into var1?

You don't have to use awk for this. Bash has a built-in variable to determine where words are split:
while IFS=, read -r var1 var2; do
echo "$var1"
done < SampleFile
IFS is set to ,, so word splitting takes place at commas.
Instead of piping to the while loop, I use redirection, which has the advantage of not spawning a subshell.
A remark: you don't need cat in this context. Awk can take a file name as an argument, so
cat SampleFile | awk -F "," '{print $1" "$2}'
becomes
awk -F "," '{print $1, $2}' SampleFile
Also, when using print, you don't need to explicitly introduce spaces: if you comma-separate your fields, awk will replace the comma by the value of the OFS (output field separator) variable, which defaults to a space.

...| while read var1; do echo $var1 done

Related

Assign bash value from value in specific line

I have a file that looks like:
>ref_frame=1
TPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSD
>ref_frame=2
HQGLDISTMCFHRDGKDHQQYSKVA*QKS*SLLENKIQT*LSINTWMICM*DLT
>ref_frame=3
TRD*ISVQCASTGMERITSNIPK*HDKNLRAF*KTKSRHSYLSIHG*FVCRI*
>test_3_2960_3_frame=1
TPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPSRKQNPDIVIYQYMDDLYVGSD
I want to assign a bash variable so that echo $variable gives test_3_2960
The line/row that I want to assign the variable to will always be line 7. How can I accomplish this using bash?
so far I have:
variable=`cat file.txt | awk 'NR==7'`
echo $variable = >test_3_2960_3_frame=1
Using sed
$ variable=$(sed -En '7s/>(([^_]*_){2}[0-9]+).*/\1/p' input_file)
$ echo "$variable"
test_3_2960
No pipes needed here...
$: variable=$(awk -F'[>_]' 'NR==7{ OFS="_"; print $2, $3, $4; exit; }' file)
$: echo $variable
test_3_2960
-F is using either > or _ as field separators, so your data starts in field 2.
OFS="_" sets the Output Field Separator, but you could also just use "_" instead of commas.
exit keeps it from wasting time bothering to read beyond line 7.
If you wish to continue with awk
$ variable=$(awk 'NR==7' file.txt | awk -F "[>_]" '{print $2"_"$3"_"$4}')
$ echo $variable
test_3_2960

Removing newlines in a txt file

I have a txt file in a format like this:
test1
test2
test3
How can I bring it into a format like this using bash?
test1,test2,test3
Assuming that “using Bash” means “without any external processes”:
if IFS= read -r line; then
printf '%s' "$line"
while IFS= read -r line; do
printf ',%s' "$line"
done
echo
fi
Old answer here
TL;DR:
cat "export.txt" | paste -sd ","
Another pure bash implementation that avoids explicit loops:
#!/usr/bin/env bash
file2csv() {
local -a lines
readarray -t lines <"$1"
local IFS=,
printf "%s\n" "${lines[*]}"
}
file2csv input.txt
You can use awk. If the file name is test.txt then
awk '{print $1}' ORS=',' test.txt | awk '{print substr($1, 1, length($1)-1)}'
The first awk commad joins the three lines with comma (test1,test2,test3,).
The second awk command just deletes the last comma from the string.
Use tool 'tr' (translate) and sed to remove last comma:
tr '\n' , < "$source_file" | sed 's/,$//'
If you want to save the output into a variable:
var="$( tr '\n' , < "$source_file" | sed 's/,$//' )"
Using sed:
$ sed ':a;N;$!ba;s/\n/,/g' file
Output:
test1,test2,test3
I think this is where I originally picked it up.
If you don't want a terminating newline:
$ awk '{printf "%s%s", sep, $0; sep=","}' file
test1,test2,test3
or if you do:
awk '{printf "%s%s", sep, $0; sep=","} END{print ""}' file
test1,test2,test3
Another loopless pure Bash solution:
contents=$(< input.txt)
printf '%s\n' "${contents//$'\n'/,}"
contents=$(< input.txt) is equivalent to contents=$(cat input.txt). It puts the contents of the input.txt file (with trailing newlines automatically removed) into the variable contents.
"${contents//$'\n'/,}" replaces all occurrences of the newline character ($'\n') in contents with the comma character. See Parameter expansion [Bash Hackers Wiki].
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why printf '%s\n' is used instead of echo.

Save output of awk to two different variables

Okay. I am kind of lost and google search isn't helping me much.
I have a command like:
filesize_filename=$(echo $line | awk ' ''{print $5":"$9}')
echo $filesize_filename
1024:/home/test
Now this one saves the two returns or awk'ed items into one variable. I'd like to achieve something like this:
filesize,filename=$(echo $line | awk ' ''{print $5":"$9}')
So I can access them individually like
echo $filesize
1024
echo $filename
/home/test
How to I achieve this?
Thanks.
Populate a shell array with the awk output and then do whatever you like with it:
$ fileInfo=( $(echo "foo 1024 bar /home/test" | awk '{print $2, $4}') )
$ echo "${fileInfo[0]}"
1024
$ echo "${fileInfo[1]}"
/home/test
If the file name can contain spaces then you'll have to adjust the FS and OFS in awk and the IFS in shell appropriately.
You may not need awk at all of course:
$ line="foo 1024 bar /home/test"
$ fileInfo=( $line )
$ echo "${fileInfo[1]}"
1024
$ echo "${fileInfo[3]}"
/home/test
but beware of globbing chars in $line matching on local file names in that last case. I expect there's a more robust way to populate a shell array from a shell variable but off the top of my head I can't think of it.
Use bash's read for that:
read size name < "$(awk '{print $5, $9}' <<< "$line")"
# Now you can output them separately
echo "$size"
echo "$name"
You can use process substitution on awk's output:
read filesize filename < <(echo "$line" | awk '{print $5,$9}')
You can totally avoid awk by doing:
read _ _ _ _ filesize _ _ _ filename _ <<< "$line"

Awk: Drop last record separator in one-liner

I have a simple command (part of a bash script) that I'm piping through awk but can't seem to suppress the final record separator without then piping to sed. (Yes, I have many choices and mine is sed.) Is there a simpler way without needing the last pipe?
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd \
| uniq | awk '{IRS="\n"; ORS=","; print}'| sed s/,$//);
Without the sed, this produces output like echo,sierra,victor, and I'm just trying to drop the last comma.
You don't need awk, try:
egrep -o ....uniq|paste -d, -s
Here is another example:
kent$ echo "a
b
c"|paste -d, -s
a,b,c
Also I think your chained command could be simplified. awk could do all things in an one-liner.
Instead of egrep, uniq, awk, sed etc, all this can be done in one single awk command:
awk -F":" '!($1 in a){l=l $1 ","; a[$1]} END{sub(/,$/, "", l); print l}' /etc/password
Here is a small and quite straightforward one-liner in awk that suppresses the final record separator:
echo -e "alpha\necho\nnovember" | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=","
Gives:
alpha,echo,november
So, your example becomes:
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd | uniq | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=",");
The benefit of using awk over paste or tr is that this also works with a multi-character ORS.
Since you tagged it bash here is one way of doing it:
#!/bin/bash
# Read the /etc/passwd file in to an array called names
while IFS=':' read -r name _; do
names+=("$name");
done < /etc/passwd
# Assign the content of the array to a variable
dolls=$( IFS=, ; echo "${names[*]}")
# Display the value of the variable
echo "$dolls"
echo "a
b
c" |
mawk 'NF-= _==$NF' FS='\n' OFS=, RS=
a,b,c

Setting multiple field to awk variables at once

I am trying to set an awk variable field to several field at once.
Right now I can only set the variables one by one.
for line in `cat file.txt`;do
var1=`echo $line | awk -F, '{print $1}'`
var2=`echo $line | awk -F, '{print $2}'`
var3=`echo $line | awk -F, '{print $3}'`
#Some complex code....
done
I think this is costly cause it parses the linux variable several times. Is there a special syntax to set the variable at once? I know that awk has a BEGIN and END block but the reason I am trying to avoid the BEGIN and END block is to avoid nested awk.
I plan to place another loop and awk code in the #Some complex code.... part.
for line in `cat file.txt`;do
var1=`echo $line | awk -F, '{print $1}'`
var2=`echo $line | awk -F, '{print $2}'`
var3=`echo $line | awk -F, '{print $3}'`
for line2 in `cat file_old.txt`;do
vara=`echo $line2 | awk -F, '{print $1}'`
varb=`echo $line2 | awk -F, '{print $2}'`
# Do comparison of $var1,var2 and $vara,$varb , then do something with either
done
done
You can use the IFS internal field separator to use a comma (instead of whitespace) and do the assignments in a while loop:
SAVEIFS=$IFS;
IFS=',';
while read line; do
set -- $line;
var1=$1;
var2=$2;
var3=$3;
...
done < file.txt
IFS=$SAVEIFS;
This will save a copy of your current IFS, change it to a , character, and then iterate over each line in your file. The line set -- $line; will convert each word (separated by a comma) into a numeric-variable ($1, $2, etc.). You can either use these variables directly, or assign them to other (more meaningful) variable names.
Alternatively, you could use IFS with the answer provided by William:
IFS=',';
while read var1 var2 var3; do
...
done < file.txt
They are functionally identical and it just comes down to whether or not you want to explicitly set var1=$1 or have it defined in the while-loop's head.
Why are you using awk at all?
while IFS=, read var1 var2 var3; do
...
done < file.txt
#!/bin/bash
FILE="/tmp/values.txt"
function parse_csv() {
local lines=$lines;
> $FILE
OLDIFS=$IFS;
IFS=","
i=0
for val in ${lines}
do
i=$((++i))
eval var${i}="${val}"
done
IFS=$OLDIFS;
for ((j=1;j<=i;++j))
do
name="var${j}"
echo ${!name} >> $FILE
done
}
for lines in `cat file_old.txt`;do
parse_csv;
done
The problem you have described has only got 3 values, would there be a chance that 3 values may differ and be 4 or 5 or undefined ?
if so the above will parse through the csv line by line and output each value at a time on a new line in a file called /tmp/values.txt
feel free to modify to match your requirements its far more dynamic than defining 3 values

Resources