Using AWK to read line from file and create a variable - bash

I have a text file with a list of filenames. I would like to create a variable from a specific line number using AWK. I get the correct output using:
awk "NR==\$Line" /myPath/fileList.txt
I want to assign this output to a variable and from documentation I found I expected the following to work:
INFILE=$(awk "NR==\$Line" /myPath/fileList.txt)
or
INFILE=`awk "NR==\$Line" /myPath/fileList.txt`
However,
echo "\$INFILE"
is blank. I am new to bash scripting and would appreciate any pointers.

The output of the AWK command is assigned to the variable. To see the contents of the variable, do this:
echo "$INFILE"
You should use single quotes for your AWK command so you don't have to escape the literal dollar sign (the literal string should be quoted, see below if you want to substitute a shell variable instead):
awk 'NR == "$Line"' /myPath/fileList.txt
The $() form is much preferred over the backtick form (I don't understand why you have the backticks escaped, by the way). Also, you should habitually use lowercase or mixed case variable names to avoid name collision with shell or environment variables.
infile=$(awk 'NR == "$Line"' /myPath/fileList.txt)
echo "$infile"
If your intention is that the value of a variable named $Line should be substituted rather than the literal string "$Line" being used, then you should use AWK's -v variable passing feature:
infile=$(awk -v "line=$Line" 'NR == line' /myPath/fileList.txt)

Don't mask the dollar sign.
wrong:
echo "\$INFILE"
right:
echo $INFILE
echo ${INFILE}
echo "$INFILE"
echo "${INFILE}"
The ${x} - construct is useful, if you like to glue texts together.
echo $INFILE0
will look for a Variable INFILE0. If $INFILE is 5, it will not produce "50".
echo ${INFILE}0
This will produce 50, if INFILE is 5.
The apostrophes are useful if you variable contains whitespace and for more or less unpredictable text.
If your rownumber is a parameter:
#!/bin/bash
Line=$1
INFILE=$(awk "NR==$Line" ./demo.txt)
echo "$INFILE"
If INFILE contains multiple spaces or tabs, echo $INFILE would condense them to single spaces, while "$INFILE" preserves them.

Related

echo inside a for loop lists files matching the output pattern

I have a problem with the following for loop:
X="*back* OLD"
for P in $X
do
echo "-$P"
done
I need it to output just:
-*back*
-OLD
However, it lists all files in the current directory matching the *back* pattern. For example it gives the following:
-backup.bkp
-backup_new.bkp
-backup_X
-OLD
How to force it to output the exact pattern?
Use an array, as unquoted parameter expansions are still subject to globbing.
X=( "*back*" OLD )
for P in "${X[#]}"; do
printf '%s\n' "$P"
done
(Use printf, as echo could try to interpret an argument as an option, for example, if you had n in the value of X.)
Use set -o noglob before your loop and set +o noglob after to disable and enable globbing.
To prevent filename expansion you could read in the string as a Here String.
To iterate over the items, you could turn them into lines using parameter expansion and read them linewise using read. In order to be able to put a - sign as the first character, use printf instead of echo.
X="*back* OLD"
while read -r x
do printf -- '-%s\n' "$x"
done <<< "${X/ /$'\n'}"
Another way could be to use tr to transform the string into lines, then use paste with the - sign as delimiter and "nothing" from /dev/null as first column.
X="*back* OLD"
tr ' ' '\n' <<< "$X" | paste -d- /dev/null -
Both should output:
-*back*
-OLD

POSIX/Bash pad variable with trailing newlines

I have a variable with some lines in it and I would like to pad it with a number of newlines defined in another variable. However it seems that the subshell may be stripping the trailing newlines. I cannot just use '\n' with echo -e as the lines may already contain escaped chars which need to be printed as is.
I have found I can print an arbitrary number of newlines using this.
n=5
yes '' | sed -n "1,${n}p;${n}q"
But if I run this in a subshell to store it in the variable, the subshell appears to strip the trailing newlines.
I can approximate the functionality but it's clumsy and due to the way I am using it I would much rather be able to just call echo "$var" or even use $var itself for things like string concatenation. This approximation runs into the same issue with subshells as soon as the last (filler) line of the variable is removed.
This is my approximation
n=5
var="test"
#I could also just set n=6
cmd="1,$((n+1))p;$((n+1))q"
var="$var$(yes '' | sed -n $cmd; echo .)"
#Now I can use it with
echo "$var" | head -n -1
Essentially I need a good way of appending a number of newlines to a variable which can then be printed with echo.
I would like to keep this POSIX compliant if at all possible but at this stage a bash solution would also be acceptable. I am also using this as part of a tool for which I have set a challenge of minimizing line and character count while maintaining readability. But I can work that out once I have a workable solution
Command substitutions with either $( ) or backticks will trim trailing newlines. So don't use them; use the shell's built-in string manipulation:
n=5
var="test"
while [ "$n" -gt 0 ]; do
var="$var
"
n=$((n-1))
done
Note that there must be nothing after the var="$var (before the newline), and nothing before the " on the next line (no indentation!).
A sequence of n newlines:
printf -v spaces "%*s" $n ""
newlines=${spaces// /$'\n'}

Assign part of a file name to bash variable?

I have a file and its name looks like:
12U12345._L001_R1_001.fastq.gz
I want to assign to a variable just the 12U12345 part.
So far I have:
variable=`basename $fastq | sed {s'/_S[0-9]*_L001_R1_001.fastq.gz//'}`
Note: $fastq is a variable with the full path to the file in it.
This solution currently returns the full file name, any ideas how to get this right?
Just use the built-in parameter expansion provided by the shell, instead of spawning a separate process
fastq="12U12345._L001_R1_001.fastq.gz"
printf '%s\n' "${fastq%%.*}"
12U12345
or use printf() itself to store to a new variable in one-shot
printf -v numericPart '%s' "${fastq%%.*}"
printf '%s\n' "${numericPart}"
Also bash has a built-in regular expression comparison operator, represented by =~ using which you could do
fastq="12U12345._L001_R1_001.fastq.gz"
regex='^([[:alnum:]]+)\.(.*)'
if [[ $fastq =~ $regex ]]; then
numericPart="${BASH_REMATCH[1]}"
printf '%s\n' "${numericPart}"
fi
You could use cut:
$> fastq="/path/to/12U12345._L001_R1_001.fastq.gz"
$> variable=$(basename "$fastq" | cut -d '.' -f 1)
$> echo "$variable"
12U12345
Also, please note that:
It's better to wrap your variable inside quotes. Otherwise you command won't work with filenames that contain space(s).
You should use $() instead of the backticks.
Using Bash Parameter Expansion to extract the basename and then extract the portion of the filename you want:
fastq="/path/to/12U12345._L001_R1_001.fastq.gz"
file="${fastq##*/}" # gives 12U12345._L001_R1_001.fastq.gz
string="${file%%.*}" # gives 12U12345
Note that Bash doesn't allow us to nest the parameter expansion. Otherwise, we could have combined statements 2 and 3 above.

How to do string separation using using keyword?

I have the STRING as given below. There is no specific separator between each key. The only way is to identify the keys is using the keyword "key_1" or "key_2" etc..
All keys begin with "key_" and can never appear in the value of another:
STRING="key_1=mislanious_string1 key_2=miscellaneous_string2"
I want the output as below.
echo $STRING1 should print:
key_1=mislanious_string1
echo $STRING2 should print:
key_2=mislanious_string2
e.g:
If STRING="key_1=foobarzkey_2=bash" , then the output should look like , STRING1=key_1=foobarz and STRING2=key_2=bash.
There may be more keys like key_1 , key_2 , key_3 etc. Each key starts with "key_" and can never appear in the value of another:
How to this in UNIX bash shell?
Using grep -P (PCRE) to support multiple key-value pairs in input:
STRING="key_1=mislanious_string1key_2=miscellaneous_string2key_3=fookey_4=BASH"
grep -oP 'key_[^=]+=.*?(?=key_|$)' <<< "$STRING"
key_1=mislanious_string1
key_2=miscellaneous_string2
key_3=foo
key_4=BASH
To store them into BASH array you can use:
read -d '' -ra arr < <(grep -oP 'key_[^=]+=.*?(?=key_|$)' <<< "$STRING")
printf "%s\n" "${arr[#]}"
key_1=mislanious_string1
key_2=miscellaneous_string2
key_3=foo
key_4=BASH
declare -p arr
declare -a arr='([0]="key_1=mislanious_string1" [1]="key_2=miscellaneous_string2" [2]="key_3=foo" [3]="key_4=BASH")'
UPDATE:: Here is a pure BASH (non-gnu) way of splitting these strings. We first insert an invisible character before every occurrence of key_ string and then use that for splitting the string:
STRING="key_1=mislanious_string1key_2=miscellaneous_string2key_3=fookey_4=BASH"
c=$'\x06'
s="${STRING//key_/${c}key_}"
arr=()
while [[ "$s" =~ ${c}(key_[^=]+=[^${c}]+)(.*) ]]; do
arr+=( "${BASH_REMATCH[1]}" )
s="${BASH_REMATCH[2]}"
done
Then to test:
printf "<%s>\n" "${arr[#]}"
<key_1=mislanious_string1>
<key_2=miscellaneous_string2>
<key_3=foo>
<key_4=BASH>
I like anubhava's grep -oP solution best. Here's an awk solution:
STRING="key_15=foobarzkey_3=bash"
awk -v RS="key_" 'NR>1{split($0, a, /=/); print "STRING" a[1] "=" RS $0}' <<< "$STRING"
STRING15=key_15=foobarz
STRING3=key_3=bash
So, to create that output as shell variables
eval $(awk -v RS="key_" 'NR>1{split($0, a, /=/); print "STRING" a[1] "=" RS $0}' <<< "$STRING")
echo $STRING3 # => key_3=bash
echo $STRING15 # => key_15=foobarz
This answer originally didn't recognize keys not preceded by whitespace. This has been fixed. In its current form this answer provides value as a portable solution. If you disagree, please let me know.
The answers provided by Glenn Jackman and anubhava are helpful, but use GNU extensions not available on all platforms (grep -P, awk with a multi-char. RS value).
Here's a POSIX-compliant sed solution that should work on most platforms, using either bash, ksh, or zsh as the shell:
str='key_1=mislanious_string1 key_2=miscellaneous_string2key_3=last'
while read -r varDef; do
[[ -n $varDef ]] && typeset "$varDef"
done < <(sed 's/\(key_\([0-9]\{1,\}\)=\)/\'$'\n''string\2=\1/g' <<<"$str")
#'# Print the variables created ($string1, $string2, $string3).
typeset -p ${!string#}
Note that lowercase variable names (string1, ...) are used so as to prevent potential conflicts with environment variables.
sed is used to split the string into key-value tokens each on their own line, preceded by the desired target variable name and =, effectively outputting shell variable assignments; e.g., for key_1, the sed command passes out:
string1=key_1=mislanious_string1 
The while loop then reads each output line and uses typeset to declare and assign the variable (note that typeset was chosen for ksh compatibility - while typeset also works in bash and zsh you'd typically use declare there); [[ -n $varDef ]] ignores the empty line that the sed output starts with.
Note: This solution trims trailing whitespace from values, consistent with the example in the question. This trimming happens due to use of read with the default $IFS value (internal field separators) - to preserve trailing whitespace, simply use IFS= read instead of just read.
Also note that use of process substitution to provide input (while ... <(sed ...)) (as opposed to a pipeline (sed ... | while ...) is required to ensure that the variables are defined in the current shell (rather than in a subshell, which would result in variables not visible to the current shell).
Some background info on what makes the above sed command POSIX-compliant:
POSIX only mandates basic regular expressions for sed, which takes away many features (e.g., quantifiers ? and +, alternation (|)) and makes escaping more cumbersome (e.g., ( and ) must be \-escaped).
POSIX sed also doesn't support escape sequences such as \n in replacement strings passed to s, so ANSI-C quoting is used to splice an \-escaped actual newline into the replacement string using $'\n'.
As an example of how useful the non-POSIX GNU sed extensions are, here's an equivalent command taking full advantage of GNU sed's features (extended regular expressions, support for \n), resulting in a shorter and more readable command:
sed -r 's/(key_([0-9]+)=)/\nstring\2=\1/g' <<<"$str"
Sometimes the simplest solution can be overlooked:
STRING="key_1=mislanious_string1key_2=miscellaneous_string2"
read STRING1 STRING2<<<${STRING//key_/ key_}
echo $STRING1
echo $STRING2

Saving backslash-escaped characters to variable in bash

I've just written a bash script that takes some info from the mysql database and reads it line by line, extracting tab-separated columns into separate variables, something like this:
oldifs=$IFS
result=result.txt
$mysql -e "SELECT id,foo,bar,baz FROM $db.$table" -u $user --password=$pass -h $server > $result
cat $result | grep -e ^[0-9].*$ | while IFS=$'\t' read id foo bar baz
do
# some code
done
IFS=$oldifs
Now, while this works OK and I'm satisfied with the result (especially since I'm going to move the query t oanother script and let cron regenerate the result.txt file contents once a week or so, since I'm dealing with a table that changes maybe once or twice a year), I'm curious about the possibility of putting the query's result in a variable instead of a file.
I have noticed that in order to echo out backslash-excaped characters, I need to tell the command explicitly to interpret such characters as special chars:
echo -e "some\tstring\n"
But, being a bash noob that I am, I have no idea how to place the backslash escaped characters (the tabs and newlines from the query) inside a variable and just work with it the same way I'm working with the external file (just changing the cat with echo -e). I tried this:
result=`$mysql -e "SELECT id,foo,bar,baz FROM $db.$table" -u $user --password=$pass -h $server`
but the backslash escaped characters are converted into spaces this way :(. How can I make it work?
To get the output of a command, use $(...). To avoid wordsplitting and other bash processing you will need to quote. Single quotes ('$(...)') will not work as the quoting is too strong.
Note that once the output is in your variable, you will probably need to (double) quote it wherever you use it if you need to preserve anything that's in $IFS.
$ listing="$(ls -l)"
$ echo "$listing"
Could you try to set double quotes around $result - thus echo -e "$result"?
% awk '/^[0-9]/ { print $2, $3, $4, $5 }' <<SQL | set -- -
> $("${mysql}" -e "SELECT id,foo,bar,baz FROM $db.$table" -u $user --password=$pass -h $server)
> SQL
% printf '%s\t' "${#}"
<id> <foo> <bar> <baz>
You might get some use out of this. The heredoc should obviate any escaping issues, awk will separate on tabs by default, and set accepts the input as a builtin argv array. printf isn't necessary, but it's better than echo - especially when working with escape characters.
You could also use read as you did above - but to better handle backslashes use the -r argument if you go that route. The above method would work best as a function and you could then iterate over your variables with shift and similar.
-Mike

Resources