What does the "done < $var" at the end of a loop do? - bash

Just a simple question - I'm wondering what the following code is doing:
nlwd="$PWD/NLWD.txt"
cat /dev/null > $nlwd
echo "Enter filename to process:"
read name
while read line
do
uid="$(echo $line | cut -d, -f1)"
echo "$uid" | grep [0-9] >> $nlwd
done < $name
In particular, I'm wondering what the done < $name is doing.

It's taking a file name, reading that file line-by-line, and doing stuff with each line.
< is an input redirect, which means that the loop is taking its input from $name.
For example:
while read LINE
do
echo $LINE
done < $name
...is essentially the same as:
cat $name
In response to your comment, the cat /dev/null > $nlwd just empties out the file's contents. This time, it uses the > output redirection to take the contents of /dev/null (which is Linux's black hole file), and outputs that emptiness into file represented by the $nlwd variable. Here's a simpler example:
$> echo "something" > something.txt
$> cat something.txt
something
$> cat /dev/null > something.txt
$> cat something.txt
$>
Further reading: http://en.wikipedia.org/wiki//dev/null

It's an input redirection. The while loop (and thus each command in the while loop, specifically read) will take its standard input from the file named by $name.

Related

How to iterate two variables in bash script?

I have these kind of files:
file6543_015.bam
subreadset_15.xml
file6543_024.bam
subreadset_24.xml
file6543_027.bam
subreadset_27.xml
I would like to run something like this:
for i in *bam && l in *xml
do
my_script $i $l > output_file
done
Because in my command the first bam file goes with the first xml file. For each combination bam/xml, that command will give a specific output file.
Like this, using bash arrays:
bam=( *.bam )
xml=( *.xml )
for ((i=0; i<${#bam[#]}; i++)); do
my_script "${bam[i]}" "${xml[i]}"
done
Assuming you have way to uniquely name your output_file for each specific output,
here is one way:
#!/bin/bash
ls file*.bam | while read i
do
CMD=`echo -n "my_script $i "`
CMD="$CMD `echo $i | sed -e 's/file.*_0/subreadset_/' -e 's/.bam/.xml/'`"
$CMD >> output_file
done

In a bash pipe, take the output of the previous command as a variable to the next command (Eg. if statement)

I wanted to write a command to compare the hash of a file. I wrote the below single line command. Wanted to understand as to how I can take the output of the previous command as a variable for the current command, in a pipe.
Eg. below command I wanted to compare the output of 1st command "Calculated hash" to the original hash. In the last command, I wanted to refer to the output of the previous command. How do I do that in the if statement? (Instead of $0)
sha256sum abc.txt | awk '{print $1}' | if [ "$0" = "8237491082roieuwr0r9812734iur" ]; then
echo "match"
fi
Following your narrow request looks like:
sha256sum abc.txt |
awk '{print $1}' |
if [ "$(cat)" = "8237491082roieuwr0r9812734iur" ]; then echo "match"; fi
...as cat with no arguments reads the command's stdin, and in a pipeline, content generated from prior stages are streamed into their successors.
Alternately:
sha256sum abc.txt |
awk '{print $1}' |
if read -r line && [ "$line" = "8237491082roieuwr0r9812734iur" ]; then echo "match"; fi
...wherein we read only a single line from stdin instead of using cat. (To instead loop over all lines given on stdin, see BashFAQ #1).
However, I would strongly suggest writing this instead as:
if [ "$(sha256sum abc.txt | awk '{print $1}')" = "8237491082roieuwr0r9812734iur" ]; then
echo "match"
fi
...which, among other things, keeps your logic outside the pipeline, so your if statement can set variables that remain set after the pipeline exits. See BashFAQ #24 for more details on the problems inherent in running code in pipelines.
Consider using sha256sum's check mode. If you save the output of sha256sum to a file, you can check it with sha256sum -c.
$ echo foo > file
$ sha256sum file > hash.txt
$ cat hash.txt
b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c file
$ sha256sum -c hash.txt
file: OK
$ if sha256sum -c --quiet hash.txt; then echo "match"; fi
If you don't want to save the hashes to a file you could pass them in via a here-string:
if sha256sum -c --quiet <<< 'b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c file'; then
echo "match"
fi

Read from a file and stdin in Bash

I would like to know if I can write a shell script that accepts two arguments simultaneously, one from a file and the another one from stdin. Could you give some example please?.
I trying
while read line
do
echo "$line"
done < "${1}" < "{/dev/stdin}"
But this does not work.
You can use cat - or cat /dev/stdin:
while read line; do
# your code
done < <(cat "$1" -)
or
while read line; do
# your code
done < <(cat "$1" /dev/stdin)
or, if you want to read from all files passed through command line as well as stdin, you could do this:
while read line; do
# your code
done < <(cat "$#" /dev/stdin)
See also:
How to read from a file or stdin in Bash?
This topic seems to be helpful here:
{ cat $1; cat; } | while read line
do
echo "$line"
done
Or just
cat $1
cat
if all you're doing is printing the content

Pipe input into a script

I have written a shell script in ksh to convert a CSV file into Spreadsheet XML file. It takes an existing CSV file (the path to which is a variable in the script), and then creates a new output file .xls. The script has no positional parameters. The file name of the CSV is currently hardcoded into the script.
I would like to amend the script so it can take the input CSV data from a pipe, and so that the .xls output data can also be piped or redirected (>) to a file on the command line.
How is this achieved?
I am struggling to find documentation on how to write a shell script to take input from a pipe. It appears that 'read' is only used for std input from kb.
Thanks.
Edit : script below for info (now amended to take input from a pipe via the cat, as per the answer to the question.
#!/bin/ksh
#Script to convert a .csv data to "Spreadsheet ML" XML format - the XML scheme for Excel 2003
#
# Take CSV data as standard input
# Out XLS data as standard output
#
DATE=`date +%Y%m%d`
#define tmp files
INPUT=tmp.csv
IN_FILE=in_file.csv
#take standard input and save as $INPUT (tmp.csv)
cat > $INPUT
#clean input data and save as $IN_FILE (in_file.csv)
grep '.' $INPUT | sed 's/ *,/,/g' | sed 's/, */,/g' > $IN_FILE
#delete original $INPUT file (tmp.csv)
rm $INPUT
#detect the number of columns and rows in the input file
ROWS=`wc -l < $IN_FILE | sed 's/ //g' `
COLS=`awk -F',' '{print NF; exit}' $IN_FILE`
#echo "Total columns is $COLS"
#echo "Total rows is $ROWS"
#create start of Excel File
echo "<?xml version=\"1.0\"?>
<?mso-application progid=\"Excel.Sheet\"?>
<Workbook xmlns=\"urn:schemas-microsoft-com:office:spreadsheet\"
xmlns:o=\"urn:schemas-microsoft-com:office:office\"
xmlns:x=\"urn:schemas-microsoft-com:office:excel\"
xmlns:ss=\"urn:schemas-microsoft-com:office:spreadsheet\"
xmlns:html=\"http://www.w3.org/TR/REC-html40\">
<DocumentProperties xmlns=\"urn:schemas-microsoft-com:office:office\">
<Author>Ben Hamilton</Author>
<LastAuthor>Ben Hamilton</LastAuthor>
<Created>${DATE}</Created>
<Company>MCC</Company>
<Version>10.2625</Version>
</DocumentProperties>
<ExcelWorkbook xmlns=\"urn:schemas-microsoft-com:office:excel\">
<WindowHeight>6135</WindowHeight>
<WindowWidth>8445</WindowWidth>
<WindowTopX>240</WindowTopX>
<WindowTopY>120</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID=\"Default\" ss:Name=\"Normal\">
<Alignment ss:Vertical=\"Bottom\" />
<Borders />
<Font />
<Interior />
<NumberFormat />
<Protection />
</Style>
<Style ss:ID=\"AcadDate\">
<NumberFormat ss:Format=\"Short Date\"/>
</Style>
</Styles>
<Worksheet ss:Name=\"Sheet 1\">
<Table>
<Column ss:AutoFitWidth=\"1\" />"
#for each row in turn, create the XML elements for row/column
r=1
while (( r <= $ROWS ))
do
echo "<Row>\n"
c=1
while (( c <= $COLS ))
do
DATA=`sed -n "${r}p" $IN_FILE | cut -d "," -f $c `
if [[ "${DATA}" == [0-9][0-9]\.[0-9][0-9]\.[0-9][0-9][0-9][0-9] ]]; then
DD=`echo $DATA | cut -d "." -f 1`
MM=`echo $DATA | cut -d "." -f 2`
YYYY=`echo $DATA | cut -d "." -f 3`
echo "<Cell ss:StyleID=\"AcadDate\"><Data ss:Type=\"DateTime\">${YYYY}-${MM}-${DD}T00:00:00.000</Data></Cell>"
else
echo "<Cell><Data ss:Type=\"String\">${DATA}</Data></Cell>"
fi
(( c+=1 ))
done
echo "</Row>"
(( r+=1 ))
done
echo "</Table>\n</Worksheet>\n</Workbook>"
rm $IN_FILE > /dev/null
exit 0
Commands inherit their standard input from the process that starts them. In your case, your script provides its standard input for each command that it runs. A simple example script:
#!/bin/bash
cat > foo.txt
Piping data into your shell script causes cat to read that data, since cat inherits its standard input from your script.
$ echo "Hello world" | myscript.sh
$ cat foo.txt
Hello world
The read command is provided by the shell for reading text from standard input into a shell variable if you don't have another command to read or process your script's standard input.
#!/bin/bash
read foo
echo "You entered '$foo'"
$ echo bob | myscript.sh
You entered 'bob'
There is one problem here. If you run the script without first checking to ensure there is input on stdin, then it will hang till something is typed.
So, to get around this, you can check to ensure there is stdin first, and if not, then use a command line argument instead if given.
Create a script called "testPipe.sh"
#!/bin/bash
# Check to see if a pipe exists on stdin.
if [ -p /dev/stdin ]; then
echo "Data was piped to this script!"
# If we want to read the input line by line
while IFS= read line; do
echo "Line: ${line}"
done
# Or if we want to simply grab all the data, we can simply use cat instead
# cat
else
echo "No input was found on stdin, skipping!"
# Checking to ensure a filename was specified and that it exists
if [ -f "$1" ]; then
echo "Filename specified: ${1}"
echo "Doing things now.."
else
echo "No input given!"
fi
fi
Then to test:
Let's add some stuff to a test.txt file and then pipe the output to our script.
printf "stuff\nmore stuff\n" > test.txt
cat test.txt | ./testPipe.sh
Output:
Data was piped to this script!
Line: stuff
Line: more stuff
Now let's test if not providing any input:
./testPipe.sh
Output:
No input was found on stdin, skipping!
No input given!
Now let's test if providing a valid filename:
./testPipe.sh test.txt
Output:
No input was found on stdin, skipping!
Filename specified: test.txt
Doing things now..
And finally, let's test using an invalid filename:
./testPipe.sh invalidFile.txt
Output:
No input was found on stdin, skipping!
No input given!
Explanation:
Programs like read and cat will use the stdin if it is available within the shell, otherwise they will wait for input.
Credit goes to Mike from this page in his answer showing how to check for stdin input: https://unix.stackexchange.com/questions/33049/check-if-pipe-is-empty-and-run-a-command-on-the-data-if-it-isnt?newreg=fb5b291531dd4100837b12bc1836456f
If the external program (that you are scripting) already takes input from stdin, your script does not need to do anything. For example, awk reads from stdin, so a short script to count words per line:
#!/bin/sh
awk '{print NF}'
Then
./myscript.sh <<END
one
one two
one two three
END
outputs
1
2
3

Skip line in text file which starts with '#' via KornShell (ksh)

I am trying to write a script which reads a text file and saves each line to a string. I would also like the script to skip any lines which start with a hash symbol. Any suggestions?
You should not leave skipping lines to ksh. E.g. do this:
grep -v '^#' INPUTFILE | while IFS="" read line ; do echo $line ; done
And instead of the echo part do whatever you want.
Or if ksh does not support this syntax:
grep -v '^#' INPUTFILE > tmpfile
while IFS="" read line ; do echo $line ; done < tmpfile
rm tmpfile
while read -r line; do
[[ "$line" = *( )#* ]] && continue
# do something with "$line"
done < filename
look for "File Name Patterns" or "File Name Generation" in the ksh man page.

Resources