Some tips to improve a bash script for count fastq files - bash

Hi guys I got this bash one line that i wish to make a script
for i in 'ls *.fastq.gz'; do echo $(zcat ${i} | wc -l)/4|bc; done
I would like to make it as a script to read from a data dir and print out the result with the name of the file.
I tried to put the dir in front of the 'data/*.fastq.gz' but got am error No such dir exist...
I would like some like this:
name1.fastq.gz 1898516
name2.fastq.gz 2467421
namen.fastq.gz 1234532
I am not experienced in bash.
Could you guys give a help?
Thanks

Take the dir as an argument, but default to the current dir if it's not set.
dir="${1-.}"
Then put it in the glob: "$dir"/*.fastq.gz
As well:
Quote variables and command expansions.
Don't parse ls.
Don't trust echo with arbitrary data (filenames). Use printf instead.
Use an end-of-options flag -- when giving filenames to commands.
I prefer to not have any inline command expansions, but that's just personal preference
Putting it together:
#!/bin/bash
dir="${1-.}"
for file in "$dir"/*.fastq.gz; do
printf '%s ' "$file"
lines="$(zcat -- "$file" | wc -l)"
bc <<< "$lines/4" # Using a here-string (Bash feature)
done

There is no need to escape to bc for integer math (divide by 4), or to use 'ls' to enumerate the files. The original version will do with minor changes:
#!/bin/bash
dir="${1-.}"
for i in "$dir"/*.fastq.gz; do
lines=$(zcat "${i}" | wc -l)
printf '%s %d\n' "$i" "$((lines/4))"
done

Related

How to recall a string in shell script

I made a script like this:
#! /usr/bin/bash
a=`ls ../wrfprd/wrfout_d0${i}* | cut -c22-25`
b=`ls ../wrfprd/wrfout_d0${i}* | cut -c27-28`
c=`ls ../wrfprd/wrfout_d0${i}* | cut -c30-31`
d=`ls ../wrfprd/wrfout_d0${i}* | cut -c33-34`
f=$a$b$c$d
echo $f
sed "s/.* startdate=.*/export startdate=${f}/g" ./post_process > post_process2
echo command works and gives 2008042118 that is what I want but in file post_process2 is like this export startdate= and can not recall variable f. I want to produce a line like export startdate=2008042118
First -- don't use ls here -- it's both expensive in terms of performance (compared to globbing, which is performed internal to the shell without starting any external programs), and doesn't guarantee useful output for the full range of possible filenames, making its use in this context inherently bug-prone. A better way to retrieve pieces from a filename, assuming a ksh-derived shell such as bash or zsh, would look like this:
#!/bin/bash
# this is an array, but we're only going to use the first element
file=( "../wrfprd/wrfout_d0${i}"* )
[[ -e $file ]] || { echo "No file found" >&2; exit 1; }
f=${file:22:4}${file:27:2}${file:30:2}${file:33:2}
Second, don't use sed to modify code -- doing so requires that your runtime user have permission to modify its own code, and moreover invites injection vulnerabilities. Just write your content out to a data file:
printf '%s\n' "$f" >startdate.txt
...and, in your second script, to read in the value from that file:
# if the shebang is #!/bin/bash
startdate=$(<startdate.txt)
# if the shebang is #!/bin/sh
startdate=$(cat startdate.txt)

list in script shell bash

I did this script
#!/bin/bash
liste=`ls -l`
for i in $liste
do
echo $i
done
The problem is I want the script displays each result line by line, but it displays word by word :
I have :
my_name
etud
4096
Oct
8
10:13
and I want to have :
my_name etud 4096 Oct 8 10:13
The final aim of the script is to analyze each line ; it is the reason I want to be able to recover the entire line. Maybe the list is not the best solution but I don't know how to recover the lines.
To start, we'll assume that none of your filenames ever contain newlines:
ls -l | IFS= while read -r line; do
echo "$line"
# Do whatever else you want with $line
done
If your filenames could contain newlines, things get tricky. In this case, it's better (although slower) to use stat to retrieve the desired metadata from each file individually. Consult man stat for details about how your local variety of stat works, as it is unfortunately not very standardized.
for f in *; do
line=$(stat -c "%U %n %s %y" "$f") # One possibility
# Work with $line as if it came from ls -l
done
You can replace
echo $i
with
echo -n "$i "
echo -n outputs to console without newline.
Another to do it with a while loop and without a pipe:
#!/bin/bash
while read line
do
echo "line: $line"
done < <(ls -l)
First, I hope that you aren't genuinely using ls in your real code, but only using it as an example. If you want a list of files, ls is the wrong tool; see http://mywiki.wooledge.org/ParsingLs for details.
Second, modern versions of bash have a builtin called readarray.
Try this:
readarray -t my_array < <(ls -l)
for entry in "${my_array[#]}"; do
read -a pieces <<<"$entry"
printf '<%s> ' "${pieces[#]}"; echo
done
First, it creates an array (called my_array) with all the output from the command being run.
Then, for each line in that output, it creates an array called pieces, and emits each piece with arrow brackets around them.
If you want to read a line at a time, rather than reading the entire file at once, see http://mywiki.wooledge.org/BashFAQ/001 ("How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?")
Joinning the previous answers with the need to store the list of files in a variable. You can do this
echo -n "$list"|while read -r lin
do
echo $lin
done

overwrite a file then append

I have a loop in my script that will append a list of email address's to a file "$CRN". If this script is executed again, it will append to this old list. I want it to overwrite with the new list rather then appending to the old list. I can submit my whole script if needed. I know I could test if "$CRN" exists then remove file, but I'm interested in some other suggestions? Thanks.
for arg in "$#"; do
if ls /students | grep -q "$arg"; then
echo "${arg}#mail.ccsf.edu">>$CRN
((students++))
elif ls /users | grep -q "$arg$"; then
echo "${arg}#ccsf.edu">>$CRN
((faculty++))
fi
Better do this :
CRN="/path/to/file"
:> "$CRN"
for arg; do
if printf '%s\n' /students/* | grep -q "$arg"; then
echo "${arg}#mail.ccsf.edu" >> "$CRN"
((students++))
elif printf '%s\n'/users/* | grep -q "${arg}$"; then
echo "${arg}#ccsf.edu" >> "$CRN"
((faculty++))
fi
done
don't parse ls output ! use bash glob instead. ls is a tool for interactively looking at file information. Its output is formatted for humans and will cause bugs in scripts. Use globs or find instead. Understand why: http://mywiki.wooledge.org/ParsingLs
"Double quote" every expansion, and anything that could contain a special character, eg. "$var", "$#", "${array[#]}", "$(command)". See http://mywiki.wooledge.org/Quotes http://mywiki.wooledge.org/Arguments and http://wiki.bash-hackers.org/syntax/words
take care to false positives like arg=foo and glob : foobar, that will match. You need grep -qw then if you want word boundaries. UP2U

Check execute command after cheking file type

I am working on a bash script which execute a command depending on the file type. I want to use the the "file" option and not the file extension to determine the type, but I am bloody new to this scripting stuff, so if someone can help me I would be very thankful! - Thanks!
Here the script I want to include the function:
#!/bin/bash
export PrintQueue="/root/xxx";
IFS=$'\n'
for PrintFile in $(/bin/ls -1 ${PrintQueue}) do
lpr -r ${PrintQueue}/${PrintFile};
done
The point is, all files which are PDFs should be printed with the lpr command, all others with ooffice -p
You are going through a lot of extra work. Here's the idiomatic code, I'll let the man page provide the explanation of the pieces:
#!/bin/sh
for path in /root/xxx/* ; do
case `file --brief $path` in
PDF*) cmd="lpr -r" ;;
*) cmd="ooffice -p" ;;
esac
eval $cmd \"$path\"
done
Some notable points:
using sh instead of bash increases portability and narrows the choices of how to do things
don't use ls when a glob pattern will do the same job with less hassle
the case statement has surprising power
First, two general shell programming issues:
Do not parse the output of ls. It's unreliable and completely useless. Use wildcards, they're easy and robust.
Always put double quotes around variable substitutions, e.g. "$PrintQueue/$PrintFile", not $PrintQueue/$PrintFile. If you leave the double quotes out, the shell performs wildcard expansion and word splitting on the value of the variable. Unless you know that's what you want, use double quotes. The same goes for command substitutions $(command).
Historically, implementations of file have had different output formats, intended for humans rather than parsing. Most modern implementations have an option to output a MIME type, which is easily parseable.
#!/bin/bash
print_queue="/root/xxx"
for file_to_print in "$print_queue"/*; do
case "$(file -i "$file_to_print")" in
application/pdf\;*|application/postscript\;*)
lpr -r "$file_to_print";;
application/vnd.oasis.opendocument.*)
ooffice -p "$file_to_print" &&
rm "$file_to_print";;
# and so on
*) echo 1>&2 "Warning: $file_to_print has an unrecognized format and was not printed";;
esac
done
#!/bin/bash
PRINTQ="/root/docs"
OLDIFS=$IFS
IFS=$(echo -en "\n\b")
for file in $(ls -1 $PRINTQ)
do
type=$(file --brief $file | awk '{print $1}')
if [ $type == "PDF" ]
then
echo "[*] printing $file with LPR"
lpr "$file"
else
echo "[*] printing $file with OPEN-OFFICE"
ooffice -p "$file"
fi
done
IFS=$OLDIFS

How can I use bash to parse out only a section of a variable with different delimiters?

I have a loop in a bash file to show me all of the files in a directory, each as its own variable. I need to take that variable (filename) and parse out only a section of it.
Example:
92378478234978ehbWHATIWANT#98712398712398723
Now, assuming "ehb" and the pound symbol never change, how can I just capture WHATIWANT into its own variable?
So far I have:
#!/bin/bash
for FILENAME in `dir -d *` ; do
done
You can use sed to edit out the parts you don't want.
want=$(echo "$FILENAME" | sed -e 's/.*ehb\(.*\)#.*/\1/')
Or you can use Bash's parameter expansion to strip out the tail and head.
want=${FILENAME%#*}; want=${want#*ehb}
One possibility:
for i in '92378478234978ehbWHATIWANT#98712398712398723' ; do
j=$(echo $i | sed -e 's/^.*ehb//' -e 's/#.*$//')
echo $j
done
produces:
WHATIWANT
using only the bash shell, no need external tools
$ string=92378478234978ehbWHATIWANT#98712398712398723
$ echo ${string#*ehb}
WHATIWANT#98712398712398723
$ string=${string#*ehb}
$ echo ${string%#*}
WHATIWANT

Resources