Calling bash script from bash script - bash

I have made two programms and I'm trying to call the one from the other but this is appearing on my screen:
cp: cannot stat ‘PerShip/.csv’: No such file or directory
cp: target ‘tmpship.csv’ is not a directory
I don't know what to do. Here are the programms. Could somebody help me please?
#!/bin/bash
shipname=$1
imo=$(grep "$shipname" shipsNAME-IMO.txt | cut -d "," -f 2)
cp PerShip/$imo'.csv' tmpship.csv
dist=$(octave -q ShipDistance.m 2>/dev/null)
grep "$shipname" shipsNAME-IMO.txt | cut -d "," -f 2 > IMO.txt
idnumber=$(cut -b 4-10 IMO.txt)
echo $idnumber,$dist
#!/bin/bash
rm -f shipsdist.csv
for ship in $(cat shipsNAME-IMO.txt | cut -d "," -f 1)
do
./FindShipDistance "$ship" >> shipsdist.csv
done
cat shipsdist.csv | sort | head -n 1

The code and error messages presented suggest that the second script is calling the first with an empty command-line argument. That would certainly happen if input file shipsNAME-IMO.txt contained any empty lines or otherwise any lines with an empty first field. An empty line at the beginning or end would do it.
I suggest
using the read command to read the data, and manipulating IFS to parse out comma-delimited fields
validating your inputs and other data early and often
making your scripts behave more pleasantly in the event of predictable failures
More generally, using internal Bash features instead of external programs where the former are reasonably natural.
For example:
#!/bin/bash
# Validate one command-line argument
[[ -n "$1" ]] || { echo empty ship name 1>&2; exit 1; }
# Read and validate an IMO corresponding to the argument
IFS=, read -r dummy imo tail < <(grep -F -- "$1" shipsNAME-IMO.txt)
[[ -f PerShip/"${imo}.csv" ]] || { echo no data for "'$imo'" 1>&2; exit 1; }
# Perform the distance calculation and output the result
cp PerShip/"${imo}.csv" tmpship.csv
dist=$(octave -q ShipDistance.m 2>/dev/null) ||
{ echo "failed to compute ship distance for '${imo}'" 2>&1; exit 1; }
echo "${imo:3:7},${dist}"
and
#!/bin/bash
# Note: the original shipsdist.csv will be clobbered
while IFS=, read -r ship tail; do
# Ignore any empty ship name, however it might arise
[[ -n "$ship" ]] && ./FindShipDistance "$ship"
done < shipsNAME-IMO.txt |
tee shipsdist.csv |
sort |
head -n 1
Note that making the while loop in the second script part of a pipeline will cause it to run in a subshell. That is sometimes a gotcha, but it won't cause any problem in this case.

Related

More "random" alternative to shuf for selecting files in a directory

I put together the following Bash function (in my .bashrc) to open a "random" image from a given folder, one at a time until the user types N, after which it exits. The script works fine aside from the actual randomness of the images generated - in a quick test of 10 runs, only 4 images are unique.
Is this simply unavoidable due to the limited number of images in the directory (20), or is there an alternative to the shuf command that will yield more random results?
If it is unavoidable, what's the best way to adapt the function to avoid repeats (i.e. discard images that have already been selected)?
function generate_image() {
while true; do
command cd "D:\Users\Hashim\Pictures\Data" &&
image="$(find . -type f -exec file --mime-type {} \+ | awk -F: '{if ($2 ~/image\//) print $1}' | shuf -n1)" &&
echo "Opening $image" &&
cygstart "$image"
read -p "Open another random image? [Y/n]"$'\n' -n 1 -r
echo
if [[ $REPLY =~ ^[Nn]$ ]]
then exit
fi
done
}
One way to handle this is by searching the filesystem and creating an array with a list of files in randomized order, and going through everything in that list before searching again.
Because you go through everything from one batch of shuf output before starting the next batch of shuf output, there's no longer a risk of repeats until everything has been seen.
refresh_image_list() {
# respect prior image_dir value if set before the function is called
image_dir=${image_dir:-'D:/Users/Hashim/Pictures/Data'}
readarray -d '' image_list < <(
find "$image_dir" -type f -exec file -0 --mime-type -- {} + \
| while IFS= read -r -d '' filename && IFS= read -r desc; do
[[ $desc = *image* ]] && printf '%s\0' "$filename"
done \
| shuf -z
)
}
generate_image() {
while true; do
(( ${#image_list[#]} )) || refresh_image_list # if list is empty, recreate
set -- "${image_list[#]}" # set argument list from image list
while (( $# )); do # argument list isn't empty?
echo "Opening $1" # ...try the first item on it
cygstart "$1"
shift # ...and then discard that item
read -p $'Open another random image? [Y/n]\n' -n 1 -r
echo
if [[ $REPLY = [Nn] ]]; then # user wants to quit?
image_list=( "$#" ) # store unused images back to list
return 0
fi
done
done
}
We can simplify this if we're willing to just stop after the user has seen every image once, instead of generating a new batch, and don't need persistence across invocations:
generate_image() {
while IFS= read -r -d '' filename <&3; do
echo "Opening $filename"
cygstart "$filename"
read -p $'Open another random image? [Y/n]\n' -n 1 -r
echo
[[ $REPLY = [Nn] ]] && return 0
done 3< <(
find "$image_dir" -type f -exec file -0 --mime-type -- {} + \
| while IFS= read -r -d '' filename && IFS= read -r desc; do
[[ $desc = *image* ]] && printf '%s\0' "$filename"
done \
| shuf -z
)
}
file listings are rarely so gigantic it can't fit into RAM for awk :
find … -print0 |
mawk 'BEGIN { FS = "\0"
_^= RS = "^$"
} END { printf("%*s", srand()*!_, $(int(rand()*(NF-_))+_)) }'
That'll randomly print out the filename for one of the image files found, with no trailing byte of either \0 or \n, without having to perform any sort of sorting/shuffling.
NF - 1 because find prints out final \0, so NF count is always 1 more than # of files found.
It also protects against an empty input instead of referencing a negative field number - simply nothing gets printed at all.
From there, you can decide you want to open this image file.
Charles' answer is definitely the superior answer here, but for completeness I thought I would also add a middle-ground solution that I stumbled across while experimenting earlier on.
I learnt that shuf can be seeded with an external source of randomness, so by seeding it with /dev/urandom - the randomness generator device available on all UNIX-like systems - it can be made more random:
shuf -n1 --random-source=/dev/urandom
From my tests this appears to result in significantly fewer repeats than a standard shuf command, and could be an ideal solution if you want a little more randomness but can tolerate the occasional repeat.

Shell: Add string to the end of each line, which match the pattern. Filenames are given in another file

I'm still new to the shell and need some help.
I have a file stapel_old.
Also I have in the same directory files like english_old_sync, math_old_sync and vocabulary_old_sync.
The content of stapel_old is:
english
math
vocabulary
The content of e.g. english is:
basic_grammar.md
spelling.md
orthography.md
I want to manipulate all files which are given in stapel_old like in this example:
take the first line of stapel_old 'english', (after that math, and so on)
convert in this case english to english_old_sync, (or after that what is given in second line, e.g. math to math_old_sync)
search in english_old_sync line by line for the pattern '.md'
And append to each line after .md :::#a1
The result should be e.g. of english_old_sync:
basic_grammar.md:::#a1
spelling.md:::#a1
orthography.md:::#a1
of math_old_sync:
geometry.md:::#a1
fractions.md:::#a1
and so on. stapel_old should stay unchanged.
How can I realize that?
I tried with sed -n, while loop (while read -r line), and I'm feeling it's somehow the right way - but I still get errors and not the expected result after 4 hours inspecting and reading.
Thank you!
EDIT
Here is the working code (The files are stored in folder 'olddata'):
clear
echo -e "$(tput setaf 1)$(tput setab 7)Learning directories:$(tput sgr 0)\n"
# put here directories which should not become flashcards, command: | grep -v 'name_of_directory_which_not_to_learn1' | grep -v 'directory2'
ls ../ | grep -v 00_gliederungsverweise | grep -v 0_weiter | grep -v bibliothek | grep -v notizen | grep -v Obsidian | grep -v z_nicht_uni | tee olddata/stapel_old
# count folders
echo -ne "\nHow much different folders: " && wc -l olddata/stapel_old | cut -d' ' -f1 | tee -a olddata/stapel_old
echo -e "Are this learning directories correct? [j ODER y]--> yes; [Other]-->no\n"
read lernvz_korrekt
if [ "$lernvz_korrekt" = j ] || [ "$lernvz_korrekt" = y ];
then
read -n 1 -s -r -p "Learning directories correct. Press any key to continue..."
else
read -n 1 -s -r -p "Learning directories not correct, please change in line 4. Press any key to continue..."
exit
fi
echo -e "\n_____________________________\n$(tput setaf 6)$(tput setab 5)Found cards:$(tput sgr 0)$(tput setaf 6)\n"
#GET && WRITE FOLDER NAMES into olddata/stapel_old
anzahl_zeilen=$(cat olddata/stapel_old |& tail -1)
#GET NAMES of .md files of every stapel and write All to 'stapelname'_old_sync
i=0
name="var_$i"
for (( num=1; num <= $anzahl_zeilen; num++ ))
do
i="$((i + 1))"
name="var_$i"
name=$(cat olddata/stapel_old | sed -n "$num"p)
find ../$name/ -name '*.md' | grep -v trash | grep -v Obsidian | rev | cut -d'/' -f1 | rev | tee olddata/$name"_old_sync"
done
(tput sgr 0)
I tried to add:
input="olddata/stapel_old"
while IFS= read -r line
do
sed -n "$line"p olddata/stapel_old
done < "$input"
The code to change only the english_old_sync is:
lines=$(wc -l olddata/english_old_sync | cut -d' ' -f1)
for ((num=1; num <= $lines; num++))
do
content=$(sed -n "$num"p olddata/english_old_sync)
sed -i "s/"$content"/""$content":::#a1/g"" olddata/english_old_sync
done
So now, this need to be a inner for-loop, of a outer for-loop which holds the variable for english, right?
stapel_old should stay unchanged.
You could try a while + read loop and embed sed inside the loop.
#!/usr/bin/env bash
while IFS= read -r files; do
echo cp -v "$files" "${files}_old_sync" &&
echo sed '/^.*\.md$/s/$/:::#a1/' "${files}_old_sync"
done < olddata/staple_old
convert in this case english to english_old_sync, (or after that what is given in second line, e.g. math to math_old_sync)
cp copies the file with a new name, if the goal is renaming the original file name from the content of the file staple_old then change cp to mv
The -n and -i flag from sed was ommited , include it, if needed.
The script also assumes that there are no empty/blank lines in the content of staple_old file. If in case there are/is add an addition test after the line where the do is.
[[ -n $files ]] || continue
It also assumes that the content of staple_old are existing files. Just in case add an additional test.
[[ -e $files ]] || { printf >&2 '%s no such file or directory.\n' "$files"; continue; }
Or an if statement.
if [[ ! -e $files ]]; then
printf >&2 '%s no such file or directory\n' "$files"
continue
fi
See also help test
See also help continue
Combining them all together should be something like:
#!/usr/bin/env bash
while IFS= read -r files; do
[[ -n $files ]] || continue
[[ -e $files ]] || {
printf >&2 '%s no such file or directory.\n' "$files"
continue
}
echo cp -v "$files" "${files}_old_sync" &&
echo sed '/^.*\.md$/s/$/:::#a1/' "${files}_old_sync"
done < olddata/staple_old
Remove the echo's If you're satisfied with the output so the script could copy/rename and edit the files.

snakemake rule calls a shell script but exits after first command

I have a shell script that works well if I just run it from command line. When I call it from a rule within snakemake it fails.
The script runs a for loop over a file of identifiers and uses those to grep the sequences from a fastq file followed by multiple sequence alignment and makes a consensus.
Here is the script. I placed some echo statements in there and for some reason it doesn't call the commands. It stops at the grep statement.
I have tried adding set +o pipefail; in the rule but that doesn't work either.
#!/bin/bash
function Usage(){
echo -e "\
Usage: $(basename $0) -r|--read2 -l|--umi-list -f|--outfile \n\
where: ... \n\
" >&2
exit 1
}
# Check argument count
[[ "$#" -lt 2 ]] && Usage
# parse arguments
while [[ "$#" -gt 1 ]];do
case "$1" in
-r|--read2)
READ2="$2"
shift
;;
-l|--umi-list)
UMI="$2"
shift
;;
-f|--outfile)
OUTFILE="$2"
shift
;;
*)
Usage
;;
esac
shift
done
# Set defaults
# Check arguments
[[ -f "${READ2}" ]] || (echo "Cannot find input file ${READ2}, exiting..." >&2; exit 1)
[[ -f "${UMI}" ]] || (echo "Cannot find input file ${UMI}, exiting..." >&2; exit 1)
#Create output directory
OUTDIR=$(dirname "${OUTFILE}")
[[ -d "${OUTDIR}" ]] || (set -x; mkdir -p "${OUTDIR}")
# Make temporary directories
TEMP_DIR="${OUTDIR}/temp"
[[ -d "${TEMP_DIR}" ]] || (set -x; mkdir -p "${TEMP_DIR}")
#RUN consensus script
for f in $( more "${UMI}" | cut -f1);do
NAME=$(echo $f)
grep "${NAME}" "${READ2}" | cut -f1 -d ' ' | sed 's/#M/M/' > "${TEMP_DIR}/${NAME}.name"
echo subsetting reads
seqtk subseq "${READ2}" "${TEMP_DIR}/${NAME}.name" | seqtk seq -A > "${TEMP_DIR}/${NAME}.fasta"
~/software/muscle3.8.31_i86linux64 -msf -in "${TEMP_DIR}/${NAME}.fasta" -out "${TEMP_DIR}/${NAME}.muscle.fasta"
echo make consensus
~/software/EMBOSS-6.6.0/emboss/cons -sequence "${TEMP_DIR}/${NAME}.muscle.fasta" -outseq "${TEMP_DIR}/${NAME}.cons.fasta"
sed -i 's/n//g' "${TEMP_DIR}/${NAME}.cons.fasta"
sed -i "s/EMBOSS_001/${NAME}.cons/" "${TEMP_DIR}/${NAME}.cons.fasta"
done
cat "${TEMP_DIR}/*.cons.fasta" > "${OUTFILE}"
Snakemake rule:
rule make_consensus:
input:
r2=get_extracted,
lst="{prefix}/{sample}/reads/cell_barcode_umi.count"
output:
fasta="{prefix}/{sample}/reads/fasta/{sample}.R2.consensus.fa"
shell:
"sh ./scripts/make_consensus.sh -r {input.r2} -l {input.lst} -f {output.fasta}"
Edit Snakemake error messages I changed some of the paths to a neutral filepath
RuleException:
CalledProcessError in line 29 of ~/user/scripts/consensus.smk:
Command ' set -euo pipefail; sh ./scripts/make_consensus.sh -r ~/user/file.extracted.fastq -l ~/user/cell_barcode_umi
.count -f ~/user/file.consensus.fa ' returned non-zero exit status 1.
File "~/user/scripts/consensus.smk", line 29, in __rule
_make_consensus
File "~/user/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
If there are better ways to do this than using a shell for loop please let me know!
thanks!
Edit
Script ran as standalone: first grep
grep AGGCCGTTCT_TGTGGATG R_extracted/wgs_5_OL_debug.R2.extracted.fastq | cut -f1 -d ' ' | sed 's/#M/M/' > ./fasta/temp/AGGCCGTTCT_TGTGGATG.name
Script ran through snakemake: first 2 grep statements
grep :::::::::::::: R_extracted/wgs_5_OL_debug.R2.extracted.fastq | cut -f1 -d ' ' | sed 's/#M/M/' > ./fasta/temp/::::::::::::::.name
I'm now trying to figure out where those :::: in snakemake are coming from. All ideas welcome
It stops at the grep statement
My guess is that the grep command in make_consensus.sh doesn't capture anything. grep returns exit code 1 in such cases and the non-zero exit status propagates to snakemake. (see also Handling SIGPIPE error in snakemake)
Loosely related... There is an inconsistency between the shebang of make_consensus.sh that says the script should be executed with bash (#!/bin/bash) and the actual execution using sh (sh ./scripts/make_consensus.sh). (In practice it shouldn't make any difference since sh is probably redirected to bash anyway)

Getting the path to the newest file in a directory with f=$(cd dir | ls -t | head) not honoring "dir"

I would like to get file (zip file) from path with this part of code file=$(cd '/path_to_zip_file' | ls -t | head -1). Instead that I got my .sh file in directory where I am running this file.
Why I can't file from /path_to_zip_file
Below is my code in .sh file
file=$(cd '/path_to_zip_file' | ls -t | head -1)
last_modified=`stat -c "%Y" $file`;
current=`date +%s`
echo $file
if [ $(($current-$last_modified)) -gt 86400 ]; then
echo 'Mail'
else
echo 'No Mail'
fi;
If you were going to use ls -t | head -1 (which you shouldn't), the cd would need to be corrected as a prior command (happening before ls takes place), not a pipeline component (running parallel with ls, with its stdout connected to ls's stdin):
set -o pipefail # otherwise, a failure of ls is ignored so long as head succeeds
file=$(cd '/path_to_zip_file' && ls -t | head -1)
A better-practice approach might look like:
newest_file() {
local result=$1; shift # first, treat our first arg as latest
while (( $# )); do # as long as we have more args...
[[ $1 -nt $result ]] && result=$1 # replace "result" if they're newer
shift # then take them off the argument list
done
[[ -e $result || -L $result ]] || return 1 # fail if no file found
printf '%s\n' "$result" # more reliable than echo
}
newest=$(newest_file /path/to/zip/file/*)
newest=${newest##*/} ## trim the path to get only the filename
printf 'Newest file is: %s\n' "$newest"
To understand the ${newest##*/} syntax, see the bash-hackers' wiki on parameter expansion.
For more on why using ls in scripts (except for output displayed to humans) is dangerous, see ParsingLs.
Bot BashFAQ #99, How do I get the latest (or oldest) file from a directory? -- and BashFAQ #3 (How can I sort or compare files based on some metadata attribute (newest / oldest modification time, size, etc)?) have useful discussion on the larger context in which this question was asked.

bash for loop with same order as GNU "ls -v" ("version-number" sort)

In a bash script I want to do a typical "for file in somedir" but I want the files to be processed in the same order that "ls -v" returns them. I know the downfalls of using "ls" as a function. Is there some way to replicate "-v" without using "ls"? Thanks.
Assuming that this is "version number" sort order, this is also implemented by GNU sort. Thus, on a GNU platform:
somedir=/foo
while IFS= read -r -d '' filename; do
printf 'Processing file: %q\n' "$filename"
done < <(set -- "$somedir"/*; [[ -e $1 || -L $1 ]] && printf '%s\0' "$#" | sort -z -V)
If you really want to use a for loop rather than a while loop, parse into an array and iterate over that:
files=( )
while IFS= read -r -d '' filename; do
files+=( "$filename" )
done < <(set -- "$somedir"/*; [[ -e $1 || -L $1 ]] && printf '%s\0' "$#" | sort -z -V)
for filename in "${files[#]}"; do
printf 'Processing file: %q\n' "$filename"
done
To explain some of the magic above:
In < <(...), <(...) is a process substitution. It's replaced with a filename which, when read from, will return the output of the code enclosed. Thus, < <(...) will put that process substitution's output as the input to the while read loop. This loop form is described in BashFAQ #1. The reasons to use this kind of redirection instead of piping into the loop are given in BashFAQ #24.
set -- "$somedir"/* replaces the argument list within the current context (that context being the subshell running the process substitution!) with the results of "$somedir"/*; thus, (non-hidden, by default) contents of the directory named in the variable somedir.
[[ -e $1 || -L $1 ]] is true only if that glob expanded to at least one item; if it remained * (and no actual filesystem object exists by that name), gating output on this condition prevents the process substitution from emitting any output.
sort -z tells sort to delimit elements in both input and output with NULs -- a character that isn't allowed to exist in filenames.

Resources