I am currently running a program that rearranges genomes to create the best alignment to a reference genome, and as it does so it generates a number of folders like alignment#.
I have no way of knowing how many iterations this program will run through before it stops, but the final alignment is the best one (could be anything from alignment5 to alignment35) and will have a predictable filename within the folder, though the folder will be changeable.
I need a bash script that will look inside a directory and identify the highest-numbered directory and store it as a variable or similar, that could ideally be passed to an additional program.
I just wanted to add that my scripting is very basic. If you guys could explain your answers as thoroughly as possible or provide links to user-friendly resources that would be much appreciated.
A concept script here:
#!/bin/bash
shopt -s extglob || exit
DIR="/parent/dir" highest=
for a in "$DIR"/alignment+([[:digit:]]); do
b=${a##*/alignment}
[[ -z $highest || b -gt highest ]] && highest=$b
done
[[ -n $highest ]] && echo "Highest: $DIR/alignment${highest}"
ls -1 $directory | sort --numeric
This assumes a consistent prefix to the file names.
Otherwise, you can use "sort -k N --numeric", see "man sort" for details.
Related
Ive got a .sql job which creates files depending on certain criteria, it writes these with a prefix of TEMP_ as we then have an adaptor that picks up the files and we dont want them picked up before writing is complete.
I need to put a post job in place which renames these files, i have it set up with a number of other job but they all create the files each time they run. This job only creates the files sometimes it runs, depending on the data in the system.
I need to do a check if the file exists and then exit if no file exists.
Ive found a few examples but they all seem to fail, this is where i have got to which i thought was checking if no file, if no file then exit but it fails and displays:
"syntax error at line 16: `TEMP_SUBCON*.csv' unexpected"
This is what i have currently with line 16 being the first line - Above that is just comments:
if [[ ! -f $OUT_DIR -name TEMP_SUBCON*.csv ]] ; then
exit $?
fi
TEMP_DATA_FILE=$(find $OUT_DIR -name TEMP_SUBCON_QTY_output*.csv)
DATA_FILE=$(basename $TEMP_DATA_FILE | cut -c6-)
echo $TEMP_DATA_FILE
echo $DATA_FILE
## Rename the file name, remove the phrase TEMP_, so that EAI can pick the file ##
mv $TEMP_DATA_FILE $OUT_DIR/$DATA_FILE
Can you help guide what ive done incorrectly?
Thanks
If I understand it right, you want to find the files with TEMP_ prefix in your $OUT_DIR, and then if any rename them without the prefix. Then that should do the trick
for file in $OUT_DIR/TEMP_SUBCON_*.txt; do
if [[ -e $file ]]; then
mv $file $OUT_DIR/${file#*SUBCON_}
fi
done
exit
It will go through the directory finding each TEMP_ file and rename them without it. If there is none, it won't do anything.
That syntax is not valid for the [[ ... ]] test.
Why not use the result of the subsequent find command to check if there were any matching files in the specified directory instead, and quit if no files are returned (in other words, quit if the result variable is empty)?
Example:
TEMP_DATA_FILE=$(find $OUT_DIR -name "TEMP_SUBCON_QTY_output*.csv" )
if [[ -z ${TEMP_DATA_FILE:=""} ]] ; then
exit 1
fi
Note 1: you should quote the pattern argument for the find command as shown.
Note 2: it is useful to use set -u in your ksh scripts to cause ksh to abort if variables are unitialized when used (often the cause of errors) , instead of using a default value. However, if you use use set -u then in any test you should explicitly give your own default value. That is the reason for using ${TEMP_DATA_FILE:=""} instead of ${TEMP_DATA_FILE} - to support the often very useful set -u option. Even when you do not use set -u the ${TEMP_DATA_FILE:=""} inside tests makes it explicit what should happen instead of relying on implicit behaviour.
Note 3: you should use set -x when debugging and study every line of the output, it will show you exactly what commands ran with which arguments and what was the result. This helps you to learn how to code in ksh and similar shells.
I am looking for a bash script (or one-liner) to accomplish the following:
Check to see if there is more than one file containing the substring "slurm-"
If so, remove all of the files containing the substring except for the newest one
Any help would be greatly appreciated, thank you.
The above isn't exceptionally efficient with a very long list of files, but (1) it's fast with a short list (low constant-time startup cost), and (2) it's very explicit about how it operates (easy to read and understand).
shopt -s nullglob
candidates=( slurm-* )
(( ${#candidates[#]} < 2 )) && exit 0 ## nothing to do if <2 files exist
latest=${candidates[0]} ## populate latest variable w/ first
for candidate in "${candidates[#]}"; do ## loop through the whole set
if [[ $candidate -nt $latest ]]; then ## and if one is newer, call it "latest"
latest=$candidate
fi
done
for candidate in "${candidates[#]}"; do ## iterate through the whole set
if [[ $candidate != "$latest" ]]; then ## and for everything but the latest file
rm -f -- "$candidate" ## run a deletion
fi
done
Answering the XY problem, you might find it a better course of action to actually add #SBATCH -o output.txt to your submission file to overwrite the Slurm output file every time, if your intent is to keep a clean working directory while submitting several times in a row the same job until it properly runs.
I'm studying the bash shell and lately understood i'm not getting right recursive calls involving file searching- i know find is made for this but I'm recently asked to implement a certain search this way or another.
I wrote the next script:
#!/bin/bash
function rec_search {
for file in `ls $1`; do
echo ${1}/${item}
if[[ -d $item ]]; then
rec ${1}/${item}
fi
done
}
rec $1
the script gets as argument file and looking for it recursively.
i find it a poor solution of mine. and have a few improvement questions:
how to find files that contain spaces in their names
can i efficiently use pwd command for printing out absolute address (i tried so, but unsuccessfully)
every other reasonable improvement of the code
Your script currently cannot work:
The function is defined as rec_search, but then it seems you mistakenly call rec
You need to put a space after the "if" in if[[
There are some other serious issues with it too:
for file in `ls $1` goes against the recommendation to "never parse the output of ls", won't work for paths with spaces or other whitespace characters
You should indent the body of if and for to make it easier to read
The script could be fixed like this:
rec() {
for path; do
echo "$path"
if [[ -d "$path" ]]; then
rec "$path"/*
fi
done
}
But it's best to not reinvent the wheel and use the find command instead.
If you are using bash 4 or later (which is likely unless you running this under Mac OS X), you can use the ** operator.
rec () {
shopt -s globstar
for file in "$1"/**/*; do
echo "$file"
done
}
I'm trying to learn how to batch edit files and extract information from them. I've begun with trying to create some trial files and editing their names. I tried to search but couldn't find the problem I'm in anywhere.
If it's already answered, I'd be happy to be directed to that link.
So, I wrote the following code:
#!/bin/bash
mkdir -p ./trialscript
echo $1
i=1
while [ $i -le $1 ]
do
touch ./trialscript/testfile$i.dat
i=$(($i+1))
done
for f in ./trialscript/*.dat
do
echo $f
mv "$f" "$fhello.dat"
done
This doesn't seem to work, and I think it's because the echo output is like:
4
./trialscript/testfile1.dat
./trialscript/testfile2.dat
./trialscript/testfile3.dat
./trialscript/testfile4.dat
I just need the filename in the 'f' and not the complete path and then just rename it.
Can someone suggest what is wrong in my code, and what's correct way to do what I'm doing.
If you want to move the file, you have to use the path, too, otherwise mv wouldn't be able to find it.
The target specification for the mv command is more problematic, though. You're using
"$fhello.dat"
which, in fact, means "content of the $fhello variable plus the string .dat". How should the poor shell know where the seam is? Use
"${f}hello.dat"
to disambiguate.
Also, to extract parts of strings, see Parameter expansion in man bash. You can use ${f%/*} to only get the path, or ${f##*/} to only get the filename.
I've been handed a project that consists of several dozen (probably over 100, I haven't counted) bash scripts. Most of the scripts make at least one call to another one of the scripts. I'd like to get the equivalent of a call graph where the nodes are the scripts instead of functions.
Is there any existing software to do this?
If not, does anybody have clever ideas for how to do this?
Best plan I could come up with was to enumerate the scripts and check to see if the basenames are unique (they span multiple directories). If there are duplicate basenames, then cry, because the script paths are usually held in variable names so you may not be able to disambiguate. If they are unique, then grep the names in the scripts and use those results to build up a graph. Use some tool (suggestions?) to visualize the graph.
Suggestions?
Wrap the shell itself by your implementation, log who called you wrapper and exec the original shell.
Yes you have to start the scripts in order to identify which script is really used. Otherwise you need a tool with the same knowledge as the shell engine itself to support the whole variable expansion, PATHs etc -- I never heard about such a tool.
In order to visualize the calling graph use GraphViz's dot format.
Here's how I wound up doing it (disclaimer: a lot of this is hack-ish, so you may want to clean up if you're going to use it long-term)...
Assumptions:
- Current directory contains all scripts/binaries in question.
- Files for building the graph go in subdir call_graph.
Created the script call_graph/make_tgf.sh:
#!/bin/bash
# Run from dir with scripts and subdir call_graph
# Parameters:
# $1 = sources (default is call_graph/sources.txt)
# $2 = targets (default is call_graph/targets.txt)
SOURCES=$1
if [ "$SOURCES" == "" ]; then SOURCES=call_graph/sources.txt; fi
TARGETS=$2
if [ "$TARGETS" == "" ]; then TARGETS=call_graph/targets.txt; fi
if [ ! -d call_graph ]; then echo "Run from parent dir of call_graph" >&2; exit 1; fi
(
# cat call_graph/targets.txt
for file in `cat $SOURCES `
do
for target in `grep -v -E '^ *#' $file | grep -o -F -w -f $TARGETS | grep -v -w $file | sort | uniq`
do echo $file $target
done
done
)
Then, I ran the following (I wound up doing the scripts-only version):
cat /dev/null | tee call_graph/sources.txt > call_graph/targets.txt
for file in *
do
if [ -d "$file" ]; then continue; fi
echo $file >> call_graph/targets.txt
if file $file | grep text >/dev/null; then echo $file >> call_graph/sources.txt; fi
done
# For scripts only:
bash call_graph/make_tgf.sh call_graph/sources.txt call_graph/sources.txt > call_graph/scripts.tgf
# For scripts + binaries (binaries will be leaf nodes):
bash call_graph/make_tgf.sh > call_graph/scripts_and_bin.tgf
I then opened the resulting tgf file in yEd, and had yEd do the layout (Layout -> Hierarchical). I saved as graphml to separate the manually-editable file from the automatically-generated one.
I found that there were certain nodes that were not helpful to have in the graph, such as utility scripts/binaries that were called all over the place. So, I removed these from the sources/targets files and regenerated as necessary until I liked the node set.
Hope this helps somebody...
Insert a line at the beginning of each shell script, after the #! line, which logs a timestamp, the full pathname of the script, and the argument list.
Over time, you can mine this log to identify likely candidates, i.e. two lines logged very close together have a high probability of the first script calling the second.
This also allows you to focus on the scripts which are still actually in use.
You could use an ed script
1a
log blah blah blah
.
wq
and run it like so:
find / -perm +x -exec ed {} <edscript
Make sure you test the find command with -print instead of the exec clause. And / is probably not the path that you want to use. If you have to include bin directories then you will probably need to switch to grep in order to identify the pathnames to include, then when you have a file full of the right names, use xargs instead of find to run the script.