Submit SGE job array with random file names

Submit SGE job array with random file names - bash

I have a script that was kicking off ~200 jobs for each sub-analysis. I realized that a job array would probably be much better for this for several reasons. It seems simple enough but is not quite working for me. My input files are not numbered so I've following examples I've seen I do this first:
INFILE=`sed -n ${SGE_TASK_ID}p <pathto/listOfFiles.txt`
My qsub command takes in quite a few variables as it is both pulling and outputting to different directories. $res does not change, however $INFILE is what I am looping through.
qsub -q test.q -t 1-200 -V -sync y -wd ${res} -b y perl -I /master/lib/ myanalysis.pl -c ${res}/${INFILE}/configFile-${INFILE}.txt -o ${res}/${INFILE}/
Since this was not working, I was curious as to what exactly was being passed. So I did an echo on this and saw that it only seems to expand up to the first time $INFILE is used. So I get:
perl -I /master/lib/ myanalysis.pl -c mydirectory/fileABC/
instead of:
perl -I /master/lib/ myanalysis.pl -c mydirectory/fileABC/configFile-fileABC.txt -o mydirectory/fileABC/
Hoping for some clarity on this and welcome all suggestions. Thanks in advance!
UPDATE: It doesn't look like $SGE_TASK_ID is set on the cluster. I looked for any variable that could be used for an array ID and couldn't find anything. If I see anything else I will update again.

Assuming you are using a grid engine variant then SGE_TASK_ID should be set within the job. It looks like you are expecting it to be set to some useful variable before you use qsub. Submitting a script like this would do roughly what you appear to be trying to do:
#!/bin/bash
INFILE=$(sed -n ${SGE_TASK_ID}p <pathto/listOfFiles.txt)
exec perl -I /master/lib/ myanalysis.pl -c ${res}/${INFILE}/configFile-${INFILE}.txt -o ${res}/${INFILE}/
Then submit this script with
res=${res} qsub -q test.q -t 1-200 -V -sync y -wd ${res} myscript.sh
`

Related

Iterations of a bash script to run in parallel

I have a bash script that looks like below.
$TOOL is another script which runs 2 times with different inputs(VAR1 and VAR2).
#Iteration 1
${TOOL} -ip1 ${VAR1} -ip2 ${FINAL_PML}/$1$2.txt -p ${IP} -output_format ${MODE} -o ${FINAL_MODE_DIR1}
rename mods mode_c_ ${FINAL_MODE_DIR1}/*.xml
#Iteration 2
${TOOL} -ip1 ${VAR2} -ip2 ${FINAL_PML}/$1$2.txt -p ${IP} -output_format ${MODE} -o ${FINAL_MODE_DIR2}
rename mods mode_c_ ${FINAL_MODE_DIR2}/*.xml
Can I make these 2 iterations in parallel inside a bash script without submitting it in a queue?

If I read this right, what you want is to run them in background.
c.f. https://linuxize.com/post/how-to-run-linux-commands-in-background/
More importantly, if you are going to be writing scripts, PLEASE read the following closely:
https://www.gnu.org/software/bash/manual/html_node/index.html#SEC_Contents
https://mywiki.wooledge.org/BashFAQ/001

Bash check if script is running with exact options

I know how to check if a script is already running (if pidof -o %PPID -x "scriptname.sh"; then...). But now I have a script that accepts inputs as flags, so it can be used in several different scenarios, many of which will probably run at the same time.
Example:
/opt/scripts/backup/tar.sh -d /directory1 -b /backup/dir -c /config/dir
and
/opt/scripts/backup/tar.sh -d /directory2 -b /backup/dir -c /config/dir
The above runs a backup script that I wrote, and the flags are the parameters for the script: the directory being backed up, the backup location, and the configuration location. The above example are two different backups (directory 1 and directory 2) and therefore should be allowed to run simultaneously.
Is there any way for a script to check if it is being run and check if the running version is using the exact same parameters/flags?

The ps -Af command will provide you all the processes that run on you os with the "command" line used to run them.

One solution :
if ps auxwww | grep '/[o]pt/scripts/backup/tar.*/directory2'; then
echo "running"
else
echo "NOT running"
fi

Splitting wireshark large size with pcap splitter with bash

I have large pcapng files, and I want to split them based on my desired wireshark filters. I want to split my files by the help of bash scripts and using pcapsplitter, but when I use a loop, it always gives me the same file.
I have written a small code.
#!/bin/bash
for i in {57201..57206}
do
mkdir destination/$i
done
tcp="tcp port "
for i in {57201..57206}
do
tcp="$tcp$i"
pcapsplitter -f file.pcapng -o destination/$i -m bpf-filter -p $tcp
done
the question is, can I use bash for my goal or not?
If yes, why it does not work?

Definitely, this is something Bash can do.
Regarding your script, the first thing I can think of is this line :
pcapsplitter -f file.pcapng -o destination/$i -m bpf-filter -p $tcp
where the value of $tcp is actually tcp port 57201 (and following numbers on the next rounds. However, without quotes, you're actually passing tcp only to the -p parameter.
It should work better after you've changed this line into :
pcapsplitter -f file.pcapng -o destination/$i -m bpf-filter -p "$tcp"
NB: as a general advice, it's usually safer to double-quote variables in Bash.
NB2 : you don't need those 2 for loops. Here is how I'd rewrite your script :
#!/bin/bash
for portNumber in {57201..57206}; do
destinationDirectory="destination/$portNumber"
mkdir "$destinationDirectory"
thePparameter="tcp port $portNumber"
pcapsplitter -f 'file.pcapng' -o "$destinationDirectory" -m bpf-filter -p "$thePparameter"
done

tsp fails to run the bash command

I'm trying to use ts/tsp to schedule idle tasks that I need done from time to time, which is OK if they don't complete due to a crash.
So far, I'm trying with a script like this:
the_args=(--long-arg /usr/share/lib --long-arg2 -j $j -o "'$path_o/'" -i "'$path_i'")
tsp -m -L "$jobname" bash -c '
echo task "$#"
cgexec -g cpu,freezer:execting exector "$#"
' "${the_args[#]}"
I want to run executor with the args given by the_args
I've tried many alternatives, including:
tsp -m -L "$jobname" bash -c "
echo task ${the_args[#]}
cgexec -g cpu,freezer:execting exector ${the_args[#]}
"
I've also tried with heredocs with different configurations.... None of them worked.
Unfortunately, none of those allow me to call the command with all the args. Some methods pass only the first element in the list (the ones shown), others don't even work.
What is the correct way to pass parameters to inside the deferred script?

The first argument following -c's argument is used to set $0 in the shell; it is not included in $#. You need to provide some dummy argument (since you probably don't care what $0 actually is).
Your quoting inside the_args also needs to be simpilfied.
the_args=(
--long-arg /usr/share/lib
--long-arg2
-j "$j"
-o "$path_o/"
-i "$path_i"
)
tsp -m -L "$jobname" bash -c '
echo task "$#"
cgexec -g cpu,freezer:execting exector "$#"
' "" "${the_args[#]}"

Bash: Looping urls using curl, giving only first url output

What I want to do: I want to find all the products(URLs) which are not redirected.
To get the final URL after redirection I'm using curl command as follows:
curl -Ls -o /dev/null -w %{url_effective} "$URL"
This is working fine. Now I want to iterate over URLs to get which are the URLs that are not redirected and display them as output of program. I've the following code:
result=""
productIds=$1
for productId in $(echo $productIds | sed "s/,/ /g")
do
echo "Checking product: $productId";
URL="http://localhost/?go=$productId";
updatedURL=`curl -Ls -o /dev/null -w %{url_effective} "$URL"`
echo "URL : $URL, UpdatedUrl: $updatedURL";
if [ "$URL" == "$updatedURL" ]
then
result="$result$productId,";
fi
done
The curl command works only for the first product. But from 2nd to last product, all the URL and updatedURL are same. I can't understand the reason why? The productId is changing in every iteration, so I think it cannot be something related to caching.
Also, I've tried the following variant of curl also:
updatedURL=$(curl -Ls -o /dev/null -w %{url_effective} "$URL")
updatedURL="$(curl -Ls -o /dev/null -w %{url_effective} "$URL")"
Edit: After trying with debug mode and lot of different ways. I noticed a pattern i.e. If I manually hit the following on terminal:
curl -Ls -o /dev/null -w %{url_effective} "http://localhost/?go=32123"
Then in shell script these urls will work fine. But if I don't hit manually then curl will also not work for those products via shell script.

Just add #!/bin/bash to be the first line of shell. It does produce required output. Now invocation should be like this bash file.sh 123,456,789,221
Invocation via Bourne shell sh file.sh 123,456,789,221 does requires code changes.Do let me know if you require that too :)

I would suggest changing your loop to something like this:
IFS=, read -ra productIds <<<"$1"
for productId in "${productIds[#]}"; do
url=http://localhost/?go=$productId
num_redirects=$(curl -Ls -o /dev/null -w %{num_redirects} "$url")
if [ "$num_redirects" -eq 0 ]; then
printf 'productId %s has no redirects\n' "$productId"
fi
done
This splits the first argument passed to the script into an array, using a comma as the delimiter. The number of redirects is stored to a variable. When the number is zero, the message is printed.
I have to admit that I can't see anything inherently broken with your original approach so it's possible that there is something extra going on that we're not aware of. If you could provide a reproducible test case then we would be able to help you more effectively.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Submit SGE job array with random file names - bash

Related

Iterations of a bash script to run in parallel

Bash check if script is running with exact options

Splitting wireshark large size with pcap splitter with bash

tsp fails to run the bash command

Bash: Looping urls using curl, giving only first url output

Categories

Resources