cannot chdir to /path/to/job_submit_dir/ in SGE cluster - parallel-processing

I use qsub to submit a job to the SGE cluster. In the job file, the following are defined:
#!/bin/bash
#
#$ -V
#$ -cwd
#$ -j y
#$ -S /bin/bash
#
The -cwd indicates that the job will run in the directory where the job file is. All job files contains the job settings above.
Some of the jobs are submitted and could run correctly, but some of them are submitted and the status from qstat is Eqw, and when use qstat -j job_id to show the detail status, it shows:
failed changing into working directory because:
error: can't chdir to /path/to/job_submit_dir
But sometimes I go into the directory, and resubmit the job, it seems to work.
I've searched in Google, and this site has provided a solution, but it doesn't work for my setting.
Could anyone give some advice, please?

Appears that for this instance of this error issues may be due to excessive write to network mounted storage:
https://www.icts.uiowa.edu/confluence/display/ICTSit/Best+practices+for+high+throughput+jobs
To solve attempt to redirect output to local storage on each execution node or /dev/null.

Related

How to execute gcloud command in bash script from crontab -e

I am trying execute some gcloud commands in bash script from crontab. The script execute sucessfully from command shell but not from the cron job.
I have tried with:
Settng the full path to gcloud like:
/etc/bash_completion.d/gcloud
/home/Arturo/.config/gcloud
/usr/bin/gcloud
/usr/lib/google-cloud-sdk/bin/gcloud
Setting in the begin the script:
/bin/bash -l
Setting in the crontab:
51 21 30 5 6 CLOUDSDK_PYTHON=/usr/bin/python2.7;
/home/myuser/folder1/myscript.sh param1 param2 param3 -f >>
/home/myuser/folder1/mylog.txt`
Setting inside the script:
export CLOUDSDK_PYTHON=/usr/bin/python2.7
Setting inside the script:
sudo ln -s /home/myuser/google-cloud-sdk/bin/gcloud /usr/bin/gcloud
Version Ubuntu 18.04.3 LTS
command to execute: gcloud config set project myproject
but nothing is working, maybe I am doing something wrongly. I hope you can help me.
You need to set your user in your crontab, for it to run the gcloud command. As well explained in this other post here, you need to modify your crontab to fetch the data in your Cloud SDK, for the execution to occur properly - it doesn't seem that you have made this configuration.
Another option that I would recommend you to try out, it's using a Cloud Scheduler to run your gcloud commands. This way, you can use gcloud for your cron jobs in a more integrated and easy way. You can verify more information about this option here: Creating and configuring cron jobs
Let me know if the information helped you!
I found my error, the problem here was only in the command: "gcloud dns record-sets transaction start", the others command was executing sucesfully but only no logging nothing, by that I though that was not executng the other commands. This Command create a temp file ex. transaction.yaml and that file could not be created in the default path for gcloud(snap/bin), but the log simply dont write any thing!. I had to specify the path and name for that file with the flag --transaction-file=mytransaction.yaml. Thanks for your supprot and ideas
I have run into the same issue before. I fixed it by forcing the profile to load in my script.sh,loading the gcloud environment variables with it. Example below:
#!/bin/bash
source /etc/profile
gcloud config set project myprojectecho
echo "Project set to myprojectecho."
I hope this can help others in the future with similar issues, as this also helped me when trying to set GKE nodes from 0-4 on a schedule.
Adding the below line to the shell script fixed my issue
#Execute user profile
source /root/.bash_profile

unrecognized arguments when executing script via crontab

I have my crontab set up as follows (this is inside a docker container).
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
SHELL=/bin/bash
5 * * * * bash /usr/local/bin/process-logs > /proc/1/fd/1 2>/proc/1/fd/
The /usr/local/bin/process-logs is designed to expose some MongoDB logs using mtools to a simple web server.
The problematic part of the script is fairly simple. raw_name is archive_name without the file extension.
aws s3 cp "s3://${s3_bucket}/${file_name}" "${archive_name}"
gunzip "${archive_name}"
mlogvis --no-browser "${raw_name}"
If I manually run the command as specified in the crontab config above
bash /usr/local/bin/process-logs > /proc/1/fd/1 2>/proc/1/fd/2
It all works as expected (this is the expected output from mlogvis)
...
copying /usr/local/lib/python3.5/dist-packages/mtools/data/index.html to /some/path/mongod.log-20190313-1552456862.html
...
When the script gets triggered via crontab it throws the following error
usage: mlogvis [-h] [--version] [--no-progressbar] [--no-browser] [--out OUT]
[--line-max LINE_MAX]
mlogvis: error: unrecognized arguments: mongod.log-20190313-1552460462
The mlogvis command that caused the following error (actual values not parameters)
mlogvis --no-browser "mongod.log-20190313-1552460462"
Again if I run this command myself it all works as expected.
mlogvis: http://blog.rueckstiess.com/mtools/mlogvis.html
I don't believe this to be an issue with the file not having correct permissions or not existing as mlogvis produces a different error in these conditions. I've also tested with removing '-' from the file name thinking it might be trying to parse these as arguments but it made no difference.
I know cron execution doesn't have the same execution environment as the user I tested the script as. I've set the PATH to be the same as the user and when the container starts up I execute env >> /etc/environment so all the environment vars and properly set.
Does anyone know of a way to debug this or has anyone encountered similar? All other components of the script are functioning except mlogvis which is core to the purpose of this job.
Summary of what I've tried as a fix:
Set environment and PATH for cron execution to be the same as the user I tested the script as
Replace - in file name(s) to see if it was parsing the parts as arguments
hardcode a filename with full permissions to see if it was permissions related
Manually run the script -> this works
Manually run the mlogvis command in isolation -> this works
try to load /home/user/.bash_profile before executing script and try again. I suspect that you have missing PATH or other environment variable which is not set.
source /home/user/.bash_profile
Please post your complete script, because while executing via crontab,
you have to be sure your raw_name variable was properly created. As
it seems to depend on archive_name, posting some more context can
help us to help you.
In any case, if you are using bash, you can try something like :
aws s3 cp "s3://${s3_bucket}/${file_name}" "${archive_name}"
gunzip "${archive_name}"
# here you have to be sure that archive_name is correct
raw_name_2=${archive_name%%.*}
mlogvis --no-browser "${raw_name_2}"
It is not going to solve your issue, but probably will take you closer to the right path.

AMBER16: running in parallel through job submission not working

I am trying to run AMBER16 on a cluster but it is not working when the job is submitted through the scheduler, using the "qsub" command. However, the job does work when running locally on the front node. I have all of the PATHS set correctly in the .bashrc file. The following is my code:
#!/bin/bash
#PBS -N testAmber
#PBS -l nodes=1:ppn=12
#PBS -l walltime=05:00:00
cd working_directory
export AMBERHOME=/state/partition1/apps/amber16
source $AMBERHOME/amber.sh
mpirun -np 12 $AMBERHOME/bin/sander.MPI -O -i ...etc...
When this is submitted, I get the following error messages:
.../.bashrc: line 46: /state/partition1/apps/amber16/amber.sh: No such file or directory
/var/spool/torque/mom_priv/jobs/...: line 16: /state/partition1/apps/amber16/amber.sh: No such file or directory
mpirun was unable to launch the specified application as it could not access
or execute an executable:
Executable: /state/partition1/apps/amber16/bin/sander.MPI
Node: compute-0-8.local
while attempting to start process rank 0.
I've been trying to find a solution for hours, but am stuck. Please help :(

Cron job does not start [duplicate]

This question already has answers here:
CronJob not running
(19 answers)
Closed last month.
I have a cron job that I want to execute every 5 minutes:
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /scr_temp/scheduleSpider.sh
In /var/spool/cron/crontabs/root
The cron should execute a shell script:
#!/bin/sh
if [ ! -f "sync.txt" ]; then
touch "sync.txt"
chmod 777 /scr_temp
curl someLink
fi
That works fine from command line but not from cron. However the cron itself is startet but the script does not start.
I read about the path problem but I dont really understand it. I setup a cron that writes some env data to a file. This is the output:
HOME=/root
LOGNAME=root
PATH=/usr/bin:/bin
SHELL=/bin/sh
If I execute the env command in command line I get following output for PATH
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
What path do I have to set in my shell script?
Your $PATH is fine; leave it alone. On Ubuntu, all the commands you're invoking (touch, chmod, curl) are in /bin and/or /usr/bin.
How did you set up the cron job? Did you run crontab some-file as root?
It seems that /etc/crontab is the usual mechanism for running cron commands as root. On my Ubuntu system, sudo crontab -l says no crontab for root. Running crontab as root, as you would for any non-root account, should be ok, but you might consider using /etc/crontab instead. Note that it uses a different syntax than an ordinary crontab, as explained in the comments at the top of /etc/crontab:
$ head -5 /etc/crontab
# /etc/crontab: system-wide crontab
# Unlike any other crontab you don't have to run the `crontab'
# command to install the new version when you edit this file
# and files in /etc/cron.d. These files also have username fields,
# that none of the other crontabs do.
Run sudo crontab -l. Does it show your command?
Temporarily modify your script so it always produces some visible output. For example, add the following right after the #!/bin/sh:
echo "Running scheduleSpider.sh at \`date\`" >> /tmp/scheduleSpider.sh.log
and see what's in /tmp/scheduleSpider.sh.log after a few minutes. (You can set the command to run every minute so you don't have to wait as long for results.) If that works (it should), you can add more echo commands to your script to see in detail what it's doing.
It looks like your script is designed to run only once; it creates the sync.txt file to prevent it from running again. That could be the root (ahem) of your problem. What that your intent? Did you mean to delete sync.txt after running the command, and just forgot to do it?
root's home directory on Ubuntu is /root. The first time your script runs, it should create /root/sync.txt. Does that file exist? If so, how old is it?
Note that curl someLink (assuming someLink is a valid URL) will just dump the content from the specified link to standard output. Was that your intent (it will show up as e-mail to root? Or did you just not show us the entire command?
First: you can substitute the first field with */5 (see man 5 crontab)
Second: have cron mail the output to your email address by entering MAILTO=your#email.address in your crontab. If the script has any output, it'll be mailed. Instead of that, you may have a local mailbox in which you can find the cron output (usually $MAIL).
A better syntax for you CRON is
*/5 * * * * /scr_temp/scheduleSpider.sh
Also, check the authority of your scheduleSpider.sh file. Cron runs under a different user than the one you are likely executing your program interactively, so it may be that cron does not have authority. Try chmod 777 for now, just to check.
I suggest to:
check that /scr_temp/scheduleSpider.sh has executable bit
set PATH properly inside your script or use absolute path to command (/bin/touch instead of touch)
specify absolute path to sync.txt file (or calculate it relatively to script)
Have you added the comand via crontab -e or just by editing the crontab file? You should use crontab -e to get it correctly updated.
Set the working directory in the cron script, it probably doesn't execute the things where you think it should.
You should add /bin/sh before the absolute path of your script.
*/5 * * * * /bin/sh /scr_temp/scheduleSpider.sh

why my svn backup shell script, works fine in terminal, but fails in crontab?

I have a svn backup script in a redhat linux. let's it called svnbackup.sh
It works fine, when I run it in terminal.
But when I put it into crontab, it will not bring the svnserve back, even the data is backuped correctly.
What's wrong with me???
killall svnserve
tar -zcf /svndir /backup/
svnserve -d -r /svndir
Usually, 'environment' is the problem in a cron job that works when run 'at the terminal' but not when it is run by cron. Most probably, your PATH is not set to include the directory where you keep svnserve.
Either use an absolute pathname for svnserve or set PATH appropriately in the script.
You can debug, in part, by adding a line such as:
env > /tmp/cron.job.env
to your script to see exactly how little environment is set when your cron job is run.
If you are trying to backup a live version of a repository, you probably should be using svnadmin hotcopy. That said, here are a few possibilities that come to mind as to what might be wrong:
You've put each of those statements as separate entries in your crontab (can't tell from the Q).
The svnserve command takes a password, which cron, in turn, cannot supply.
The svnserve command blocks or hangs indefinitely and gets killed by cron.
The command svnserve is not in your PATH in cron.
Assuming that svnserve does not take a password, this might fix the problem:
#! /bin/bash
# backup_and_restart_svnserve.sh
export PATH=/bin:/sbin:/usr/bin:/usr/local/bin # set up your path here
killall svnserve && \
tar -zcf /svndir /backup/ && \
svnserve -d -r /svndir >/dev/null 2>&1 &
Now, use "backup_and_restart_svnserve.sh" as the script to execute. Since it runs in the background, it should hopefully continue running even when cron executes the next task.

Resources