Using Sun Grid Engine, is there a way to run a command on the master node within the qsub submission script? If I run /bin/hostname from within a qsub script, I'm already on one of the queue computers and not the master node. In short, I want to run qstaton the job I just submitted automatically. If I try to run qstat as from one of the worker nodes, I get an error message telling me the worker node is neither a submit nor admin host.
I realize I can do this from outside of the qsub script, but the script defines many useful variables, such as the job name and sge job id.
If your aim is simply to get details on the submitted job, you may be better off using the environment variables provided by the submitting client ie, available within the job script. See the ENVIRONMENT VARIABLES section of the qsub manual page (man qsub):
ENVIRONMENTAL VARIABLES
SGE_ROOT Specifies the location of the Sun Grid Engine
standard configuration files.
SGE_CELL If set, specifies the default Sun Grid Engine
cell. To address a Sun Grid Engine cell qsub,
qsh, qlogin or qalter use (in the order of
precedence):
The name of the cell specified in the
environment variable SGE_CELL, if it is
set.
The name of the default cell, i.e.
default.
SGE_DEBUG_LEVEL
If set, specifies that debug information
should be written to stderr. In addition the
level of detail in which debug information is
generated is defined.
SGE_QMASTER_PORT
If set, specifies the tcp port on which
sge_qmaster(8) is expected to listen for com-
munication requests. Most installations will
use a services map entry for the service
"sge_qmaster" instead to define that port.
DISPLAY For qsh jobs the DISPLAY has to be specified
at job submission. If the DISPLAY is not set
by using the -display or the -v switch, the
contents of the DISPLAY environment variable
are used as default.
In addition to those environment variables specified to be
exported to the job via the -v or the -V option (see above)
qsub, qsh, and qlogin add the following variables with the
indicated values to the variable list:
SGE_O_HOME the home directory of the submitting client.
SGE_O_HOST the name of the host on which the submitting
client is running.
SGE_O_LOGNAME the LOGNAME of the submitting client.
SGE_O_MAIL the MAIL of the submitting client. This is
the mail directory of the submitting client.
SGE_O_PATH the executable search path of the submitting
client.
SGE_O_SHELL the SHELL of the submitting client.
SGE_O_TZ the time zone of the submitting client.
SGE_O_WORKDIR the absolute path of the current working
directory of the submitting client.
Furthermore, Sun Grid Engine sets additional variables into
the job's environment, as listed below.
ARC
SGE_ARCH The Sun Grid Engine architecture name of the
node on which the job is running. The name is
compiled-in into the sge_execd(8) binary.
SGE_CKPT_ENV Specifies the checkpointing environment (as
selected with the -ckpt option) under which a
checkpointing job executes. Only set for
checkpointing jobs.
SGE_CKPT_DIR Only set for checkpointing jobs. Contains
path ckpt_dir (see checkpoint(5) ) of the
checkpoint interface.
SGE_STDERR_PATH
the pathname of the file to which the stan-
dard error stream of the job is diverted.
Commonly used for enhancing the output with
error messages from prolog, epilog, parallel
environment start/stop or checkpointing
scripts.
SGE_STDOUT_PATH
the pathname of the file to which the stan-
dard output stream of the job is diverted.
Commonly used for enhancing the output with
messages from prolog, epilog, parallel
environment start/stop or checkpointing
scripts.
SGE_STDIN_PATH the pathname of the file from which the stan-
dard input stream of the job is taken. This
variable might be used in combination with
SGE_O_HOST in prolog/epilog scripts to
transfer the input file from the submit to
the execution host.
SGE_JOB_SPOOL_DIR
The directory used by sge_shepherd(8) to
store job related data during job execution.
This directory is owned by root or by a Sun
Grid Engine administrative account and com-
monly is not open for read or write access to
regular users.
SGE_TASK_ID The index number of the current array job
task (see -t option above). This is an unique
number in each array job and can be used to
reference different input data records, for
example. This environment variable is set to
"undefined" for non-array jobs. It is possi-
ble to change the predefined value of this
variable with -v or -V (see options above).
SGE_TASK_FIRST The index number of the first array job task
(see -t option above). It is possible to
change the predefined value of this variable
with -v or -V (see options above).
SGE_TASK_LAST The index number of the last array job task
(see -t option above). It is possible to
change the predefined value of this variable
with -v or -V (see options above).
SGE_TASK_STEPSIZE
The step size of the array job specification
(see -t option above). It is possible to
change the predefined value of this variable
with -v or -V (see options above).
ENVIRONMENT The ENVIRONMENT variable is set to BATCH to
identify that the job is being executed under
Sun Grid Engine control.
HOME The user's home directory path from the
passwd(5) file.
HOSTNAME The hostname of the node on which the job is
running.
JOB_ID A unique identifier assigned by the
sge_qmaster(8) when the job was submitted.
The job ID is a decimal integer in the range
1 to 99999.
JOB_NAME The job name. For batch jobs or jobs submit-
ted by qrsh with a command, the job name is
built as basename of the qsub script filename
resp. the qrsh command. For interactive jobs
it is set to `INTERACTIVE' for qsh jobs,
`QLOGIN' for qlogin jobs and `QRLOGIN' for
qrsh jobs without a command.
This default may be overwritten by the -N.
option.
JOB_SCRIPT The path to the job script which is executed.
The value can not be overwritten by the -v or
-V option.
LOGNAME The user's login name from the passwd(5)
file.
NHOSTS The number of hosts in use by a parallel job.
NQUEUES The number of queues allocated for the job
(always 1 for serial jobs).
NSLOTS The number of queue slots in use by a paral-
lel job.
PATH A default shell search path of:
/usr/local/bin:/usr/ucb:/bin:/usr/bin
SGE_BINARY_PATH
The path where the Sun Grid Engine binaries
are installed. The value is the concatenation
of the cluster configuration value
binary_path and the architecture name
$SGE_ARCH environment variable.
PE The parallel environment under which the job
executes (for parallel jobs only).
PE_HOSTFILE The path of a file containing the definition
of the virtual parallel machine assigned to a
parallel job by Sun Grid Engine. See the
description of the $pe_hostfile parameter in
sge_pe(5) for details on the format of this
file. The environment variable is only avail-
able for parallel jobs.
QUEUE The name of the cluster queue in which the
job is running.
REQUEST Available for batch jobs only.
The request name of a job as specified with
the -N switch (see above) or taken as the
name of the job script file.
RESTARTED This variable is set to 1 if a job was res-
tarted either after a system crash or after a
migration in case of a checkpointing job. The
variable has the value 0 otherwise.
SHELL The user's login shell from the passwd(5)
file. Note: This is not necessarily the shell
in use for the job.
TMPDIR The absolute path to the job's temporary
working directory.
TMP The same as TMPDIR; provided for compatibil-
ity with NQS.
TZ The time zone variable imported from
sge_execd(8) if set.
USER The user's login name from the passwd(5)
file.
SGE_JSV_TIMEOUT
If the response time of the client JSV is
greater than this timeout value, then the JSV
will attempt to be re-started. The default
value is 10 seconds, and this value must be
greater than 0. If the timeout has been
reached, the JSV will only try to re-start
once, if the timeout is reached again an
error will occur.
The client commands must be accessible from the node where the job runs. You can try supplying the full path to qstat, which may match where it resides on the head node. If not found, you'll have to install it on the compute nodes (or ask the admin to do that).
Edit: some admins don't like to allow this, since "qstat spam" may overload the server, on a busy enough system. If you can call, do so mindfully, being polite and not calling it every few seconds.
Related
I would like to run a command in my pipeline and then save the result in a variable to be used later on in the pipeline. The command I want to run is
gh release view | head -n 1 | cut -f 1
I can log into Github and everything else, so that is not a problem. My only issue is saving the result to a variable and using that variable.
How can I do this?
Unfortunately not. You must write the contents of the variable to file and use inputs and outputs to communicate between tasks. If you need to use the output between jobs, you'll also need a resource as described in the excerpt from https://docs.concourse.farm/power-tutorial/00-core-concepts
When inputs are passed between steps within a job they can remain just
that: inputs/outputs. For passing inputs/outputs between jobs, you
must use resources. A resource is an input/output set whose state is
retrieved/stored externally by a job, e.g. a git repo or an S3 object.
Of course, once a task receives an input from the previous task, it can then be read into a variable.
Given a project that consists of large number of bash scripts that are being launched from crontab periodically how can one track execution time of each script?
There is straightforward approach to edit each of those file by adding date
But what I really want is some kind of daemon that could track execution time and submit results to somewhere several times a day.
So the question is:
Is it possible to gather information about execution time of 200 bash scripts without editing each of them?
time module considered as fallback solution, if nothing better could be found
Depending on your systems cron implementation you may define the log-levels of the cron daemon. For ubuntus default vixie-cron setting log-level will log start and end of a job-execution which can then be analyzed.
On current LTS Ubuntu it works defining the log-level in /etc/init/cron
appending the -L 3 option to the exec line letting it look like:
exec cron -L 3
You could change your cron to run your scripts under time?
time scriptname
And pipe output to you logs.
I need to create some sort of fail safe in one of my scripts, to prevent it from being re-executed immediately after failure. Typically when a script fails, our support team reruns the script using a 3rd party tool. Which is usually ok, but it should not happen for this particular script.
I was going to echo out a time-stamp into the log, and then make a condition to see if the current time-stamp is at least 2 hrs greater than the one in the log. If so, the script will exit itself. I'm sure this idea will work. However, this got me curious to see if there is a way to pull in the last run time of the script from the system itself? Or if there is an alternate method of preventing the script from being immediately rerun.
It's a SunOS Unix system, using the Ksh Shell.
Just do it, as you proposed, save the date >some file and check it at the script start. You can:
check the last line (as an date string itself)
or the last modification time of the file (e.g. when the last date command modified the somefile
Other common method is create one specified lock-file, or pid-file such /var/run/script.pid, Its content is usually the PID (and hostname, if needed) of the process what created it. Of course, the file-modification time tell you when it is created, by its content you can check the running PID. If the PID doesn't exists, (e.g. pre process is died) and the file modification time is older as X minutes, you can start the script again.
This method is good mainly because you can use simply the cron + some script_starter.sh what will periodically check the script running status and restart it when needed.
If you want use system resources (and have root access) you can use the accton + lastcomm.
I don't know SunOS but probably knows those programs. The accton starts the system-wide accounting of all programs, (needs to be root) and the lastcomm command_name | tail -n 1 shows when the command_name is executed last time.
Check the man lastcomm for the command line switches.
I'm having trouble passing command parameters remotely to a "ForceCommand" program in ssh.
In my remote server I have this configuration in sshd_config:
Match User user_1
ForceCommand /home/user_1/user_1_shell
The user_1_shell program limits the commands the user can execute, in this case, I only allow the user to execute "set_new_mode param1 param2". Any other commands will be ignored.
So I expect that when a client logs in via ssh such as this one:
ssh user_1#remotehost "set_new_mode param1 param2"
The user_1_shell program seems to be executed, but the parameter string doesn't seem to be passed.
Maybe, I should be asking, does ForceCommand actually support this?
If yes, any suggestions on how I could make it work?
Thanks.
I found the answer. The remote server captures the parameter string and saves it in "$SSH_ORIGINAL_COMMAND" environment variable.
As already answered, the commandline sent from the ssh client is put into the SSH_ORIGINAL_COMMAND environment variable, only the ForcedCommand is executed.
If you use the information in SSH_ORIGINAL_COMMAND in your ForcedCommand you must take care of security implications. An attacker can augment your command with arbitrary additional commands by sending e.g. ; rm -rf / at the end of the commandline.
This article shows a generic script which can be used to lock down allowed parameters. It also contains links to relevant information.
The described method (the 'only' script) works as follows:
Make the 'only' script the ForcedCommand, and give it the allowed
command as its parameter. Note that more then one allowed command may be used.
Put a .onlyrules files into the home directory of user_1 and fill it with rules (regular expressions) which are matched against the
commandline sent by the ssh client.
Your example would look like:
Match User user_1
ForceCommand /usr/local/bin/only /home/user_1/user_1_shell
and if, for example, you want to allow as parameters only 'set_new_mode' with exactly two alphanumeric arbitrary parameters the .onlyrules file would look like this:
\:^/home/user_1/user_1_shell set_new_mode [[:alnum:]]\{1,\} [[:alnum:]]\{1,\}$:{p;q}
Note that for sending the command to the server you must use the whole commandline:
/home/user_1/user_1_shell set_new_mode param1 param2
'only' looks up the command on the server and uses its name for matching the rules. If any of these checks fail, the command is not run.
[Disclosure: I wrote sshdo which is described below]
There's a program called sshdo for doing this. It controls which commands may be executed via incoming ssh connections. It's available for download at:
http://raf.org/sshdo/ (read manual pages here)
https://github.com/raforg/sshdo/
It has a learning mode to allow all commands that are attempted, and a --learn option to produce the configuration needed to allow learned commands permanently. Then learning mode can be turned off and any other commands will not be executed.
It also has an --unlearn option to stop allowing commands that are no longer in use so as to maintain strict least privilege as requirements change over time.
It can also be configured manually.
It is very fussy about what it allows. It won't allow a command with any arguments. Only complete shell commands can be allowed. But it does support simple patterns to represent similar commands that vary only in the digits that appear on the command line (e.g. sequence numbers or date/time stamps).
It's like a firewall or whitelisting control for ssh commands.
I've got a shell script that runs genReport.sh in order to create a .pdf formatted report, and it works perfectly when it's run from the command line. The data source for the report is a ClearQuest database.
When it's run from a CRON job the .pdf file is created, except that only the various report and column headers are displayed, and the data of the report is missing. There are no errors reported to STDERR during the execution of the script.
This screams "environment variable" to me.
Currently, the shell script is defining the following:
CQ_HOME
BIRT_HOME
ODBCINI
ODBCINST
LD_LIBRARY_PATH
If it's an environmental thing, what part of the environment am I missing?
Without seeing the scripts, it's only guesswork. It could be a quoting issue or something having to do with a relative path to a file or executable that should be absolute. Often, the problem is that the directories listed in $PATH are different in cron's environment than they are in the user's. One thing you can do to aid in the diagnosis is add this line to your script:
env > /tmp/someoutputfilename.$$
and run the script from the command line and from cron and compare.
The magic for making this run turned out to be evaluating the output of the clearquest -dumpsh command, which in turn required that the TZ variable be set. That command outputs a dozen or so variables.