Passing calculation commands to cluster job

Passing calculation commands to cluster job - bash

TL;DR
Trying to pass a computation of the form $(($LSB_JOBINDEX-1)) to a cluster call, but getting an error
$((2-1)): syntax error: operand expected (error token is "$((2-1))")
How do I escape correctly or what alternative command to use so that this works?
Detailed:
For automatations in my workflow I am currently trying to write a script that automatically issues bsub commands in a predefined order.
Some of these commands are array jobs that are supposed to work on a file each.
If done without the cluster calls, it would look something like this:
samplearray=(sample0.fasta sample1.fasta) #array of input files
for s in samplearray
echo $s #some command on $s
done
for the cluster call I want to use an array job, the command for this looks like this:
bsub -J test[1-2] 'samplearray=(sample0.fastq sample1.fastq)' echo '${samplearray[$(($LSB_JOBINDEX-1))]}'
which launches two jobs with LSB_JOBINDEXset to 1 or 2 respectively, which is why I need to subtract 1 for correct indexing of the array.
The problem now is in the $((...)) part, because what is being executed on the node is ${samplearray[$\(\($LSB_JOBINDEX-1\)\)]} which does not trigger the computation but instead throws an error:
$((2-1)): syntax error: operand expected (error token is "$((2-1))")
What am I doing wrong here? I have tried other ways of escaping and quoting, but this was the closest I got to the correct solution

Related

Semaphore CI using environment conditions in "when" clauses

I'm trying to only run a Semaphore CI job when it's being ran by the automated scheduler.
To do this, the docs has the environment variable SEMAPHORE_WORKFLOW_TRIGGERED_BY_SCHEDULE.
I'm trying to use this in the when clause. e.g.
dependencies: [System Tests]
run:
when: "SEMAPHORE_WORKFLOW_TRIGGERED_BY_SCHEDULE = true"
This does not work, I get the error
Unprocessable YAML file.
Error: "Invalid 'when' condition on path '#/blocks/5/run/when':
Syntax error on line 1. - Invalid expression on the left of '=' operator."
Referring to the env variable with a $ doesn't work either, as apparently in the when clause it's an invalid character.
How can I only run the CI job when it's scheduled?

when doesn't support environment variables. It only supports the notation specified here: https://docs.semaphoreci.com/reference/conditions-reference/.
Let me get your use case straight. Do you want to run specific blocks based on whether the pipeline is executed from the scheduler or not? If that's the case, when will not help you, unfortunately.
However, I think I can provide a workaround by using an IF statement. You could add all the commands from the job inside an IF statement that checks SEMAPHORE_WORKFLOW_TRIGGERED_BY_SCHEDULE. This way, if the variable matches you run the commands, if not you skip all the commands and the job ends successfully there. Effectively "skipping" the job with almost no execution time. Something like, please check if the syntax is correct:
if [ SEMAPHORE_WORKFLOW_TRIGGERED_BY_SCHEDULE = true ]; then run commands; else echo "skip commands"; fi

How to prevent syntax errors when reading BASH associative array values which contain slashes from a child process?

I'm using bash 4.4.19(1)-release.
At the start of my program I read customer configuration values from the command line, configuration file(s), and the environment (in decreasing order of precedence). I validate these configuration values against internal definitions, failing out if required values are missing or if the customer values don't match against accepted regular expressions. This approach is a hard requirement and I'm stuck using BASH for this.
The whole configuration process involves the parsing of several YAML files and takes about a second to complete. I'd like to only have to do this once in order to preserve performance. Upon completion, all of the configured values are placed in a global associative array declared as follows:
declare -gA CONFIG_VALUES
A basic helper function has been written for accessing this array:
# A wrapper for accessing the CONFIG_VALUES array.
function get_config_value {
local key="${1^^}"
local output
output="${CONFIG_VALUES[${key}]}"
echo "$output"
}
This works perfectly fine when all of the commands are run within the same shell. This even works when the get_config_value function is called from a child process. Where this breaks down is when it's called from a child process and the value in the array contains slashes. This leads to errors such as the following (line 156 is "output="${CONFIG_VALUES[${key}]}"):
config.sh: line 156: path/to/some/file: syntax error: operand expected (error token is "/to/some/file")
This is particularly obnoxious because it seems to be reading the value "path/to/some/file" just fine. It simply decides to announce a syntax error after doing so and falls over dead instead of echoing the value.
I've been trying to circumvent this by running the array lookup in a subshell, capturing the syntax failure, and grepping it for the value I need:
# A wrapper for accessing the CONFIG_VALUES array.
function get_config_value {
local key="${1^^}"
local output
if output="$(echo "${CONFIG_VALUES[${key}]}" 2>&1)"; then
echo "$output"
else
grep -oP "(?<=: ).*(?=: syntax error: operand expected)" <<< "$output"
fi
}
Unfortunately, it seems that BASH won't let me ignore the "syntax error" like that. I'm not sure where to go from here (well... Python, but I don't get to make that decision).
Any ideas?

Export serialized array in bash

I'm trying to serialize an array in bash and then export it:
function serialize
{
for i in ${1[#]}; do
ret+=$i" "
done
return ${ret::-1}
}
MEASUREMENT_OUTPUT_FILES=( t1DQ.txt t1CS.txt t2RXe.txt t2e.txt )
export MEASUREMENT_OUTPUT_FILES=${serialize MEASUREMENT_OUTPUT_FILES[#]}
The code produces the following error:
MEASUREMENT_OUTPUT_FILES=${serialize MEASUREMENT_OUTPUT_FILES[#]}: bad
substitution
Any ideas what the correct syntax (error in the last line starting with export) would be?

I believe you want:
export MEASUREMENT_OUTPUT_FILES=$(serialize "${MEASUREMENT_OUTPUT_FILES[#]}")
(where $(...) is a notation for command substitution).
That said, your command is actually equivalent to
export MEASUREMENT_OUTPUT_FILES="${MEASUREMENT_OUTPUT_FILES[*]}"
so you don't need the serialize function unless you want to improve your serialization logic. (Which you should consider doing, IMHO: just joining with a space is error-prone, because what if one of the arguments includes a space?)
Edited to add: Also, I don't know how we all missed this before, but this:
return ${ret::-1}
actually needs to be this:
echo "${ret::-1}"
or this:
printf %s "${ret::-1}"
since return is for setting the exit status of a function, which must be an integer. (It's intended for indicating success, zero, vs. failure, nonzero, though some commands assign special meanings to multiple nonzero values.) What you want is for your function to "print" the files, so you can capture them.

Rundeck sharing variables across job steps

I want to share a variable across rundeck job steps.
Initialized a job option "target_files"
Set the variable on STEP 1.
RD_OPTION_TARGET_FILES=some bash command
echo $RD_OPTION_TARGET_FILES
The value is printed here.
Read the variable from STEP 2.
echo $RD_OPTION_TARGET_FILES
Step 3 doesn't recognize the variable set in STEP 1.
What's a good way of doing this on rundeck other than using environment variables?

The detailed procedure from RUNDECK 2.9+:
1) set the values - three methods:
1.a) use a "global variable" workflow step type
e.g. fill in: Group:="export", Name:="varOne", Value:="hello"
1.b) add to the workflow a "global log filter" (the Data Capture Plugin cited by 'Amos' here) which takes a regular expression that is evaluated on job step log outputs. For instance with a job step command like:
echo "CaptureThis:varTwo=world"
and a global log filter pattern like:
"CaptureThis:(.*?)=(.*)"
('Name Data' field not needed unless you supply a single capturing group in the pattern)
1.c) use a workflow Data Step to define multiple variables explicitly. Example contents:
varThree=foo
varFour=bar
2) get the values back:
you must use the syntax ${ctx.name} in command strings and args, and #ctx.name# within INLINE scripts. In our example, with a job step command or inline script line like:
echo "values : #export.varOne#, #data.varTwo#, #stub.varThree#, #stub.varFour#"
you'll echo the four values.
The context is implicitly 'data' for method 1.b and 'stub' for method 1.c.
Note that a data step is quite limitative! It only allows to benefit from #stub.name# notations within inline scripts. Value substitution is not performed in remote files, and notations like ${stub.name} are not available in job step command strings or arguments.

After Rundeck 2.9, there is a Data Capture Plugin to allow pass data between job steps.
The plugin is contained in the Rundeck application by default.
Data capture plugin to match a regular expression in a step’s log output and pass the values to later steps
Details see Data Capture/Data Passing between steps (Published: 03 Aug 2017)

Nearly there are no ways in job Inline scripts other than 1, exporting the value to env or 2, writing the value to a 3rd file at step1 and step2 reading it from there.
If you are using "Scriptfile or URL" method, may be you can execute step2 script with in script1 as a work around.. like
Script1
#!/bin/bash
. ./script2
In the above case script2 will execute in the same session as script1 so the variables and values are preserved.
EDIT
Earlier there was no such options but later there are available plugins. Hence check Amos's answer.

Elegant way to run shell command from ruby script and fail on shell command error

I know something like this is possible
out = `echo 1`
$?.to_i == 0 or raise “Failed"
Yet I’m unable to merge these 2 statements, so that the output will be captured into a variable and the command will fail (also printing the captured output) if the shell command returns with error.
Preferably into a 1 lines, if possible. Something like
out = `echo 1` && $?.to_i == 0 or raise “Failed. Output:” + out
only prettier.

Look at the Open3 class. It has a number of methods that will let you do what you want.
In particular, capture2 is the closest to what you're doing. From the docs:
::capture2 captures the standard output of a command.
stdout_str, status = Open3.capture2([env,] cmd... [, opts])
Pay attention to that optional env parameter. Without that your called application will have no environment information so you might want to consider passing in the ENV hash, allowing the child to have the same environment settings as the running code. If you want to restrict what is passed you can selectively add key/value pairs to a hash, or use ENV.dup then delete selected key/value pairs.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Passing calculation commands to cluster job - bash

Related

Semaphore CI using environment conditions in "when" clauses

How to prevent syntax errors when reading BASH associative array values which contain slashes from a child process?

Export serialized array in bash

Rundeck sharing variables across job steps

Elegant way to run shell command from ruby script and fail on shell command error

Categories

Resources