CircleCI: comma separated files for parallel tests - arguments

It's very easy to setup parallel tests for rspec or cucumber on CircleCI:
test:
override:
- bundle exec rspec:
parallel: true
files:
- spec/unit/sample.rb # can be a direct path to file
- spec/**/*.rb # or a glob (ruby globs)
However I try to split protractor tests. It takes comma separated files as a command line argument instead of space separated files. How can I achieve this without too much work?

You can try adding the following to your circle.yml:
test:
override:
- run () { echo $# | tr ' ' ',' | xargs protractor; }; run:
parallel: true
files: ..

Related

How to prevent yq removing comments and empty lines?

Here Edit yaml objects in array with yq. Speed up Terminalizer's terminal cast (record) I asked about how to edit yaml with yq. I received the best answer. But by default yq removes comments and empty lines. How to prevent this behavior?
input.yml
# Specify a command to be executed
# like `/bin/bash -l`, `ls`, or any other commands
# the default is bash for Linux
# or powershell.exe for Windows
command: fish -l
# Specify the current working directory path
# the default is the current working directory path
cwd: null
# Export additional ENV variables
env:
recording: true
# Explicitly set the number of columns
# or use `auto` to take the current
# number of columns of your shell
cols: 110
execute
yq -y . input.yml
result
command: fish -l
cwd: null
env:
recording: true
cols: 110
In some limited cases you could use diff/patch along with yq.
For example if input.yml contains your input text, the commands
$ yq -y . input.yml > input.yml.1
$ yq -y .env.recording=false input.yml > input.yml.2
$ diff input.yml.1 input.yml.2 > input.yml.diff
$ patch -o input.yml.new input.yml < input.yml.diff
creates a file input.yml.new with comments preserved but
recording changed to false:
# Specify a command to be executed
# like `/bin/bash -l`, `ls`, or any other commands
# the default is bash for Linux
# or powershell.exe for Windows
command: fish -l
# Specify the current working directory path
# the default is the current working directory path
cwd: null
# Export additional ENV variables
env:
recording: false
# Explicitly set the number of columns
# or use `auto` to take the current
# number of columns of your shell
cols: 110
This is improvement of How to prevent yq removing comments and empty lines? comment.
In mine case was no enough diff -B and diff -wB as it still does not keep blank lines and keep generate an entire file difference as a single chunk instead of many small chunks.
Here is example of the input (test.yml):
# This file is automatically generated
#
content-index:
timestamp: 1970-01-01T00:00:00Z
entries:
- dirs:
- dir: dir-1/dir-2
files:
- file: file-1.dat
md5-hash:
timestamp: 1970-01-01T00:00:00Z
- file: file-2.dat
md5-hash:
timestamp:
- file: file-3.dat
md5-hash:
timestamp:
- dir: dir-1/dir-2/dir-3
files:
- file: file-1.dat
md5-hash:
timestamp:
- file: file-2.dat
md5-hash:
timestamp:
If try to edit a field and generate the difference file:
diff -B test.yml <(yq -y ".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" test.yml)
It does keep remove blank lines:
5,7c2
<
< timestamp: 1970-01-01T00:00:00Z
<
---
> timestamp: '2022-01-01T00:00:00Z'
Adds everywhere null instead of an empty field and changes the rest of timestamp fields (which means you have to use '...' to retain these as is):
17,19c8,9
< md5-hash:
< timestamp: 1970-01-01T00:00:00Z
<
---
> md5-hash: null
> timestamp: '1970-01-01T00:00:00+00:00'
The -wB flags changes the difference file from a single chunk into multiple chunks, but still does remove blank lines.
Here is a mention of that diff issue: https://unix.stackexchange.com/questions/423186/diff-how-to-ignore-empty-lines/423188#423188
To fix that you have to use it with grep:
diff -wB <(grep -vE '^\s*$' test.yml) <(yq -y ".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" test.yml)
But nevertheless it still does remove comments:
1,2d0
< # This file is automatically generated
< #
Here is solution for that: https://unix.stackexchange.com/questions/17040/how-to-diff-files-ignoring-comments-lines-starting-with/17044#17044
So the complete oneliner is:
diff -wB <(grep -vE '^\s*(#|$)' test.yml) <(yq -y ".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" test.yml) | patch -o - test.yml 2>/dev/null
Where 2>/dev/null stands to ignore patch warnings like:
Hunk #1 succeeded at 6 (offset 4 lines).
To avoid it in real code, you can use the -s flag instead:
... | patch -s -o ...
Update:
CAUTION:
This is the previous implementation and has an issue with a line addition to the yaml file and left as an example of implementation. Search for more reliable implementation in the Update 2 section.
There is a better implementation as a shell script for GitHub Actions pipeline composite action.
GitHub Composite action: https://github.com/andry81-devops/gh-action--accum-content
Bash scripts (previous implementation):
Implementation: https://github.com/andry81-devops/gh-workflow/blob/ee5d2d5b6bf59299e39baa16bb85357cf34a8561/bash/github/init-yq-workflow.sh
Example of usage: https://github.com/andry81-devops/gh-workflow/blob/9b9d01a9b60a65d6c3c29f5b4b200409fc6a0aed/bash/cache/accum-content.sh
The implementation can use 2 of yq implementations:
https://github.com/kislyuk/yq - a jq wrapper (default Cygwin distribution)
https://github.com/mikefarah/yq - Go implementation (default Ubuntu 20.04 distribution)
Search for: yq_edit, yq_diff, yq_patch functions
Update 2:
There is another discussion with some more reliable workarounds:
yq write strips completely blank lines from the output : https://github.com/mikefarah/yq/issues/515
Bash scripts (new implementation):
Implementation: https://github.com/andry81-devops/gh-workflow/blob/master/bash/github/init-yq-workflow.sh
Example of usage: https://github.com/andry81-devops/gh-workflow/blob/master/bash/cache/accum-content.sh
# Usage example:
#
>yq_edit "<prefix-name>" "<suffix-name>" "<input-yaml>" "$TEMP_DIR/<output-yaml-edited>" \
<list-of-yq-eval-strings> && \
yq_diff "$TEMP_DIR/<output-yaml-edited>" "<input-yaml>" "$TEMP_DIR/<output-diff-edited>" && \
yq_restore_edited_uniform_diff "$TEMP_DIR/<output-diff-edited>" "$TEMP_DIR/<output-diff-edited-restored>" && \
yq_patch "$TEMP_DIR/<output-yaml-edited>" "$TEMP_DIR/<output-diff-edited-restored>" "$TEMP_DIR/<output-yaml-edited-restored>" "<output-yaml>"
#
# , where:
#
# <prefix-name> - prefix name part for files in the temporary directory
# <suffix-name> - suffix name part for files in the temporary directory
#
# <input-yaml> - input yaml file path
# <output-yaml> - output yaml file path
#
# <output-yaml-edited> - output file name of edited yaml
# <output-diff-edited> - output file name of difference file generated from edited yaml
# <output-diff-edited-restored> - output file name of restored difference file generated from original difference file
# <output-yaml-edited-restored> - output file name of restored yaml file stored as intermediate temporary file
Example with test.yml from above:
export GH_WORKFLOW_ROOT='<path-to-gh-workflow-root>' # https://github.com/andry81-devops/gh-workflow
source "$GH_WORKFLOW_ROOT/bash/github/init-yq-workflow.sh"
[[ -d "./temp" ]] || mkdir "./temp"
export TEMP_DIR="./temp"
yq_edit 'content-index' 'edit' "test.yml" "$TEMP_DIR/test-edited.yml" \
".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" && \
yq_diff "$TEMP_DIR/test-edited.yml" "test.yml" "$TEMP_DIR/test-edited.diff" && \
yq_restore_edited_uniform_diff "$TEMP_DIR/test-edited.diff" "$TEMP_DIR/test-edited-restored.diff" && \
yq_patch "$TEMP_DIR/test-edited.yml" "$TEMP_DIR/test-edited-restored.diff" "$TEMP_DIR/test.yml" "test-patched.yml" || exit $?
PROs:
Can restore blank lines together with standalone comment lines: # ...
Can restore line end comments: key: value # ...
Can detect a line remove/change/add altogether.
CONs:
Because of has having a guess logic, may leave artefacts or invalid corrections.
Does not restore line end comments, where the yaml data is changed.

Using a wildcard to generate output file, not just input file, failing with sh

Assume this file tree:
$PWD
____dir1
________file.one
________file.two
____dir2
________file.one
________file.two
I want to replace contents of file.one in each directory, with contents of corresponding file.two in that same directory.
I use the following code to accomplish this simple task:
cat ./*1/*.one > ./*1/*.two
cat ./*2/*.one > ./*2/*.two
It works as intended, BUT when I try to execute this like this:
/bin/sh -c 'cat ./*1/*.one > ./*1/*.two'
/bin/sh: ./*1/*.two: No such file or directory
The following error ^^ occurs.
NOTE: When I use Bash instead of Shell everything works even with -c flag.
The code in question is extremely fragile. If you generate your output name from your input file name, and avoid assuming that it already exists (and that there's exactly one match), you're on much firmer ground:
set -- ./*1/*.one # replace "$#" with list of matches for ./*1/*.one
for arg do # this is shorthand for: for arg in "$#"; do
[ -e "$arg" ] || continue # ignore files that don't exist (f/e, failed matches)
cat "$arg" >"${arg%.one}.two" # use a PE to replace .one with .two in output name
done
...which is to say, the following works as-intended on all POSIX-compliant shells:
sh -c 'set -- ./*1/*.one; for arg do [ -e "$arg" ] || continue; cat "$arg" >"${arg%.one}.two"; done'
See the bash-hackers' wiki page on parameter expansion for details on the ${arg%.one} syntax.

Append a command in a Bash script using a colon and an optional positional parameter?

I have a small Bash script that runs a command that is essentially one long string containing environment variables, ending in a path to a file.
function ios-run-test() {
thing="DEVICE_TARGET=abcde12345
DEVICE_ENDPOINT=http://192.168.1.1:37265
BUNDLE_ID='com.app.iPhoneEnterprise'
DISABLE_ADS=1
env=$1
DISABLE_LOGIN_INTERSTITIALS=1
bundle exec cucumber --tags ~#wip --tags ~#ignore --tags ~#android
~/Automation/ios-automation/features/$2.feature"
if [[ $3 ]]; then
add_this=$3
thing="${thing:$add_this}"
fi
echo ${thing}
eval universal-variables
eval ${thing}
}
Sometimes that command may end with a :some_integer, such as DEVICE_TARGET=abcde12345 DEVICE_ENDPOINT=http://192.168.1.1:37265 BUNDLE_ID='com.app.iPhoneEnterprise' DISABLE_ADS=1 env=production DISABLE_LOGIN_INTERSTITIALS=1 bundle exec cucumber --tags ~#wip --tags ~#ignore --tags ~#android ~/Automation/ios-automation/features/login.feature:5. This is where my problem lies. I have discovered Substring Extraction which is pretty neat, but is causing this if statement to fail:
if [[ $3 ]]; then
add_this=$3
thing="${thing:$add_this}"
fi
Instead of appending $thing to have ":$3" it is removing the first 3 characters of $thing. Is there some other way that I'd be able to take an optional positional parameter and append it to the command?
If you just want to append :$3, then change this line:
thing="${thing:$add_this}"
To this:
thing="${thing}:$add_this"
Appending values in Bash works by simply writing them one after the other.
The braces are optional in this example,
so simply thing="$thing:$add_this" is equivalent.
Inside ${...} you can perform various advanced operations based on a variable,
but none of that is necessary or relevant for your use case.

How to use gnu-parallel for processing a script with two inputs?

I am trying to run a Python script with two inputs as follows. I got ~300 of these two inputs so I wonder if somebody could advise how to run them with parallel.
The single run looks like:
python stable.py KOG_1.fan KOG_1.fasta > KOG_1.stable
My test with parallel which is not working:
ls *.fan; ls *.fasta | parallel python stable.py {} {} > {.}.stable
but how do I specify that is has to run with _1.fan and _1.fasta; then _2.fan and _1.fasta and so on... until _300.fan and _300.fasta.
This is not really a Python question, it's a question about GNU parallel. You could try this if all files are prefixed with "KOG_":
seq 1 300 | parallel python stable.py KOG_{}.fan KOG_{}.fasta ">" KOG_{.}.stable
The quotes around the redirect (">") are important, unless you want all of the output in one file.
To handle generic prefixes:
ls *fan *fasta | parallel ---max-lines=2 python stable.py {1} {2} ">" {1.}.stable
This uses the -max-lines option to take 2 lines per command. Of course this works only if the *.fan and *.fasta files match up, i.e. there must be the same number of each, and the numbers need to match up, otherwise you'll end up pairing files that shouldn't be paired. If that is a problem, you can figure out a command that will more robustly feed pairs to parallel.
Try:
parallel python stable.py {} {.}.fasta '>' {.}.stable ::: *fan
I recommend you split this task in two steps:
Create a jobs file containing all commands you want to run with
parallel.
You need to create a text file jobs.txt that should be similar to the one presented bellow:
python stable.py KOG_1.fan KOG_1.fasta > KOG_1.stable
python stable.py KOG_2.fan KOG_2.fasta > KOG_2.stable
python stable.py KOG_3.fan KOG_3.fasta > KOG_3.stable
python stable.py KOG_4.fan KOG_4.fasta > KOG_4.stable
...
python stable.py KOG_300.fan KOG_300.fasta > KOG_300.stable
If all your files are prefixed with KOG, you can build up this file this way:
for I in `seq 300`; do echo "python stable.py KOG_$I.fan KOG_$I.fasta > KOG_$I.stable" >> jobs.txt; done;
Run parallel using the jobs file
Once you have the jobs file, you just need to run the following command:
parallel -j4 < jobs.txt
Note that -j4 indicates that at most 4 commands from your jobs file will be running in parallel. You can adjust that according to the number of cores available on your computer.

BASH Set multiple variables depending on number of lines then pass them as command arguments

I have this output after command:
$some_command
id
274725
275050
275065
277560
277801
277814
277817
277862
277863
I need ignore the first line and set all numbers as variables:
var1=274725
var2=275050
...
...
var9=277863
then I want to run command with those variables as parameters:
$some_cmd $var1 $var2 ...$varN
with my very limited knowledge I have idea how to figure out the number of variables:
varN1=`some_command | wc -l`
varsN=`expr $varN1 - 1`
But this won't be easiest solution and I don't know how to make the loop where I will set the variables and how to construct the command. Perhaps something like?:
for i in (1...$varsN)
do
var$i=.... ?
It looks like you're putting them into variables simply so you can pass them to another function/program. That's not actually necessary.
xargs, the program that runs multiple child tasks on groups of input parameters, has an option to limit the number of parametres it passes to each of it's tasks:
pax> echo 'id
274725
275050
275065
277560
277801
277814
277817
277862
277863' | awk 'NR>1{print}' | xargs -n 4 echo runcommand
runcommand 274725 275050 275065 277560
runcommand 277801 277814 277817 277862
runcommand 277863
The awk command simply strips out the first line then xargs -n 4 will run the child task echo runcommand for each grouping of up to that size.
To run a command with all the numbers in your files as separate arguments, you can use
yourcommand $(some_command | sed 1d)
Note that this assumes that your lines are all numbers, and not strings with spaces and other ascii characters you want to preserve.

Resources