GNU Parallel job number leading zeros with multiple expansions

GNU Parallel job number leading zeros with multiple expansions - parallel-processing

I am using GNU Parallel to create a file of python jobs. I was looking to have the file look like such:
job_num, args
001, -a 1 -b 2
002, -a 1 -b 4
003, -a 2 -b 2
004, -a 2 -b 4
The idea being that each group of args can be configured at file generation while having leading zero job numbers.
Here is one thing I tried:
parallel --rpl '{0#} $_=sprintf("%02d",$job->seq())' echo {0#}, -a {1} -b {2} ::: 1 2 ::: 2 4
This results in:
01 01, -a 1 -b 2
02 02, -a 1 -b 4
03 03, -a 2 -b 2
04 04, -a 2 -b 4
The result that I did not expect was the double set of job numbers (with 3 expansions, it results in 3 job numbers). Any thoughts on how to make what I am trying work?
Tested on versions 20170322 & 20210222.
Related posts (tried the contents of each):
GNU Parallel with sequence number `{#}` and `-n` option
Linux shell script to add leading zeros to file names

Something like this:
parallel --rpl '{0#} 1 $f=1+int((log(total_jobs())/log(10))); $_=sprintf("%0${f}d",seq())' echo '{0#}, -a {1} -b {2}' ::: {1..9} ::: {1..12}
1+int((log(total_jobs())/log(10))) computes the width of total_jobs() in digits.

Related

How to make a bash script that will use cdhit on each file in the directory separately?

I have a directory with >500 multifasta files. I want to use the same program (cd-hit-est) to cluster sequences in each of the files and then save the output in another directory. I want the name to be the same of the file to be the same as in the original file.
for file in /dir/*.fasta;
do
echo "$file";
cd-hit-est -i $file -o /anotherdir/${file} -c 0.98 -n 9 -d 0 -M 120000 -T 32;
done
I get partial output and then an error:
...
^M# comparing sequences from 33876 to 33910
.................---------- new table with 34 representatives
^M# comparing sequences from 33910 to 33943
.................---------- new table with 33 representatives
^M# comparing sequences from 33943 to 33975
................---------- new table with 32 representatives
^M# comparing sequences from 33975 to 34006
................---------- new table with 31 representatives
^M# comparing sequences from 34006 to 34036
...............---------- new table with 30 representatives
^M# comparing sequences from 34036 to 34066
...............---------- new table with 30 representatives
^M# comparing sequences from 34066 to 35059
.....................
Fatal Error:
file opening failed
Program halted !!
---------- new table with 993 representatives
35059 finished 34719 clusters
No output file was produced. Could anyone help me understand where do I make a mistake?

doit() {
file="$1"
echo "$file";
cd-hit-est -i "$file" -o /anotherdir/$(basename "$transcriptome") -c 0.98 -n 9 -d 0 -M 120000 -T 32;
}
env_parallel doit ::: /dir/*.fasta

OK, it seems that I have an answer now, in any case if somebody is looking for a similar answer.
for file in /dir/*.fasta;
do
echo "$file";
cd-hit-est -i "$file" -o /anotherdir/$(basename "$transcriptome") -c 0.98 -n 9 -d 0 -M 120000 -T 32;
done
Calling the output file in another way did the trick.

How can I split a file with a custom suffix?

I have a file with 100s of filenames:
ax
bx
cx
...
...
...
112z
I want to split this file into files with 10 filenames each.
split -a 2 -d l 10 MASTERLIST
TRIAL 2 works: split -a 2 -d -l 10 MASTERLIST LIST_
But I want the numbering of files from 01 instead of 00. How can I do this? I know I have to use this:
-d, --numeric-suffixes[=FROM] use numeric suffixes instead of alphabetic.
FROM changes the start value (default 0).
But I am not sure how to use the FROM syntax.
Link: http://man7.org/linux/man-pages/man1/split.1.html

split -a 2 --numeric-suffixes=1 -l 10 MASTERLIST

How to split up function from arguments in bash?

I am writing a bash script named safeDel.sh with base functionalities including:
file [file1, file2, file3...]
-l
-t
-d
-m
-k
-r arg
For the single letter arguments I am using the built in function getops which works fine. The issue I'm having now is with the 'file' argument. The 'file' argument should take a list of files to be moved to a directory like this:
$ ./safeDel.sh file file1.txt file2.txt file3.txt
The following is a snippet of the start of my program :
#! /bin/bash
files=("$#")
arg="$1"
echo "arguments: $arg $files"
The echo statement shows the following:
$ arguments : file file
How can I split up the file argument from the files that have to be moved to the directory?

Assuming that the options processed by getopts have been shifted off the command line arguments list, and that a check has been done to ensure that at least two arguments remain, this code should do what is needed:
arg=$1
files=( "${#:2}" )
echo "arguments: $arg ${files[*]}"
files=( "${#:2}" ) puts all the command line arguments after the first into an array called files. See Handling positional parameters [Bash Hackers Wiki] for more information.
${files[*]} expands to the list of files in the files array inside the argument to echo. To safely expand the list in files for looping, or to pass to a command, use "${files[#]}". See Arrays [Bash Hackers Wiki].

This is a way you can achieve your needs:
#!/bin/bash
declare -a files="$#"
for fileToManage in ${files}; do
echo "Managing ... $fileToManage"
done
But it works only if there is no space in your file names, in which case you need to do some additional work.
Let me know if you need further help.

function getting_arguments {
# using windows powershell
echo #($args).GetType()
echo #($args).length
echo "#($args)[0]"
echo #($args)[0]
echo "#($args)[1..(#($args).length)]"
echo #($args)[1..(#($args).length)]
echo "debug: $(#($args)[0])" #($args)[1..(#($args).length)]
}
OUTPUT
PS C:\monkey> getting_arguments 1 2 3 4 5 6
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
6
#(1 2 3 4 5 6)[0]
1
#(1 2 3 4 5 6)[1..(#(1 2 3 4 5 6).length)]
2
3
4
5
6
debug: 1
2
3
4
5
6

Script to create a four character string based on known numerical relationships

Consider a three line input file containing four unique numbers (1,2,3,4) such that each line represents the position of one number relative to another number.
So for example in the following input set, 4 is next to 2, 2 is next to 3, and 1 is next to 4.
42
23
14
So given that how would a script assemble all four numbers in such a way that it maintains each numbers known relationship?
In other words there are two answers 1423 or 3241 but how to arrive at that programmatically?

Not very sensible or efficient, but fun (for me, at least) :-)
This will echo all the permutations using GNU Parallel:
parallel echo {1}{2}{3}{4} ::: {1..4} ::: {1..4} ::: {1..4} ::: {1..4}
And add some grepping on the end:
parallel echo {1}{2}{3}{4} ::: {1..4} ::: {1..4} ::: {1..4} ::: {1..4} | grep -E "42|24" | grep -E "23|32" | grep -E "14|41"
Output
1423
3241

Brute forcing the luck:
for (( ; ; ))
do
res=($(echo "42
23
14" | shuf))
if ((${res[0]}%10 == ${res[1]}/10 && ${res[1]}%10 == ${res[2]}/10))
then
echo "success: ${res[#]}"
break
fi
echo "fail: ${res[#]}"
done
fail: 42 14 23
fail: 42 23 14
fail: 42 14 23
success: 14 42 23
For 3 numbers, this approach is acceptable.
Shuf shuffles the input lines and fills the array res with the numbers.
Then we take to following numbers and test, if the last digit of the first matches the first digit of the next, and for the 2nd and 3rd number accordingly.
If so, we break with a success message. For debugging, a failure message is better than a silent endless loop.
For longer chains of numbers, a systematic permutation might be better to test and a function to check two following numbers, which can be called by index or better a loop would be suitable.

Split number string arbitrarily using bash into fixed number of variables

I have a string with 3000 elements (NOT in series) in bash,
sections='1 2 4 ... 3000'
I am trying to split this string into x chunks of length n.
I want x to be typically between 3-10. Each chunk may not be of
the same length.
Each chunk is the input to a job.
Looking at https://unix.stackexchange.com/questions/122499/bash-split-a-list-of-files
and using bash arrays, my first attempt looks like this:
#! /bin/bash
nArgs=10
nChunkSize=10
z="0 1 2 .. 1--"
zs=(${z// / })
echo ${zs[#]}
for i in $nArgs; do
echo "Creating argument: "$i
startItem=$i*$nChunkSize
zArg[$i] = ${zs[#]:($startItem:$chunkSize}
done
echo "Resulting args"
for i in $nArgs; do
echo "Argument"${zArgs[$1]}
done
The above is far from working I'm afraid. Any pointers on the ${zs[#]:($startItem:$chunkSize} syntax?
For an input of 13 elements:
z='0 1 2 3 4 5 6 7 8 10 11 12 15'
nChunks=3
and nArgs=4
I would like to obtain an array with 3 elements, zs with content
zs[0] = '0 1 2 3'
zs[1] = '4 5 6 7'
zs[2] = '8 10 11 12 15'
Each zs will be used as arguments to subsequent jobs.

First note: This is a bad idea. It won't work reliably with arbitrary (non-numeric) contents, as bash doesn't have support for nested arrays.
output=( )
sections_str='1 2 4 5 6 7 8 9 10 11 12 13 14 15 16 3000'
batch_size=4
read -r -a sections <<<"$sections_str"
for ((i=0; i<${#sections[#]}; i+=batch_size)); do
current_pieces=( "${sections[#]:i:batch_size}" )
output+=( "${current_pieces[*]}" )
done
declare -p output # to view your output
Notes:
zs=( $z ) is buggy. For example, any * inside your list will be replaced with a list of filenames in the current directory. Use read -a to read into an array in a reliable way that doesn't depend on shell configuration other than IFS (which can be controlled scoped to just that one line with IFS=' ' read -r -a).
${array[#]:start:count} expands to up to count items from your array, starting at position start.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

GNU Parallel job number leading zeros with multiple expansions - parallel-processing

Something like this: parallel --rpl '{0#} 1 $f=1+int((log(total_jobs())/log(10))); $_=sprintf("%0${f}d",seq())' echo '{0#}, -a {1} -b {2}' ::: {1..9} ::: {1..12} 1+int((log(total_jobs())/log(10))) computes the width of total_jobs() in digits.

Related

How to make a bash script that will use cdhit on each file in the directory separately?

How can I split a file with a custom suffix?

How to split up function from arguments in bash?

Script to create a four character string based on known numerical relationships

Split number string arbitrarily using bash into fixed number of variables

Categories

Resources