I use Hbase Indexer with Lily Solr (Lucene 4.4) with my CDH4 System.
I have 2 servers where lily-solr runs and 5 Indexer Configurations to Index different data datatype , I too use lily-server for storing data in Hbase.
When I add indexer configuration and request for batch build , initially it indexes all data and gradually it deletes all indexes .
I use the following way to add index :
asset_idx01
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n asset_idx01 -z spotlight-prod-cluster01 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster01:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c asset_indexconfig.xml -cp lily.zk=spotlight-prod-cluster01
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n asset_idx01 --batch BUILD_REQUESTED
asset_idx02
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n asset_idx02 -z spotlight-prod-cluster02 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster02:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c asset_indexconfig.xml -cp lily.zk=spotlight-prod-cluster02
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n asset_idx02 --batch BUILD_REQUESTED
brightcove_idx01
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n brightcove_idx01 -z spotlight-prod-cluster01 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster01:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c brightcove_indexconfig.xml -cp lily.zk=spotlight-prod-cluster01
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n brightcove_idx01 --batch BUILD_REQUESTED
brightcove_idx02
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n brightcove_idx02 -z spotlight-prod-cluster02 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster02:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c brightcove_indexconfig.xml -cp lily.zk=spotlight-prod-cluster02
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n brightcove_idx02 --batch BUILD_REQUESTED
fortune_idx01
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n fortune_idx01 -z spotlight-prod-cluster01 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster01:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c fortune_indexconfig.xml -cp lily.zk=spotlight-prod-cluster01
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n fortune_idx01 --batch BUILD_REQUESTED
fortune_idx02
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n fortune_idx01 -z spotlight-prod-cluster02 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster02:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c fortune_indexconfig.xml -cp lily.zk=spotlight-prod-cluster02
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n fortune_idx02 --batch BUILD_REQUESTED
populist_idx01
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n populist_idx01 -z spotlight-prod-cluster01 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster01:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c indexconfig.xml -cp lily.zk=spotlight-prod-cluster01
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n populist_idx01 --batch BUILD_REQUESTED
populist_idx02
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n populist_idx02 -z spotlight-prod-cluster02 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster01:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c indexconfig.xml -cp lily.zk=spotlight-prod-cluster02
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n populist_idx02 --batch BUILD_REQUESTED
twitter_stat_idx01
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n twitter_stats_idx01 -z spotlight-prod-cluster01 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster01:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c twitter_stats_indexconfig.xml -cp lily.zk=spotlight-prod-cluster01
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n twitter_stats_idx01 --batch BUILD_REQUESTED
twitter_stat_idx02
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n twitter_stats_idx02 -z spotlight-prod-cluster02 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster02:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c twitter_stats_indexconfig.xml -cp lily.zk=spotlight-prod-cluster02
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n twitter_stats_idx02 --batch BUILD_REQUESTED
Can anyone please help me in knowing the various root cause, So that I can rectify.
I follow the following steps
Import Data to Hbase
Start Lily Server
Start Hbase Indexer
Add Indexer Configuration to Hbase
Request Batch Build as mentioned above.
Related
I have a bash script:
#!/bin/bash
time srun -p my_partition -c 1 --mem=4G my_code -i my_file_1 -o my_output_file_1
time srun -p my_partition -c 1 --mem=4G my_code -i my_file_2 -o my_output_file_2
time srun -p my_partition -c 1 --mem=4G my_code -i my_file_3 -o my_output_file_3
time srun -p my_partition -c 1 --mem=4G my_code -i my_file_4 -o my_output_file_4
I want to know the average memory usage for each step (printed after the real/user/sys time) while the script is running.
I have tried
#!/bin/bash
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_1 -o my_output_file_1
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_2 -o my_output_file_2
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_3 -o my_output_file_3
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_4 -o my_output_file_4
sstat -a -j my_job --format=JobName,AveRSS,MaxRSS
You can try
#!/bin/bash
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_1 -o my_output_file_1
sstat -j ${SLURM_JOB_ID}.1 --format=JobName,AveRSS,MaxRSS
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_2 -o my_output_file_2
sstat -j ${SLURM_JOB_ID}.2 --format=JobName,AveRSS,MaxRSS
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_3 -o my_output_file_3
sstat -j ${SLURM_JOB_ID}.3 --format=JobName,AveRSS,MaxRSS
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_4 -o my_output_file_4
sstat -j ${SLURM_JOB_ID}.4 --format=JobName,AveRSS,MaxRSS
I am very new in writing code, so it might be a silly question, but an answer is highly appreciated to enhance my learning. I have written a simple bash script as below. But how can I optimize this code by using loop, array? I can understand if I use two loops, I can make the lines of code shorter. Please help:
#!/bin/bash
zs=10.0.3.10
zb=/usr/local/bin/zabbix_sender
zh=zabbix
# ql1 = queue link
ql1=https://sqs.us-west-2.amazonaws.com/843390035802/testService1
val1=$(aws sqs get-queue-attributes --queue-url $ql1 --attribute-names ApproximateNumberOfMessages --region us-west-2 --output text | awk '{print $2}')
echo "$ql1 count is $val1"
$zb -z $zs -s $zh -k testService1 -o val1 >/dev/null 2>&1
ql2=https://sqs.us-west-2.amazonaws.com/853390078801/testService2
val2=$(aws sqs get-queue-attributes --queue-url $ql2 --attribute-names ApproximateNumberOfMessages --region us-west-2 --output text | awk '{print $2}')
echo "$ql2 count is $val2"
$zb -z $zs -s $zh -k testService2 -o val2 >/dev/null 2>&1
ql3=https://sqs.us-west-2.amazonaws.com/843393305801/testService3
val3=$(aws sqs get-queue-attributes --queue-url $ql3 --attribute-names ApproximateNumberOfMessages --region us-west-2 --output text | awk '{print $2}')
echo "$ql3 count is $val3"
$zb -z $zs -s $zh -k testService3 -o val3 >/dev/null 2>&1
ql4=https://sqs.us-west-2.amazonaws.com/875660005801/testService4
val4=$(aws sqs get-queue-attributes --queue-url $ql4 --attribute-names ApproximateNumberOfMessages --region us-west-2 --output text | awk '{print $2}')
echo "$ql4 count is $val4"
$zb -z $zs -s $zh -k testService4 -o val4 >/dev/null 2>&1
ql5=https://sqs.us-west-2.amazonaws.com/843390635802/testService5
val5=$(aws sqs get-queue-attributes --queue-url $ql5 --attribute-names ApproximateNumberOfMessages --region us-west-2 --output text | awk '{print $2}')
echo "$ql5 count is $val5"
$zb -z $zs -s $zh -k testService2 -o val5 >/dev/null 2>&1
In above code at this step
$zb -z $zs -s $zh -k testService2 -o val5 >/dev/null 2>&1 I used -k as different 5 values. So how can I arrange it and work the code as same as above?
One loop is sufficient to eliminate the code duplication, and we don't need an array - we can read one queue link after the other in the loop. The variable argument to the option -k can be extracted from the queue link by removing the URL part up to to last / with the shell parameter expansion ${parameter##word}.
zs=10.0.3.10
zb=/usr/local/bin/zabbix_sender
zh=zabbix
# ql = queue link
while read ql
do
val=$(aws sqs get-queue-attributes --queue-url $ql --attribute-names ApproximateNumberOfMessages --region us-west-2 --output text | awk '{print $2}')
echo "$ql count is $val"
$zb -z $zs -s $zh -k ${ql##*/} -o $val >/dev/null 2>&1
done <<END
https://sqs.us-west-2.amazonaws.com/843390035802/testService1
https://sqs.us-west-2.amazonaws.com/853390078801/testService2
https://sqs.us-west-2.amazonaws.com/843393305801/testService3
https://sqs.us-west-2.amazonaws.com/875660005801/testService4
https://sqs.us-west-2.amazonaws.com/843390635802/testService5
END
I need to build a bash command in a script depending on some cuote or normal parameters. For example:
BAYES)
class="weka.classifiers.bayes.BayesNet"
A="-D -Q weka.classifiers.bayes.net.search.local.K2 -- -P 1 -S BAYES -E"
B="weka.classifiers.bayes.net.estimate.SimpleEstimator -- -A 0.5" ;;
LOGISTIC)
class="weka.classifiers.functions.Logistic"
A="-R 1.0E-8 -M -1 -num-decimal-places 4" ;;
SIMPLELOG)
class="weka.classifiers.functions.SimpleLogistic"
A="-I 0 -M 500 -H 50 -W 0.0" ;;
SMO)
class="weka.classifiers.functions.SMO"
A="-C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K"
A1="weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0" ;;
IBK)
class="weka.classifiers.lazy.IBk"
A="-K 1 -W 0 -A "
A1="weka.core.neighboursearch.LinearNNSearch -A"
A2="weka.core.EuclideanDistance -R first-last" ;;
KSTAR)
class="weka.classifiers.lazy.KStar"
A="-B 20 -M a" ;;
...
java -Xmx"$mem"m -cp "$WEKA_INSTALL_DIR/weka.jar" $class -s $i -t "$file" $A "$A1" $B "$B1"
However, my problem is that in some conditions, when $A1 is empty, the "$A1" parameter is not valid. The same with "$B1". And the parameter could be in any combination ($A1 with $B1, $A1 without $B2, ...).
Also I've tried include $A1 in $A as following:
A="-C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K \"weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0\""
and execute:
java -Xmx"$mem"m -cp "$WEKA_INSTALL_DIR/weka.jar" $class -s $i -t "$file" $A
but this doesn't work.
You cannot safely and reliably store multiple arguments in a single string; you need to use arrays; this is their intended use case. Make sure to initialize any arrays that won't be used, so that they "disappear" when expanded.
# If A is undefined, "${A[#]}" is an empty string.
# But if A=(), then "${A[#]}" simply disappears from the command line.
A=()
B=()
A1=()
A2=()
case $something in
BAYES)
class="weka.classifiers.bayes.BayesNet"
A=(-D -Q weka.classifiers.bayes.net.search.local.K2 -- -P 1 -S BAYES -E)
B=(weka.classifiers.bayes.net.estimate.SimpleEstimator -- -A 0.5);;
LOGISTIC)
class="weka.classifiers.functions.Logistic"
A=(-R 1.0E-8 -M -1 -num-decimal-places 4);;
SIMPLELOG)
class="weka.classifiers.functions.SimpleLogistic"
A=(-I 0 -M 500 -H 50 -W 0.0) ;;
SMO)
class="weka.classifiers.functions.SMO"
A=(-C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K)
A1=(weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0) ;;
IBK)
class="weka.classifiers.lazy.IBk"
A=(-K 1 -W 0 -A)
A1=(weka.core.neighboursearch.LinearNNSearch -A)
A2=(weka.core.EuclideanDistance -R first-last);;
KSTAR)
class="weka.classifiers.lazy.KStar"
A=(-B 20 -M a) ;;
esac
and always quote parameter expansions.
java -Xmx"$mem"m -cp "$WEKA_INSTALL_DIR/weka.jar" \
"$class" -s "$i" -t "$file" "${A[#]}" "${A1[#]}" "${B[#]}" "${B1[#]}"
SOLUTION:
I solved all my problems using only a parameter A like this:
BAYES)
class="weka.classifiers.bayes.BayesNet"
A=(-D -Q weka.classifiers.bayes.net.search.local.K2 -- -P 1 -S BAYES -E weka.classifiers.bayes.net.estimate.SimpleEstimator -- -A 0.5);;
SMO)
class="weka.classifiers.functions.SMO"
A=(-C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0");;
java -Xmx"$mem"m -cp "$WEKA_INSTALL_DIR/weka.jar" $class -s $i -t "$file" "${A[#]}"
From your question, I did:
Initialized the variables
Completed the case statement
Removed some not required double quotes
Defined some variables for which you did not provide values for
backslash your double quotes if you must have then in the java command
If you need double quotes for certain variables, put these in the variables. This way you will not have "" in your java command if the variables is empty. I did this for A1 in case IBK.
This will get you started, modify as required:
#!/bin/bash
#
mem="512"
WEKA_INSTALL_DIR='/opt/weka'
class=""
i="value-of-i"
A=""
A1=""
B=""
B1=""
file="SOMEFILE"
case $1 in
'BAYES')
class="weka.classifiers.bayes.BayesNet"
A="-D -Q weka.classifiers.bayes.net.search.local.K2 -- -P 1 -S BAYES -E"
B="weka.classifiers.bayes.net.estimate.SimpleEstimator -- -A 0.5"
;;
'LOGISTIC')
class="weka.classifiers.functions.Logistic"
A="-R 1.0E-8 -M -1 -num-decimal-places 4"
;;
'SIMPLELOG')
class="weka.classifiers.functions.SimpleLogistic"
A="-I 0 -M 500 -H 50 -W 0.0"
;;
'SMO')
class="weka.classifiers.functions.SMO"
A="-C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K"
A1="weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0"
;;
'IBK')
class="weka.classifiers.lazy.IBk"
A="-K 1 -W 0 -A "
A1="\"weka.core.neighboursearch.LinearNNSearch -A\""
A2="weka.core.EuclideanDistance -R first-last"
;;
'KSTAR')
class="weka.classifiers.lazy.KStar"
A="-B 20 -M a"
;;
*)
# default options
;;
esac
echo java -Xmx${mem}m -cp $WEKA_INSTALL_DIR/weka.jar $class -s $i -t $file $A $A1 $B $B1
Example:
./test.bash LOGISTIC
java -Xmx512m -cp /opt/weka/weka.jar weka.classifiers.functions.Logistic -s value-of-i -t SOMEFILE -R 1.0E-8 -M -1 -num-decimal-places 4
./test.bash IBK
java -Xmx512m -cp /opt/weka/weka.jar weka.classifiers.lazy.IBk -s value-of-i -t SOMEFILE -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A"
I'm new to SLURM. I want to process a list of files assembled_reads/*.sorted.bam in parallel. With the code below, however only one process is being used over and over again.
#!/bin/bash
#
#SBATCH --job-name=****
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=24
#SBATCH --partition=short
#SBATCH --time=12:00:00
#SBATCH --array=1-100
#SBATCH --mem-per-cpu=16000
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=****#***.edu
srun hostname
for FILE in assembled_reads/*.sorted.bam; do
echo ${FILE}
OUTFILE=$(basename ${FILE} .sorted.bam).raw.snps.indels.g.vcf
PLDY=$(awk -F "," '$1=="$FILE"{print $4}' metadata.csv)
PLDYNUM=$( [[$PLDY = "haploid" ]] && echo "1" || echo "2")
srun java -Djava.io.tmpdir="tmp" -jar GenomeAnalysisTK.jar \
-R scaffs_HAPSgracilaria92_50REF.fasta \
-T HaplotypeCaller \
-I ${${SLURM_ARRAY_TASK_ID}} \
--emitRefConfidence GVCF \
-ploidy $PLDYNUM \
-nt 1 \
-nct 24 \
-o $OUTFILE
sleep 1 # pause to be kind to the scheduler
done
You are creating a job array but are not using it. You should replace the for-loop with an indexing of the files based on the slurm job array id:
#!/bin/bash
#
#SBATCH --job-name=****
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=24
#SBATCH --partition=short
#SBATCH --time=12:00:00
#SBATCH --array=0-99
#SBATCH --mem-per-cpu=16000
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=****#***.edu
srun hostname
FILES=(assembled_reads/*.sorted.bam)
FILE=${FILES[$SLURM_ARRAY_TASK_ID]}
echo ${FILE}
OUTFILE=$(basename ${FILE} .sorted.bam).raw.snps.indels.g.vcf
PLDY=$(awk -F "," '$1=="$FILE"{print $4}' metadata.csv)
PLDYNUM=$( [[$PLDY = "haploid" ]] && echo "1" || echo "2")
srun java -Djava.io.tmpdir="tmp" -jar GenomeAnalysisTK.jar \
-R scaffs_HAPSgracilaria92_50REF.fasta \
-T HaplotypeCaller \
-I ${${SLURM_ARRAY_TASK_ID}} \
--emitRefConfidence GVCF \
-ploidy $PLDYNUM \
-nt 1 \
-nct 24 \
-o $OUTFILE
Just make sure to adapt the value of --array to be equal to the number of files to process.
I'm working with some big script which produce some new input.sh file which consist of sequence of the commands step-by-step used as the input for some program called here pmemd.
Here I'd like to focus on how I obtain this input.sh. Using printf I have the following :
printf "#setenv CUDA_VISIBLE_DEVICES "1"\nmpirun -np 8 pmemd.MPI -O -i ../in/minim.in -o minim.out -p ./protein.parm7 -c ./protein.inpcrd -r minim.restrt\npmemd.cuda -O -i ../in/equil.in -o equil.out -p ./protein.parm7 -c ./minim.restrt -r equil.restrt -x equil.nc -ref ./minim.restrt\npmemd.cuda -O -i ../in/equil2.in -o equil2.out -p ./protein.parm7 -c equil.restrt -r equil2.restrt -x equil2.nc -ref equil.restrt\npmemd.cuda -O -i ../in/equil3.in -o equil3.out -p ./protein.parm7 -c equil2.restrt -r equil3.restrt -x equil3.nc -ref equil2.restrt\npmemd.cuda -O -i ../in/equil4.in -o equil4.out -p ./protein.parm7 -c equil3.restrt -r equil4.restrt -x equil4.nc -ref equil3.restrt\npmemd.cuda -O -i ../in/equil5.in -o equil5_1.out -p ./protein.parm7 -c equil4.restrt -r equil5_1.restrt -x equil5.nc -ref equil4.restrt\npmemd.cuda -O -i ../in/equil5.in -o equil5_2.out -p ./protein.parm7 -c equil5_1.restrt -r equil5_2.restrt -x equil5.nc -ref equil5_1.restrt\npmemd.cuda -O -i ../in/equil5.in -o equil5_3.out -p ./protein.parm7 -c equil5_2.restrt -r equil5_3.restrt -x equil5.nc -ref equil5_2.restrt\npmemd.cuda -O -i ../in/equil5.in -o equil5_4.out -p ./protein.parm7 -c equil5_3.restrt -r equil5_4.restrt -x equil5.nc -ref equil5_3.restrt\npmemd.cuda -O -i ../in/equil5.in -o equil5_5.out -p ./protein.parm7 -c equil5_4.restrt -r equil5_5.restrt -x equil5.nc -ref equil5_4.restrt\npmemd.cuda -O -i ../in/md.in -o md_${P}_${com_n}.out -p ./protein.parm7 -c equil5_5.restrt -r md.restrt -x md.nc" > ./input.sh
which result in the input.sh eventually consisted of the
#setenv CUDA_VISIBLE_DEVICES 1
mpirun -np 8 pmemd.MPI -O -i ../in/minim.in -o minim.out -p ./protein.parm7 -c ./protein.inpcrd -r minim.restrt
pmemd.cuda -O -i ../in/equil.in -o equil.out -p ./protein.parm7 -c ./minim.restrt -r equil.restrt -x equil.nc -ref ./minim.restrt
pmemd.cuda -O -i ../in/equil2.in -o equil2.out -p ./protein.parm7 -c equil.restrt -r equil2.restrt -x equil2.nc -ref equil.restrt
pmemd.cuda -O -i ../in/equil3.in -o equil3.out -p ./protein.parm7 -c equil2.restrt -r equil3.restrt -x equil3.nc -ref equil2.restrt
pmemd.cuda -O -i ../in/equil4.in -o equil4.out -p ./protein.parm7 -c equil3.restrt -r equil4.restrt -x equil4.nc -ref equil3.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_1.out -p ./protein.parm7 -c equil4.restrt -r equil5_1.restrt -x equil5.nc -ref equil4.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_2.out -p ./protein.parm7 -c equil5_1.restrt -r equil5_2.restrt -x equil5.nc -ref equil5_1.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_3.out -p ./protein.parm7 -c equil5_2.restrt -r equil5_3.restrt -x equil5.nc -ref equil5_2.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_4.out -p ./protein.parm7 -c equil5_3.restrt -r equil5_4.restrt -x equil5.nc -ref equil5_3.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_5.out -p ./protein.parm7 -c equil5_4.restrt -r equil5_5.restrt -x equil5.nc -ref equil5_4.restrt
pmemd.cuda -O -i ../in/md.in -o md_OR5P3_plus-carvone.out -p ./protein.parm7 -c equil5_5.restrt -r md.restrt -x md.nc
as you can see this method is not very comfortable because it's not easy to edit containt of the input.sh within the initial script when I need to add some modification of future input.sh. Also As you notice the file consist of fragment which i need to obtain as the result of the some looping because it consist of several steps like 5_1, 5_2 etc (here each next step use the previous one as the input)
pmemd.cuda -O -i ../in/equil5.in -o equil5_1.out -p ./protein.parm7 -c equil4.restrt -r equil5_1.restrt -x equil5.nc -ref equil4.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_2.out -p ./protein.parm7 -c equil5_1.restrt -r equil5_2.restrt -x equil5.nc -ref equil5_1.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_3.out -p ./protein.parm7 -c equil5_2.restrt -r equil5_3.restrt -x equil5.nc -ref equil5_2.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_4.out -p ./protein.parm7 -c equil5_3.restrt -r equil5_4.restrt -x equil5.nc -ref equil5_3.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_5.out -p ./protein.parm7 -c equil5_4.restrt -r equil5_5.restrt -x equil5.nc -ref equil5_4.restrt
So I'd thankful if someone provide me with the ideas of how to add For or While loop within such printf operator or its alternatives.
Thanks for help!!
Gleb
Here's a shell script that generates most of the lines in your post:
prev=minim
for i in '' $(seq 2 4); do
cur=equil$i
printf 'pmemd.cuda -O -i ../in/%s.in -o %s.out -p ./protein.parm7 -c %s.restrt -r %s.restrt -x %s.nc -ref %s.restrt\n' $cur $cur $prev $cur $cur $prev
prev=$cur
done
for i in $(seq 5); do
cur=equil5_$i
printf 'pmemd.cuda -O -i ../in/equil5.in -o %s.out -p ./protein.parm7 -c %s.restrt -r %s.restrt -x equil5.nc -ref %s.restrt\n' $cur $prev $cur $prev
prev=$cur
done
This generates:
pmemd.cuda -O -i ../in/equil.in -o equil.out -p ./protein.parm7 -c minim.restrt -r equil.restrt -x equil.nc -ref minim.restrt
pmemd.cuda -O -i ../in/equil2.in -o equil2.out -p ./protein.parm7 -c equil.restrt -r equil2.restrt -x equil2.nc -ref equil.restrt
pmemd.cuda -O -i ../in/equil3.in -o equil3.out -p ./protein.parm7 -c equil2.restrt -r equil3.restrt -x equil3.nc -ref equil2.restrt
pmemd.cuda -O -i ../in/equil4.in -o equil4.out -p ./protein.parm7 -c equil3.restrt -r equil4.restrt -x equil4.nc -ref equil3.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_1.out -p ./protein.parm7 -c equil4.restrt -r equil5_1.restrt -x equil5.nc -ref equil4.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_2.out -p ./protein.parm7 -c equil5_1.restrt -r equil5_2.restrt -x equil5.nc -ref equil5_1.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_3.out -p ./protein.parm7 -c equil5_2.restrt -r equil5_3.restrt -x equil5.nc -ref equil5_2.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_4.out -p ./protein.parm7 -c equil5_3.restrt -r equil5_4.restrt -x equil5.nc -ref equil5_3.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_5.out -p ./protein.parm7 -c equil5_4.restrt -r equil5_5.restrt -x equil5.nc -ref equil5_4.restrt
I might be misunderstanding your question, but it seems to be that you're not comfortable with such a big giant printf line. So, why not just divide it up. You can use >> to append to a file. This way, each line in your input.sh is a separate printf line:
printf "#setenv CUDA_VISIBLE_DEVICES \"1\"\n" > input.sh
printf "mpirun -np 8 pmemd.MPI -O -i ../in/minim.in -o minim.out -p ./protein.parm7 -c ./protein.inpcrd -r minim.restrt\n" >> input.sh
printf "pmemd.cuda -O -i ../in/equil.in -o equil.out -p ./protein.parm7 -c ./minim.restrt -r equil.restrt -x equil.nc -ref ./minim.restrt\n" >> input.sh
printf "pmemd.cuda -O -i ../in/equil2.in -o equil2.out -p ./protein.parm7 -c equil.restrt -r equil2.restrt -x equil2.nc -ref equil.restrt\n" >> input.sh
printf "pmemd.cuda -O -i ../in/equil3.in -o equil3.out -p ./protein.parm7 -c equil2.restrt -r equil3.restrt -x equil3.nc -ref equil2.restrt\n" >> input.sh
printf "pmemd.cuda -O -i ../in/equil4.in -o equil4.out -p ./protein.parm7 -c equil3.restrt -r equil4.restrt -x equil4.nc -ref equil3.restrt\n" >> input.sh
printf "pmemd.cuda -O -i ../in/equil5.in -o equil5_1.out -p ./protein.parm7 -c equil4.restrt -r equil5_1.restrt -x equil5.nc -ref equil4.restrt\n" >> input.sh
printf "pmemd.cuda -O -i ../in/equil5.in -o equil5_2.out -p ./protein.parm7 -c equil5_1.restrt -r equil5_2.restrt -x equil5.nc -ref equil5_1.restrt\n >> input.sh
printf "pmemd.cuda -O -i ../in/equil5.in -o equil5_3.out -p ./protein.parm7 -c equil5_2.restrt -r equil5_3.restrt -x equil5.nc -ref equil5_2.restrt\n" >> input.sh
printf "pmemd.cuda -O -i ../in/equil5.in -o equil5_4.out -p ./protein.parm7 -c equil5_3.restrt -r equil5_4.restrt -x equil5.nc -ref equil5_3.restrt\n" >> input.sh
printf "pmemd.cuda -O -i ../in/equil5.in -o equil5_5.out -p ./protein.parm7 -c equil5_4.restrt -r equil5_5.restrt -x equil5.nc -ref equil5_4.restrt\n" >> input.sh
printf "pmemd.cuda -O -i ../in/md.in -o md_${P}_${com_n}.out -p ./protein.parm7 -c equil5_5.restrt -r md.restrt -x md.nc\n" >> ./input.sh
Since printf doesn't automatically include a NL on the end, you could even divide up the individual input.sh lines for better readability:
printf " pmemd.cuda -O" >> input.sh
printf " -i ../in/equil.in" >> input.sh
printf " -o equil.out" >> input.sh
printf " -p ./protein.parm7" >> input.sh
printf " -c ./minim.restrt" >> input.sh
printf " -r equil.restrt" >> input.sh
printf " -x equil.nc" >> input.sh
printf " -ref ./minim.restrt" >> input.sh
printf "\n" >> input.sh
Now, if something is changed, you can quickly find the parameter and modify it.
You could also combine that with Chris Jester-Young's suggestion of using a for loop to increment your values:
for cur in '' $(seq 2 4)
do
printf ' pmemd.cuda -O ' >> input.sh
printf ' -i ../in/equil%s.in ' $cur >> input.sh
printf ' -o equil%s.out' $cur >> input.sh
printf ' -p ./protein.parm7" >> input.sh
printf ' -c equil%s.restrt' $prev >> input.sh
prinff ' -r equil%s.restrt' $cur >> input.sh
printf ' -x equil%s.nc' $cur >> input.sh
printf ' -ref equil%s.restrt\n' $prev >> input.sh
prev=$cur
done
Note how the spacing on each line makes it easier to read and understand the command.
Finally, if you don't mind writing out the entire file, you can use a here document.
The << starts the here document. The EOF is the parameter that tells the here document when it ends. This could be anything. For example, it could be INPUT_SH. If I put that parameter in single quotes, the shell won't interpolate variables.
cat <<'EOF' > input.sh
#setenv CUDA_VISIBLE_DEVICES "1"
mpirun -np 8 pmemd.MPI -O -i ../in/minim.in -o minim.out -p ./protein.parm7 -c ./protein.inpcrd -r minim.restrt
pmemd.cuda -O -i ../in/equil.in -o equil.out -p ./protein.parm7 -c ./minim.restrt -r equil.restrt -x equil.nc -ref ./minim.restrt
pmemd.cuda -O -i ../in/equil2.in -o equil2.out -p ./protein.parm7 -c equil.restrt -r equil2.restrt -x equil2.nc -ref equil.restrt
pmemd.cuda -O -i ../in/equil3.in -o equil3.out -p ./protein.parm7 -c equil2.restrt -r equil3.restrt -x equil3.nc -ref equil2.restrt
pmemd.cuda -O -i ../in/equil4.in -o equil4.out -p ./protein.parm7 -c equil3.restrt -r equil4.restrt -x equil4.nc -ref equil3.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_1.out -p ./protein.parm7 -c equil4.restrt -r equil5_1.restrt -x equil5.nc -ref equil4.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_2.out -p ./protein.parm7 -c equil5_1.restrt -r equil5_2.restrt -x equil5.nc -ref equil5_1.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_3.out -p ./protein.parm7 -c equil5_2.restrt -r equil5_3.restrt -x equil5.nc -ref equil5_2.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_4.out -p ./protein.parm7 -c equil5_3.restrt -r equil5_4.restrt -x equil5.nc -ref equil5_3.restrt
pmemd.cuda -O -i ../in/equil5.in -o equil5_5.out -p ./protein.parm7 -c equil5_4.restrt -r equil5_5.restrt -x equil5.nc -ref equil5_4.restrt
pmemd.cuda -O -i ../in/md.in -o md_${P}_${com_n}.out -p ./protein.parm7 -c equil5_5.restrt -r md.restrt -x md.nc
EOF