I want to use a python3 script to generate a 'self-extracting' bash script which has an embedded .tar.gz archive as payload.
In bash I would simply do something like this:
printf "#!/bin/bash
PAYLOAD_LINE=\`awk '/^__PAYLOAD_BELOW__/ {print NR + 1; exit 0; }' \$0\`
tail -n+\$PAYLOAD_LINE \$0 | tar -xvz
#script that does something with the unpacked payload
exit 0
__PAYLOAD_BELOW__\n" > "tmpfile"
cat "tmpfile" "payload.tar.gz" > "myscript.sh"
What I tried in python is this:
with open('myscript.sh','wb') as script:
for line in open(payload.tar.gz, 'rb'):
script.write(line)
I can untar the resulting file manually with cat myscript.sh | tar -xvz
To add the bash script part (PAYLOAD_LINE= .... __PAYLOAD_BELOW__\n"), is there a more elegant way than to open myscript.sh a second time, but not in binary ('wb') mode?
Figured it out.
I just have to convert the string to bytes, before writing it to the file, so I don't have to open it twice.
script_header = """#!/bin/bash
PAYLOAD_LINE=\`awk '/^__PAYLOAD_BELOW__/ {print NR + 1; exit 0; }' \$0\`
tail -n+\$PAYLOAD_LINE \$0 | tar -xvz"""
with open('myscript.sh','wb') as script:
script.write(bytes(script_header + '\n', 'utf-8'))
for line in open(payload.tar.gz, 'rb'):
script.write(line)
Related
Let's say I have a file with patterns to match into another file:
file_names.txt
pfg022G
pfg022T
pfg068T
pfg130T
pfg181G
pfg181T
pfg424G
pfg424T
I would like to use file_names.txt and use sed command into example.conf:
example.conf
{
"ExomeGermlineSingleSample.sample_and_unmapped_bams": {
"flowcell_unmapped_bams": ["/groups/cgsd/alexandre/gatk-workflows/src/ubam/pfg022G.unmapped.bam"],
"unmapped_bam_suffix": ".unmapped.bam",
"sample_name": "pfg022G",
"base_file_name": "pfg022G.GRCh38DH.target",
"final_gvcf_base_name": "pfg022G.GRCh38DH.target"
},
The sed command would replace pfg022G on example.conf with pfg022T, which is the next item in file_names.txt (sed s/pfg022G/pfg022T/). The example.conf at this point should look like this:
{
"ExomeGermlineSingleSample.sample_and_unmapped_bams": {
"flowcell_unmapped_bams": ["/groups/cgsd/alexandre/gatk-workflows/src/ubam/pfg022T.unmapped.bam"],
"unmapped_bam_suffix": ".unmapped.bam",
"sample_name": "pfg022T",
"base_file_name": "pfg022T.GRCh38DH.target",
"final_gvcf_base_name": "pfg022T.GRCh38DH.target"
},
After 15 minutes the substitution should be pfg022T to pfg068T and so on until all the items in file_names.txt are exhausted.
The following crontab would run your script every 15 minutes:
# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7)
# | | | | |
# * * * * * command to be executed
15 * * * * /path/to/script
With script reading
#!/usr/bin/env sh
file1="file_names.txt"
file2="example.conf"
sed -i -e "$(awk '(NR>1){print "s/"p"/"$1"/g"}{p=$1}' $file1 | tac)" example.conf
The trick we use here is to do revere substitution. The file example.conf always contains only one string which is also in "file_names.txt". So if you attempt to substitute from the last to the front you will only do a single substitution.
We use awk here to build a sed-script and tac to reverse it so that we only have a single match:
$ awk '(NR>1){print "s/"p"/"$1"/g"}{p=$1}' $file_names.txt
s/pfg022G/pfg022T/g
s/pfg022T/pfg068T/g
s/pfg068T/pfg130T/g
s/pfg130T/pfg181G/g
s/pfg181G/pfg181T/g
s/pfg181T/pfg424G/g
s/pfg424G/pfg424T/g
If we do a sed with the above script, we will always end up with pfg424T (the last entry) as it will find a single match (assume we are in the third entry pfg068T), so sed will perform every substitution after that. However, when we reverse the order (using tac), sed will only find a single match.
For the logic of how i think this would work,
Create a cronjob, or if your server shuts down periodically create an anacron job, to run a bash script every 15 minutes.
In the bash script you can use an if statement you can test with grep with each line in filenames.txt which line exists in example.conf, and if that line exists to go onto the next line in filenames.txt. If you are at the last string in file_names.txt then the bash script should stop running with the exit command
You would run the sed command to replace your string. I do think the replace command should be able to replace this.
If you have to reload the service to load the amended configuration and then to add this also afterwards.
It might be easier to create a daemon/background process as opposed to a periodic cron job.
while read str;
do
sleep 900;
sed -ri "s#(^\"flowcell_unmapped_bams.*gatk-workflows/src/ubam/)(.*)(\.unmapped\.bam\"\],.*$)#\1$str\3#;s/(^\"sample.name.*: \")(.*)(\",.*$)/\1$str\3/;s/(^\"base_file_name.*: \")(.*)(\.GRCh38DH.*$)/\1$str\3/" example.conf;
done < file_names.txt &
Read the contents of file_names.txt line by line via a while loop, reading the line as a variable str. Sleep 900 seconds and then use this str variable in three sed commands. In all commands, enable regular expression interpretation with -r or -E and split the lines into three sections. Substitute the lines for sections 1, followed by the variable str and section 3. Add & at the end to run the process to the background.
I would perhaps just generate all the files in advance in a queue directory, and have the cron job pick up the next one on each invocation.
awk 'NR==FNR { a[++n] = $0; next }
{ file = $1 ".conf"
for(i=1; i<=n; i++) {
l = a[i]; sub("{{name}}", $0, l);
print l >file }
close file
}' - file_names.txt <<\____
{
"ExomeGermlineSingleSample.sample_and_unmapped_bams": {
"flowcell_unmapped_bams": ["/groups/cgsd/alexandre/gatk-workflows/src/ubam/{{name}}.unmapped.bam"],
"unmapped_bam_suffix": ".unmapped.bam",
"sample_name": "{{name}}",
"base_file_name": "{{name}}.GRCh38DH.target",
"final_gvcf_base_name": "{{name}}.GRCh38DH.target"
},
____
Running this on your sample file_names.txt creates the following files:
pfg022G.conf pfg068T.conf pfg181G.conf pfg424G.conf
pfg022T.conf pfg130T.conf pfg181T.conf pfg424T.conf
with contents like you would expect; here's pfg0222G.conf:
{
"ExomeGermlineSingleSample.sample_and_unmapped_bams": {
"flowcell_unmapped_bams": ["/groups/cgsd/alexandre/gatk-workflows/src/ubam/pfg022G.unmapped.bam"],
"unmapped_bam_suffix": ".unmapped.bam",
"sample_name": "pfg022G",
"base_file_name": "pfg022G.GRCh38DH.target",
"final_gvcf_base_name": "pfg022G.GRCh38DH.target"
},
Now, your cron job just needs to move one of these to example.conf and process it. When the directory with the files is empty, you are done.
#!/bin/sh
for f in confdir/*.conf; do
if [ -e "$f" ]; then
# Safeguard against clobbering previous run
if [ -e ./example.conf ]; then
echo "$0: example.conf is still there -- skipping this run" >&2
exit 63
fi
mv "$f" ./example.conf
exec your_main_script_or_whatever
# Should never fall through to here, but whatever
break
else
echo "$0: directory empty -- aborting" >&2
fi
done
To avoid a race condition -- if the previous cron job is still running, or failed for some reason, we don't want to be clobbering its input file. This requires your_main_script_or_whatever to remove example.conf when it completes. If you don't care about this, maybe you can simply remove the safeguard condition from the above script.
This question already has answers here:
Parallelize Bash script with maximum number of processes
(16 answers)
Closed 4 years ago.
I have a bash script that parses log files - aggregating data in an AWK array - that takes part of the file path as a parameter. It runs fine, I can run multiple instances in the background manually. The trouble is I can't figure out how to avoid invoking the script manually for each parameter in my list.
Depending on where I've put the & it either runs the instances serially or tries to run all the jobs at once (I don't want to see a load average of 9999 again).
script.sh param1 &
script.sh param2 & ... #works fine
script.sh < params.txt & ... #runs serially
Placing & at various places within the script had some undesirable outcomes.
hub=$1
while read date; do
zgrep ^1 /logarchive/http/${hub}pr*/$date*.gz|\
awk -F'[ ,]' '{print$34,$(NF-6),$6,$(NF-7)}'|\
awk 'NR>1{bytesDown[$1 " " $2] += $3; bytesUp[$1 " " $2] += $4} END {for (i in bytesDown) print i, bytesDown[i], bytesUp[i]}'\
> ${hub}.$date.txt
done < dates.txt
I'd like to run an instance in the background for each parameter in a file.
Use export -f to export a function, and then you can call it in parallel from shells started by xargs -P; in the example below, numjobs indicates how many dates you want to run concurrently.
myfunc() {
date=$1
zgrep ^1 "/logarchive/http/${hub}pr*/$date"*.gz | \
awk -F'[ ,]' '{print$34,$(NF-6),$6,$(NF-7)}' | \
awk '
NR>1{
bytesDown[$1 " " $2] += $3
bytesUp[$1 " " $2] += $4
}
END {
for (i in bytesDown) print i, bytesDown[i], bytesUp[i]
}
' >"${hub}.$date.txt"
}
export -f myfunc
numjobs=8
xargs -P "$numjobs" -n 1 bash -c 'myfunc "$#"' _
I am fairly new to bash scripting... I have an issue with a cronjob where I get too many emails when "ntpq: read: Connection refused" error comes up. I want to create a conditional when this error shows up, DO NOT send the email.
However, I can't seem to parse the output from "nptq -nc peers". I did try to redirect the output of the cronjob to a test.txt file and then create another cronjob that parses that file. However, I feel like there is a better solution.
Thanks for your help!
Here is my code for the cronjob
#!/bin/bash
limit=10101010101010101010000 # Set your limit in milliseconds here
offsets=$(/usr/sbin/ntpq -nc peers | /usr/bin/tail -n +3 | awk 'BEGIN { FS = " " } ; { print $9 }' | /usr/bin/tr -d '-')
for offset in ${offsets}; do
if echo $offset $limit | awk '{exit $1>$2?0:1}'
then
echo "NTPD offset $offset > $limit. Please investigate"
exit 1
fi
done
I'm having an issue when i try to port my bash script to nagios.The scripts works fine when I run on console, but when I run it from Nagios i get the msg "(null)" - In the nagios debug log I see that it parse the script well but it returns the error msg..
I'm not very good at scripting so i guess i'll need some help
The objective of the script is to check *.ears version from some servers, md5 them and compare the output to see if the version matches or not.
To do that, i have a json on these servers that prints the name of the *.ear and his md5.
so.. The first part of the script gets that info from the json with curl and stores just the md5 number on a .tempfile , then it compares both temp files and if they match i got the $STATE_OK msg. If they dont , it creates a .datetmp file with the date ( the objective of this is to print a message after 48hs of inconsistence). Then, i make a diff of the .datetmp file and the days i wanna check if the result is less than 48hrs it prints the $STATE_WAR, if the result is more than 48 hrs it Prints the $STATE_CRI
The sintaxis of the script is " $ sh script.sh nameoftheear.ear server1 server2 "
Thanks in advance
#/bin/bash
#Variables For Nagios
cont=$1
bas1=$2
bas2=$3
## Here you set the servers hostname
svr1= curl -s "http://$bas1.domain.com:7877/apps.json" | grep -Po '"EAR File":.*? [^\\]",' | grep $cont | awk '{ print $5 }' > .$cont-tmpsvr1
svr2= curl -s "http://$bas2.domain.com:7877/apps.json" | grep -Po '"EAR File":.*? [^\\]",' | grep $cont | awk '{ print $5 }' > .$cont-tmpsvr2
file1=.$cont-tmpsvr1
file2=.$cont-tmpsvr2
md51=$(head -n 1 .$cont-tmpsvr1)
md52=$(head -n 1 .$cont-tmpsvr2)
datenow=$(date +%s)
#Error Msg
ERR_WAR="Not updated $bas1: $cont $md51 --- $bas2: $cont $md52 "
ERR_CRI="48 hs un-updated $bas1: $cont $md51 --- $bas2: $cont $md52 "
OK_MSG="Is up to date $bas1: $cont $md51 --- $bas2: $cont $md52 "
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
##Matching md5 Files
if cmp -s "$file1" "$file2"
then
echo $STATE_OK
echo $OK_MSG
# I do the rm to delete the date tmp file so i can get the $STATE_OK or $STATE_WARNING
rm .$cont-datetmp
exit 0
elif
echo $datenow >> .$cont-datetmp
#Vars to set modification date
datetmp=$(head -n 1 .$cont-datetmp)
diffdate=$(( ($datenow - $datetmp) /60 ))
#This var is to set the time of the critical ERR
days=$((48*60))
[ $diffdate -lt $days ]
then
echo $STATE_WARNING
echo $ERR_WAR
exit 1
else
echo $STATE_CRITICAL
echo $ERR_CRI
exit 2
fi
I am guessing some kind of permission problem - more specifically I don't think the nagios user can write to it's own home directory. You either fix those permissions or write to a file in /tmp (and consider using mktemp?).
...but ideally you'd skip writing all those files, as far as I can see all of those comparisons etc could be kept in memory.
UPDATE
Looked at your script again - I see some obvious errors you can look into:
You are printing out the exit value before you print the message.
You print the exit value rather than exit with the exit value.
...so this:
echo $STATE_WARNING
echo $ERR_WAR
exit 1
Should rather be:
echo $ERR_WAR
exit $STATE_WARNING
Also I am wondering if this is really the script or if you missed something when pasting. There seems to be missing an 'if' and also a superfluous line break in your last piece of code? Should rather be:
if [ $diffdate -lt $days ]
then
...
else
...
fi
I'm currently working on a maths project and just run into a bit of a brick wall with programming in bash.
Currently I have a directory containing 800 texts files, and what I want to do is run a loop to cat the first 80 files (_01 through to _80) into a new file and save elsewhere, then the next 80 (_81 to _160) files etc.
all the files in the directory are listed like so: ath_01, ath_02, ath_03 etc.
Can anyone help?
So far I have:
#!/bin/bash
for file in /dir/*
do
echo ${file}
done
Which just simple lists my file. I know I need to use cat file1 file2 > newfile.txt somehow but it's confusing my with the numerated extension of _01, _02 etc.
Would it help if I changed the name of the file to use something other than an underscore? like ath.01 etc?
Cheers,
Since you know ahead of time how many files you have and how they are numbered, it may be easier to "unroll the loop", so to speak, and use copy-and-paste and a little hand-tweaking to write a script that uses brace expansion.
#!/bin/bash
cat ath_{001..080} > file1.txt
cat ath_{081..160} > file2.txt
cat ath_{161..240} > file3.txt
cat ath_{241..320} > file4.txt
cat ath_{321..400} > file5.txt
cat ath_{401..480} > file6.txt
cat ath_{481..560} > file7.txt
cat ath_{561..640} > file8.txt
cat ath_{641..720} > file9.txt
cat ath_{721..800} > file10.txt
Or else, use nested for-loops and the seq command
N=800
B=80
for n in $( seq 1 $B $N ); do
for i in $( seq $n $((n+B - 1)) ); do
cat ath_$i
done > file$((n/B + 1)).txt
done
The outer loop will iterate n through 1, 81, 161, etc. The inner loop will iterate i over 1 through 80, then 81 through 160, etc. The body of the inner loops just dumps the contents if the ith file to standard output, but the aggregated output of the loop is stored in file 1, then 2, etc.
You could try something like this:
cat "$file" >> "concat_$(( ${file#/dir/ath_} / 80 ))"
with ${file#/dir/ath_} you remove the prefix /dir/ath_ from the filename
$(( / 80 )) you get the suffix divided by 80 (integer division)
Also change the loop to
for file in /dir/ath_*
So you only get the files you need
If you want groups of 80 files, you'd do best to ensure that the names are sortable; that's why leading zeroes were often used. Assuming that you only have one underscore in the file names, and no newlines in the names, then:
SOURCE="/path/to/dir"
TARGET="/path/to/other/directory"
(
cd $SOURCE || exit 1
ls |
sort -t _ -k2,2n |
awk -v target="$TARGET" \
'{ file[n++] = $1
if (n >= 80)
{
printf "cat"
for (i = 0; i < 80; i++)
printf(" %s", file[i]
printf(" >%s/%s.%.2d\n", target, "newfile", ++number)
n = 0
}
END {
if (n > 0)
{
printf "cat"
for (i = 0; i < n; i++)
printf(" %s", file[i]
printf(" >%s/%s.%.2d\n", target, "newfile", ++number)
}
}' |
sh -x
)
The two directories are specified (where the files are and where the summaries should go); the command changes directory to the source directory (where the 800 files are). It lists the names (you could specify a glob pattern if you needed to) and sorts them numerically. The output is fed into awk which generates a shell script on the fly. It collects 80 names at a time, then generates a cat command that will copy those files to a single destination file such as "newfile.01"; tweak the printf() command to suit your own naming/numbering conventions. The shell commands are then passed to a shell for execution.
While testing, replace the sh -x with nothing, or sh -vn or something similar. Only add an active shell when you're sure it will do what you want. Remember, the shell script is in the source directory as it is running.
Superficially, the xargs command would be nice to use; the difficulty is coordinating the output file number. There might be a way to do that with the -n 80 option to group 80 files at a time and some fancy way to generate the invocation number, but I'm not aware of it.
Another option is to use xargs -n to execute a shell script that can deduce the correct output file number by listing what's already in the target directory. This would be cleaner in many ways:
SOURCE="/path/to/dir"
TARGET="/path/to/other/directory"
(
cd $SOURCE || exit 1
ls |
sort -t _ -k2,2n |
xargs -n 80 cpfiles "$TARGET"
)
Where cpfiles looks like:
TARGET="$1"
shift
if [ $# -gt 0 ]
then
old=$(ls -r newfile.?? | sed -n -e 's/newfile\.//p; 1q')
new=$(printf "%.2d" $((old + 1)))
cat "$#" > "$TARGET/newfile. $new
fi
The test for zero arguments avoids trouble with xargs executing the command once with zero arguments. On the whole, I prefer this solution to the one using awk.
Here's a macro for #chepner's first solution, using GNU Make as the templating language:
SHELL := /bin/bash
N = 800
B = 80
fileNums = $(shell seq 1 $$((${N}/${B})) )
files = ${fileNums:%=file%.txt}
all: ${files}
file%.txt : start = $(shell echo $$(( ($*-1)*${B}+1 )) )
file%.txt : end = $(shell echo $$(( $* * ${B} )) )
file%.txt:
cat ath_{${start}..${end}} > $#
To use:
$ make -n all
cat ath_{1..80} > file1.txt
cat ath_{81..160} > file2.txt
cat ath_{161..240} > file3.txt
cat ath_{241..320} > file4.txt
cat ath_{321..400} > file5.txt
cat ath_{401..480} > file6.txt
cat ath_{481..560} > file7.txt
cat ath_{561..640} > file8.txt
cat ath_{641..720} > file9.txt
cat ath_{721..800} > file10.txt