Just partially creation of csv file using crontab command - bash

I have some problem in automation the generation of a csv file. The bash code used to produce the csv works in parallel using 3 cores in order to reduce the time consumption; initially different csv files are produced, which are subsequently combined to form a single csv file. The core of the code is this cycle:
...
waitevery=3
for j in `seq 1 24`; do
if((j==1)); then
printf '%s\n' A B C D E | paste -sd ',' >> code${namefile}01${rr}.csv
fi
j=$(printf "%02d" $j)
../src/thunderstorm --mask-file=mask.grib const_${namefile}$j${rr}.grib surf_${namefile}$j${rr}.grib ua_${namefile}$j${rr}.grib hl_const.grib out &
if ! ((c % waitevery)); then
wait
fi
c=$((c+1))
done
...
where ../src/thunderstorm is a .F90 code which produce the second and successive files.
If I run this code manually it produces the right csv file, but if I run it by a programmed crontab command it generates a csv file with the only header A B C D E
Some suggestions?
Thanks!

cron runs your script in an environment, that often does not match your expectations.
check that the PATH is correct and that the script is called from the correct location: ../src is obviously relative, but to what?
I find cron-scripts to be much more reliable when using full paths for input, output and programs.

As #umläute points out, cron runs your scripts but does not run the typical initiallizations that you may have when you open a terminal session. Which means that you have to make no assumptions regarding your environment.
For scripts that may be invoked from the shell and may be invoked from cron I usually add at the beginning something like this:
BIN_DIR=/home/myhome/bin
PATH=$PATH:$BIN_DIR
Also, make sure you do not use relative paths to executables like ../src/thunderstorm. The working directory of the script invoked by cron may not be what you think. You may use $BIN_DIR/../src/thunderstorm. If you want to save typing add the relevant directories to the PATH.
The same logic goes for all other shell variables.
Doing a good initialization at the beginning of your script will allow you to run it from the shell for testing (or manual execution) and then run it as a cron job too.

Related

Define a Increment variable in shell script that increments on every cronjob

I have searched the forum couldn't find one.can we define a variable that only increments on every cronjob run?
for example:
i have a script that runs every 5minutes so i need a variable that increments based on the cron run
Say if the job ran 5minutes for minutes. so 6 times the script got executed so my counter variable should be 6 now
Im expecting in bash/shell
Apologies if a duplicate question
tried:
((count+1))
You can do it this way:
create two scripts: counter.sh and increment_counter.sh
add execution of increment_counter.sh in your cron job
add . /path/to/counter.sh into /etc/profile or /etc/bash.bashrc or wherever you need
counter.sh
declare -i COUNTER
COUNTER=1
export COUNTER
increment_counter.sh
#!/bin/bash
echo "COUNTER=\$COUNTER+1" >> /path/to/counter.sh
The shell that you've run the command in has exited; any variables it has set have gone away. You can't use variables for this purpose.
What you need is some sort of permanent data store. This could be a database, or a remote network service, or a variety of things, but by far the simplest solution is to store the value in a file somewhere on disk. Read the file in when the script starts and write out the incremented value afterwards.
You should think about what to do if the file is missing and what happens if multiple copies of the script are run at the same time, and decide whether those are situations you care about at all. If they are, you'll need to add appropriate error handling and locking, respectively, in your script.
Wouldn't this be a better solution?
...to define a file under /tmp, such that a command like:
echo -n "." > $MyCounterFilename
Tracks the number of times something is invoked, in my particular case of app.:
#!/bin/bash
xterm [ Options ] -T "$(cat $MyCounterFilename | wc -c )" &
echo -n "." > $MyCounterFilename
Because i had to modify the way xterm is invoked for my purposes and i found already that having opened many of these concurrently one waste less time if knowing exactly what is running on each one by its number (without having to cycle alt+tab and eye inspect through everything).
NOTE: /etc/profile, or better either ~/.profile or ~/.bash_profile needs only a env. variable name defined containing the full path to your counter file.
Anyway, if you dont like the idea above, experiments might be performed to determine a) 1st time out of all that /etc/profile is executed since machine is powered on and system boots. 2) Wether /etc/profile is executed or not, and how many times (Each time we open an xterm?, for instance). ... thereafter the same sort of testing for the other dudes lesser general than /etc one.

ignore but recall malformed data : iterate & process folder with bash script + .jar

There is a folder full of files- each of which contains some data that I need to convert to a single output file.
I've built a conversion script- it can run like so:
java -jar tableGenerator.jar -inputfile more-adzuna-jobs-type-9.rdf -skillNames skillNames.ttl -countries countries_europe.rdf -outputcsv out.csv
The problem is- some of the files contain characters that are regarded as invalid by my .jar file, is there a way to create a bash script to run this command simultaneously on a folder full of these files (many hundreds) and for each one that generates an error to:
ignore it, i.e. not let it halt the process
remember it- so that later it can be dealt with appropriately
It seems like this would be possible but my bash-fu is quite weak- what would be a logical way to execute this task?
If your Java program in fact exits with an error status then it should be fairly easy to write a bash script that processes all the files in a folder and tracks which had errors. I emphasize that the Java program must exit with an error (non-zero) status for this to be easy. For example, it should terminate execution by invoking System.exit(1).
If your program does report its success or failure to the system via its exit status, then you might do something like this:
#!/bin/bash
# The name of the directory to process is expected as the first argument.
if [ $# -lt 1 ]; then
echo usage: $0 directory
exit 1
fi
# The first argument to the script is $1
if [ -e failures.txt ]; then
rm failures.txt
fi
touch failures.txt
for f in $1/*; do
if ! java -jar /path/to/tableGenerator.jar \
-inputfile $f \
-skillNames /path/to/skillNames.ttl \
-countries /path/to/countries_europe.rdf \
-outputcsv $f.out.csv
then
echo $f >> failures.txt
fi
done
That iterates over all the files in the directory specified by the first script argument, assigning each path in turn to shell variable $f, and runs your Java program for each one, passing the path as the argument following -inputfile. In the event that the program exits with a non-zero status, the script writes the name of the failing file in file failures.txt in the script's current working directory (unrelated to the data directory designated to it) and continues.
Note that it does not run the command simultaneously on all the files, but instead iteratively. I am uncertain whether that was a key component of your request. Inasmuch as the system you run this on is unlikely to have a separate core it can dedicate to each of hundreds of instances of your program, and inasmuch as the storage medium on which the files reside probably has only one data channel, you cannot effectively run the command hundreds of times simultaneously, anyway.
If you do want to run multiple jobs in parallel then bash has ways to do that, but I recommend getting the serial script working first. If processing the files serially is not good enough then you can explore ways to achieve some parallelism. However, to the extent that Java VM startup time may present an issue with starting up hundreds of JVMs, you might be better off building multiple-file handling directly into your Java program, so that you can process all the files in the same VM.

Is there a way to use qsub and source together?

I wrote a shell script to process a bunch of files separately, like this
#!/bin/bash
#set parameters (I have many)
...
#find the files and do iteratively
for File in $FileList; do
source MyProcessing.sh
done
MyProcessing.sh is the calling script, and the variables and functions from the main script are used in the calling script.
Now I'd like to move my shell script to cluster, and use qsub in the iteration. I tried
#find the files and do iteratively
for File in $FileList; do
echo "source MyProcessing.sh" | qsub
done
But it does not work in this way. Anyone can help? Thank you in advance.
Variables and functions are local to a script. This means source MyProcessing.sh will work but bash MyProcessing.sh won't. The second syntax creates a subshell which means a new Unix process and Unix processes are isolated.
The same is true for qsub since you invoke it via a pipe: BASH will create a new process qsub and set the stdin to source MyProcessing.sh. That only passes these 23 bytes to qsub and nothing else.
If you want this to work, then you will have to write a new script that is 100% independent of the main script (i.e. it must not use any variables or functions). Then you must read the documentation of qsub to find out how to set it up. Usually, tools like that only work after you distributed a copy of MyProcessing.sh on every node of the cluster.
Also, the tool probably won't try to figure out what other data the script needs, so you will have to copy the files to the cluster nodes as well (probably by putting them on a shared file system).
Use:
(set; echo "source MyProcessing.sh") | qsub
You need to set your current variables in qsub shell.

How do I write a shell script that repeats a java program a specific number of times?

Essentially I am looking to write a shell script, likely using a for loop, that would allow me to repeat a program call multiple times without having to do it by hand (I don't know exactly how to explain this, but i want to perform the java TestFile.java command in the cmd window multiple times without doing it by hand).
I am trying to write it for the UNIX shell in bash, if that helps at all.
My program outputs a set of numbers that I want to look at to analyze end behavior, so I need to perform many tests for many different inputs and I want to streamline the process. I have a pretty basic understanding of shell scripting - i tried to teach myself today but I couldn't really understand the syntax of the for loop or the syntax of how to write a .java file call, but I would be able to write them in shell script with a little help.
This will do:
#!/bin/bash
javac Testfile.java # compile the program
for((i=1;i<=50;i++))
do
echo "Output of Iteration $i" >> outfile
java Testfile >> outfile
done
This will compile your java program and run it for 50 times and store the output in a file named outfile. Likewise, you can change the 50 for the number of iterations you want.
#!/bin/bash
for i in {1..10}
do
#insert file run command here
done
#!/bin/bash
LOOPS=50
for i IN {1 .. LOOPS}
do
java TestFile >> out.log
done

Can a shell script indicate that its lines be loaded into memory initially?

UPDATE: this is a repost of How to make shell scripts robust to source being changed as they run
This is a little thing that bothers me every now and then:
I write a shell script (bash) for a quick and dirty job
I run the script, and it runs for quite a while
While it's running, I edit a few lines in the script, configuring it for a different job
But the first process is still reading the same script file and gets all screwed up.
Apparently, the script is interpreted by loading each line from the file as it is needed. Is there some way that I can have the script indicate to the shell that the entire script file should be read into memory all at once? For example, Perl scripts seem to do this: editing the code file does not affect a process that's currently interpreting it (because it's initially parsed/compiled?).
I understand that there are many ways I could get around this problem. For example, I could try something like:
cat script.sh | sh
or
sh -c "`cat script.sh`"
... although those might not work correctly if the script file is large and there are limits on the size of stream buffers and command-line arguments. I could also write an auxiliary wrapper that copies a script file to a locked temporary file and then executes it, but that doesn't seem very portable.
So I was hoping for the simplest solution that would involve modifications only to the script, not the way in which it is invoked. Can I just add a line or two at the start of the script? I don't know if such a solution exists, but I'm guessing it might make use of the $0 variable...
The best answer I've found is a very slight variation on the solutions offered to How to make shell scripts robust to source being changed as they run. Thanks to camh for noting the repost!
#!/bin/sh
{
# Your stuff goes here
exit
}
This ensures that all of your code is parsed initially; note that the 'exit' is critical to ensuring that the file isn't accessed later to see if there are additional lines to interpret. Also, as noted on the previous post, this isn't a guarantee that other scripts called by your script will be safe.
Thanks everyone for the help!
Use an editor that doesn't modify the existing file, and instead creates a new file then replaces the old file. For example, using :set writebackup backupcopy=no in Vim.
How about a solution to how you edit it.
If the script is running, before editing it, do this:
mv script script-old
cp script-old script
rm script-old
Since the shell keep's the file open as long as you don't change the contents of the open inode everything will work okay.
The above works because mv will preserve the old inode while cp will create a new one. Since a file's contents will not actually be removed if it is opened, you can remove it right away and it will be cleaned up once the shell closes the file.
According to the bash documentation if instead of
#!/bin/bash
body of script
you try
#!/bin/bash
script=$(cat <<'SETVAR'
body of script
SETVAR)
eval "$script"
then I think you will be in business.
Consider creating a new bang path for your quick-and-dirty jobs. If you start your scripts with:
#!/usr/local/fastbash
or something, then you can write a fastbash wrapper that uses one of the methods you mentioned. For portability, one can just create a symlink from fastbash to bash, or have a comment in the script saying one can replace fastbash with bash.
If you use Emacs, try M-x customize-variable break-hardlink-on-save. Setting this variable will tell Emacs to write to a temp file and then rename the temp file over the original instead of editing the original file directly. This should allow the running instance to keep its unmodified version while you save the new version.
Presumably, other semi-intelligent editors would have similar options.
A self contained way to make a script resistant to this problem is to have the script copy and re-execute itself like this:
#!/bin/bash
if [[ $0 != /tmp/copy-* ]] ; then
rm -f /tmp/copy-$$
cp $0 /tmp/copy-$$
exec /tmp/copy-$$ "$#"
echo "error copying and execing script"
exit 1
fi
rm $0
# rest of script...
(This will not work if the original script begins with the characters /tmp/copy-)
(This is inspired by R Samuel Klatchko's answer)

Resources