I am trying to use shell script to generate data to my Kafka topic.
Firstly, I write a shell script run_producer.sh:
#!/bin/sh
./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic AAATest2 \
--property "parse.key=true" \
--property "key.separator=:" \
--property key.schema='{"type":"string"}' \
--property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"measurement","type":"string"},{"name":"id","type":"int"}]}'
It requires you to input string like "key1":{"measurement": "INFO", "id": 1} in command line when the run_producer.sh is executed, and you can input as many as you want.
I write another script add_data.sh:
#!/bin/sh
s="\"key1\":{\"measurement\": \"INFO\", \"id\": 1}"
printf "${s}\n${s}\n" | ./run_producer.sh
It can input the string 2 times, or more by adding "${s}\n" in printf, but it is limited and stupid.
I want to make it inputs the string endlessly until I stop it. How can I do that with shell script ?
I will be very grateful if you can tell me how to make the string differently(different data) by the way.
You could use yes "$s" to produce endless input for your script. But what do you meen by "make the string differently"? Will be enough to use infinite loop with some random data, like
while true; do s="\"key1\":{\"measurement\": \"INFO\", \"id\": $RANDOM}"; echo $s; done
or you need modify it in some other way?
man bash:
RANDOM Each time this parameter is referenced, a random integer between 0 and 32767 is generated. The sequence of random numbers may be initialized by assigning a value to RANDOM. If RANDOM is unset, it loses its special properties, even if it is subsequently reset.
You could combine it with anything else, like: "key_${RANDOM}"
Or choose any other method like https://gist.github.com/earthgecko/3089509 or https://unix.stackexchange.com/questions/230673/how-to-generate-a-random-string
Related
I'm writing a bash script to call functions of the Veeam Backup CLI.
In this script I have a function to configure a new backup job.
function configureBackupJob() {
jobname="$1"
reponame="$2"
objects="$3"
advancedOptions="$4"
scheduleOptions="$5"
activeFullBackupOptions="$6"
indexingOptions="$7"
command="veeamconfig job create filelevel --name ${jobname} --reponame ${reponame} --includedirs ${objects} ${advancedOptions} ${scheduleOptions} ${activeFullBackupOptions} ${indexingOptions} --nosnap"
echo "${command}"
veeamconfig job create filelevel --name "$jobname" --reponame "$reponame" --includedirs "$objects" "$advancedOptions" "$scheduleOptions" "$activeFullBackupOptions" "$indexingOptions" --nosnap
}
When calling the script I use a case to determine which function shall be called:
case $command in
# More cases before and after this one
configureBackupJob)
configureBackupJob "$2" "$3" "$4" "$5" "$6" "$7" "$8"
;;
*)
showHelp
;;
esac
I call the script like this:
sudo ./script.sh configureBackupJob "TheJobsName" "RepositoryName" "/path/FoldertoBeBackedUpByVeeam" "--daily --at 12:15" "--weekdays-full Monday,Wednesday" "--indexall"
I used this site from the Veeam help center to know the arguments: Veeam Help Center: Creating File-Level Backup Job
Calling the script results in an error message
Unknown argument: [--daily --at 12:15].
If I call veeamconfig manually the command that my echo shows works fine.
Why can I call the command directly but not from within the script? I tried calling the function without the double quotation marks but that doesn't work.
I can't hardcode all arguments like the
--includedirs
so I need to find a way to pass arguments like the
--daily --at 12:15
The basic problem is that you're passing "--daily --at 12:15" to the veeamconfig command as a single argument, rather than three separate arguments ("--daily", "--at", and "12:15"). This confuses veeamconfig. It looks ok when you echo it because that looses the distinction between spaces between arguments and spaces within arguments.
The best way to handle this depends on a couple of things: First, does your script care which options have to do with scheduling vs indexing vs full backups vs whatever, or is it ok if there's just a list of options to be given to the veeamconfig command? Second, is it possible that the paths/options/whatever might contain spaces (or filename wildcards) in them, as well as between them? (Note: I mostly use macOS, where paths with spaces are very common.)
If the script doesn't have to differentiate between the different types of options, it's pretty simple: just pass all of the options as separate arguments all the way through, and where necessary store them as an array rather than as separate strings (and use bash printf's %q option to properly quote/escape the command options for printing):
function configureBackupJob() {
jobname="$1"
reponame="$2"
object="$3"
allOptions=("${#:4}") # This stores all arguments starting with $4 in an array
printf -v command '%q ' veeamconfig job create filelevel --name "$jobname" --reponame "$reponame" --includedirs "$object" "${allOptions[#]}" --nosnap
echo "${command}"
veeamconfig job create filelevel --name "$jobname" --reponame "$reponame" --includedirs "$object" "${allOptions[#]}" --nosnap
}
And call it like this:
case $command in
# More cases before and after this one
configureBackupJob)
configureBackupJob "${#:2}"
...
And run the overall script like this:
sudo ./script.sh configureBackupJob "TheJobsName" "RepositoryName" "/path/FoldertoBeBackedUpByVeeam" --daily --at 12:15 --weekdays-full Monday,Wednesday --indexall
If the script needs to tell the different types of option apart, things get messier. If there's no possibility of spaces or wildcard-like characters in the options, you could leave those variables unquoted when you pass them to the veeamconfig command, and let the shell's word splitting break them up into individual arguments:
veeamconfig job create filelevel --name "$jobname" --reponame "$reponame" --includedirs "$objects" $advancedOptions $scheduleOptions $activeFullBackupOptions $indexingOptions --nosnap
Note that if you go this route, you need to keep them safely double-quoted at all other points in the process, especially when passing them to the configureBackupJob function. If word-splitting happens too early, it'll just make a mess.
If you need to keep types of options separate and also allow spaces and/or funny characters in the options, it's even more difficult. You might be tempted to put quotes and/or escapes within the options to control this, but word splitting doesn't respect those, so it doesn't work. I think I'll just refer you to this question and hope this doesn't apply.
hadoop jar cc-jar-with-dependencies.jar com.coupang.pz.cc.merge.Merge_Run \
${IDF_OUT}\
${IG_OUT}\
${PROB_OUT}\
${MERGE_OUT}\
1.00 \
0.000001 \
0.0001 \
There is a piece of shell code and I know the hadoop will run the cc-jar-with-dependencies.jar on hdfs. But what are the meaning of the other parameters below from the second line. Are they the parameters needed for the jar package ?
${...} is the path on hdfs, like ${IDF_OUT} and so on.
The usage of {WORD} is the basic case of Paramter Expansion in bash, shell
$PARAMETER
${PARAMETER}
The easiest form is to just use a parameter's name within braces. This is identical to using $FOO like you see it everywhere, but has the advantage that it can be immediately followed by characters that would be interpreted as part of the parameter name otherwise.
with an example,
word="car"
echo "The plural of $word is most likely $words"
echo "The plural of $word is most likely ${word}s"
produces an output as,
The plural of car is most likely
The plural of car is most likely cars
See the first line not containing cars as expected because shell was able to interpret only ${word} and not $words.
Coming back to your example,
hadoop jar cc-jar-with-dependencies.jar com.coupang.pz.cc.merge.Merge_Run \
${IDF_OUT}\
${IG_OUT}\
${PROB_OUT}\
${MERGE_OUT}\
1.00 \
0.000001 \
0.0001 \
From the second line on-wards, the variables ${IDF_OUT}, ${IG_OUT}, ${PROB_OUT} and ${MERGE_OUT} are all in likelihood some variables (could be environment variables in the hadoop file system) which will get expanded to values when the command is run.
Whilst I have explained what the ${WORD} syntaxes are, the actual purposes of the above variables are not quite relevant in the context of shell.
Those parameters are passed to the hadoop command, so you would need to read the documentation for that command.
However, it might be interesting for you to find out the values contained in these parameters when your script is run. You can do that my modifying the code as shown below :
echo >&2 \
hadoop jar cc-jar-with-dependencies.jar com.coupang.pz.cc.merge.Merge_Run \
${IDF_OUT}\
${IG_OUT}\
${PROB_OUT}\
${MERGE_OUT}\
1.00 \
0.000001 \
0.0001 \
This change will cause the whole command to be printed rather than executed, while the >&2 causes standard output to be output to standard error (which may help getting the data printed to the terminal if there is some output capture going on). Please note that this change is for debugging/curiosity only, it will make your script omit execution of the command.
If you know the values, the whole command is likely be easier to make sense of.
I know something like this is possible
out = `echo 1`
$?.to_i == 0 or raise “Failed"
Yet I’m unable to merge these 2 statements, so that the output will be captured into a variable and the command will fail (also printing the captured output) if the shell command returns with error.
Preferably into a 1 lines, if possible. Something like
out = `echo 1` && $?.to_i == 0 or raise “Failed. Output:” + out
only prettier.
Look at the Open3 class. It has a number of methods that will let you do what you want.
In particular, capture2 is the closest to what you're doing. From the docs:
::capture2 captures the standard output of a command.
stdout_str, status = Open3.capture2([env,] cmd... [, opts])
Pay attention to that optional env parameter. Without that your called application will have no environment information so you might want to consider passing in the ENV hash, allowing the child to have the same environment settings as the running code. If you want to restrict what is passed you can selectively add key/value pairs to a hash, or use ENV.dup then delete selected key/value pairs.
I would like to write a script to execute the steps outlined below. If someone can provide simple examples on how to modify files and search through folders using a script (not necessarily solving my problem below), I will greatly appreciate it.
submit job MyJob in currentDirectory using myJobShellFile.sh to a queue
upon completion of MyJob, goto to currentDirectory/myJobDataFolder.
In myJobDataFolder, there are folders
myJobData.0000 myJobData.0001 myJobData.0002 myJobData.0003
I want to find the maximum number maxIteration of all the listed folders. Here it would be maxIteration=0003.\
In file myJobShellFile.sh, at the last line says
mpiexec ./main input myJobDataFolder
I want to append this line to
'mpiexec ./main input myJobDataFolder 0003'
I want to submit MyJob to the que while maxIteration < 10
Upon completion of MyJob, find the new maxIteration and change this number in myJobShellFile.sh and goto step 4.
I think people write python scripts typically to do this stuff, but am having a hard time finding out how. I probably don't know the correct terminology for this procedure. I am also aware that the script will vary slightly depending on the queing system, but any help will be greatly appreciated.
Quite a few aspects of your question are unclear, such as the meaning of “submit job MyJob in currentDirectory using myJobShellFile.sh to a que”, “append this line to
'mpiexec ./main input myJobDataFolder 0003'”, how you detect when a job is done, relevant parts of myJobShellFile.sh, and some other details. If you can list the specific shell commands you use in each iteration of job submission, then you can post a better question, with a bash tag instead of python.
In the following script, I put a ### at the end of any line where I am guessing what you are talking about. Lines ending with ### may be irrelevant to whatever you actually do, or may be pseudocode. Anyway, the general idea is that the script is supposed to do the things you listed in your items 1 to 5. This script assumes that you have modified myJobShellFile.sh to say
mpiexec ./main input $1 $2
instead of
mpiexec ./main input
because it is simpler to use parameters to modify what you tell mpiexec than it is to keep modifying a shell script. Also, it seems to me you would want to increment maxIter before submitting next job, instead of after. If so, remove the # from the t=$((1$maxIter+1)); maxIter=${t#1} line. Note, see the “Parameter Expansion” section of man bash re expansion of the ${var#txt} form, and the “Arithmetic Expansion” section re $((expression)) form. The 1$maxIter and similar forms are used to change text like 0018 (which is not a valid bash number because 8 is not an octal digit) to 10018.
#!/bin/sh
./myJobShellFile.sh MyJob ###
maxIter=0
while true; do
waitforjobcompletion ###
cd ./myJobDataFolder
maxFile= $(ls myJobData* | tail -1)
maxIter= ${maxFile#myJobData.} #Get max extension
# If you want to increment maxIter, uncomment next line
# t=$((1$maxIter+1)); maxIter=${t#1}
cd ..
if [[ 1$maxIter -lt 11000 ]] ; then
./myJobShellFile.sh MyJobDataFolder $maxIter
else
break
fi
done
Notes: (1) To test with smaller runs than 1000 submissions, replace 11000 by 10000+n; for example, to do 123 runs, replace it with 10123. (2) In writing the above script, I assumed that not-previously-known numbers of output files appear in the output directory from time to time. If instead exactly one output file appears per run, and you just want to do one run per value for the values 0000, 0001, 0002, 0999, 1000, then use a script like the following. (For testing with a smaller number than 1000, replace 1000 with (eg) 0020. The leading zeroes in these numbers tell bash to fill the generated numbers with leading zeroes.)
#!/bin/sh
for iter in {0000..1000}; do
./myJobShellFile.sh MyJobDataFolder $iter
waitforjobcompletion ###
done
(3) If the system has a command that sleeps while it waits for a job to complete on the supercomputing resource, it is reasonable to use that command in place of waitforjobcompletion in the above scripts. Otherwise, if the system has a command jobisrunning that returns true if a job is still running, replace waitforjobcompletion with something like the following:
while jobisrunning ; do sleep 15; done
This will run the jobisrunning command; if it returns true, the shell will sleep for 15 seconds and then retest. Here is an example that illustrates waiting for a file to appear and then for it to go away:
while [ ! -f abc ]; do sleep 3; echo no abc; done
while ls abc >/dev/null 2>&1; do sleep 3; echo an abc; done
The second line's test could be [ -f abc ] instead; I showed a longer example to illustrate how to suppress output and error messages by routing them to /dev/null. (4) To reverse the sense of a while statement's test, replace the word while with until. For example, while [ ! -f abc ]; ... is equivalent to until [ -f abc ]; ....
I have a problem when using Arduino to post data to Pachube. The Arduino is configured to return JSON data for the temperature when you send a 't' and return JSON data for the light level when you send an 'l'. This works perfectly through the Arduino Serial Monitor. I then created two bash scripts. One regularly sends the 't' and 'l' commands to Arduino and waits 10 seconds in between each request.
while true; do
echo -n t > /dev/ttyACM0
echo "$(date): Queried Arduino for temperature."
sleep 10
echo -n l > /dev/ttyACM0
echo "$(date): Queried Arduino for light."
sleep 10
done
This works fine. I get an echo message every 10 seconds. The other script reads the generated JSON from serial port (I basically copied it from some Web page).
ARDUINO_PORT=/dev/ttyACM0
ARDUINO_SPEED=9600
API_KEY='MY_PACHUBE_KEY'
FEED_ID='MY_FEED_ID'
# Set speed for usb
stty -F $ARDUINO_PORT ispeed $ARDUINO_SPEED ospeed $ARDUINO_SPEED raw
exec 6<$ARDUINO_PORT
# Read data from Arduino
while read -u 6 f ;do
# Remove trailing carriage return character added
# by println to satisfy stupid MS-DOS Computers
f=${f:0:${#f} - 1}
curl --request PUT --header "X-PachubeApiKey: $API_KEY" --data-binary "{ \"version\":\"1.0.0\", \"datastreams\":[ $f ] }" "http://api.pachube.com/v2/feeds/MY_FEED_ID"
echo "$(date) $f was read."
done
Unfortunately, this script goes crazy with echo messages telling me several times per 10 seconds that it posted data to Pachube although it should only do it every 10 seconds (whenever the first script told Arduino to create a JSON message). I thought it might be an issue with buffered messages on the Arduino but even when switching it off and on again the problem remains. Any thoughts? Thanks in advance.
I am completely unfamiliar with Arduino and a handful of other things you're doing here but here are a few general things I see:
Bash is almost entirely incapable of handling binary data reliably. There is no way to store a NUL byte in a Bash string. Looks like you're trying to pull some trickery to make arbitrary data readable - hopefully you're sending nothing but character data into read, otherwise this isn't likely going to work.
read reads newline-delimited input (or the given value of -d if your bash is new enough). I don't know the format the while loop is reading, but it has to be a newline delimited string of characters.
Use read -r unless you want escape sequences interpreted. (You almost always want -r with read.)
Unconditionally stripping a character off the end of each string isn't the greatest. I'd use: f=${f%+($'\r')}, which removes 1 or more adjacent \r's from the end of f. Remember to shopt -s extglob at the top of your script if this isn't the default.
This shouldn't be actually causing an issue, but I prefer not using exec unless it's really required - which it isn't here. Just put done <$ARDUINO_PORT to terminate the while loop and remove the -u 6 argument from read (unless something inside the loop is specifically reading from stdin and can't conflict, which doesn't appear to be the case). The open FD will automatically close when exiting the loop.
Don't create your own all-caps variable names in scripts because they are reserved and can conflict with variables from the environment. Use at least one lower-case letter. This of course doesn't apply if those variables are set by something in your system and you're only using or modifying them.