Random seed in a bash script - bash

I want to create a random seed in a bash script but somehow know how to calculate the same random seed again later on.
Suppose I have a range of servers id : 1 to 10. I want to randomly select a server to run a test on. I can do that by using the RANDOM function and see the value
echo $((1 + RANDOM % 10))
6
Then run the test on server id 6.
I do this in a loop for 5 times.
Is there a way to re-calculate the values (ids of servers) later on to see where the tests were ran on ? I should mention I do not wish to store the ids in an array.
Or if there is a way other than using the RANDOM function to accomplish this ?

Assuming you want re-playable "random" sequence, you could use the command shuf:
$ printf '%s\n' server{1..10} | shuf --random-source file
server3
server6
server5
server9
server1
server2
server4
server8
server7
server10
As long as you use the same file as random source, the sequence will stay the same and could be replayed in the same order.
For info, you can also use the command shuf -e server{1..10} --random-source file if you want to get rid of the printf ... | command.
Use -i option of this shuf command and if you have a consecutive range of number to shuffle.

Related

How do I test the speed between my site and a proxy server?

I'm getting complaints from employees in the field that our site is slow. When I check it -- the speed is acceptable. They are all going through a proxy server that is not controlled by me.
I'd like to run a continuous ping to the proxy server, but I haven't found anything to do that.
How do I check the speed from my site to a proxy server?
You can set up a cronjob to ping a site of your choice, at the frequency you choose. Here I ping google.com every 15 minutes. I can adjust the number of times I ping with the flag -c count and the time between pings with -i interval. This time is in seconds, I can use shorter intervals if required, for example 0.5.
I then pipe to tail -n to only use the last line with the results. At this stage my output is as follows:
rtt min/avg/max/mdev = 12.771/17.448/23.203/4.022 ms
We then use awk to only take the 4th field and use tr to replace the slashes with commas. Finally we store the result in a CSV file.
Here is the whole line in crontab:.
*/15 * * * * ping -c 5 -i 1 google.com | tail -n 1 | awk '{ print $4 }' | tr "/" "," >> /home/john/pingLog.csv
It is important to run this as root. To do so we edit the crontab using sudo:
sudo crontab -e
The end result is a comma separated file that you can open in Excel or equivalent, or process as you wish.
As noted in the ping output the 4 figures are min/avg/max/mdev.
Here is a version for Windows. The result is not so refined as we had in the Linux version but we're still getting the essentiels. You could put it in a .bat file and run it with a planned task or put it directly in the planned task.
ping google.com | findstr Minimum >> TotalPings.txt
Which adds the following line every time it is run:
Minimum = 23ms, Maximum = 23ms, Moyenne = 23ms
You can change the server pinged to suit your needs.

parallel computing in multiple cores for data which is indepedently run with the program

I have a simulation program in fortran which takes the input from a .dat. This file has 100.000 lines which takes really long to run. The program take the first line, run all the simulations and write in a .out the result and pass to the next line. I have a computer with 16 cpu so how can I do to split my data in 16 parts and run it separatly in each of the cpus? I am running in a machine with ubuntu. It is totally independent each line from the other.
For example my data is HeadData10000.dat, then I have a file simulation.ini with the name of the input data in this case: HeadData10000.dat and with the name of the output data. So the file simulation.ini will look like that
HeadData10000.dat
outputdata.out
Then now I have two computer so I split my HeadData10000.dat y two files and I do two simulation.ini for each input data and I run it like this in each computer: ./simulation.exe<./simulation.ini.
Assuming your list of 100,000 jobs is called "jobs.txt" and looks like this:
JobA
JobB
JobC
JobD
You could run this:
parallel 'printf "{}\n{.}.out" | ./simulation.exe' < jobs.txt
If you want to do a dry run to see what that would do without doing anything:
parallel --dry-run 'printf "{}\n{.}.out" | ./simulation.exe' < jobs.txt
Sample Output
printf "JobA\nJobA.out" | ./simulation.exe
printf "JobB\nJobB.out" | ./simulation.exe
printf "JobC\nJobC.out" | ./simulation.exe
printf "JobD\nJobD.out" | ./simulation.exe
If you have multiple servers available, look at using the -S parameter to GNU Parallel to spread the jobs across the machines. Also, look at the --eta and --bar parameters for getting progress reports.
I used printf "line1 \n line2" to generate two lines of input in order to avoid having to create, and later delete 100,000 files.
By default, GNU Parallel will keep 1 job per CPU core running, so there will always be 16 jobs running on your 16-core machine, but you can change that to, say, 8 if you want to with parallel -j 8. You can also specify the number of jobs to run on your second (and subsequent) machines.

Shell Scripting to compare the value of current iteration with that of the previous iteration

I have an infinite loop which uses aws cli to get the microservice names, it's parameters like desired tasks,number of running task etc for an environment.
There are 100's of microservices running in an environment. I have a requirement to compare the value of aws ecs metric running task for a particular microservice in the current loop and with that of the previous loop.
Say name a microservice X has the metric running task 5. As it is an infinite loop, after some time, again the loop come for the microservice X. Now, let's assume the value of running task is 4. I want to compare the running task for currnet loop, which is 4 with the value of the running task for the previous run, which is 5.
If you are asking a generic question of how to keep a previous value around so it can be compared to the current value, just store it in a variable. You can use the following as a starting point:
#!/bin/bash
previousValue=0
while read v; do
echo "Previous value=${previousValue}; Current value=${v}"
previousValue=${v}
done
exit 0
If the above script is called testval.sh. And you have an input file called test.in with the following values:
2
1
4
6
3
0
5
Then running
./testval.sh <test.in
will generate the following output:
Previous value=0; Current value=2
Previous value=2; Current value=1
Previous value=1; Current value=4
Previous value=4; Current value=6
Previous value=6; Current value=3
Previous value=3; Current value=0
Previous value=0; Current value=5
If the skeleton script works for you, feel free to modify it for however you need to do comparisons.
Hope this helps.
I dont know how your input looks exactly, but something like this might be useful for you :
The script
#!/bin/bash
declare -A app_stats
while read app tasks
do
if [[ ${app_stats[$app]} -ne $tasks && ! -z ${app_stats[$app]} ]]
then
echo "Number of tasks for $app has changed from ${app_stats[$app]} to $tasks"
app_stats[$app]=$tasks
else
app_stats[$app]=$tasks
fi
done <<< "$( cat input.txt)"
The input
App1 2
App2 5
App3 6
App1 6
The output
Number of tasks for App1 has changed from 2 to 6
Regards!

bash asynchronous variable setting (dns lookup)

Let's say we had a loop that we want to have run as quickly as possible. Let's say something was being done to a list of hosts inside that loop; just for the sake of argument, let's say it was a redis query. Let's say that the list of hosts may change occasionally due to hosts being added/removed from a pool (not load balanced); however, the list is predictable (e.g., they all start with “foo” and end with 2 digits. So we want to run this occasionally; say, once every 15 minutes:
listOfHosts=$(dig +noall +ans foo{00..99}.domain | while read -r n rest; do printf '%s\n' ${n%.}; done)
to get the list of hosts. Let's say our loop looked something like this:
while :; do
for i in $listOfHosts; do
redis-cli -h $i llen something
done
(( ( $(date +%s) % 60 * 15) == 0 )) && callFunctionThatSetslistOfHosts
done
(now obviously there's some things missing, like testing to see if we've already run callFunctionThatSetslistOfHosts in the current minute and only running it once, and doing something with the redis output, and maybe the list of hosts should be an array, but basically this is it.)
How can we run callFunctionThatSetslistOfHosts asynchronously so that it doesn't slow down the loop. I.e., have it running in the background setting listOfHosts occasionally (e.g. once every 15 minutes), so that the next time the inner loop is run it gets a potentially different set of hosts to run the redis query on?
My major problem seems to be that in order to set listOfHosts in a loop, that loop has to be a subshell, and listOfHosts is local to that subshell, and setting it doesn't affect the global listOfHosts.
I may resort to pipes, but will have to poll the reader before generating a new list — not that that's terribly bad if I poll slowly, but I thought I'd present this as a problem.
Thanks.

fastest hashing in a unix environment?

I need to examine the output of a certain script 1000s of times on a unix platform and check if any of it has changed from before.
I've been doing this:
(script_stuff) | md5sum
and storing this value. I actually don't really need "md5", JUST a simple hash function which I can compare against a stored value to see if its changed. Its okay if there are an occassional false positive.
Is there anything better than md5sum that works faster and generates a fairly usable hash value? The script itself generates a few lines of text - maybe 10-20 on average to max 100 or so.
I had a look at fast md5sum on millions of strings in bash/ubuntu - that's wonderful, but I can't compile a new program. Need a system utility... :(
Additional "background" details:
I've been asked to monitor the DNS record of a set of 1000 or so domains and immediately call certain other scripts if there has been any change. I intend to do a dig xyz +short statement and hash its output and store that, and then check it against a previously stored value. Any change will trigger the other script, otherwise it just goes on. Right now, we're planning on using cron for a set of these 1000, but can think completely diffeerently for "seriously heavy" usage - ~20,000 or so.
I have no idea what the use of such a system would be, I'm just doing this as a job for someone else...
The cksum utility calculates a non-cryptographic CRC checksum.
How big is the output you're checking? A hundred lines max. I'd just save the entire original file then use cmp to see if it's changed. Given that a hash calculation will have to read every byte anyway, the only way you'll get an advantage from a checksum type calculation is if the cost of doing it is less than reading two files of that size.
And cmp won't give you any false positives or negatives :-)
pax> echo hello >qq1.txt
pax> echo goodbye >qq2.txt
pax> cp qq1.txt qq3.txt
pax> cmp qq1.txt qq2.txt >/dev/null
pax> echo $?
1
pax> cmp qq1.txt qq3.txt >/dev/null
pax> echo $?
0
Based on your question update:
I've been asked to monitor the DNS record of a set of 1000 or so domains and immediately call certain other scripts if there has been any change. I intend to do a dig xyz +short statement and hash its output and store that, and then check it against a previously stored value. Any change will trigger the other script, otherwise it just goes on. Right now, we're planning on using cron for a set of these 1000, but can think completely diffeerently for "seriously heavy" usage - ~20,000 or so.
I'm not sure you need to worry too much about the file I/O. The following script executed dig microsoft.com +short 5000 times first with file I/O then with output to /dev/null (by changing the comments).
#!/bin/bash
rm -rf qqtemp
mkdir qqtemp
((i = 0))
while [[ $i -ne 5000 ]] ; do
#dig microsoft.com +short >qqtemp/microsoft.com.$i
dig microsoft.com +short >/dev/null
((i = i + 1))
done
The elapsed times at 5 runs each are:
File I/O | /dev/null
----------+-----------
3:09 | 1:52
2:54 | 2:33
2:43 | 3:04
2:49 | 2:38
2:33 | 3:08
After removing the outliers and averaging, the results are 2:49 for the file I/O and 2:45 for the /dev/null. The time difference is four seconds for 5000 iterations, only 1/1250th of a second per item.
However, since an iteration over the 5000 takes up to three minutes, that's how long it will take maximum to detect a problem (a minute and a half on average). If that's not acceptable, you need to move away from bash to another tool.
Given that a single dig only takes about 0.012 seconds, you should theoretically do 5000 in sixty seconds assuming your checking tool takes no time at all. You may be better off doing something like this in Perl and using an associative array to store the output from dig.
Perl's semi-compiled nature means that it will probably run substantially faster than a bash script and Perl's fancy stuff will make the job a lot easier. However, you're unlikely to get that 60-second time much lower just because that's how long it takes to run the dig commands.

Resources