Optimised random number generation in bash

Optimised random number generation in bash - bash

I'd like to generate a lot of integers between 0 and 1 using bash.
I tried shuf but the generation is very slow. Is there another way to generate numbers ?

This will output an infinite stream of bytes, written in binary and separated by a space :
cat /dev/urandom | xxd -b | cut -d" " -f 2-7 | tr "\n" " "
As an example :
10100010 10001101 10101110 11111000 10011001 01111011 11001010 00011010 11101001 01111101 10100111 00111011 10100110 01010110 11101110 01000011 00101011 10111000 01010110 10011101 01000011 00000010 10100001 11000110 11101100 11001011 10011100 10010001 01000111 01000010 01001011 11001101 11000111 11110111 00101011 00111011 10110000 01110101 01001111 01101000 01100000 11011101 11111111 11110001 10001011 11100001 11100110 10101100 11011001 11010100 10011010 00010001 00111001 01011010 00100101 00100100 00000101 10101010 00001011 10101101 11000001 10001111 10010111 01000111 11011000 01111011 10010110 00111100 11010000 11110000 11111011 00000110 00011011 11110110 00011011 11000111 11101100 11111001 10000110 11011101 01000000 00010000 00111111 11111011 01001101 10001001 00000010 10010000 00000001 10010101 11001011 00001101 00101110 01010101 11110101 10111011 01011100 00110111 10001001 00100100 01111001 01101101 10011011 00100001 01101101 01001111 01101000 00100001 10100011 00011000 01000001 00100100 10001101 10110110 11111000 01110111 10110111 11001000 00101000 01101000 01001100 10000001 11011000 11101110 11001010 10001101 00010011^C
If you don't want spaces between bytes (thanks #Chris):
cat /dev/urandom | xxd -b | head | cut -d" " -f 2-7 | tr -d "\n "
1000110001000101011111000010011011011111111001000000011000000100111101000001110110011011000000001101111111011000000100101001001110110001111000010100100100010110110000100111111110111011111100101000011000010010111010010001001001111000010101000110010010011011110000000011100110000000100111010001110000000011001011010101111001

tr -dc '01' < /dev/urandom is a quick and dirty way to do this.
If you're on OSX, tr can work a little weird, so you can use perl instead: perl -pe 'tr/01//dc' < /dev/urandom

Just for fun --
A native-bash function to print a specified number of random bits, extracted from the smallest possible number of evaluations of $RANDOM:
randbits() {
local x x_bits num_bits
num_bits=$1
while (( num_bits > 0 )); do
x=$RANDOM
x_bits="$(( x % 2 ))$(( x / 2 % 2 ))$(( x / 4 % 2 ))$(( x / 8 % 2 ))$(( x / 16 % 2 ))$(( x / 32 % 2 ))$(( x / 64 % 2 ))$(( x / 128 % 2 ))$(( x / 256 % 2 ))$(( x / 512 % 2 ))$(( x / 1024 % 2 ))$(( x / 2048 % 2 ))$(( x / 4096 % 2))$(( x / 8192 % 2 ))$(( x / 16384 % 2 ))"
if (( ${#x_bits} < $num_bits )); then
printf '%s' "$x_bits"
(( num_bits -= ${#x_bits} ))
else
printf '%s' "${x_bits:0:num_bits}"
break
fi
done
printf '\n'
}
Usage:
$ randbits 64
1011010001010011010110010110101010101010101011101100011101010010
Because this uses $RANDOM, its behavior can be made reproducible by assigning a seed value to $RANDOM before invoking it. This can be handy if you want to be able to reproduce bugs in software that uses "random" inputs.

Since the question asks for integers between 1 and 0, there is this extremely random and very fast method. A good one-liner for sure:
echo "0.$(printf $(date +'%N') | md5sum | tr -d '[:alpha:][:punct:]')"
This command will give you an output similar to this when thrown inside a for loop with 10 iterations:
0.97238535471032972041395
0.8642459339189067551494
0.18109959700829495487820
0.39135471514800072505703651
0.624084503017958530984255
0.41997456791539740171
0.689027289676627803
0.22698852059605560195614
0.037745437519184791498537
0.428629619193662260133
And if you need to print random strings of 1's and 0's, as others have assumed, you can make a slight change to the command like this:
printf $(date +'%N') | sha512sum | tr -d '[2-9][:alpha:][:punct:]'
Which will yield an output of random 0's and 1's similar to this when thrown into a for loop with 10 iterations:
011101001110
001110011011
0010100010111111
0000001101101001111011111111
1110101100
00010110100
1100101101110010
101100110101100
1100010100
0000111101100010001001
To my knowledge, and from what I have found online, this is the closest to true randomness we can get in bash. I have even made a game of dice (where the dice has 10 sides 0-9) to test the randomness, using this method for generating a single number from 0 to 9. Out of 100 dice throws, each side lands almost a perfect 10 times. Out of 1000 throws, each side hits around 890-1100 times. The variation of what side lands doesn't change much after 1000 throws. So you can be very sure that this method is highly ideal, at least for bash tools generating pseudo-random numbers, for the job.
And if you need just an absolute mind-blowingly ridiculous amount of randomness, the simple md5sum checksum command can be compounded upon itself many, many times and still be very fast. As an example:
printf $(date +'%N') | md5sum | md5sum | md5sum | tr -d '[:punct:][:space:]'
This will have a not-so-random number, obtained from printing the date command's nanosecond option, piped into md5sum. Then that md5 hash is piped into md5sum and then "that" hash is sent into md5sum for a last time. The output is a completely randomized hash that you can use tools like awk, sed, grep, and tr to control what you want printed.
Hope this helps.

Related

bash: conserve tab with spaces for alignment with column

I am trying to display .tsv files aligned nicely as columns, and yet allow limiting display to the current screen width. I am able to get this done in the following way that works in general but will fail if the input contains a particular character that is used by column. The current solution that I am using presently works as follows:
bash$ cat sample.tsv | tr '\t' '#' | column -n -t -s # | cut -c-`tput cols`
I tried using tab itself directly but could not make it work. And with default option for column, any whitespace and not just tabs are used so it does not work for me. Would be thankful for any better alternative than the above.
PS:
A sample is shown below
bash:~$ cat sample.tsv
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y
bash:~$ cat sample.tsv | tr '\t' '#' | column -n -t -s # | cut -c-`tput cols`
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y
bash:~$ cat sample.tsv | column -n -t | cut -c-`tput cols`
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y
bash:~$

You can set column to use tab as character to be used to delimit columns with -s:
column -t -s $'\t' -n sample.tsv
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y

How to find values 2 exponential in shell?

Is there a way to find a value's 2 exponential form in bash.
For example if I input 512 it should result output as 9 meaning 2 ^ 9 is 512.
Any help here is immensely appreciated - Thanks

When I read the question, 512 is the input, and 9 is the output. Is is possible what is being asked here is the answer to "log_base_2(512)" which has an answer of "9". If so, then maybe this would help.
$ echo "l(512) / l(2)" | bc -l
9.00000000000000000008
The explanation of the math can be found here:
How do I calculate the log of a number using bc?

Using awk.
$ echo 512 | awk '{print log($1)/log(2)}'
9
Put that into a script (expo.sh):
#!/bin/bash
_num="$1"
expon=$(awk -v a="$_num" 'BEGIN{print log(a)/log(2)}')
if [[ $expon =~ ^[0-9]+\.[0-9]*$ ]]; then # Match floating points
echo "$_num is not an exponent of 2"; # Not exponent if floating point
else
echo "$_num = 2^${expon}"; # print number
fi
Run:
$ ./expo.sh 512
512 = 2^9
$ ./expo.sh 21
21 is not an exponent of 2

A fast way to check a number x is an 2 exponent is to check bitwise and x and x-1 and to exclude 0, x>0
((x>0 && ( x & x-1 ) == 0 )) && echo $x is a 2-exponent
using this algorithm: fast-computing-of-log2-for-64-bit-integers to compute log2
tab32=( 0 9 1 10 13 21 2 29
11 14 16 18 22 25 3 30
8 12 20 28 15 17 24 7
19 27 23 6 26 5 4 31 )
log2_32() {
local value=$1
(( value |= value >> 1 ))
(( value |= value >> 2 ))
(( value |= value >> 4 ))
(( value |= value >> 8 ))
(( value |= value >> 16 ))
log2_32=${tab32[(value * 16#7C4ACDD & 16#ffffffff)>>27]}
}
log2_32 262144
echo "$log2_32"

Grep variable in for loop

I want to grep a specific line for each loop in a for loop. I've already looked on the internet to see an answer to my problem, I tried them but it doesn't seem to work for me... And I don't find what I'm doing wrong.
Here is the code :
for n in 2 4 6 8 10 12 14 ; do
for U in 1 10 100 ; do
for L in 2 4 6 8 ; do
i=0
cat results/output_iteration/occ_"$L"_"$n"_"$U"_it"$i".dat
for k in $(seq 1 1 $L) ; do
${'var'.$k}=`grep " $k " results/output_iteration/occ_"$L"_"$n"_"$U"_it"$i".dat | tail -n 1`
done
which gives me :
%
%
% site density double occupancy
1 0.49791021 0.03866179
2 0.49891438 0.06077808
3 0.50426102 0.05718336
4 0.49891438 0.06077808
./run_deviation_functionL.sh: line 109: ${'var'.$k}=`grep " $k " results/output_iteration/occ_"$L"_"$n"_"$U"_it"$i".dat | tail -n 1`: bad substitution
Then, I would like to take only the density number, with something like:
${'density'.$k}=`echo "${'var'.$k:10:10}" | bc -l`
Anyone knows the reason why it fails?

Use declare to create variable names from variables:
declare density$k="`...`"
Use the variable indirection to retrieve them:
var=var$k
echo ${!var:10:10}

How do i split the input into chunks of six entries each using bash?

This is the script which i run to output the raw data of data_tripwire.sh
#!/bin/sh
LOG=/var/log/syslog-ng/svrs/sec2tes1
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
CBS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.41 |sort|uniq | wc -l`
echo $CBS >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
GFS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.31 |sort|uniq | wc -l`
echo $GFS >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
HR1=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.10.1 |sort|uniq | wc -l `
echo $HR1 >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
HR2=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.12 |sort|uniq | wc -l`
echo $HR2 >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
PAYROLL=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.18 |sort|uniq | wc -l`
echo $PAYROLL >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
INCV=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.71 |sort|uniq | wc -l`
echo $INCV >> /home/secmgr/attmrms1/data_tripwire1.sh
done
data_tripwire.sh
91
58
54
108
52
18
8
81
103
110
129
137
84
15
14
18
11
17
12
6
1
28
6
14
8
8
0
0
28
24
25
23
21
13
9
4
18
17
18
30
13
3
I want to do the first 6 entries(91,58,54,108,52,18) from the output above. Then it will break out of the loop.After that it will continue for the next 6 entries.Then it will break out of the loop again....
The problem now is that it reads all the 42 numbers without breaking out of the loop.
This is the output of the table
Tripwire
Month CBS GFS HR HR Payroll INCV
cb2db1 gfs2db1 hr2web1 hrm2db1 hrm2db1a incv2svr1
2013-07 85 76 12 28 26 4
2013-08 58 103 18 6 24 18
2013-09 54 110 11 14 25 17
2013-10 108 129 17 8 23 18
2013-11 52 137 12 8 21 30
2013-12 18 84 6 0 13 13
2014-01 8 16 1 0 9 3
The problem now is that it read the total 42 numbers from 85...3
I want to make a loop which run from july till jan for one server.Then it will do the average mean and standard deviation calculation which is already done below.
After that done, it will continue the next cycle of 6 numbers for the next server and it will do the same like initial cycle.Assistance is required for the for loops which has break and continue in it or any simpler.
This is my standard deviation calculation
count=0 # Number of data points; global.
SC=3 # Scale to be used by bc. three decimal places.
E_DATAFILE=90 # Data file error
## ----------------- Set data file ---------------------
if [ ! -z "$1" ] # Specify filename as cmd-line arg?
then
datafile="$1" # ASCII text file,
else #+ one (numerical) data point per line!
datafile=/home/secmgr/attmrms1/data_tripwire1.sh
fi # See example data file, below.
if [ ! -e "$datafile" ]
then
echo "\""$datafile"\" does not exist!"
exit $E_DATAFILE
fi
Calculate the mean
arith_mean ()
{
local rt=0 # Running total.
local am=0 # Arithmetic mean.
local ct=0 # Number of data points.
while read value # Read one data point at a time.
do
rt=$(echo "scale=$SC; $rt + $value" | bc)
(( ct++ ))
done
am=$(echo "scale=$SC; $rt / $ct" | bc)
echo $am; return $ct # This function "returns" TWO values!
# Caution: This little trick will not work if $ct > 255!
# To handle a larger number of data points,
#+ simply comment out the "return $ct" above.
} <"$datafile" # Feed in data file.
sd ()
{
mean1=$1 # Arithmetic mean (passed to function).
n=$2 # How many data points.
sum2=0 # Sum of squared differences ("variance").
avg2=0 # Average of $sum2.
sdev=0 # Standard Deviation.
while read value # Read one line at a time.
do
diff=$(echo "scale=$SC; $mean1 - $value" | bc)
# Difference between arith. mean and data point.
dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
done
avg2=$(echo "scale=$SC; $sum2 / $n" | bc) # Avg. of sum of squares.
sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
echo $sdev # Standard Deviation.
} <"$datafile" # Rewinds data file.
Showing the output
mean=$(arith_mean); count=$? # Two returns from function!
std_dev=$(sd $mean $count)
echo
echo "<tr><th>Servers</th><th>"Number of data points in \"$datafile"\"</th> <th>Arithmetic mean (average)</th><th>Standard Deviation</th></tr>" >> $HTML
echo "<tr><td>cb2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>gfs2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hr2web1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1a<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>incv21svr1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo
I want to split the input into chunks of six entries each with the arithmetic mean and the sd of the entries 1..6, then of the entries 7..12, then of 13..18 etc.
This is the output of the table i want.
Tripwire
Month CBS GFS HR HR Payroll INCV
cb2db1 gfs2db1 hr2web1 hrm2db1 hrm2db1a incv2svr1
2013-07 85 76 12 28 26 4
2013-08 58 103 18 6 24 18
2013-09 54 110 11 14 25 17
2013-10 108 129 17 8 23 18
2013-11 52 137 12 8 21 30
2013-12 18 84 6 0 13 13
2014-01 8 16 1 0 9 3
*Standard
deviation
(7mths) 31.172 35.559 5.248 8.935 5.799 8.580
* Mean
(7mths) 54.428 94.285 11.142 9.142 20.285 14.714

paste - - - - - - < data_tripwire.sh | while read -a values; do
# values is an array with 6 values
# ${values[0]} .. ${values[5]}
arith_mean "${values[#]}"
done
This means you have to rewrite your function so they don't use read: change
while read value
to
for value in "$#"
#Matt, yes change both functions to iterate over arguments instead of reading from stdin. Then, you will pass the data file (now called "data_tripwire1.sh" (terrible file extension for data, use .txt or .dat)) into paste to reformat the data so that the first 6 values now form the first row. Read the line into the array values (using read -a values) and invoke the functions :
arith_mean () {
local sum=$(IFS=+; echo "$*")
echo "scale=$SC; ($sum)/$#" | bc
}
sd () {
local mean=$1
shift
local sum2=0
for i in "$#"; do
sum2=$(echo "scale=$SC; $sum2 + ($mean-$i)^2" | bc)
done
echo "scale=$SC; sqrt($sum2/$#)"|bc
}
paste - - - - - - < data_tripwire1.sh | while read -a values; do
mean=$(arith_mean "${values[#]}")
sd=$(sd $mean "${values[#]}")
echo "${values[#]} $mean $sd"
done | column -t
91 58 54 108 52 18 63.500 29.038
8 81 103 110 129 137 94.666 42.765
84 15 14 18 11 17 26.500 25.811
12 6 1 28 6 14 11.166 8.648
8 8 0 0 28 24 11.333 10.934
25 23 21 13 9 4 15.833 7.711
18 17 18 30 13 3 16.500 7.973
Note you don't need to return a fancy value from the functions: you know how many points you pass in.

Based on Glenn's answer I propose this which needs very little changes to the original:
paste - - - - - - < data_tripwire.sh | while read -a values
do
for value in "${values[#]}"
do
echo "$value"
done | arith_mean
for value in "${values[#]}"
do
echo "$value"
done | sd
done
You can type (or copy & paste) this code directly in an interactive shell. It should work out of the box. Of course, this is not feasible if you intend to use this often, so you can put that code into a text file, make that executable and call that text file as a shell script. In this case you should add #!/bin/bash as first line in that file.
Credit to Glenn Jackman for the use of paste - - - - - - which is the real solution I'd say.

The functions will now be able to only read 6 items in datafile.
arith_mean ()
{
local rt=0 # Running total.
local am=0 # Arithmetic mean.
local ct=0 # Number of data points.
while read value # Read one data point at a time.
do
rt=$(echo "scale=$SC; $rt + $value" | bc)
(( ct++ ))
done
am=$(echo "scale=$SC; $rt / $ct" | bc)
echo $am; return $ct # This function "returns" TWO values!
# Caution: This little trick will not work if $ct > 255!
# To handle a larger number of data points,
#+ simply comment out the "return $ct" above.
} <(awk -v block=$i 'NR > (6* (block - 1)) && NR < (6 * block + 1) {print}' "$datafile") # Feed in data file.
sd ()
{
mean1=$1 # Arithmetic mean (passed to function).
n=$2 # How many data points.
sum2=0 # Sum of squared differences ("variance").
avg2=0 # Average of $sum2.
sdev=0 # Standard Deviation.
while read value # Read one line at a time.
do
diff=$(echo "scale=$SC; $mean1 - $value" | bc)
# Difference between arith. mean and data point.
dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
done
avg2=$(echo "scale=$SC; $sum2 / $n" | bc) # Avg. of sum of squares.
sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
echo $sdev # Standard Deviation.
} <(awk -v block=$i 'NR > (6 * (block - 1)) && NR < (6 * block + 1) {print}' "$datafile") # Rewinds data file.
From main you will need to set your blocks to read.
for((i=1; i <= $(( $(wc -l $datafile | sed 's/[A-Za-z \/]*//g') / 6 )); i++))
do
mean=$(arith_mean); count=$? # Two returns from function!
std_dev=$(sd $mean $count)
done
Of course it is better to move the wc -l outside of the loop for faster execution. But you get the idea.
The syntax error occured between < and ( due to space. There shouldn't be a space between them. Sorry for the typo.
cat <(awk -F: '{print $1}' /etc/passwd) works.
cat < (awk -F: '{print $1}' /etc/passwd) syntax error near unexpected token `('

How to group the information respectively with comma separator in the input?

My file name is myInfo.txt under the current directory: DIR="$(pwd)";
inside it has:
1000 at num 2049 28 2068100
1000 at num 2049 28 2623200
1000 at num 2049 28 2833000
1000 at num 2049 28 3499700
1000 at num 2051 28 2453500
1000 at num 2051 28 2969400
1000 at num 2051 28 3071300
1000 at num 2051 28 3838200
Now I used the bash script sequentially:
DIR="$(pwd)";
array=(2049 2151);
for k in "${array[#]}"; do
grep "at num ${k}" myInfo.txt | cut -d' ' -f 6 > ${DIR}/Info/nums/${k}.out
done
and group the 6th column information in each row like 2068100 2623200...... into the file 2049.out and 2051.out respectively under the folder ${DIR}/Info/nums/
My question is: Can I use comma separator like follows to get the same functionality as before:
for k in "${array[#]}"; do
grep "at num ${k}" myInfo.txt | cut -d',' -f 6 > ${DIR}/Info/nums/${k}.out
done
I tried to re-generate the myInfo.txt to satisfy the above command:
1000,at num 2049,28,2068100
1000,at num 2049,28,2623200
1000,at num 2049,28,2833000
1000,at num 2049,28,3499700
1000,at num 2051 28,2453500
1000,at num 2051 28,2969400
1000,at num 2051 28,3071300
1000,at num 2051 28,3838200
and tried to group the information same as before. But it seems that the cut -d',' -f 6 cannot get the same functionality as cut -d' ' -f 6.
I wonder if the "cut -d',' -f 6" is valid? If it is valid, which kind of format of information should I re-generate in the myInfo.txt file? Thank you.

You can fix your problem in (at least) two ways:
Either you replace each space in myInfo.txt with a comma, and not just some, or you use the 4th column now (because when using the , as the delimiter, each column is separated by a column).
In any case, you should fix up your file so that your comma separation is consistent across all lines (right now you sometimes have 3, sometimes 2 commas).

If your input record is structured like this:
1000,at num 2049,28,2068100
Then you need
cut -d',' -f 4
To extract 4th column.
However if you want to use:
cut -d',' -f 6
then input record should be formatted like this:
1000,at,num,2049,28,2068100

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Optimised random number generation in bash - bash

I'd like to generate a lot of integers between 0 and 1 using bash. I tried shuf but the generation is very slow. Is there another way to generate numbers ?

tr -dc '01' < /dev/urandom is a quick and dirty way to do this. If you're on OSX, tr can work a little weird, so you can use perl instead: perl -pe 'tr/01//dc' < /dev/urandom

Related

bash: conserve tab with spaces for alignment with column

How to find values 2 exponential in shell?

Grep variable in for loop

How do i split the input into chunks of six entries each using bash?

How to group the information respectively with comma separator in the input?

Categories

Resources