This question already has answers here:
Random number from a range in a Bash Script
(19 answers)
Closed 5 years ago.
With the shell script, I wish to generate five files, and I wish to put different random number range from 50000~150000 in each file. I tried something like following,
for i in 01 02 03 04 05; do
A=$((50000+100000))
B=$(($B%$A))
cat > ${i}.dat << EOF
AArandom=$A
EOF
done
But this does not work.... How can I make random numbers and print out for each file?
Each time you read the value of the variable $RANDOM,
it gives you a random number between 0 and 2^15 - 1,
that is 0 and 32767. So that doesn't give you enough range.
You could use two $RANDOM as two digits of base-15,
and then take appropriate modulo and apply appropriate range normalization.
Here's the logic wrapped in a function:
randrange() {
min=$1
max=$2
((range = max - min))
((maxrand = 2**30))
((limit = maxrand - maxrand % range))
while true; do
((r = RANDOM * 2**15 + RANDOM))
((r < limit)) && break
done
((num = min + r % range))
echo $num
}
And then you can generate the files in a loop like this:
for i in 01 02 03 04 05; do
echo "AArandom=$(randrange 50000 150000)" > $i.dat
done
Note that there is a caveat in the implementation of randrange:
there is a loop to re-roll in case the value would be biased,
but theoretically this may prevent the function from terminating.
In practice, that's extremely unlikely, but deserves a mention.
shuf is probably what you want:
$ shuf -i 50000-150000 -n 1
148495
Related
I have these variables maybe this can up until 100 ( minimun two variables)
(Maximun +50)
1=10
2=21
3=44
4=36
...
and need find which variables sum up to 57
In this case is variable 4 + 2.
Or maybe the result was 90 and this case is 1+3+4.
I think need some random code, maybe some like this.
#!/bin/bash
array[0]=10
array[1]=21
array[2]=44
array[3]=36
Next add random until this fits to result
But but if I have 100 variables and need to find a result is it possible?
I read some links to randomize but I never seen anything like this.
This recursive Bash function tries to find and print sums using a brute-force, check all possible sums, approach:
function print_sums
{
local -r target=$1 # Number for which sums are to be found
local -r pre_sum=$2 # Sum built up for an outer target
local -r values=( "${#:3}" ) # Values available to use in sums
if (( target == 0 )) ; then
printf '%s\n' "$pre_sum"
elif (( ${#values[*]} == 0 )) ; then
:
else
# Print any sums that include the first in the list of values
local first_value=${values[0]}
if (( first_value <= target )) ; then
local new_pre_sum
[[ -z $pre_sum ]] && new_pre_sum=$first_value \
|| new_pre_sum="$pre_sum+$first_value"
local new_target=$((target - first_value))
print_sums "$new_target" "$new_pre_sum" "${values[#]:1}"
fi
# Print any sums that don't include the first in the list of values
print_sums "$target" "$pre_sum" "${values[#]:1}"
fi
return 0
}
Example usage, with an extended list of possible values to use in sums is:
values=(10 21 44 36 85 61 69 81 76 39 95 22 30 4 29 47 80 18 40 44 )
print_sums 90 '' "${values[#]}"
This prints:
10+21+30+29
10+44+36
10+36+22+4+18
10+36+4+40
10+36+44
10+76+4
10+22+18+40
10+4+29+47
10+80
21+36+4+29
21+69
21+39+30
21+22+29+18
21+22+47
21+4+47+18
21+29+40
61+29
39+22+29
39+4+29+18
39+4+47
It takes less than a second to do this on an oldish Linux machine. However, the exponential explosion (each addition to the list of values doubles the potential number of sums to try) means that it is not a practical solution for significantly larger numbers of values. I haven't tried 50, but it's hopeless unless the target value is small so you get a lot of early returns.
The question asked for the indices of the values in the sum to be printed, not the values themselves. That can be done with minor modifications to the code (which are left as an exercise for anybody who is interested!).
I want to generate a random decimal number from 0 to 3, the result should look like this:
0.2
1.5
2.9
The only command I know is:
echo "0.$(( ($RANDOM%500) + 500))"
but this always generates 0.xxx. How do I do that ?
Bash has no support for non-integers. The snippet you have just generates a random number between 500 and 999 and then prints it after "0." to make it look like a real number.
There are lots of ways to do something similar in bash (generating the integer and decimal parts separately). To ensure a maximally even distribution, I would just decide how many digits you want after the decimal and pick a random integer with the same precision, then print the digits out with the decimal in the right place. For example, if you just want one digit after the decimal in the half-open range [0,3), you can generate an integer between 0 and 30 and then print out the tens and ones separated by a period:
(( n = RANDOM % 30 ))
printf '%s.%s\n' $(( n / 10 )) $(( n % 10 ))
If you want two digits after the decimal, use % 300 in the RANDOM assignment and 100 in the two expressions on the printf. And so on.
Alternatively, see the answer below for a number of solutions using other tools that aren't bash builtins:
https://stackoverflow.com/a/50359816/2836621
$RANDOM gives random integers in the range 0..32767
Knowing this, you have many options. Here are two:
Using bc:
$ bc <<< "scale=3; 3 * $RANDOM / 32767"
2.681
Constructing a number with two $RANDOMs:
$ echo "$(( $RANDOM % 3 )).$(( $RANDOM % 999 ))"
0.921
I have limited the precision to 3 decimal digits. Increasing/decreasing it should be trivial.
Using a "while" loop how i can display the sum of the following list of numbers 1 8 4 3 6 5 7 2.
i must create a sum variable to collect the sum of each value as the number is processed by the loop. bash script
If the numbers are stored in a file called list_of_numbers and that file in the current directory (your question does not state where the numbers are coming from), then you could calculate and output the sum like this:
sum=0
while read num
do
echo "$sum + $num = $((sum+num))";
((sum += num))
done < ./list_of_numbers
echo $sum
I have the following code that is addressing the Project Euler problem below:
2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.
What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?
My script works fine, generates me 2520 as it should do for 1-10, I also have an answer for 1-17 of 12252240, it looks like so:
#!/bin/bash
for ((i=1; i<10000000000; i++))
do
if (( i%2 == 0 )) && (( i%3 == 0 )) && (( i%4 == 0 )) && (( i%5 == 0 )) &&
(( i%6 == 0 )) && (( i%7 == 0 )) && (( i%8 == 0 )) && (( i%9 == 0 )) &&
(( i%10 == 0 )) && (( i%11 == 0 )) && (( i%12 == 0 )) && (( i%13 == 0 )) &&
(( i%14 == 0 )) && (( i%15 == 0 )) && (( i%16 == 0 )) && (( i%17 == 0 )); then
# remaning terms to factor && (( i%18 == 0 )) && (( i%19 == 0 )) && (( i%20 == 0 )); then
int=$i
fi
if [[ $int ]]; then
echo "Lowest integer = '$int'"
break
else
continue
fi
done
However, the jump from factoring around 12 terms (about 3/4th of a second real time), to factoring 17 (6 mins real time), in computational time is huge.
I've yet to let the full 20 factors run, but all Project Euler problems are supposed to be solvable in a few minutes on medium power home computers.
So my question is 2 fold: 1) Am I on the right track in terms of how I approached programming this, and 2) how else could/should I have done it to make it as efficient as possible?
Without abandoning the brute-force approach, running the inner loop in reverse order roughly halves the running time.
for ((i=1; i<100000000; ++i)); do
for ((j=17; j>1; --j)); do
(( i%j != 0 )) && break
done
((j==1)) && echo "$i" && break
done
Informally speaking, almost no numbers are divisible by 17, and out of those, almost no numbers are divisible by 16. Thus, running the inner loop in reverse order removes 16 iterations of the inner loop for most numbers, and 15 for most of the rest.
Additional optimizations are obvious; for example, the inner loop could end at 4, because 2, 3, and 4 are already covered by their respective squares (all numbers which are divisible by 9 are also divisible by 3, etc). However, that's small potatoes compared to the main optimization.
(You did not have an explicit inner loop, and in fact, unrolling the loop like you did probably achieves a small performance gain. I rolled it into an explicit loop mainly out of laziness as well as for aesthetic reasons.)
So my question is 2 fold:
1) Am I on the right track in terms of how I approached programming this, and
I'm afraid you're not. You're using the wrong tools, namely a shell scripting language, to solve mathematical problems, and wonder why that doesn't perform well. "being solvable in a couple of minutes on a home computer" doesn't mean it's supposed to be like that, no matter how unusual your choice of tool is.
2) how else could/should I have done it to make it as efficient as possible?
Don't use bash's arithmetics. Bash is a shell, which means it's an interpreter to its core. Which means that it'll spend very little time calculating, and very much time understanding what it should do. To illustrate: Your complicated formula first has to be parsed into a tree that tells bash in which order to execute things, then these things have to be identified, then bash needs to work through that tree and save all the results for the next level of the tree. The few arithmetic instructions that it does cost next to no computational time.
Have a look at numpy, which is a python module for mathematics; it does things faster. If you're not afraid to compile your stuff, look at C++ or C, both for which very very fast math libraries exist.
Arithmetic conditions support logical operators. The speed gain is not huge, but there's some:
if (( i % 2 == 0 && i % 3 == 0 && ... ))
Also note, that testing i % 10 == 0 when you already know that i % 2 == 0 and i % 5 == 0 is not needed.
There's a much faster way how to get the number without iterating over all the numbers.
The answer is not a faster programming language. The answer is a more clever algorithm.
You know your end answer has to be divisible by all of the numbers, so start with your largest number and only check multiples of it. Find the smallest number that is a multiple of your two biggest numbers, and then check only multiples of that for the next number.
Let's look at how this works for 1 to 10:
10 // not divisible by 9, keep adding 10's until divisible by 9
20
30
40
50
60
70
80
90 // divisible by 9, move on to 8, not divisible by 8, keep adding 90's
180
270
360 // divisible by 8, not divisible by 7, keep adding 360's
720
1080
1440
1800
2160
2520 // divisible by 7, 6, 5, 4, 3, 2, 1 so you're done!
So in only 17 steps, you have your answer.
This algorithm implemented in Ruby (not known for its speed) found the answer for 1-5000 in 4.5 seconds on a moderately fast laptop.
I have 100 datafiles, each with 1000 rows, and they all look something like this:
0 0 0 0
1 0 1 0
2 0 1 -1
3 0 1 -2
4 1 1 -2
5 1 1 -3
6 1 0 -3
7 2 0 -3
8 2 0 -4
9 3 0 -4
10 4 0 -4
.
.
.
999 1 47 -21
1000 2 47 -21
I have developed a script which is supposed to take the square of each value in columns 2,3,4, and then sum and square root them.
Like so:
temp = ($t1*$t1) + ($t2*$t2) + ($t3*$t3)
calc = $calc + sqrt ($temp)
It then calculates the square of that value, and averages these numbers over every data-file to output the average "calc" for each row and average "fluc" for each row.
The meaning of these numbers is this:
The first number is the step number, the next three are coordinates on the x, y and z axis respectively. I am trying to find the distance the "steps" have taken me from the origin, this is calculated with the formula r = sqrt(x^2 + y^2 + z^2). Next I need the fluctuation of r, which is calculated as f = r^4 or f = (r^2)^2.
These must be averages over the 100 data files, which leads me to:
r = r + sqrt(x^2 + y^2 + z^2)
avg = r/s
and similarly for f where s is the number of read data files which I figure out using sum=$(ls -l *.data | wc -l).
Finally, my last calculation is the deviation between the expected r and the average r, which is calculated as stddev = sqrt(fluc - (r^2)^2) outside of the loop using final values.
The script I created is:
#!/bin/bash
sum=$(ls -l *.data | wc -l)
paste -d"\t" *.data | nawk -v s="$sum" '{
for(i=0;i<=s-1;i++)
{
t1 = 2+(i*4)
t2 = 3+(i*4)
t3 = 4+(i*4)
temp = ($t1*$t1) + ($t2*$t2) + ($t3*$t3)
calc = $calc + sqrt ($temp)
fluc = $fluc + ($calc*$calc)
}
stddev = sqrt(($calc^2) - ($fluc))
print $1" "calc/s" "fluc/s" "stddev
temp=0
calc=0
stddev=0
}'
Unfortunately, part way through I receive an error:
nawk: cmd. line:9: (FILENAME=- FNR=3) fatal: attempt to access field -1
I am not experienced enough with awk to be able to figure out exactly where I am going wrong, could someone point me in the right direction or give me a better script?
The expected output is one file with:
0 0 0 0
1 (calc for all 1's) (fluc for all 1's) (stddev for all 1's)
2 (calc for all 2's) (fluc for all 2's) (stddev for all 2's)
.
.
.
The following script should do what you want. The only thing that might not work yet is the choice of delimiters. In your original script you seem to have tabs. My solution assumes spaces. But changing that should not be a problem.
It simply pipes all files sequentially into the nawk without counting the files first. I understand that this is not required. Instead of trying to keep track of positions in the file it uses arrays to store seperate statistical data for each step. In the end it iterates over all step indexes found and outputs them. Since the iteration is not sorted there is another pipe into a Unix sort call which handles this.
#!/bin/bash
# pipe the data of all files into the nawk processor
cat *.data | nawk '
BEGIN {
FS=" " # set the delimiter for the columns
}
{
step = $1 # step is in column 1
temp = $2*$2 + $3*$3 + $4*$4
# use arrays indexed by step to store data
calc[step] = calc[step] + sqrt (temp)
fluc[step] = fluc[step] + calc[step]*calc[step]
count[step] = count[step] + 1 # count the number of samples seen for a step
}
END {
# iterate over all existing steps (this is not sorted!)
for (i in count) {
stddev = sqrt((calc[i] * calc[i]) + (fluc[i] * fluc[i]))
print i" "calc[i]/count[i]" "fluc[i]/count[i]" "stddev
}
}' | sort -n -k 1 # that' why we sort here: first column "-k 1" and numerically "-n"
EDIT
As sugested by #edmorton awk can take care of loading the files itself. The following enhanced version removes the call to cat and instead passes the file pattern as parameter to nawk. Also, as suggested by #NictraSavios the new version introduces a special handling for the output of the statistics of the last step. Note that the gathering of the statistics is still done for all steps. It's a little difficult to suppress this during the reading of the data since at that point we don't know yet what the last step will be. Although this can be done with some extra effort you would probably loose a lot of robustness of your data handling since right now the script does not make any assumptions about:
the number of files provided,
the order of the files processed,
the number of steps in each file,
the order of the steps in a file,
the completeness of steps as a range without "holes".
Enhanced script:
#!/bin/bash
nawk '
BEGIN {
FS=" " # set the delimiter for the columns (not really required for space which is the default)
maxstep = -1
}
{
step = $1 # step is in column 1
temp = $2*$2 + $3*$3 + $4*$4
# remember maximum step for selected output
if (step > maxstep)
maxstep = step
# use arrays indexed by step to store data
calc[step] = calc[step] + sqrt (temp)
fluc[step] = fluc[step] + calc[step]*calc[step]
count[step] = count[step] + 1 # count the number of samples seen for a step
}
END {
# iterate over all existing steps (this is not sorted!)
for (i in count) {
stddev = sqrt((calc[i] * calc[i]) + (fluc[i] * fluc[i]))
if (i == maxstep)
# handle the last step in a special way
print i" "calc[i]/count[i]" "fluc[i]/count[i]" "stddev
else
# this is the normal handling
print i" "calc[i]/count[i]
}
}' *.data | sort -n -k 1 # that' why we sort here: first column "-k 1" and numerically "-n"
You could also use:
awk -f c.awk *.data
where c.awk is
{
j=FNR
temp=$2*$2+$3*$3+$4*$4
calc[j]=calc[j]+sqrt(temp)
fluc[j]=fluc[j]+calc[j]*calc[j]
}
END {
N=ARGIND
for (i=1; i<=FNR; i++) {
stdev=sqrt(fluc[i]-calc[i]*calc[i])
print i-1,calc[i]/N,fluc[i]/N,stdev
}
}