Byte operations in Ruby - ruby

I would like to get a clue on how I can get Ruby to work with byte arrays
The below code is C#:
int t = (GetTime() / 60) //t is time in seconds divided by 60s (1 min)
byte[] myArray = new byte[64];
myArray[0] = (byte)(t >> 24);
myArray[1] = (byte)(t >> 16);
Any idea how I can get this to work in Ruby?

One way would be to work with arrays of integers and use Array#pack to pack the result into a binary string. E.g.
[65, 66, 67].pack('C*')
Returns ABC
Another way would be to manipulate a string directly when the encoding is set to "ASCII-8BIT"

Ruby can do bitwise operations and you can use a normal array, so I don't see a problem.
I don't use C# at the moment so I can't check if the results are the same.
t = Time.now.to_i / 60 #t is time in seconds divided by 60s (1 min)
myArray = []
myArray[0] = t >> 24
myArray[1] = t >> 16
p myArray #=>[1, 360]

Related

How can I convert a string ("longint") to float if the number of decimals are known?

Given a lat/lon combination as string:
lat = "514525865"
lon = "54892584"
I wish to convert them to floats:
lat = 51.4525865
lon = 5.4892584
As you can see the number of decimals is KNOWN and given to be 7.
I have tried to do conversion to char-array then adding a . char then merging the char-array but that feels super irrational
def pos_to_float(stringpos)
chars = stringpos.chars
chars.insert(-8,'.')
outstring = chars.join('')
return outstring.to_f
end
lat = "514525865"
floatlat = pos_to_float(lat)
puts floatlat
> 51.4525865
no error as this works but it feels stupid.. any better functions?
You can convert to float then divide by 10^7
p lat.to_f / 10 ** 7
#=> 51.4525865

How do I split an integer into 2 byte binary in Ruby?

Have ready C# code to split integer into 2 bytes as you can see below, Needs to re-write same in Ruby-
int seat2 = 65000;
// Split into two bytes
byte seats = (byte)(seat2 & 0xFF); // lower byte
byte options = (byte)((seat2 >> 8) & 0xFF); // upper byte
Below is the output above
Output Seats => 232
options => 253
// Merge back into integer
seat2 = (options << 8) | seats;
Please suggest anyone has any solution to rewrite the above in Ruby
The code you wrote would work well in Ruby with very few modifications.
You could simply try:
seat2 = 65000
seat2 & 0xFF
# => 232
(seat2 >> 8) & 0xFF
# => 253
An alternative would be to use pack and unpack:
[65000].pack('S').unpack('CC')
# => [232, 253]
[232, 253].pack('CC').unpack('S')
# => [65000]
I believe the most idiomatic way for binary transformations in Ruby is Array#pack and String#unpack (like in Eric's answer).
Also you have an option to use Numeric#divmod with 256(2^8, byte size):
> upper, lower = 65000.divmod(256)
# => [253, 232]
> upper
# => 253
> lower
# => 232
In this case, to have correct bytes, your Integer should not exceed 65535 (2^16-1).
Another one:
lower, upper = 65000.digits(256)

Number crunching in Ruby (optimisation needed)

Ruby may not be the optimal language for this but I'm sort of comfortable working with this in my terminal so that's what I'm going with.
I need to process the numbers from 1 to 666666 so I pin out all the numbers that contain 6 but doesn't contain 7, 8 or 9. The first number will be 6, the next 16, then 26 and so forth.
Then I needed it printed like this (6=6) (16=6) (26=6) and when I have ranges like 60 to 66 I need it printed like (60 THRU 66=6) (SPSS syntax).
I have this code and it works but it's neither beautiful nor very efficient so how could I optimize it?
(silly code may follow)
class Array
def to_ranges
array = self.compact.uniq.sort
ranges = []
if !array.empty?
# Initialize the left and right endpoints of the range
left, right = array.first, nil
array.each do |obj|
# If the right endpoint is set and obj is not equal to right's successor
# then we need to create a range.
if right && obj != right.succ
ranges << Range.new(left,right)
left = obj
end
right = obj
end
ranges << Range.new(left,right) unless left == right
end
ranges
end
end
write = ""
numbers = (1..666666).to_a
# split each number in an array containing it's ciphers
numbers = numbers.map { |i| i.to_s.split(//) }
# delete the arrays that doesn't contain 6 and the ones that contains 6 but also 8, 7 and 9
numbers = numbers.delete_if { |i| !i.include?('6') }
numbers = numbers.delete_if { |i| i.include?('7') }
numbers = numbers.delete_if { |i| i.include?('8') }
numbers = numbers.delete_if { |i| i.include?('9') }
# join the ciphers back into the original numbers
numbers = numbers.map { |i| i.join }
numbers = numbers.map { |i| i = Integer(i) }
# rangify consecutive numbers
numbers = numbers.to_ranges
# edit the ranges that go from 1..1 into just 1
numbers = numbers.map do |i|
if i.first == i.last
i = i.first
else
i = i
end
end
# string stuff
numbers = numbers.map { |i| i.to_s.gsub(".."," thru ") }
numbers = numbers.map { |i| "(" + i.to_s + "=6)"}
numbers.each { |i| write << " " + i }
File.open('numbers.txt','w') { |f| f.write(write) }
As I said it works for numbers even in the millions - but I'd like some advice on how to make prettier and more efficient.
I deleted my earlier attempt to parlez-vous-ruby? and made up for that. I know have an optimized version of x3ro's excellent example.
$,="\n"
puts ["(0=6)", "(6=6)", *(1.."66666".to_i(7)).collect {|i| i.to_s 7}.collect do |s|
s.include?('6')? "(#{s}0 THRU #{s}6=6)" : "(#{s}6=6)"
end ]
Compared to x3ro's version
... It is down to three lines
... 204.2 x faster (to 66666666)
... has byte-identical output
It uses all my ideas for optimization
gen numbers based on modulo 7 digits (so base-7 numbers)
generate the last digit 'smart': this is what compresses the ranges
So... what are the timings? This was testing with 8 digits (to 66666666, or 823544 lines of output):
$ time ./x3ro.rb > /dev/null
real 8m37.749s
user 8m36.700s
sys 0m0.976s
$ time ./my.rb > /dev/null
real 0m2.535s
user 0m2.460s
sys 0m0.072s
Even though the performance is actually good, it isn't even close to the C optimized version I posted before: I couldn't run my.rb to 6666666666 (6x10) because of OutOfMemory. When running to 9 digits, this is the comparative result:
sehe#meerkat:/tmp$ time ./my.rb > /dev/null
real 0m21.764s
user 0m21.289s
sys 0m0.476s
sehe#meerkat:/tmp$ time ./t2 > /dev/null
real 0m1.424s
user 0m1.408s
sys 0m0.012s
The C version is still some 15x faster... which is only fair considering that it runs on the bare metal.
Hope you enjoyed it, and can I please have your votes if only for learning Ruby for the purpose :)
(Can you tell I'm proud? This is my first encounter with ruby; I started the ruby koans 2 hours ago...)
Edit by #johndouthat:
Very nice! The use of base7 is very clever and this a great job for your first ruby trial :)
Here's a slight modification of your snippet that will let you test 10+ digits without getting an OutOfMemory error:
puts ["(0=6)", "(6=6)"]
(1.."66666666".to_i(7)).each do |i|
s = i.to_s(7)
puts s.include?('6') ? "(#{s}0 THRU #{s}6=6)" : "(#{s}6=6)"
end
# before:
real 0m26.714s
user 0m23.368s
sys 0m2.865s
# after
real 0m15.894s
user 0m13.258s
sys 0m1.724s
Exploiting patterns in the numbers, you can short-circuit lots of the loops, like this:
If you define a prefix as the 100s place and everything before it,
and define the suffix as everything in the 10s and 1s place, then, looping
through each possible prefix:
If the prefix is blank (i.e. you're testing 0-99), then there are 13 possible matches
elsif the prefix contains a 7, 8, or 9, there are no possible matches.
elsif the prefix contains a 6, there are 49 possible matches (a 7x7 grid)
else, there are 13 possible matches. (see the image below)
(the code doesn't yet exclude numbers that aren't specifically in the range, but it's pretty close)
number_range = (1..666_666)
prefix_range = ((number_range.first / 100)..(number_range.last / 100))
for p in prefix_range
ps = p.to_s
# TODO: if p == prefix_range.last or p == prefix_range.first,
# TODO: test to see if number_range.include?("#{ps}6".to_i), etc...
if ps == '0'
puts "(6=6) (16=6) (26=6) (36=6) (46=6) (56=6) (60 thru 66) "
elsif ps =~ /7|8|9/
# there are no candidate suffixes if the prefix contains 7, 8, or 9.
elsif ps =~ /6/
# If the prefix contains a 6, then there are 49 candidate suffixes
for i in (0..6)
print "(#{ps}#{i}0 thru #{ps}#{i}6) "
end
puts
else
# If the prefix doesn't contain 6, 7, 8, or 9, then there are only 13 candidate suffixes.
puts "(#{ps}06=6) (#{ps}16=6) (#{ps}26=6) (#{ps}36=6) (#{ps}46=6) (#{ps}56=6) (#{ps}60 thru #{ps}66) "
end
end
Which prints out the following:
(6=6) (16=6) (26=6) (36=6) (46=6) (56=6) (60 thru 66)
(106=6) (116=6) (126=6) (136=6) (146=6) (156=6) (160 thru 166)
(206=6) (216=6) (226=6) (236=6) (246=6) (256=6) (260 thru 266)
(306=6) (316=6) (326=6) (336=6) (346=6) (356=6) (360 thru 366)
(406=6) (416=6) (426=6) (436=6) (446=6) (456=6) (460 thru 466)
(506=6) (516=6) (526=6) (536=6) (546=6) (556=6) (560 thru 566)
(600 thru 606) (610 thru 616) (620 thru 626) (630 thru 636) (640 thru 646) (650 thru 656) (660 thru 666)
(1006=6) (1016=6) (1026=6) (1036=6) (1046=6) (1056=6) (1060 thru 1066)
(1106=6) (1116=6) (1126=6) (1136=6) (1146=6) (1156=6) (1160 thru 1166)
(1206=6) (1216=6) (1226=6) (1236=6) (1246=6) (1256=6) (1260 thru 1266)
(1306=6) (1316=6) (1326=6) (1336=6) (1346=6) (1356=6) (1360 thru 1366)
(1406=6) (1416=6) (1426=6) (1436=6) (1446=6) (1456=6) (1460 thru 1466)
(1506=6) (1516=6) (1526=6) (1536=6) (1546=6) (1556=6) (1560 thru 1566)
(1600 thru 1606) (1610 thru 1616) (1620 thru 1626) (1630 thru 1636) (1640 thru 1646) (1650 thru 1656) (1660 thru 1666)
etc...
Note I don't speak ruby, but I intend to dohave done a ruby version later just for speed comparison :)
If you just iterate all numbers from 0 to 117648 (ruby <<< 'print "666666".to_i(7)') and print them in base-7 notation, you'll at least have discarded any numbers containing 7,8,9. This includes the optimization suggestion by MrE, apart from lifting the problem to simple int arithmetic instead of char-sequence manipulations.
All that remains, is to check for the presence of at least one 6. This would make the algorithm skip at most 6 items in a row, so I deem it less unimportant (the average number of skippable items on the total range is 40%).
Simple benchmark to 6666666666
(Note that this means outputting 222,009,073 (222M) lines of 6-y numbers)
Staying close to this idea, I wrote this quite highly optimized C code (I don't speak ruby) to demonstrate the idea. I ran it to 282475248 (congruent to 6666666666 (mod 7)) so it was more of a benchmark to measure: 0m26.5s
#include <stdio.h>
static char buf[11];
char* const bufend = buf+10;
char* genbase7(int n)
{
char* it = bufend; int has6 = 0;
do
{
has6 |= 6 == (*--it = n%7);
n/=7;
} while(n);
return has6? it : 0;
}
void asciify(char* rawdigits)
{
do { *rawdigits += '0'; }
while (++rawdigits != bufend);
}
int main()
{
*bufend = 0; // init
long i;
for (i=6; i<=282475248; i++)
{
char* b7 = genbase7(i);
if (b7)
{
asciify(b7);
puts(b7);
}
}
}
I also benchmarked another approach, which unsurprisingly ran in less than half the time because
this version directly manipulates the results in ascii string form, ready for display
this version shortcuts the has6 flag for deeper recursion levels
this version also optimizes the 'twiddling' of the last digit when it is required to be '6'
the code is simply shorter...
Running time: 0m12.8s
#include <stdio.h>
#include <string.h>
inline void recursive_permute2(char* const b, char* const m, char* const e, int has6)
{
if (m<e)
for (*m = '0'; *m<'7'; (*m)++)
recursive_permute2(b, m+1, e, has6 || (*m=='6'));
else
if (has6)
for (*e = '0'; *e<'7'; (*e)++)
puts(b);
else /* optimize for last digit must be 6 */
puts((*e='6', b));
}
inline void recursive_permute(char* const b, char* const e)
{
recursive_permute2(b, b, e-1, 0);
}
int main()
{
char buf[] = "0000000000";
recursive_permute(buf, buf+sizeof(buf)/sizeof(*buf)-1);
}
Benchmarks measured with:
gcc -O4 t6.c -o t6
time ./t6 > /dev/null
$range_start = -1
$range_end = -1
$f = File.open('numbers.txt','w')
def output_number(i)
if $range_end == i-1
$range_end = i
elsif $range_start < $range_end
$f.puts "(#{$range_start} thru #{$range_end})"
$range_start = $range_end = i
else
$f.puts "(#{$range_start}=6)" if $range_start > 0 # no range, print out previous number
$range_start = $range_end = i
end
end
'1'.upto('666') do |n|
next unless n =~ /6/ # keep only numbers that contain 6
next if n =~ /[789]/ # remove nubmers that contain 7, 8 or 9
output_number n.to_i
end
if $range_start < $range_end
$f.puts "(#{$range_start} thru #{$range_end})"
end
$f.close
puts "Ruby is beautiful :)"
I came up with this piece of code, which I tried to keep more or less in FP-styling. Probably not much more efficient (as it has been said, with basic number logic you will be able to increase performance, for example by skipping from 19xx to 2000 directly, but that I will leave up to you :)
def check(n)
n = n.to_s
n.include?('6') and
not n.include?('7') and
not n.include?('8') and
not n.include?('9')
end
def spss(ranges)
ranges.each do |range|
if range.first === range.last
puts "(" + range.first.to_s + "=6)"
else
puts "(" + range.first.to_s + " THRU " + range.last.to_s + "=6)"
end
end
end
range = (1..666666)
range = range.select { |n| check(n) }
range = range.inject([0..0]) do |ranges, n|
temp = ranges.last
if temp.last + 1 === n
ranges.pop
ranges.push(temp.first..n)
else
ranges.push(n..n)
end
end
spss(range)
My first answer was trying to be too clever. Here is a much simpler version
class MutablePrintingCandidateRange < Struct.new(:first, :last)
def to_s
if self.first == nil and self.last == nil
''
elsif self.first == self.last
"(#{self.first}=6)"
else
"(#{self.first} thru #{self.last})"
end
end
def <<(x)
if self.first == nil and self.last == nil
self.first = self.last = x
elsif self.last == x - 1
self.last = x
else
puts(self) # print the candidates
self.first = self.last = x # reset the range
end
end
end
and how to use it:
numer_range = (1..666_666)
current_range = MutablePrintingCandidateRange.new
for i in numer_range
candidate = i.to_s
if candidate =~ /6/ and candidate !~ /7|8|9/
# number contains a 6, but not a 7, 8, or 9
current_range << i
end
end
puts current_range
Basic observation: If the current number is (say) 1900 you know that you can safely skip up to at least 2000...
(I didn't bother updating my C solution for formatting. Instead I went with x3ro's excellent ruby version and optimized that)
Undeleted:
I still am not sure whether the changed range-notation behaviour isn't actually what the OP wants: This version changes the behaviour of breaking up ranges that are actually contiguous modulo 6; I wouldn't be surprised the OP actually expected
.
....
(555536=6)
(555546=6)
(555556 THRU 666666=6)
instead of
....
(666640 THRU 666646=6)
(666650 THRU 666656=6)
(666660 THRU 666666=6)
I'll let the OP decide, and here is the modified version, which runs in 18% of the time as x3ro's version (3.2s instead of 17.0s when generating up to 6666666 (7x6)).
def check(n)
n.to_s(7).include?('6')
end
def spss(ranges)
ranges.each do |range|
if range.first === range.last
puts "(" + range.first.to_s(7) + "=6)"
else
puts "(" + range.first.to_s(7) + " THRU " + range.last.to_s(7) + "=6)"
end
end
end
range = (1..117648)
range = range.select { |n| check(n) }
range = range.inject([0..0]) do |ranges, n|
temp = ranges.last
if temp.last + 1 === n
ranges.pop
ranges.push(temp.first..n)
else
ranges.push(n..n)
end
end
spss(range)
My answer below is not complete, but just to show a path (I might come back and continue the answer):
There are only two cases:
1) All the digits besides the lowest one is either absent or not 6
6, 16, ...
2) At least one digit besides the lowest one includes 6
60--66, 160--166, 600--606, ...
Cases in (1) do not include any continuous numbers because they all have 6 in the lowest digit, and are different from one another. Cases in (2) all appear as continuous ranges where the lowest digit continues from 0 to 6. Any single continuation in (2) is not continuous with another one in (2) or with anything from (1) because a number one less than xxxxx0 will be xxxxy9, and a number one more than xxxxxx6 will be xxxxxx7, and hence be excluded.
Therefore, the question reduces to the following:
3)
Get all strings between "" to "66666" that do not include "6"
For each of them ("xxx"), output the string "(xxx6=6)"
4)
Get all strings between "" to "66666" that include at least one "6"
For each of them ("xxx"), output the string "(xxx0 THRU xxx6=6)"
The killer here is
numbers = (1..666666).to_a
Range supports iterations so you would be better off by going over the whole range and accumulating numbers that include your segments in blocks. When one block is finished and supplanted by another you could write it out.

How do I generate a random string of up to a certain length?

I would like to generate a random string (or a series of random strings, repetitions allowed) of length between 1 and n characters from some (finite) alphabet. Each string should be equally likely (in other words, the strings should be uniformly distributed).
The uniformity requirement means that an algorithm like this doesn't work:
alphabet = "abcdefghijklmnopqrstuvwxyz"
len = rand(1, n)
s = ""
for(i = 0; i < len; ++i)
s = s + alphabet[rand(0, 25)]
(pseudo code, rand(a, b) returns a integer between a and b, inclusively, each integer equally likely)
This algorithm generates strings with uniformly distributed lengths, but the actual distribution should be weighted toward longer strings (there are 26 times as many strings with length 2 as there are with length 1, and so on.) How can I achieve this?
What you need to do is generate your length and then your string as two distinct steps. You will need to first chose the length using a weighted approach. You can calculate the number of strings of a given length l for an alphabet of k symbols as k^l. Sum those up and then you have the total number of strings of any length, your first step is to generate a random number between 1 and that value and then bin it accordingly. Modulo off by one errors you would break at 26, 26^2, 26^3, 26^4 and so on. The logarithm based on the number of symbols would be useful for this task.
Once you have you length then you can generate the string as you have above.
Okay, there are 26 possibilities for a 1-character string, 262 for a 2-character string, and so on up to 2626 possibilities for a 26-character string.
That means there are 26 times as many possibilities for an (N)-character string than there are for an (N-1)-character string. You can use that fact to select your length:
def getlen(maxlen):
sz = maxlen
while sz != 1:
if rnd(27) != 1:
return sz
sz--;
return 1
I use 27 in the above code since the total sample space for selecting strings from "ab" is the 26 1-character possibilities and the 262 2-character possibilities. In other words, the ratio is 1:26 so 1-character has a probability of 1/27 (rather than 1/26 as I first answered).
This solution isn't perfect since you're calling rnd multiple times and it would be better to call it once with an possible range of 26N+26N-1+261 and select the length based on where your returned number falls within there but it may be difficult to find a random number generator that'll work on numbers that large (10 characters gives you a possible range of 2610+...+261 which, unless I've done the math wrong, is 146,813,779,479,510).
If you can limit the maximum size so that your rnd function will work in the range, something like this should be workable:
def getlen(chars,maxlen):
assert maxlen >= 1
range = chars
sampspace = 0
for i in 1 .. maxlen:
sampspace = sampspace + range
range = range * chars
range = range / chars
val = rnd(sampspace)
sz = maxlen
while val < sampspace - range:
sampspace = sampspace - range
range = range / chars
sz = sz - 1
return sz
Once you have the length, I would then use your current algorithm to choose the actual characters to populate the string.
Explaining it further:
Let's say our alphabet only consists of "ab". The possible sets up to length 3 are [ab] (2), [ab][ab] (4) and [ab][ab][ab] (8). So there is a 8/14 chance of getting a length of 3, 4/14 of length 2 and 2/14 of length 1.
The 14 is the magic figure: it's the sum of all 2n for n = 1 to the maximum length. So, testing that pseudo-code above with chars = 2 and maxlen = 3:
assert maxlen >= 1 [okay]
range = chars [2]
sampspace = 0
for i in 1 .. 3:
i = 1:
sampspace = sampspace + range [0 + 2 = 2]
range = range * chars [2 * 2 = 4]
i = 2:
sampspace = sampspace + range [2 + 4 = 6]
range = range * chars [4 * 2 = 8]
i = 3:
sampspace = sampspace + range [6 + 8 = 14]
range = range * chars [8 * 2 = 16]
range = range / chars [16 / 2 = 8]
val = rnd(sampspace) [number from 0 to 13 inclusive]
sz = maxlen [3]
while val < sampspace - range: [see below]
sampspace = sampspace - range
range = range / chars
sz = sz - 1
return sz
So, from that code, the first iteration of the final loop will exit with sz = 3 if val is greater than or equal to sampspace - range [14 - 8 = 6]. In other words, for the values 6 through 13 inclusive, 8 of the 14 possibilities.
Otherwise, sampspace becomes sampspace - range [14 - 8 = 6] and range becomes range / chars [8 / 2 = 4].
Then the second iteration of the final loop will exit with sz = 2 if val is greater than or equal to sampspace - range [6 - 4 = 2]. In other words, for the values 2 through 5 inclusive, 4 of the 14 possibilities.
Otherwise, sampspace becomes sampspace - range [6 - 4 = 2] and range becomes range / chars [4 / 2 = 2].
Then the third iteration of the final loop will exit with sz = 1 if val is greater than or equal to sampspace - range [2 - 2 = 0]. In other words, for the values 0 through 1 inclusive, 2 of the 14 possibilities (this iteration will always exit since the value must be greater than or equal to zero.
In retrospect, that second solution is a bit of a nightmare. In my personal opinion, I'd go for the first solution for its simplicity and to avoid the possibility of rather large numbers.
Building on my comment posted as a reply to the OP:
I'd consider it an exercise in base
conversion. You're simply generating a
"random number" in "base 26", where
a=0 and z=25. For a random string of
length n, generate a number between 1
and 26^n. Convert from base 10 to base
26, using symbols from your chosen
alphabet.
Here's a PHP implementation. I won't guaranty that there isn't an off-by-one error or two in here, but any such error should be minor:
<?php
$n = 5;
var_dump(randstr($n));
function randstr($maxlen) {
$dict = 'abcdefghijklmnopqrstuvwxyz';
$rand = rand(0, pow(strlen($dict), $maxlen));
$str = base_convert($rand, 10, 26);
//base convert returns base 26 using 0-9 and 15 letters a-p(?)
//we must convert those to our own set of symbols
return strtr($str, '1234567890abcdefghijklmnopqrstuvwxyz', $dict);
}
Instead of picking a length with uniform distribution, weight it according to how many strings are a given length. If your alphabet is size m, there are mx strings of size x, and (1-mn+1)/(1-m) strings of length n or less. The probability of choosing a string of length x should be mx*(1-m)/(1-mn+1).
Edit:
Regarding overflow - using floating point instead of integers will expand the range, so for a 26-character alphabet and single-precision floats, direct weight calculation shouldn't overflow for n<26.
A more robust approach is to deal with it iteratively. This should also minimize the effects of underflow:
int randomLength() {
for(int i = n; i > 0; i--) {
double d = Math.random();
if(d > (m - 1) / (m - Math.pow(m, -i))) {
return i;
}
}
return 0;
}
To make this more efficient by calculating fewer random numbers, we can reuse them by splitting intervals in more than one place:
int randomLength() {
for(int i = n; i > 0; i -= 5) {
double d = Math.random();
double c = (m - 1) / (m - Math.pow(m, -i))
for(int j = 0; j < 5; j++) {
if(d > c) {
return i - j;
}
c /= m;
}
}
for(int i = n % 0; i > 0; i--) {
double d = Math.random();
if(d > (m - 1) / (m - Math.pow(m, -i))) {
return i;
}
}
return 0;
}
Edit: This answer isn't quite right. See the bottom for a disproof. I'll leave it up for now in the hope someone can come up with a variant that fixes it.
It's possible to do this without calculating the length separately - which, as others have pointed out, requires raising a number to a large power, and generally seems like a messy solution to me.
Proving that this is correct is a little tough, and I'm not sure I trust my expository powers to make it clear, but bear with me. For the purposes of the explanation, we're generating strings of length at most n from an alphabet a of |a| characters.
First, imagine you have a maximum length of n, and you've already decided you're generating a string of at least length n-1. It should be obvious that there are |a|+1 equally likely possibilities: we can generate any of the |a| characters from the alphabet, or we can choose to terminate with n-1 characters. To decide, we simply pick a random number x between 0 and |a| (inclusive); if x is |a|, we terminate at n-1 characters; otherwise, we append the xth character of a to the string. Here's a simple implementation of this procedure in Python:
def pick_character(alphabet):
x = random.randrange(len(alphabet) + 1)
if x == len(alphabet):
return ''
else:
return alphabet[x]
Now, we can apply this recursively. To generate the kth character of the string, we first attempt to generate the characters after k. If our recursive invocation returns anything, then we know the string should be at least length k, and we generate a character of our own from the alphabet and return it. If, however, the recursive invocation returns nothing, we know the string is no longer than k, and we use the above routine to select either the final character or no character. Here's an implementation of this in Python:
def uniform_random_string(alphabet, max_len):
if max_len == 1:
return pick_character(alphabet)
suffix = uniform_random_string(alphabet, max_len - 1)
if suffix:
# String contains characters after ours
return random.choice(alphabet) + suffix
else:
# String contains no characters after our own
return pick_character(alphabet)
If you doubt the uniformity of this function, you can attempt to disprove it: suggest a string for which there are two distinct ways to generate it, or none. If there are no such strings - and alas, I do not have a robust proof of this fact, though I'm fairly certain it's true - and given that the individual selections are uniform, then the result must also select any string with uniform probability.
As promised, and unlike every other solution posted thus far, no raising of numbers to large powers is required; no arbitrary length integers or floating point numbers are needed to store the result, and the validity, at least to my eyes, is fairly easy to demonstrate. It's also shorter than any fully-specified solution thus far. ;)
If anyone wants to chip in with a robust proof of the function's uniformity, I'd be extremely grateful.
Edit: Disproof, provided by a friend:
dato: so imagine alphabet = 'abc' and n = 2
dato: you have 9 strings of length 2, 3 of length 1, 1 of length 0
dato: that's 13 in total
dato: so probability of getting a length 2 string should be 9/13
dato: and probability of getting a length 1 or a length 0 should be 4/13
dato: now if you call uniform_random_string('abc', 2)
dato: that transforms itself into a call to uniform_random_string('abc', 1)
dato: which is an uniform distribution over ['a', 'b', 'c', '']
dato: the first three of those yield all the 2 length strings
dato: and the latter produce all the 1 length strings and the empty strings
dato: but 0.75 > 9/13
dato: and 0.25 < 4/13
// Note space as an available char
alphabet = "abcdefghijklmnopqrstuvwxyz "
result_string = ""
for( ;; )
{
s = ""
for( i = 0; i < n; i++ )
s += alphabet[rand(0, 26)]
first_space = n;
for( i = 0; i < n; i++ )
if( s[ i ] == ' ' )
{
first_space = i;
break;
}
ok = true;
// Reject "duplicate" shorter strings
for( i = first_space + 1; i < n; i++ )
if( s[ i ] != ' ' )
{
ok = false;
break;
}
if( !ok )
continue;
// Extract the short version of the string
for( i = 0; i < first_space; i++ )
result_string += s[ i ];
break;
}
Edit: I forgot to disallow 0-length strings, that will take a bit more code which I don't have time to add now.
Edit: After considering how my answer doesn't scale to large n (takes too long to get lucky and find an accepted string), I like paxdiablo's answer much better. Less code too.
Personally I'd do it like this:
Let's say your alphabet has Z characters. Then the number of possible strings for each length L is:
L | Z
--------------------------
1 | 26
2 | 676 (= 26 * 26)
3 | 17576 (= 26 * 26 * 26)
...and so on.
Now let's say your maximum desired length is N. Then the total number of possible strings from length 1 to N that your function could generate would be the sum of a geometric sequence:
(1 - (Z ^ (N + 1))) / (1 - Z)
Let's call this value S. Then the probability of generating a string of any length L should be:
(Z ^ L) / S
OK, fine. This is all well and good; but how do we generate a random number given a non-uniform probability distribution?
The short answer is: you don't. Get a library to do that for you. I develop mainly in .NET, so one I might turn to would be Math.NET.
That said, it's really not so hard to come up with a rudimentary approach to doing this on your own.
Here's one way: take a generator that gives you a random value within a known uniform distribution, and assign ranges within that distribution of sizes dependent on your desired distribution. Then interpret the random value provided by the generator by determining which range it falls into.
Here's an example in C# of one way you could implement this idea (scroll to the bottom for example output):
RandomStringGenerator class
public class RandomStringGenerator
{
private readonly Random _random;
private readonly char[] _alphabet;
public RandomStringGenerator(string alphabet)
{
if (string.IsNullOrEmpty(alphabet))
throw new ArgumentException("alphabet");
_random = new Random();
_alphabet = alphabet.Distinct().ToArray();
}
public string NextString(int maxLength)
{
// Get a value randomly distributed between 0.0 and 1.0 --
// this is approximately what the System.Random class provides.
double value = _random.NextDouble();
// This is where the magic happens: we "translate" the above number
// to a length based on our computed probability distribution for the given
// alphabet and the desired maximum string length.
int length = GetLengthFromRandomValue(value, _alphabet.Length, maxLength);
// The rest is easy: allocate a char array of the length determined above...
char[] chars = new char[length];
// ...populate it with a bunch of random values from the alphabet...
for (int i = 0; i < length; ++i)
{
chars[i] = _alphabet[_random.Next(0, _alphabet.Length)];
}
// ...and return a newly constructed string.
return new string(chars);
}
static int GetLengthFromRandomValue(double value, int alphabetSize, int maxLength)
{
// Looping really might not be the smartest way to do this,
// but it's the most obvious way that immediately springs to my mind.
for (int length = 1; length <= maxLength; ++length)
{
Range r = GetRangeForLength(length, alphabetSize, maxLength);
if (r.Contains(value))
return length;
}
return maxLength;
}
static Range GetRangeForLength(int length, int alphabetSize, int maxLength)
{
int L = length;
int Z = alphabetSize;
int N = maxLength;
double possibleStrings = (1 - (Math.Pow(Z, N + 1)) / (1 - Z));
double stringsOfGivenLength = Math.Pow(Z, L);
double possibleSmallerStrings = (1 - Math.Pow(Z, L)) / (1 - Z);
double probabilityOfGivenLength = ((double)stringsOfGivenLength / possibleStrings);
double probabilityOfShorterLength = ((double)possibleSmallerStrings / possibleStrings);
double startPoint = probabilityOfShorterLength;
double endPoint = probabilityOfShorterLength + probabilityOfGivenLength;
return new Range(startPoint, endPoint);
}
}
Range struct
public struct Range
{
public readonly double StartPoint;
public readonly double EndPoint;
public Range(double startPoint, double endPoint)
: this()
{
this.StartPoint = startPoint;
this.EndPoint = endPoint;
}
public bool Contains(double value)
{
return this.StartPoint <= value && value <= this.EndPoint;
}
}
Test
static void Main(string[] args)
{
const int N = 5;
const string alphabet = "acegikmoqstvwy";
int Z = alphabet.Length;
var rand = new RandomStringGenerator(alphabet);
var strings = new List<string>();
for (int i = 0; i < 100000; ++i)
{
strings.Add(rand.NextString(N));
}
Console.WriteLine("First 10 results:");
for (int i = 0; i < 10; ++i)
{
Console.WriteLine(strings[i]);
}
// sanity check
double sumOfProbabilities = 0.0;
for (int i = 1; i <= N; ++i)
{
double probability = Math.Pow(Z, i) / ((1 - (Math.Pow(Z, N + 1))) / (1 - Z));
int numStrings = strings.Count(str => str.Length == i);
Console.WriteLine("# strings of length {0}: {1} (probability = {2:0.00%})", i, numStrings, probability);
sumOfProbabilities += probability;
}
Console.WriteLine("Probabilities sum to {0:0.00%}.", sumOfProbabilities);
Console.ReadLine();
}
Output:
First 10 results:
wmkyw
qqowc
ackai
tokmo
eeiyw
cakgg
vceec
qwqyq
aiomt
qkyav
# strings of length 1: 1 (probability = 0.00%)
# strings of length 2: 38 (probability = 0.03%)
# strings of length 3: 475 (probability = 0.47%)
# strings of length 4: 6633 (probability = 6.63%)
# strings of length 5: 92853 (probability = 92.86%)
Probabilities sum to 100.00%.
My idea regarding this is like:
you have 1-n length string.there 26 possible 1 length string,26*26 2 length string and so on.
you can find out the percentage of each length string of the total possible strings.for example percentage of single length string is like
((26/(TOTAL_POSSIBLE_STRINGS_OF_ALL_LENGTH))*100).
similarly you can find out the percentage of other length strings.
Mark them on a number line between 1 to 100.ie suppose percentage of single length string is 3 and double length string is 6 then number line single length string lies between 0-3 while double length string lies between 3-9 and so on.
Now take a random number between 1 to 100.find out the range in which this number lies.I mean suppose for examplethe number you have randomly chosen is 2.Now this number lies between 0-3 so go 1 length string or if the random number chosen is 7 then go for double length string.
In this fashion you can see that length of each string choosen will be proportional to the percentage of the total number of that length string contribute to the all possible strings.
Hope I am clear.
Disclaimer: I have not gone through above solution except one or two.So if it matches with some one solution it will be purely a chance.
Also,I will welcome all the advice and positive criticism and correct me if I am wrong.
Thanks and regard
Mawia
Matthieu: Your idea doesn't work because strings with blanks are still more likely to be generated. In your case, with n=4, you could have the string 'ab' generated as 'a' + 'b' + '' + '' or '' + 'a' + 'b' + '', or other combinations. Thus not all the strings have the same chance of appearing.

Ruby max integer

I need to be able to determine a systems maximum integer in Ruby. Anybody know how, or if it's possible?
FIXNUM_MAX = (2**(0.size * 8 -2) -1)
FIXNUM_MIN = -(2**(0.size * 8 -2))
Ruby automatically converts integers to a large integer class when they overflow, so there's (practically) no limit to how big they can be.
If you are looking for the machine's size, i.e. 64- or 32-bit, I found this trick at ruby-forum.com:
machine_bytes = ['foo'].pack('p').size
machine_bits = machine_bytes * 8
machine_max_signed = 2**(machine_bits-1) - 1
machine_max_unsigned = 2**machine_bits - 1
If you are looking for the size of Fixnum objects (integers small enough to store in a single machine word), you can call 0.size to get the number of bytes. I would guess it should be 4 on 32-bit builds, but I can't test that right now. Also, the largest Fixnum is apparently 2**30 - 1 (or 2**62 - 1), because one bit is used to mark it as an integer instead of an object reference.
Reading the friendly manual? Who'd want to do that?
start = Time.now
largest_known_fixnum = 1
smallest_known_bignum = nil
until smallest_known_bignum == largest_known_fixnum + 1
if smallest_known_bignum.nil?
next_number_to_try = largest_known_fixnum * 1000
else
next_number_to_try = (smallest_known_bignum + largest_known_fixnum) / 2 # Geometric mean would be more efficient, but more risky
end
if next_number_to_try <= largest_known_fixnum ||
smallest_known_bignum && next_number_to_try >= smallest_known_bignum
raise "Can't happen case"
end
case next_number_to_try
when Bignum then smallest_known_bignum = next_number_to_try
when Fixnum then largest_known_fixnum = next_number_to_try
else raise "Can't happen case"
end
end
finish = Time.now
puts "The largest fixnum is #{largest_known_fixnum}"
puts "The smallest bignum is #{smallest_known_bignum}"
puts "Calculation took #{finish - start} seconds"
In ruby Fixnums are automatically converted to Bignums.
To find the highest possible Fixnum you could do something like this:
class Fixnum
N_BYTES = [42].pack('i').size
N_BITS = N_BYTES * 8
MAX = 2 ** (N_BITS - 2) - 1
MIN = -MAX - 1
end
p(Fixnum::MAX)
Shamelessly ripped from a ruby-talk discussion. Look there for more details.
There is no maximum since Ruby 2.4, as Bignum and Fixnum got unified into Integer. see Feature #12005
> (2 << 1000).is_a? Fixnum
(irb):322: warning: constant ::Fixnum is deprecated
=> true
> 1.is_a? Bignum
(irb):314: warning: constant ::Bignum is deprecated
=> true
> (2 << 1000).class
=> Integer
There won't be any overflow, what would happen is an out of memory.
as #Jörg W Mittag pointed out: in jruby, fix num size is always 8 bytes long. This code snippet shows the truth:
fmax = ->{
if RUBY_PLATFORM == 'java'
2**63 - 1
else
2**(0.size * 8 - 2) - 1
end
}.call
p fmax.class # Fixnum
fmax = fmax + 1
p fmax.class #Bignum

Resources