All possible permutations with a condition - ruby

I'm wondering if there is an elegant way in Ruby to come up with all permutations (with repetitions) of some integers with the requirements that 1) Integers introduced must be in ascending order from left to right 2) Zero is exempt from this rule.
Below, I have a subset of the output for three digits and the integers 0,1,2,3,4,5,6,7,8,9. This is only a subset of the total answer, and specifically it is the subset which starts with 5. I've included notes on a couple of them
500 - Zero is used twice
505 - 5 is used twice. Note that 504 is not included because 5 was introduced on the left and 4 < 5
506
507
508
509
550
555
556
557
558
559
560
565 - Though 5 < 6, 5 can be used twice because 5 was introduced to the left of 6.
566
567
568
569
570
575
577
578
579
580
585
588
589
590
595
599
I need to be able to do it for arbitrarily long output lengths (not just 3, like this example), and I need to be able to do it for specific sets of integers. However, zero will always be the integer to which the ordering rule does not apply.

This would work:
class Foo
include Comparable
attr :digits
def initialize(digits)
#digits = digits.dup
end
def increment(i)
if i == -1 # [9,9] => [1,0,0]
#digits.unshift 1
else
succ = #digits[i] + 1
if succ == 10 # [8,9] => [9,0]
#digits[i] = 0
increment(i-1)
else
#digits[i] = #digits[0,i].sort.detect { |e| e >= succ } || succ
end
end
self
end
def succ
Foo.new(#digits).increment(#digits.length-1)
end
def <=>(other)
#digits <=> other.digits
end
def to_s
digits.join
end
def inspect
to_s
end
end
range = Foo.new([5,0,0])..Foo.new([5,9,9])
range.to_a
#=> [500, 505, 506, 507, 508, 509, 550, 555, 556, 557, 558, 559, 560, 565, 566, 567, 568, 569, 570, 575, 577, 578, 579, 580, 585, 588, 589, 590, 595, 599]
The main rule for incrementing a digit is:
#digits[i] = #digits[0,i].sort.detect { |e| e >= succ } || succ
This sorts the digits left to the current digit (the ones "introduced to the left") and detects the first element that's equal or larger than the successor. If none if found, the successor itself is used.

In case you need this as an executable:
#!/usr/bin/env ruby -w
def output(start, stop)
(start..stop).select do |num|
digits = num.to_s.split('').to_a
digits.map! { |d| d.to_i }
checks = []
while digit = digits.shift
next if digit == 0
next if checks.find { |d| break true if digit == d }
break false if checks.find { |d| break true if digit < d }
checks << digit
end != false
end
end
p output(*$*[0..1].map { |a| a.to_i })
$ ./test.rb 560 570
[560, 565, 566, 567, 568, 569, 570]

This is some C#/pseudocode. It definitely won't compile. The implementation is not linear, but I note where you can add a simple optimization to make it more efficient. The algorithm is quite simple, but it seems to be pretty reasonably performant (it's linear with respect to the output. I'm guessing the output grows exponentially... so this algorithm is also exponential. But with a tight constant).
// Note: I've never used BigInteger before. I don't even know what the
// APIs are. Basically you can use strings but hopefully the arbitrary
// precision arithmetic class/struct would be more efficient. You
// mentioned that you intend to add more than just 10 digits. In
// that case you pretty much have to use a string without rolling out
// your own special class. Perhaps C# has an arbitrary precision arithmetic
// which handles arbitrary base as well?
// Note: We assume that possibleDigits is sorted in increasing order. But you
// could easily sort. Also we assume that it doesn't contain 0. Again easy fix.
public List<BigInteger> GenSequences(int numDigits, List<int> possibleDigits)
{
// We have special cases to get rid of things like 000050000...
// hard to explain, but should be obvious if you look at it
// carefully
if (numDigits <= 0)
{
return new List<BigInteger>();
}
// Starts with all of the valid 1 digit (except 0)
var sequences = new Queue<BigInteger>(possibleDigits);
// Special case if numDigits == 1
if (numDigits == 1)
{
sequences.Enqueue(new BigInteger(0));
return sequences;
}
// Now the general case. We have all valid sequences of length 1
// (except 0 because no valid sequence of length greater than 1
// will start with 0)
for (int length = 1; length <= numDigits; length++)
{
// Naming is a bit weird. A 'sequence' is just a BigInteger
var sequence = sequences.Dequeue();
while (sequence.Length == length)
{
// 0 always works
var temp = sequence * 10;
sequences.Enqueue(temp);
// Now do all of the other possible last digits
var largestDigitIndex = FindLargestDigitIndex(sequence, possibleDigits);
for (int lastDigitIndex = largestDigitIndex;
lastDigitIndex < possibleDigits.Length;
lastDigitIndex++)
{
temp = sequence * 10 + possibleDigits[lastDigitIndex];
sequences.Enqueue(temp);
}
sequence = sequences.Dequeue();
}
}
}
// TODO: This is the slow part of the algorithm. Instead, keep track of
// the max digit of a given sequence Meaning 5705 => 7. Keep a 1-to-1
// mapping from sequences to largestDigitsInSequences. That's linear
// overhead in memory and reduces time complexity to linear _with respect to the
// output_. So if the output is like `O(k^n)` where `k` is the number of possible
// digits and `n` is the number of digits in the output sequences, then it's
// exponential
private int FindLargestDigitIndex(BigInteger number,
List<int> possibleDigits)
{
// Just iterate over the digits of number and find the maximum
// digit. Then return the index of that digit in the
// possibleDigits list
}
I prove why the algorithm works in the comments above (mostly, at least). It's an inductive argument. For general n > 1 you can take any possible sequence. The first n-1 digits (starting from left) must form a sequence that is also valid (by contradiction). Using induction and then checking the logic in the innermost loop we can see that our desired sequence will be output. This specific implementation you'd also need some proofs around termination and such. For example, the point of the Queue is that we want to process the sequences of length n while we are adding the sequences of length n+1 to the same Queue. The ordering of the Queue allows that innermost while loop to terminate (because we'll go through all sequences of length n before we get to the n+1 sequences).

Note: Three solutions are shown; look for the splits.
Describe a valid number, then (1..INFINITE).select{|n| valid(n)}.take(1)
So what's valid? Well, let's take some advantage here:
class Fixnum
def to_a
to_s.split('').collect{|d| d.to_i}
end
end
123.to_a == [1,2,3]
Alright, so, now: Each digit can be a digit already present or zero, or a digit greater than the prior value, and the first digit is always valid.
PS - I use i not i-1 because the loop's index is one less than set's, since I lopped the first element off.
def valid num
#Ignore zeros:
set = num.to_a.select{|d| d != 0 }
#First digit is always valid:
set[1..-1].each_with_index{ |d, i|
if d > set[i]
# puts "Increasing digit"
elsif set[0..i].include? d
# puts "Repeat digit"
else
# puts "Digit does not pass"
return false
end
}
return true
end
so then, hurrah for lazy:
(1..Float::INFINITY).lazy.select{|n| valid n}.take(100).force
#=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24,
# 25, 26, 27, 28, 29, 30, 33, 34, 35, 36, 37, 38, 39, 40, 44, 45, 46, 47, 48, 49, 50, 55,
# 56, 57, 58, 59, 60, 66, 67, 68, 69, 70, 77, 78, 79, 80, 88, 89, 90, 99, 100, 101, 102,
# 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120,
# 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 133, 134, 135, 136]
Now that we have it, let's make it succinct:
def valid2 num
set = num.to_a.select{|d| d != 0 }
set[1..-1].each_with_index{ |d, i|
return false unless (d > set[i]) || (set[0..(i)].include? d)
}
return true
end
check:
(1..Float::INFINITY).lazy.select{|n| valid n}.take(100).force - (1..Float::INFINITY).lazy.select{|n| valid2 n}.take(100).force
#=> []
all together now:
def valid num
set = num.to_s.split('').collect{|d| d.to_i}.select{|d| d != 0 }
set[1..-1].each_with_index{ |d, i|
return false unless (d > set[i]) || (set[0..(i)].include? d)
}
return true
end
Edit:
If you want a particular subset of the set, just change the range. Your original would be:
(500..1000).select{|n| valid n}
Edit2: To generate the range for a given number of digits n:
((Array.new(n-1, 0).unshift(1).join('').to_i)..(Array.new(n, 0).unshift(1).join('').to_i))
Edit3: Interesting alternative method - recursively remove digits as they become valid.
def _rvalid set
return true if set.size < 2
return false if set[1] < set[0]
return _rvalid set.select{|d| d != set[0]}
end
def rvalid num
return _rvalid num.to_s.split('').collect{|d| d.to_i}.select{|d| d != 0 }
end
(1..Float::INFINITY).lazy.select{|n| rvalid n}.take(100).force
Edit 4: Positive generation method
def _rgen set, target
return set if set.size == target
((set.max..9).to_a + set.uniq).collect{ |d|
_rgen((set + [d]), target)
}
end
def rgen target
sets = (0..9).collect{|d|
_rgen [d], target
}
# This method has an array problem that I'm not going to figure out right now
while sets.first.is_a? Array
sets = sets.flatten
end
sets.each_slice(target).to_a.collect{|set| set.join('').to_i}
end

This doesn't seem too complex. Write a refinement of a base N increment, with the change that when a digit is incremented from zero it goes straight to the largest of the digits to its left.
Update I misread the spec and my initial take on this didn't quite perform. Depending on the actual dataset the uniq.sort may be too costly, but it is fine when the items in the sequence have only a few digits. The right way would be to maintain a second, sorted copy of the digits, but I'm leaving it like this until I know it's too inefficient.
Note that the values of 0..N here are intended to be used as indices into a sorted list of the actual values each digit can take. A call to map will generate the real elements of the sequence.
This program dumps the same section of the sequence as you have shown yourself (everything beginning with five).
def inc!(seq, limit)
(seq.length-1).downto(0) do |i|
if seq[i] == limit
seq[i] = 0
else
valid = seq.first(i).uniq.sort
valid += ((valid.last || 0).next .. limit).to_a
seq[i] = valid.find { |v| v > seq[i] }
break
end
end
end
seq = Array.new(3,0)
loop do
puts seq.join if seq[0] == 5
inc!(seq, 9)
break if seq == [0,0,0]
end
output
500
505
506
507
508
509
550
555
556
557
558
559
560
565
566
567
568
569
570
575
577
578
579
580
585
588
589
590
595
599

Related

Unique random string with alphanumberic required in Ruby

I'm using the following code to generate a unique 10-character random string of [A-Z a-z 0-9] in Ruby:
random_code = [*('a'..'z'),*('0'..'9'),*('A'..'Z')].shuffle[0, 10].join
However, sometimes this random string does not contain a number or an uppercase character. Could you help me have a method that generates a unique random string that requires at least one number, one uppercase and one downcase character?
down = ('a'..'z').to_a
up = ('A'..'Z').to_a
digits = ('0'..'9').to_a
all = down + up + digits
[down.sample, up.sample, digits.sample].
concat(7.times.map { all.sample }).
shuffle.
join
#=> "TioS8TYw0F"
[Edit: The above reflects a misunderstanding of the question. I'll leave it, however. To have no characters appear more than once:
def rnd_str
down = ('a'..'z').to_a
up = ('A'..'Z').to_a
digits = ('0'..'9').to_a
[extract1(down), extract1(up), extract1(digits)].
concat(((down+up+digits).sample(7))).shuffle.join
end
def extract1(arr)
i = arr.size.times.to_a.sample
c = arr[i]
arr.delete_at(i)
c
end
rnd_str #=> "YTLe0WGoa1"
rnd_str #=> "NrBmAnE9bT"
down.sample.shift (etc.) would have been more compact than extract1, but the inefficiency was just too much to bear.
If you do not want to repeat random strings, simply keep a list of the ones you generate. If you generate another that is in the list, discard it and generate another. It's pretty unlikely you'll have to generate any extra ones, however. If, for example, you generate 100 random strings (satisfying the requirement of at least one lowercase letter, uppercase letter and digit), the chances that there will be one or more duplicate strings is about one in 700,000:
t = 107_518_933_731
n = t+1
t = t.to_f
(1.0 - 100.times.reduce(1.0) { |prod,_| prod * (n -= 1)/t }).round(10)
#=> 1.39e-07
where t = C(62,10) and C(62,10) is defined below.
An alternative
There is a really simple way to do this that turns out to be pretty efficient: just sample without replacement until a sample is found that meets the requirement of at least lowercase letter, one uppercase letter and one digit. We can do that as follows:
DOWN = ('a'..'z').to_a
UP = ('A'..'Z').to_a
DIGITS = ('0'..'9').to_a
ALL = DOWN + UP + DIGITS
def rnd_str
loop do
arr = ALL.sample(10)
break arr.shuffle.join unless (DOWN&&arr).empty? || (UP&&arr).empty? ||
(DIGITS&&arr).empty?
end
end
rnd_str #=> "3jRkHcP7Ge"
rnd_str #=> "B0s81x4Jto
How many samples must we reject, on average, before finding a "good" one? It turns out (see below if you are really, really interested) that the probability of getting a "bad" string (i.e, selecting 10 characters at random from the 62 elements of all, without replacement, that has no lowercase letters, no uppercase letters or no digits, is only about 0.15. (15%). That means that 85% of the time no bad samples will be rejected before a good one is found.
It turns out that the expected number of bad strings that will be sampled, before a good string is sampled, is:
0.15/0.85 =~ 0.17
The following shows how the above probability was derived, should anyone be interested.
Let n_down be the number of ways a sample of 10 can be drawn that has no lowercase letters:
n_down = C(36,10) = 36!/(10!*(36-10)!)
where (the binomial coefficient) C(36,10) equals the number of combinations of 36 "things" that can be "taken" 10 at a time, and equals:
C(36,10) = 36!/(10!*(36-10)!) #=> 254_186_856
Similarly,
n_up = n_down #=> 254_186_856
and
n_digits = C(52,10) #=> 15_820_024_220
We can add these three numbers together to obtain:
n_down + n_up + n_digits #=> 16_328_397_932
This is almost, but not quite, the number of ways to draw 10 characters, without replacement, that contains no lowercase letters characters, uppercase letters or digits. "Not quite" because there is a bit of double-counting going on. The necessary adjustment is as follows:
n_down + n_up + n_digits - 2*C(26,10) - 3
#=> 16_317_774_459
To obtain the probability of drawing a sample of 10 from a population of 62, without replacement, that has no lowercase letter, no uppercase letter or no digit, we divide this number by the total number of ways 10 characters can be drawn from 62 without replacement:
(16_317_774_459.0/c(62,10)).round(2)
#=> 0.15
If you want a script to generate just some small amount of tokens (like 2, 5, 10, 100, 1000, 10 000, etc), then the best way would be to simply keep the already generated tokens in memory and retry until a new one is generated (statistically speaking, this wont take long). If this is not the case - keep reading.
After thinking about it, this problem turned out to be in fact very interenting. For brievety, I will not mention the requirement to have at least one number, capital and lower case letters, but it will be included in the final solution. Also let all = [*'1'..'9', *'a'..'z', *'A'..'Z'].
To sum it up, we want to generate k-permutations of n elements with repetition randomly with uniqueness constraint.
k = 10, n = 61 (all.size)
Ruby just so happens to have such method, it's Array#repeated_permutation. So everything is great, we can just use:
all.repeated_permutation(10).to_a.map(&join).shuffle
and pop the resulting strings one by one, right? Wrong! The problem is that the amount of possibilities happens to be:
k^n = 10000000000000000000000000000000000000000000000000000000000000 (10**61).
Even if you had an infinetelly fast processor, you still can't hold such amount of data, no matter if this was the count of complex objects or simple bits.
The opposite would be to generate random permutations, keep the already generated in a set and make checks for inclusion before returning the next element. This is just delaying the innevitable - not only you would still have to hold the same amount of information at some point, but as the number of generated permutations grows, the number of tries required to generate a new permutation diverges to infinity.
As you might have thought, the root of the problem is that randomness and uniqueness hardly go hand to hand.
Firstly, we would have to define what we would consider as random. Judging by the amount of nerdy comics on the subject, you could deduce that this isn't that black and white either.
An intuitive definition for a random program would be one that doesn't generate the tokens in the same order with each execution. Great, so now we can just take the first n permutations (where n = rand(100)), put them at the end and enumerate everything in order? You can sense where this is going. In order for a random generation to be considered good, the generated outputs of consecutive runs should be equaly distributed. In simpler terms, the probability of getting any possible output should be equal to 1 / #__all_possible_outputs__.
Now lets explore the boundaries of our problem a little:
The number of possible k-permutations of n elements without repetition is:
n!/(n-k)! = 327_234_915_316_108_800 ((61 - 10 + 1).upto(61).reduce(:*))
Still out of reach. Same goes for
The number of possible full permutations of n elements without repetition:
n! = 507_580_213_877_224_798_800_856_812_176_625_227_226_004_528_988_036_003_099_405_939_480_985_600_000_000_000_000 (1.upto(61).reduce(:*))
The number of possible k-combinations of n elements without repetition:
n!/k!(n-k)! = 90_177_170_226 ((61 - 10 + 1).upto(61).reduce(:*)/1.upto(10).reduce(:*))
Finally, where we might have a break through with full permutation of k elements without repetition:
k! = 3_628_800 (1.upto(10).reduce(:*))
~3.5m isn't nothing, but at least it's reasonably computable. On my personal laptop k_permutations = 0.upto(9).to_a.permutation.to_a took 2.008337 seconds on average. Generally, as computing time goes, this is a lot. However, assuming that you will be running this on an actual server and only once per application startup, this is nothing. In fact, it would even be reasonable to create some seeds. A single k_permutations.shuffle took 0.154134 seconds, therefore in about a minute we can acquire 61 random permutations: k_seeds = 61.times.map { k_permutations.shuffle }.to_a.
Now lets try to convert the problem of k-permutations of n elements without repetition to solving multiple times full k-permutations without repetitions.
A cool trick for generating permutations is using numbers and bitmaps. The idea is to generate all numbers from 0 to 2^61 - 1 and look at the bits. If there is a 1 on position i, we will use the all[i] element, otherwise we will skip it. We still didn't escape the issue as 2^61 = 2305843009213693952 (2**61) which we can't keep in memory.
Fortunatelly, another cool trick comes to the rescue, this time from number theory.
Any m consecutive numbers raised to the power of a prime number by modulo of m give the numbers from 0 to m - 1
In other words:
5.upto(65).map { |number| number**17 % 61 }.sort # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60]
5.upto(65).map { |number| number**17 % 61 } # => [36, 31, 51, 28, 20, 59, 11, 22, 47, 48, 42, 12, 54, 26, 5, 34, 29, 57, 24, 53, 15, 55, 3, 38, 21, 18, 43, 40, 23, 58, 6, 46, 8, 37, 4, 32, 27, 56, 35, 7, 49, 19, 13, 14, 39, 50, 2, 41, 33, 10, 30, 25, 16, 9, 17, 60, 0, 1, 44, 52, 45]
Now actually, how random is that? As it turns out - the more common divisors shared by m and the selected m numbers, the less evenly distributed the sequence is. But we are at luck here - 61^2 - 1 is a prime number (also called Mersenne prime). Therefore, the only divisors it can share are 1 and 61^2 - 1. This means that no matter what power we choose, the positions of the numbers 0 and 1 will be fixed. That is not perfect, but the other 61^2 - 3 numbers can be found at any position. And guess what - we don't care about 0 and 1 anyway, because they don't have 10 1s in their binary representation!
Unfortunatelly, a bottleneck for our randomness is the fact that the bigger prime number we want to generate, the harder it gets. This is the best I can come up with when it comes to generating all the numbers in a range in a shuffled order, without keeping them in memory simultaneously.
So to put everything in use:
We generate seeds of full permutations of 10 elements.
We generate a random prime number.
We randomly choose if we want to generate permutations for the next number in the sequence or a number that we already started (up to a finite number of started numbers).
We use bitmaps of the generated numbers to get said permutations.
Note that this will solve only the problem of k-permutations of n elements without repetition. I still haven't thought of a way to add repetition.
DISCLAIMER: The following code comes with no guarantees of any kind, explicit or implied. Its point is to further express the author's ideas, rather than be a production ready solution:
require 'prime'
class TokenGenerator
NUMBERS_UPPER_BOUND = 2**61 - 1
HAS_NUMBER_MASK = ('1' * 9 + '0' * (61 - 9)).reverse.to_i(2)
HAS_LOWER_CASE_MASK = ('0' * 9 + '1' * 26 + '0' * 26).reverse.to_i(2)
HAS_UPPER_CASE_MASK = ('0' * (9 + 26) + '1' * 26).reverse.to_i(2)
ALL_CHARACTERS = [*'1'..'9', *'a'..'z', *'A'..'Z']
K_PERMUTATIONS = 0.upto(9).to_a.permutation.to_a # give it a couple of seconds
def initialize
random_prime = Prime.take(10_000).drop(100).sample
#all_numbers_generator = 1.upto(NUMBERS_UPPER_BOUND).lazy.map do |number|
number**random_prime % NUMBERS_UPPER_BOUND
end.select do |number|
!(number & HAS_NUMBER_MASK).zero? and
!(number & HAS_LOWER_CASE_MASK).zero? and
!(number & HAS_UPPER_CASE_MASK).zero? and
number.to_s(2).chars.count('1') == 10
end
#k_permutation_seeds = 61.times.map { K_PERMUTATIONS.shuffle }.to_a # this will take a minute
#numbers_in_iteration = {go_fish: nil}
end
def next
raise StopIteration if #numbers_in_iteration.empty?
number_generator = #numbers_in_iteration.keys.sample
if number_generator == :go_fish
add_next_number if #numbers_in_iteration.size < 1_000_000
self.next
else
next_permutation(number_generator)
end
end
private
def add_next_number
#numbers_in_iteration[#all_numbers_generator.next] = #k_permutation_seeds.sample.to_enum
rescue StopIteration # lol, you actually managed to traverse all 2^61 numbers!
#numbers_in_iteration.delete(:go_fish)
end
def next_permutation(number)
fetch_permutation(number, #numbers_in_iteration[number].next)
rescue StopIteration # all k permutations for this number were already generated
#numbers_in_iteration.delete(number)
self.next
end
def fetch_permutation(number_mask, k_permutation)
k_from_n_indices = number_mask.to_s(2).chars.reverse.map.with_index { |bit, index| index if bit == '1' }.compact
k_permutation.each_with_object([]) { |order_index, k_from_n_values| k_from_n_values << ALL_CHARACTERS[k_from_n_indices[order_index]] }
end
end
EDIT: it turned out that our constraints eliminate too much possibilities. This causes #all_numbers_generator to take too much time testing and skipping numbers. I will try to think of a better generator, but everything else remains valid.
The old version that generates tokens with uniqueness constraint on the containing characters:
numbers = ('0'..'9').to_a
downcase_letters = ('a'..'z').to_a
upcase_letters = downcase_letters.map(&:upcase)
all = [numbers, downcase_letters, upcase_letters]
one_of_each_set = all.map(&:sample)
random_code = (one_of_each_set + (all.flatten - one_of_each_set).sample(7)).shuffle.join
Use 'SafeRandom' Gem GithubLink
It will provide the easiest way to generate random values for Rails 2, Rails 3, Rails 4, Rails 5 compatible.
Here you can use the strong_string method to generate a strong combination of string ( ie combination of the alphabet(uppercase, downcase), number, and symbols
# => Strong string: Minumum number should be greater than 5 otherwise by default 8 character string.
require 'safe_random'
puts SafeRandom.strong_string # => 4skgSy93zaCUZZCoF9WiJF4z3IDCGk%Y
puts SafeRandom.strong_string(3) # => P4eUbcK%
puts SafeRandom.strong_string(5) # => 5$Rkdo

General algorithm: random sort array, so the distance between objects is maxed

Given an array of random not-unique numbers
[221,44,12,334,63,842,112,12]
What would be the best approach to random sort the values, but also try to max the distance |A-B| to the neighbouring number
You could try a suboptimal greedy algorithm:
1. sortedArr <- sort input array
2. resultArr <- initialize empty array
3. for i=0 to size of sortedArr
a. if i%2
I. resultArr[i] = sortedArr[i/2]
b. else
II. resultArr[i] = sortedArr[sortedArr.size - (i+1)/2]
This puts numbers in the result alternating from the left and right of the sorted input. For example if the sorted input is:
12, 12, 44, 63, 112, 221, 334, 842
Then the output would be:
12, 842, 12, 334, 44, 221, 63, 112
This might not be optimal but it probably gets pretty close and works in O(nlogn). On your example the optimal is obtained by:
63, 221, 12, 334, 12, 842, 44, 112
Which yields a sum of 2707. My algorithm yields a sum of 2656. I'm pretty sure that you won't be able to find the optimal in polynomial time.
A brute force solution in Python would look like:
import itertools
maxSum = 0
maxl = []
for l in itertools.permutations([221,44,12,334,63,842,112,12]):
sum = 0
for i in range(len(l)-1):
sum += abs(l[i]-l[i+1])
if sum > maxSum:
print maxSum
print maxl
maxSum = sum
maxl = l
print maxSum
print maxl

Ruby Project Euler 35-Circular Primes Wrong Answer

I tried Project Euler question 35 in Ruby (I am quite new to it) and got the wrong answer.
The problem:
The number, 197, is called a circular prime because all rotations of
the digits: 197, 971, and 719, are themselves prime.
There are thirteen such primes below 100: 2, 3, 5, 7, 11, 13, 17, 31,
37, 71, 73, 79, and 97.
How many circular primes are there below one million?
My code:
require 'prime'
def primes(n)
nums = [nil, nil, *2..n]
(2..Math.sqrt(n)).each do |i|
(i**2..n).step(i) { |m| nums[m] = nil } if nums[i]
end
nums.compact
end
prime = primes(1000000)
circularPrimes = 0
prime.each do |j|
puts "\n"
puts j
flag = false
j = j.to_s()
for k in (0..j.length)
temp = j[k..-1] + j[0..k]
temp = temp.to_i()
a = Prime.prime?(temp)
if a == false then
flag = true
break
end
end
if flag == false then
circularPrimes += 1
end
end
puts"\n\n\n\n\n"
puts circularPrimes
I can't figure out the problem in the code (which I think is fine).
As Patru mentioned, your rotation was not right. I am not sure about your primes method either, though I did not try to fix it. Since you are not opposed to using the Prime class, I instead used that for a solution which is easier on the eyes, and correct as far as I can tell. Though it seems to perform quite badly, perhaps it can be optimized. It will return an answer for 1_000_000, but it takes about 70 seconds which seems awfully long.
I suppose instead of going through all numbers I should at least skip all rotations I already processed and determined were circular prime or not. Anyway, now you'll have some optimizing to do.
require 'prime'
def circular_prime?(n)
rotations(n).all? { |r| Prime.prime? r }
end
def rotations(n)
str = n.to_s
(0...str.length).map do |i|
(str[i..-1] + str[0...i]).to_i
end
end
(2 .. 100).select { |n| circular_prime?(n) }
# => [2, 3, 5, 7, 11, 13, 17, 31, 37, 71, 73, 79, 97]
Incorporating your primes method, you can change the circular prime generation to
primes(1_000_000).select { |prime| circular_prime? prime }
The behavior is equivalent to your code in that it first selects all primes up to a million and then selects the circular primes from there. A slight optimization would be to remove the original number from the rotations to be checked, since we already know it is prime.
The single timing I did yielded 50 seconds for this variant so this at least seems faster than my original (~70 sec), which is not really surprising since I went through all rotations of all numbers between 2 and 1 million, whereas by first selecting primes the input to rotations is significantly reduced.
I think your rotation is off by 1, trying
j="123456"
j[1..-1] + j[0..1] # that is k=1 from the above code
yields
"2345612"
which would not be a rotation. Could be fixed through
temp = j[k..-1] + j[0...k]
You are re-inventing two things:
generating primes upto n (use Prime.each(n))
rotating the digits (use Array#rotate!)
Probably the problem is somewhere in there. I would do it this way:
require 'prime'
def rotations(x)
digits = x.to_s.chars
digits.map do
digits.rotate!.join.to_i
end
end
circular_primes = Prime.each(1_000_000).select do |p|
rotations(p).all?{|r| Prime.prime?(r) }
end
puts circular_primes.count

Determing if two lists contain the same numeric items without sorting

I have two lists and I need to determine if they contain the same values without sorting (ie. order of values is irrelevant). I know sorting the would work, but this is part of a performance critical section.
Item values fall within the range [-2, 63] and we're always comparing equal size lists, but the list sizes range from [1, 8].
Example lists:
A = (0, 0, 4, 23, 10)
B = (23, 10, 0, 4, 0)
C = (0, 0, 4, 27, 10)
A == B is true
A == C is false
I think a possible solution would be to compare the product of the two lists (multiply all values together), but there are problems with this solution. What to do with zero and negative numbers. A workaround would be adding 4 to every value before multiplying. Here's the code I have so far.
bool equal(int A[], int B[], int size)
{
int sumA = 1;
int sumB = 1;
for (int i = 0; i < size; i++) {
sumA *= A[i] + 4;
sumB *= B[i] + 4;
}
return (sumA == sumB)
}
But, would this always work no matter what the order/contents of the list were? In other words is the following mathematically true? So what I'm really asking is the following (unless there's another way to solve the problem):
Given 2 equal sized lists. If the products (multiplying all values together) of the lists are equal then the lists contain the same values, so long as the values are integers greater than 0.
Assuming you know the range ahead of time, you can use a variation on counting sort. Just scan through each array and keep track of how many times each integer occurs.
Procedure Compare-Lists(A, B, min, max)
domain := max - min
Count := new int[domain]
for i in A:
Count[i - min] += 1
for i in B:
Count[i - min] -= 1
if Count[i - min] < 0:
// Something was in B but not A
return "Different"
for i in A:
if Count[i - min] > 0:
// Something was in A but not B
return "Different"
return "Same"
This is linear in O(len(A) + len(B))
You could do this with primes. Keep a prime table for the first 66 primes and use the elements of your arrays (offset by +2) to index into the prime table.
The identity of an array is then just the product of the primes represented by the elements in the array.
Unfortunately, the product must be represented with at least 67 bits:
The 66th prime is 317, and 3178 = 101,970,394,089,246,452,641
log2(101,970,394,089,246,452,641) = 66.47 (rounded up) is 67 bits
Example pseudocode for doing this (assuming the existence of an int128 data type):
int primes[] =
{
2, 3, 5, 7, 11, 13, 17, 19, 23, 29,
31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97, 101, 103, 107, 109, 113,
127, 131, 137, 139, 149, 151, 157, 163, 167, 173,
179, 181, 191, 193, 197, 199, 211, 223, 227, 229,
233, 239, 241, 251, 257, 263, 269, 271, 277, 281,
283, 293, 307, 311, 313, 317
};
// Assumes:
// Each xs[i] is [-2, 63]
// length is [1, 8]
int128 identity(int xs[], int length)
{
int128 product = 1;
for (int i = 0; i < length; ++i)
{
product *= primes[xs[i] + 2];
}
return product;
}
bool equal(int a[], int b[], int size)
{
return identity(a, size) == identity(b, size);
}
You might be able to use a long double on GCC to store the product since it is defined as an 80-bit data type, but I'm not sure if the floating-point multiplication error would cause collisions between lists. I haven't verified this.
My previous solution below does not work, see the comments below.
For each list:
Compute the sum of all elements
Compute the product of all elements
Store the length of the list (in your case, since the length is guaranteed to be the same for two lists, you can ignore it entirely)
As you compute the sum and product, each element needs to be adjusted by +3, so your range is now [1, 66].
The (sum, product, length) tuple is the identity for your list. Any lists with the same identity are equal.
You can fit this (sum, product, length) tuple into a single 64-bit number:
For the product: 668 = 360,040,606,269,696, log2(360,040,606,269,696) = 48.36 (rounded up) is 49 bits
For the sum: 66 * 8 = 528, log2(528) = 9.04 (rounded up) is 10 bits
Length is in the range [1, 8], log2(8) = 3 bits
49 + 10 + 3 = 62 bits for representing the identity
Then, you can do direct 64-bit comparisons to determine equality.
Running-time is linear in the size of the arrays with a single pass over each. Memory usage is O(1).
Example code:
#include <cstdint>
#include <stdlib.h>
// Assumes:
// Each xs[i] is [-2, 63]
// length is [1, 8]
uint64_t identity(int xs[], int length)
{
uint64_t product = 1;
uint64_t sum = 0;
for (int i = 0; i < length; ++i)
{
int element = xs[i] + 3;
product *= element;
sum += element;
}
return (uint64_t)length << 59 | (sum << 49) | product;
}
bool equal(int a[], int b[], int size)
{
return identity(a, size) == identity(b, size);
}
void main()
{
int a[] = { 23, 0, -2, 6, 3, 23, -1 };
int b[] = { 0, -1, 6, 23, 23, -2, 3 };
printf("%d\n", equal(a, b, _countof(a)));
}
Since you only have 66 possible numbers, you can create a bit vector (3 32-bit words or 2 64-bit words) and compare those. You can do it all with just shifts and adds. Since there are no comparisons required until the end (to find out if they are equal), it can run fast because there won't be many branches.
Make a copy of the first list. Then loop through the second and as you do remove each item from the copy. If you get all the way through the second list and found all elements in the copy, then the lists have the same elements. This is a lot of looping, but with only max 8 elements in the list, you won't get a performance gain by using a different type of collection.
If you had a lot more items, then has a Dictionary/Hashtable for the copy. Keep a unique key of values with a count of how many times they were found in the first list. This will give you a performance boost on larger lists.
Given 2 equal sized lists. If the products (multiplying all values together) of the lists are equal then the lists contain the same values, so long as the values are integers greater than 0.
No. Consider the following lists
(9, 9)
(3, 27)
They are the same size and the product of the elements are the same.
How fast do you need to process 8 integers? Sorting 8 things in any modern processor is going to take almost no time.
The easy thing is to just use an array of size 66 where index 0 represents value -2. Then you just increment counts across both arrays, and then you just iterate across them afterwards.
If your list has only 8 items then sorting is hardly a performance hit. If you want to do this without sorting you can do so using a hashmap.
Iterate over the first array and for each value N in the array Hash(N) = 1.
Iterate over the second array and for each value M, Hash(M) = Hash(M) + 1.
Iterate over the hash and find all keys K for which Hash(K) = 2.

Non-repeating pseudo random number stream with 'clumping'

I'm looking for a method to generate a pseudorandom stream with a somewhat odd property - I want clumps of nearby numbers.
The tricky part is, I can only keep a limited amount of state no matter how large the range is. There are algorithms that give a sequence of results with minimal state (linear congruence?)
Clumping means that there's a higher probability that the next number will be close rather than far.
Example of a desirable sequence (mod 10): 1 3 9 8 2 7 5 6 4
I suspect this would be more obvious with a larger stream, but difficult to enter by hand.
Update:
I don't understand why it's impossible, but yes, I am looking for, as Welbog summarized:
Non-repeating
Non-Tracking
"Clumped"
Cascade a few LFSRs with periods smaller than you need, combining them to get a result such than the fastest changing register controls the least significant values. So if you have L1 with period 3, L2 with period 15 and L3 with some larger period, N = L1(n) + 3 * L2(n/3) + 45 * L3(n/45). This will obviously generate 3 clumped values, then jump and general another 3 clumped values. Use something other than multiplication ( such as mixing some of the bits of the higher period registers ) or different periods to make the clump spread wider than the period of the first register. It won't be particularly smoothly random, but it will be clumpy and non-repeating.
For the record, I'm in the "non-repeating, non-random, non-tracking is a lethal combination" camp, and I hope some simple though experiments will shed some light. This is not formal proof by any means. Perhaps someone will shore it up.
So, I can generate a sequence that has some randomness easily:
Given x_i, x_(i+1) ~ U(x_i, r), where r > x_i.
For example:
if x_i = 6, x_(i+1) is random choice from (6+epsilon, some_other_real>6). This guarantees non-repeating, but at the cost that the distribution is monotonically increasing.
Without some condition (like monotonicity), inherent to the sequence of generated numbers themselves, how else can you guarantee uniqueness without carrying state?
Edit: So after researching RBarryYoung's claim of "Linear Congruential Generators" (not differentiators... is this what RBY meant), and clearly, I was wrong! These sequences exist, and by necessity, any PRNG whose next number is dependent only on the current number and some global, non changing state can't have repeats within a cycle (after some initial burn it period).
By defining the "clumping features" in terms of the probability distribution of its size, and the probability distribution of its range, you can then use simple random generators with the underlying distribution and produce the sequences.
One way to get "clumpy" numbers would be to use a normal distribution.
You start the random list with your "initial" random value, then you generate a random number with the mean of the previous random value and a constant variance, and repeat as necessary. The overall variance of your entire list of random numbers should be approximately constant, but the "running average" of your numbers will drift randomly with no particular bias.
>>> r = [1]
>>> for x in range(20):
r.append(random.normalvariate(r[-1], 1))
>>> r
[1, 0.84583267252801408, 0.18585962715584259, 0.063850022580489857, 1.2892164299497422,
0.019381814281494991, 0.16043424295472472, 0.78446377124854461, 0.064401889591144235,
0.91845494342245126, 0.20196939102054179, -1.6521524237203531, -1.5373703928440983,
-2.1442902977248215, 0.27655425357702956, 0.44417440706703393, 1.3128647361934616,
2.7402744740729705, 5.1420432435119352, 5.9326297626477125, 5.1547981880261782]
I know it's hard to tell by looking at the numbers, but you can sort of see that the numbers clump together a little bit - the 5.X's at the end, and the 0.X's on the second row.
If you need only integers, you can just use a very large mean and variance, and truncate/divide to obtain integer output. Normal distributions by definition are a continuous distribution, meaning all real numbers are potential output - it is not restricted to integers.
Here's a quick scatter plot in Excel of 200 numbers generated this way (starting at 0, constant variance of 1):
scatter data http://img178.imageshack.us/img178/8677/48855312.png
Ah, I just read that you want non-repeating numbers. No guarantee of that in a normal distribution, so you might have to take into account some of the other approaches others have mentioned.
I don't know of an existing algorithm that would do this, but it doesn't seem difficult to roll your own (depending on how stringent the "limited amount of state" requirement is). For example:
RANGE = (1..1000)
CLUMP_ODDS = .5
CLUMP_DIST = 10
last = rand(RANGE)
while still_want_numbers
if rand(CLUMP_ODDS) # clump!
next = last + rand(CLUMP_DIST) - (CLUMP_DIST / 2) # do some boundary checking here
else # don't clump!
next = rand(RANGE)
end
print next
last = next
end
It's a little rudimentary, but would something like that suit your needs?
In the range [0, 10] the following should give a uniform distribution. random() yields a (pseudo) random number r with 0 <= r < 1.
x(n + 1) = (x(n) + 5 * (2 * random() - 1)) mod 10
You can get your desired behavior by delinearizing random() - for example random()^k will be skewed towards small numbers for k > 1. An possible function could be the following, but you will have to try some exponents to find your desired distribution. And keep the exponent odd, if you use the following function... ;)
x(n + 1) = (x(n) + 5 * (2 * random() - 1)^3) mod 10
How about (psuedo code)
// clumpiness static in that value retained between calls
static float clumpiness = 0.0f; // from 0 to 1.0
method getNextvalue(int lastValue)
float r = rand(); // float from 0 to 1
int change = MAXCHANGE * (r - 0.5) * (1 - clumpiness);
clumpiness += 0.1 * rand() ;
if (clumpiness >= 1.0) clumpiness -= 1.0;
// -----------------------------------------
return Round(lastValue + change);
Perhaps you could generate a random sequence, and then do some strategic element swapping to get the desired property.
For example, if you find 3 values a,b,c in the sequence such that a>b and a>c, then with some probability you could swap elements a and b or elements a and c.
EDIT in response to comment:
Yes, you could have a buffer on the stream that is whatever size you are comfortable with. Your swapping rules could be deterministic, or based on another known, reproducible psuedo-random sequence.
Does a sequence like 0, 94, 5, 1, 3, 4, 14, 8, 10, 9, 11, 6, 12, 7, 16, 15, 17, 19, 22, 21, 20, 13, 18, 25, 24, 26, 29, 28, 31, 23, 36, 27, 42, 41, 30, 33, 34, 37, 35, 32, 39, 47, 44, 46, 40, 38, 50, 43, 45, 48, 52, 49, 55, 54, 57, 56, 64, 51, 60, 53, 59, 62, 61, 69, 68, 63, 58, 65, 71, 70, 66, 73, 67, 72, 79, 74, 81, 77, 76, 75, 78, 83, 82, 85, 80, 87, 84, 90, 89, 86, 96, 93, 98, 88, 92, 99, 95, 97, 2, 91 (mod 100) look good to you?
This is the output of a small ruby program (explanations below):
#!/usr/bin/env ruby
require 'digest/md5'
$seed = 'Kind of a password'
$n = 100 # size of sequence
$k = 10 # mixing factor (higher means less clumping)
def pseudo_random_bit(k, n)
Digest::MD5.hexdigest($seed + "#{k}|#{n}")[-1] & 1
end
def sequence(x)
h = $n/2
$k.times do |k|
# maybe exchange 1st with 2nd, 3rd with 4th, etc
x ^= pseudo_random_bit(k, x >> 1) if x < 2*h
# maybe exchange 1st with last
if [0, $n-1].include? x
x ^= ($n-1)*pseudo_random_bit(k, 2*h)
end
# move 1st to end
x = (x - 1) % $n
# maybe exchange 1st with 2nd, 3rd with 4th, etc
# (corresponds to 2nd with 3rd, 4th with 5th, etc)
x ^= pseudo_random_bit(k, h+(x >> 1)) if x < 2*(($n-1)/2)
# move 1st to front
x = (x + 1) % $n
end
x
end
puts (0..99).map {|x| sequence(x)}.join(', ')
The idea is basically to start with the sequence 0..n-1 and disturb the order by passing k times over the sequence (more passes means less clumping). In each pass one first looks at the pairs of numbers at positions 0 and 1, 2 and 3, 4 and 5 etc (general: 2i and 2i+1) and flips a coin for each pair. Heads (=1) means exchange the numbers in the pair, tails (=0) means don't exchange them. Then one does the same for the pairs at positions 1 and 2, 3 and 4, etc (general: 2i+1 and 2i+2). As you mentioned that your sequence is mod 10, I additionally exchanged positions 0 and n-1 if the coin for this pair dictates it.
A single number x can be mapped modulo n after k passes to any number of the interval [x-k, x+k] and is approximately binomial distributed around x. Pairs (x, x+1) of numbers are not independently modified.
As pseudo-random generator I used only the last of the 128 output bits of the hash function MD5, choose whatever function you want instead. Thanks to the clumping one won't get a "secure" (= unpredictable) random sequence.
Maybe you can chain together 2 or more LCGs in a similar manner described for the LSFRs described here. Incement the least-significant LCG with its seed, on full-cycle, increment the next LCG. You only need to store a seed for each LCG. You could then weight each part and sum the parts together. To avoid repititions in the 'clumped' LstSig part you can randomly reseed the LCG on each full cycle.

Resources