Linq list subset removal

Linq list subset removal - linq

I have a collection that contains items of a collection of integers. What I want to do is remove from the top level list the items that are an extended subset of other items.
See the following list as an example:
Item 1: 42, 40, 38, 32, 50, 28, 30, 51, 1
Item 2: 42, 38, 32, 50, 28, 30, 51, 1
Item 3: 42, 50, 28, 30, 51, 1
Item 4: 42, 51, 1
When I execute the code all items except item 4 should be removed from the list because there are extensions of Item 4.
The code below works and does the job correctly but is taking a bit longer than I would expect. Note that I have lots of items in the collection.
Can I use Linq or Collections.Set to achieve the same result faster?
Code currently used:
Public Sub RemoveExtended()
If Me.Count < 1 Then Exit Sub
Dim endTime As DateTime
Dim start As DateTime
Debug.Print("Processing:" & Me.Count - 1.ToString)
start = Now
For shortestIndex As Integer = 0 To Me.Count - 1
For index As Integer = Me.Count - 1 To shortestIndex + 1 Step -1
If ContainsAll(Me(shortestIndex), Me(index)) Then
Me.RemoveAt(index)
End If
Next
Next
endTime = Now
Debug.Print("removing time:" & endTime.Subtract(start).ToString)
Debug.Print("result :" & Me.Count)
End Sub
Private Function ContainsAll(ByVal shortest As Generic.List(Of Integer), ByVal current As Generic.List(Of Integer)) As Boolean
'slower
'Return shortest.All(Function(x) current.Contains(x))
For Each Item As Integer In shortest
If Not current.Contains(Item) Then
Return False
End If
Next
Return True
End Function

You can try to change ContainsAll() with LINQ to check for subset collection :
If Not Me(shortestIndex).Except(Me(index)).Any() Then
Me.RemoveAt(index)
End If

Related

Function which increases fast and slows down reaching predefined maximum

I am creating a count up algorithm where I increase the number with bigger increments and then the increments get smaller over time, ideally reaching zero or 1. The final sum value should be predefined. The number of steps should be an input parameter and can vary. It seems like it is a logarithmic function with a maximum value. However, the logarithmic function grows until infinity.
The best I've found is square root of logarithm:
val log = (1..10).map { Math.sqrt(Math.log(it * 1.0)) }
val max = log.max()
val rounded = log.map { it * 1000 / max!! }.map { Math.round(it) }
rounded.forEachIndexed { i, l ->
if (i + 1 < rounded.size)
println("${rounded[i + 1] - rounded[i]}")
}
However, i still do not get to very small increments in the end.
If range is from zero to 10:
549, 142, 85, 60, 46, 37, 31, 27, 23
If the range is 20:
481, 125, 74, 53, 40, 33, 27, 23, 21, 18, 16, 14, 14, 12, 11, 10, 10, 9, 9
What algorthm to use to get to 1 in the end?
Update:
Based on Patricks formula I made this solution:
` val N = 18981.0
val log = (1..50).map { N - N/it }
val max = log.max()
log.map { print("$it, ") }
val rounded = log.map { it * N / max!! }.map { Math.round(it) }`
It is important that N is Double and not the whole number

Square root of the logarithm also grows to infinity. Try
f(n) = N - N/n
This has the value 0 at n = 1 and tends towards N as n grows without bound. If you need finer granularity, add some coefficients and play around with them until you get something reasonable. For instance, you can use [1 + (n/1000)] and get similar but much slower growth. You can also use exp(-x) or any function with a horizontal asymptote and get similar behavior.
f(n) = N - exp(-n)
Again, add some coefficients and see how the function changes

Unique random string with alphanumberic required in Ruby

I'm using the following code to generate a unique 10-character random string of [A-Z a-z 0-9] in Ruby:
random_code = [*('a'..'z'),*('0'..'9'),*('A'..'Z')].shuffle[0, 10].join
However, sometimes this random string does not contain a number or an uppercase character. Could you help me have a method that generates a unique random string that requires at least one number, one uppercase and one downcase character?

down = ('a'..'z').to_a
up = ('A'..'Z').to_a
digits = ('0'..'9').to_a
all = down + up + digits
[down.sample, up.sample, digits.sample].
concat(7.times.map { all.sample }).
shuffle.
join
#=> "TioS8TYw0F"
[Edit: The above reflects a misunderstanding of the question. I'll leave it, however. To have no characters appear more than once:
def rnd_str
down = ('a'..'z').to_a
up = ('A'..'Z').to_a
digits = ('0'..'9').to_a
[extract1(down), extract1(up), extract1(digits)].
concat(((down+up+digits).sample(7))).shuffle.join
end
def extract1(arr)
i = arr.size.times.to_a.sample
c = arr[i]
arr.delete_at(i)
c
end
rnd_str #=> "YTLe0WGoa1"
rnd_str #=> "NrBmAnE9bT"
down.sample.shift (etc.) would have been more compact than extract1, but the inefficiency was just too much to bear.
If you do not want to repeat random strings, simply keep a list of the ones you generate. If you generate another that is in the list, discard it and generate another. It's pretty unlikely you'll have to generate any extra ones, however. If, for example, you generate 100 random strings (satisfying the requirement of at least one lowercase letter, uppercase letter and digit), the chances that there will be one or more duplicate strings is about one in 700,000:
t = 107_518_933_731
n = t+1
t = t.to_f
(1.0 - 100.times.reduce(1.0) { |prod,_| prod * (n -= 1)/t }).round(10)
#=> 1.39e-07
where t = C(62,10) and C(62,10) is defined below.
An alternative
There is a really simple way to do this that turns out to be pretty efficient: just sample without replacement until a sample is found that meets the requirement of at least lowercase letter, one uppercase letter and one digit. We can do that as follows:
DOWN = ('a'..'z').to_a
UP = ('A'..'Z').to_a
DIGITS = ('0'..'9').to_a
ALL = DOWN + UP + DIGITS
def rnd_str
loop do
arr = ALL.sample(10)
break arr.shuffle.join unless (DOWN&&arr).empty? || (UP&&arr).empty? ||
(DIGITS&&arr).empty?
end
end
rnd_str #=> "3jRkHcP7Ge"
rnd_str #=> "B0s81x4Jto
How many samples must we reject, on average, before finding a "good" one? It turns out (see below if you are really, really interested) that the probability of getting a "bad" string (i.e, selecting 10 characters at random from the 62 elements of all, without replacement, that has no lowercase letters, no uppercase letters or no digits, is only about 0.15. (15%). That means that 85% of the time no bad samples will be rejected before a good one is found.
It turns out that the expected number of bad strings that will be sampled, before a good string is sampled, is:
0.15/0.85 =~ 0.17
The following shows how the above probability was derived, should anyone be interested.
Let n_down be the number of ways a sample of 10 can be drawn that has no lowercase letters:
n_down = C(36,10) = 36!/(10!*(36-10)!)
where (the binomial coefficient) C(36,10) equals the number of combinations of 36 "things" that can be "taken" 10 at a time, and equals:
C(36,10) = 36!/(10!*(36-10)!) #=> 254_186_856
Similarly,
n_up = n_down #=> 254_186_856
and
n_digits = C(52,10) #=> 15_820_024_220
We can add these three numbers together to obtain:
n_down + n_up + n_digits #=> 16_328_397_932
This is almost, but not quite, the number of ways to draw 10 characters, without replacement, that contains no lowercase letters characters, uppercase letters or digits. "Not quite" because there is a bit of double-counting going on. The necessary adjustment is as follows:
n_down + n_up + n_digits - 2*C(26,10) - 3
#=> 16_317_774_459
To obtain the probability of drawing a sample of 10 from a population of 62, without replacement, that has no lowercase letter, no uppercase letter or no digit, we divide this number by the total number of ways 10 characters can be drawn from 62 without replacement:
(16_317_774_459.0/c(62,10)).round(2)
#=> 0.15

If you want a script to generate just some small amount of tokens (like 2, 5, 10, 100, 1000, 10 000, etc), then the best way would be to simply keep the already generated tokens in memory and retry until a new one is generated (statistically speaking, this wont take long). If this is not the case - keep reading.
After thinking about it, this problem turned out to be in fact very interenting. For brievety, I will not mention the requirement to have at least one number, capital and lower case letters, but it will be included in the final solution. Also let all = [*'1'..'9', *'a'..'z', *'A'..'Z'].
To sum it up, we want to generate k-permutations of n elements with repetition randomly with uniqueness constraint.
k = 10, n = 61 (all.size)
Ruby just so happens to have such method, it's Array#repeated_permutation. So everything is great, we can just use:
all.repeated_permutation(10).to_a.map(&join).shuffle
and pop the resulting strings one by one, right? Wrong! The problem is that the amount of possibilities happens to be:
k^n = 10000000000000000000000000000000000000000000000000000000000000 (10**61).
Even if you had an infinetelly fast processor, you still can't hold such amount of data, no matter if this was the count of complex objects or simple bits.
The opposite would be to generate random permutations, keep the already generated in a set and make checks for inclusion before returning the next element. This is just delaying the innevitable - not only you would still have to hold the same amount of information at some point, but as the number of generated permutations grows, the number of tries required to generate a new permutation diverges to infinity.
As you might have thought, the root of the problem is that randomness and uniqueness hardly go hand to hand.
Firstly, we would have to define what we would consider as random. Judging by the amount of nerdy comics on the subject, you could deduce that this isn't that black and white either.
An intuitive definition for a random program would be one that doesn't generate the tokens in the same order with each execution. Great, so now we can just take the first n permutations (where n = rand(100)), put them at the end and enumerate everything in order? You can sense where this is going. In order for a random generation to be considered good, the generated outputs of consecutive runs should be equaly distributed. In simpler terms, the probability of getting any possible output should be equal to 1 / #__all_possible_outputs__.
Now lets explore the boundaries of our problem a little:
The number of possible k-permutations of n elements without repetition is:
n!/(n-k)! = 327_234_915_316_108_800 ((61 - 10 + 1).upto(61).reduce(:*))
Still out of reach. Same goes for
The number of possible full permutations of n elements without repetition:
n! = 507_580_213_877_224_798_800_856_812_176_625_227_226_004_528_988_036_003_099_405_939_480_985_600_000_000_000_000 (1.upto(61).reduce(:*))
The number of possible k-combinations of n elements without repetition:
n!/k!(n-k)! = 90_177_170_226 ((61 - 10 + 1).upto(61).reduce(:*)/1.upto(10).reduce(:*))
Finally, where we might have a break through with full permutation of k elements without repetition:
k! = 3_628_800 (1.upto(10).reduce(:*))
~3.5m isn't nothing, but at least it's reasonably computable. On my personal laptop k_permutations = 0.upto(9).to_a.permutation.to_a took 2.008337 seconds on average. Generally, as computing time goes, this is a lot. However, assuming that you will be running this on an actual server and only once per application startup, this is nothing. In fact, it would even be reasonable to create some seeds. A single k_permutations.shuffle took 0.154134 seconds, therefore in about a minute we can acquire 61 random permutations: k_seeds = 61.times.map { k_permutations.shuffle }.to_a.
Now lets try to convert the problem of k-permutations of n elements without repetition to solving multiple times full k-permutations without repetitions.
A cool trick for generating permutations is using numbers and bitmaps. The idea is to generate all numbers from 0 to 2^61 - 1 and look at the bits. If there is a 1 on position i, we will use the all[i] element, otherwise we will skip it. We still didn't escape the issue as 2^61 = 2305843009213693952 (2**61) which we can't keep in memory.
Fortunatelly, another cool trick comes to the rescue, this time from number theory.
Any m consecutive numbers raised to the power of a prime number by modulo of m give the numbers from 0 to m - 1
In other words:
5.upto(65).map { |number| number**17 % 61 }.sort # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60]
5.upto(65).map { |number| number**17 % 61 } # => [36, 31, 51, 28, 20, 59, 11, 22, 47, 48, 42, 12, 54, 26, 5, 34, 29, 57, 24, 53, 15, 55, 3, 38, 21, 18, 43, 40, 23, 58, 6, 46, 8, 37, 4, 32, 27, 56, 35, 7, 49, 19, 13, 14, 39, 50, 2, 41, 33, 10, 30, 25, 16, 9, 17, 60, 0, 1, 44, 52, 45]
Now actually, how random is that? As it turns out - the more common divisors shared by m and the selected m numbers, the less evenly distributed the sequence is. But we are at luck here - 61^2 - 1 is a prime number (also called Mersenne prime). Therefore, the only divisors it can share are 1 and 61^2 - 1. This means that no matter what power we choose, the positions of the numbers 0 and 1 will be fixed. That is not perfect, but the other 61^2 - 3 numbers can be found at any position. And guess what - we don't care about 0 and 1 anyway, because they don't have 10 1s in their binary representation!
Unfortunatelly, a bottleneck for our randomness is the fact that the bigger prime number we want to generate, the harder it gets. This is the best I can come up with when it comes to generating all the numbers in a range in a shuffled order, without keeping them in memory simultaneously.
So to put everything in use:
We generate seeds of full permutations of 10 elements.
We generate a random prime number.
We randomly choose if we want to generate permutations for the next number in the sequence or a number that we already started (up to a finite number of started numbers).
We use bitmaps of the generated numbers to get said permutations.
Note that this will solve only the problem of k-permutations of n elements without repetition. I still haven't thought of a way to add repetition.
DISCLAIMER: The following code comes with no guarantees of any kind, explicit or implied. Its point is to further express the author's ideas, rather than be a production ready solution:
require 'prime'
class TokenGenerator
NUMBERS_UPPER_BOUND = 2**61 - 1
HAS_NUMBER_MASK = ('1' * 9 + '0' * (61 - 9)).reverse.to_i(2)
HAS_LOWER_CASE_MASK = ('0' * 9 + '1' * 26 + '0' * 26).reverse.to_i(2)
HAS_UPPER_CASE_MASK = ('0' * (9 + 26) + '1' * 26).reverse.to_i(2)
ALL_CHARACTERS = [*'1'..'9', *'a'..'z', *'A'..'Z']
K_PERMUTATIONS = 0.upto(9).to_a.permutation.to_a # give it a couple of seconds
def initialize
random_prime = Prime.take(10_000).drop(100).sample
#all_numbers_generator = 1.upto(NUMBERS_UPPER_BOUND).lazy.map do |number|
number**random_prime % NUMBERS_UPPER_BOUND
end.select do |number|
!(number & HAS_NUMBER_MASK).zero? and
!(number & HAS_LOWER_CASE_MASK).zero? and
!(number & HAS_UPPER_CASE_MASK).zero? and
number.to_s(2).chars.count('1') == 10
end
#k_permutation_seeds = 61.times.map { K_PERMUTATIONS.shuffle }.to_a # this will take a minute
#numbers_in_iteration = {go_fish: nil}
end
def next
raise StopIteration if #numbers_in_iteration.empty?
number_generator = #numbers_in_iteration.keys.sample
if number_generator == :go_fish
add_next_number if #numbers_in_iteration.size < 1_000_000
self.next
else
next_permutation(number_generator)
end
end
private
def add_next_number
#numbers_in_iteration[#all_numbers_generator.next] = #k_permutation_seeds.sample.to_enum
rescue StopIteration # lol, you actually managed to traverse all 2^61 numbers!
#numbers_in_iteration.delete(:go_fish)
end
def next_permutation(number)
fetch_permutation(number, #numbers_in_iteration[number].next)
rescue StopIteration # all k permutations for this number were already generated
#numbers_in_iteration.delete(number)
self.next
end
def fetch_permutation(number_mask, k_permutation)
k_from_n_indices = number_mask.to_s(2).chars.reverse.map.with_index { |bit, index| index if bit == '1' }.compact
k_permutation.each_with_object([]) { |order_index, k_from_n_values| k_from_n_values << ALL_CHARACTERS[k_from_n_indices[order_index]] }
end
end
EDIT: it turned out that our constraints eliminate too much possibilities. This causes #all_numbers_generator to take too much time testing and skipping numbers. I will try to think of a better generator, but everything else remains valid.
The old version that generates tokens with uniqueness constraint on the containing characters:
numbers = ('0'..'9').to_a
downcase_letters = ('a'..'z').to_a
upcase_letters = downcase_letters.map(&:upcase)
all = [numbers, downcase_letters, upcase_letters]
one_of_each_set = all.map(&:sample)
random_code = (one_of_each_set + (all.flatten - one_of_each_set).sample(7)).shuffle.join

Use 'SafeRandom' Gem GithubLink
It will provide the easiest way to generate random values for Rails 2, Rails 3, Rails 4, Rails 5 compatible.
Here you can use the strong_string method to generate a strong combination of string ( ie combination of the alphabet(uppercase, downcase), number, and symbols
# => Strong string: Minumum number should be greater than 5 otherwise by default 8 character string.
require 'safe_random'
puts SafeRandom.strong_string # => 4skgSy93zaCUZZCoF9WiJF4z3IDCGk%Y
puts SafeRandom.strong_string(3) # => P4eUbcK%
puts SafeRandom.strong_string(5) # => 5$Rkdo

All possible permutations with a condition

I'm wondering if there is an elegant way in Ruby to come up with all permutations (with repetitions) of some integers with the requirements that 1) Integers introduced must be in ascending order from left to right 2) Zero is exempt from this rule.
Below, I have a subset of the output for three digits and the integers 0,1,2,3,4,5,6,7,8,9. This is only a subset of the total answer, and specifically it is the subset which starts with 5. I've included notes on a couple of them
500 - Zero is used twice
505 - 5 is used twice. Note that 504 is not included because 5 was introduced on the left and 4 < 5
506
507
508
509
550
555
556
557
558
559
560
565 - Though 5 < 6, 5 can be used twice because 5 was introduced to the left of 6.
566
567
568
569
570
575
577
578
579
580
585
588
589
590
595
599
I need to be able to do it for arbitrarily long output lengths (not just 3, like this example), and I need to be able to do it for specific sets of integers. However, zero will always be the integer to which the ordering rule does not apply.

This would work:
class Foo
include Comparable
attr :digits
def initialize(digits)
#digits = digits.dup
end
def increment(i)
if i == -1 # [9,9] => [1,0,0]
#digits.unshift 1
else
succ = #digits[i] + 1
if succ == 10 # [8,9] => [9,0]
#digits[i] = 0
increment(i-1)
else
#digits[i] = #digits[0,i].sort.detect { |e| e >= succ } || succ
end
end
self
end
def succ
Foo.new(#digits).increment(#digits.length-1)
end
def <=>(other)
#digits <=> other.digits
end
def to_s
digits.join
end
def inspect
to_s
end
end
range = Foo.new([5,0,0])..Foo.new([5,9,9])
range.to_a
#=> [500, 505, 506, 507, 508, 509, 550, 555, 556, 557, 558, 559, 560, 565, 566, 567, 568, 569, 570, 575, 577, 578, 579, 580, 585, 588, 589, 590, 595, 599]
The main rule for incrementing a digit is:
#digits[i] = #digits[0,i].sort.detect { |e| e >= succ } || succ
This sorts the digits left to the current digit (the ones "introduced to the left") and detects the first element that's equal or larger than the successor. If none if found, the successor itself is used.

In case you need this as an executable:
#!/usr/bin/env ruby -w
def output(start, stop)
(start..stop).select do |num|
digits = num.to_s.split('').to_a
digits.map! { |d| d.to_i }
checks = []
while digit = digits.shift
next if digit == 0
next if checks.find { |d| break true if digit == d }
break false if checks.find { |d| break true if digit < d }
checks << digit
end != false
end
end
p output(*$*[0..1].map { |a| a.to_i })
$ ./test.rb 560 570
[560, 565, 566, 567, 568, 569, 570]

This is some C#/pseudocode. It definitely won't compile. The implementation is not linear, but I note where you can add a simple optimization to make it more efficient. The algorithm is quite simple, but it seems to be pretty reasonably performant (it's linear with respect to the output. I'm guessing the output grows exponentially... so this algorithm is also exponential. But with a tight constant).
// Note: I've never used BigInteger before. I don't even know what the
// APIs are. Basically you can use strings but hopefully the arbitrary
// precision arithmetic class/struct would be more efficient. You
// mentioned that you intend to add more than just 10 digits. In
// that case you pretty much have to use a string without rolling out
// your own special class. Perhaps C# has an arbitrary precision arithmetic
// which handles arbitrary base as well?
// Note: We assume that possibleDigits is sorted in increasing order. But you
// could easily sort. Also we assume that it doesn't contain 0. Again easy fix.
public List<BigInteger> GenSequences(int numDigits, List<int> possibleDigits)
{
// We have special cases to get rid of things like 000050000...
// hard to explain, but should be obvious if you look at it
// carefully
if (numDigits <= 0)
{
return new List<BigInteger>();
}
// Starts with all of the valid 1 digit (except 0)
var sequences = new Queue<BigInteger>(possibleDigits);
// Special case if numDigits == 1
if (numDigits == 1)
{
sequences.Enqueue(new BigInteger(0));
return sequences;
}
// Now the general case. We have all valid sequences of length 1
// (except 0 because no valid sequence of length greater than 1
// will start with 0)
for (int length = 1; length <= numDigits; length++)
{
// Naming is a bit weird. A 'sequence' is just a BigInteger
var sequence = sequences.Dequeue();
while (sequence.Length == length)
{
// 0 always works
var temp = sequence * 10;
sequences.Enqueue(temp);
// Now do all of the other possible last digits
var largestDigitIndex = FindLargestDigitIndex(sequence, possibleDigits);
for (int lastDigitIndex = largestDigitIndex;
lastDigitIndex < possibleDigits.Length;
lastDigitIndex++)
{
temp = sequence * 10 + possibleDigits[lastDigitIndex];
sequences.Enqueue(temp);
}
sequence = sequences.Dequeue();
}
}
}
// TODO: This is the slow part of the algorithm. Instead, keep track of
// the max digit of a given sequence Meaning 5705 => 7. Keep a 1-to-1
// mapping from sequences to largestDigitsInSequences. That's linear
// overhead in memory and reduces time complexity to linear _with respect to the
// output_. So if the output is like `O(k^n)` where `k` is the number of possible
// digits and `n` is the number of digits in the output sequences, then it's
// exponential
private int FindLargestDigitIndex(BigInteger number,
List<int> possibleDigits)
{
// Just iterate over the digits of number and find the maximum
// digit. Then return the index of that digit in the
// possibleDigits list
}
I prove why the algorithm works in the comments above (mostly, at least). It's an inductive argument. For general n > 1 you can take any possible sequence. The first n-1 digits (starting from left) must form a sequence that is also valid (by contradiction). Using induction and then checking the logic in the innermost loop we can see that our desired sequence will be output. This specific implementation you'd also need some proofs around termination and such. For example, the point of the Queue is that we want to process the sequences of length n while we are adding the sequences of length n+1 to the same Queue. The ordering of the Queue allows that innermost while loop to terminate (because we'll go through all sequences of length n before we get to the n+1 sequences).

Note: Three solutions are shown; look for the splits.
Describe a valid number, then (1..INFINITE).select{|n| valid(n)}.take(1)
So what's valid? Well, let's take some advantage here:
class Fixnum
def to_a
to_s.split('').collect{|d| d.to_i}
end
end
123.to_a == [1,2,3]
Alright, so, now: Each digit can be a digit already present or zero, or a digit greater than the prior value, and the first digit is always valid.
PS - I use i not i-1 because the loop's index is one less than set's, since I lopped the first element off.
def valid num
#Ignore zeros:
set = num.to_a.select{|d| d != 0 }
#First digit is always valid:
set[1..-1].each_with_index{ |d, i|
if d > set[i]
# puts "Increasing digit"
elsif set[0..i].include? d
# puts "Repeat digit"
else
# puts "Digit does not pass"
return false
end
}
return true
end
so then, hurrah for lazy:
(1..Float::INFINITY).lazy.select{|n| valid n}.take(100).force
#=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24,
# 25, 26, 27, 28, 29, 30, 33, 34, 35, 36, 37, 38, 39, 40, 44, 45, 46, 47, 48, 49, 50, 55,
# 56, 57, 58, 59, 60, 66, 67, 68, 69, 70, 77, 78, 79, 80, 88, 89, 90, 99, 100, 101, 102,
# 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120,
# 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 133, 134, 135, 136]
Now that we have it, let's make it succinct:
def valid2 num
set = num.to_a.select{|d| d != 0 }
set[1..-1].each_with_index{ |d, i|
return false unless (d > set[i]) || (set[0..(i)].include? d)
}
return true
end
check:
(1..Float::INFINITY).lazy.select{|n| valid n}.take(100).force - (1..Float::INFINITY).lazy.select{|n| valid2 n}.take(100).force
#=> []
all together now:
def valid num
set = num.to_s.split('').collect{|d| d.to_i}.select{|d| d != 0 }
set[1..-1].each_with_index{ |d, i|
return false unless (d > set[i]) || (set[0..(i)].include? d)
}
return true
end
Edit:
If you want a particular subset of the set, just change the range. Your original would be:
(500..1000).select{|n| valid n}
Edit2: To generate the range for a given number of digits n:
((Array.new(n-1, 0).unshift(1).join('').to_i)..(Array.new(n, 0).unshift(1).join('').to_i))
Edit3: Interesting alternative method - recursively remove digits as they become valid.
def _rvalid set
return true if set.size < 2
return false if set[1] < set[0]
return _rvalid set.select{|d| d != set[0]}
end
def rvalid num
return _rvalid num.to_s.split('').collect{|d| d.to_i}.select{|d| d != 0 }
end
(1..Float::INFINITY).lazy.select{|n| rvalid n}.take(100).force
Edit 4: Positive generation method
def _rgen set, target
return set if set.size == target
((set.max..9).to_a + set.uniq).collect{ |d|
_rgen((set + [d]), target)
}
end
def rgen target
sets = (0..9).collect{|d|
_rgen [d], target
}
# This method has an array problem that I'm not going to figure out right now
while sets.first.is_a? Array
sets = sets.flatten
end
sets.each_slice(target).to_a.collect{|set| set.join('').to_i}
end

This doesn't seem too complex. Write a refinement of a base N increment, with the change that when a digit is incremented from zero it goes straight to the largest of the digits to its left.
Update I misread the spec and my initial take on this didn't quite perform. Depending on the actual dataset the uniq.sort may be too costly, but it is fine when the items in the sequence have only a few digits. The right way would be to maintain a second, sorted copy of the digits, but I'm leaving it like this until I know it's too inefficient.
Note that the values of 0..N here are intended to be used as indices into a sorted list of the actual values each digit can take. A call to map will generate the real elements of the sequence.
This program dumps the same section of the sequence as you have shown yourself (everything beginning with five).
def inc!(seq, limit)
(seq.length-1).downto(0) do |i|
if seq[i] == limit
seq[i] = 0
else
valid = seq.first(i).uniq.sort
valid += ((valid.last || 0).next .. limit).to_a
seq[i] = valid.find { |v| v > seq[i] }
break
end
end
end
seq = Array.new(3,0)
loop do
puts seq.join if seq[0] == 5
inc!(seq, 9)
break if seq == [0,0,0]
end
output
500
505
506
507
508
509
550
555
556
557
558
559
560
565
566
567
568
569
570
575
577
578
579
580
585
588
589
590
595
599

Algorithm to spread selection over a fixed size array

It's not specifically a ruby problem: more of a general question about algorithms. But there might be some ruby-specific array methods which are helpful.
I have an array with 30 items. I ask for a number of items between 15 and 30, and I want to select a given number of items from the whole array as evenly distributed as possible. The selection needs to be non-random, returning the same result every time.
Let's say someone asks for 16 items. If I return the first 16, that would be a massive fail. Instead, I could return all the odd numbered ones plus the last one; If I had the numbers 1 to 30 stored in the array, I could give back
myArr.spread(16)
=> [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,30]
If someone asks for 20 items, its a bit tricker: I can't immediately think of a nice programmatic way of doing this. I feel like it must have been solved already by someone. Any suggestions?

I ended up doing this, inspired by Alex D: i step through n-1 times and then always add the last element to the end.
class Array
def spread(n)
step = self.length.to_f / (n -1)
(0..(n-2)).to_a.collect{|i| self[i * step]} + [self.last]
end
end
> (1..30).to_a.spread(3)
=> [1, 16, 30]
> (1..30).to_a.spread(4)
=> [1, 11, 21, 30]
> (1..30).to_a.spread(5)
=> [1, 8, 16, 23, 30]
> (1..30).to_a.spread(15)
=> [1, 3, 5, 7, 9, 11, 13, 16, 18, 20, 22, 24, 26, 28, 30]

Having recently implemented this method—although I called it keep—for use in a backup retention application, I thought I'd share my solution. It's similar to Alex D's answer with two major differences in the algorithm:
The "stride" is calculated using (length + (length / n) - 1).to_f / n where n is the number of items desired. Calculating an offset in terms of the number of times n goes into length ensures that the last item is always included.
It uses a modulo operation instead of incrementing: If the element's index divided by the "stride" has a remainder between 0 and 1 (inclusive of 0, exclusive of 1), the element is included in the result. The fact that 0 % x is always 0 ensures that the first element is always returned.
Edge cases, such as when the number of elements is less than the number desired, are accounted for.
class Array
def keep(n)
if n < 1
[]
elsif length <= n
self.clone
else
stride = (length + (length / n) - 1).to_f / n
select.with_index do |_, i|
remainder = i % stride
(0 <= remainder && remainder < 1)
end
end
end
end

Divide the size of the array by the number of items you want to select (DON'T use truncating division) -- this will be your "stride" as you walk over the array, selecting items. Keep adding the "stride" to a running total until it equals or exceeds the size of the array. Each time you add the "stride", take the integral part and use it as an index into the array to select an item.
Say you have 100 items and you want to select 30. Then your "stride" will be 3.3333... so you start with a "running total" of 3.3333, and select item 3. Then 6.66666 -- so you select item 6. Next is 10.0 -- so you select item 10. And so on...
Test to make sure you don't get "off by one" errors, and also that you don't divide by zero if the array size or number of items to select is zero. Also use a guard clause to ensure that the number of items to select is not greater than the number in the array.

There was a similar question for this here, but the solution is in python.
In Ruby, it would be:
class Array
def spread( count)
length = self.length
result = Array.new
0.upto(count-1) do |i|
result << self[(i * length.to_f / count).ceil]
end
return result
end
end
arr = Array(1..30)
puts arr.spread(20)
#=> [1, 3, 4, 6, 7, 9, 10, 12, 13, 15, 16, 18, 19, 21, 22, 24, 25, 27, 28, 30]

You could try to use a Random (doc) with a fixed seed:
with the Random object, you can pick the elements of the array randomly
the fixed seed ensure that every call to the function will generate the list of random numbers.
For example with Array#sample
def spread(arr, count) do
arr.sample(count, Random.new(0))
end

Determing if two lists contain the same numeric items without sorting

I have two lists and I need to determine if they contain the same values without sorting (ie. order of values is irrelevant). I know sorting the would work, but this is part of a performance critical section.
Item values fall within the range [-2, 63] and we're always comparing equal size lists, but the list sizes range from [1, 8].
Example lists:
A = (0, 0, 4, 23, 10)
B = (23, 10, 0, 4, 0)
C = (0, 0, 4, 27, 10)
A == B is true
A == C is false
I think a possible solution would be to compare the product of the two lists (multiply all values together), but there are problems with this solution. What to do with zero and negative numbers. A workaround would be adding 4 to every value before multiplying. Here's the code I have so far.
bool equal(int A[], int B[], int size)
{
int sumA = 1;
int sumB = 1;
for (int i = 0; i < size; i++) {
sumA *= A[i] + 4;
sumB *= B[i] + 4;
}
return (sumA == sumB)
}
But, would this always work no matter what the order/contents of the list were? In other words is the following mathematically true? So what I'm really asking is the following (unless there's another way to solve the problem):
Given 2 equal sized lists. If the products (multiplying all values together) of the lists are equal then the lists contain the same values, so long as the values are integers greater than 0.

Assuming you know the range ahead of time, you can use a variation on counting sort. Just scan through each array and keep track of how many times each integer occurs.
Procedure Compare-Lists(A, B, min, max)
domain := max - min
Count := new int[domain]
for i in A:
Count[i - min] += 1
for i in B:
Count[i - min] -= 1
if Count[i - min] < 0:
// Something was in B but not A
return "Different"
for i in A:
if Count[i - min] > 0:
// Something was in A but not B
return "Different"
return "Same"
This is linear in O(len(A) + len(B))

You could do this with primes. Keep a prime table for the first 66 primes and use the elements of your arrays (offset by +2) to index into the prime table.
The identity of an array is then just the product of the primes represented by the elements in the array.
Unfortunately, the product must be represented with at least 67 bits:
The 66th prime is 317, and 3178 = 101,970,394,089,246,452,641
log2(101,970,394,089,246,452,641) = 66.47 (rounded up) is 67 bits
Example pseudocode for doing this (assuming the existence of an int128 data type):
int primes[] =
{
2, 3, 5, 7, 11, 13, 17, 19, 23, 29,
31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97, 101, 103, 107, 109, 113,
127, 131, 137, 139, 149, 151, 157, 163, 167, 173,
179, 181, 191, 193, 197, 199, 211, 223, 227, 229,
233, 239, 241, 251, 257, 263, 269, 271, 277, 281,
283, 293, 307, 311, 313, 317
};
// Assumes:
// Each xs[i] is [-2, 63]
// length is [1, 8]
int128 identity(int xs[], int length)
{
int128 product = 1;
for (int i = 0; i < length; ++i)
{
product *= primes[xs[i] + 2];
}
return product;
}
bool equal(int a[], int b[], int size)
{
return identity(a, size) == identity(b, size);
}
You might be able to use a long double on GCC to store the product since it is defined as an 80-bit data type, but I'm not sure if the floating-point multiplication error would cause collisions between lists. I haven't verified this.
My previous solution below does not work, see the comments below.
For each list:
Compute the sum of all elements
Compute the product of all elements
Store the length of the list (in your case, since the length is guaranteed to be the same for two lists, you can ignore it entirely)
As you compute the sum and product, each element needs to be adjusted by +3, so your range is now [1, 66].
The (sum, product, length) tuple is the identity for your list. Any lists with the same identity are equal.
You can fit this (sum, product, length) tuple into a single 64-bit number:
For the product: 668 = 360,040,606,269,696, log2(360,040,606,269,696) = 48.36 (rounded up) is 49 bits
For the sum: 66 * 8 = 528, log2(528) = 9.04 (rounded up) is 10 bits
Length is in the range [1, 8], log2(8) = 3 bits
49 + 10 + 3 = 62 bits for representing the identity
Then, you can do direct 64-bit comparisons to determine equality.
Running-time is linear in the size of the arrays with a single pass over each. Memory usage is O(1).
Example code:
#include <cstdint>
#include <stdlib.h>
// Assumes:
// Each xs[i] is [-2, 63]
// length is [1, 8]
uint64_t identity(int xs[], int length)
{
uint64_t product = 1;
uint64_t sum = 0;
for (int i = 0; i < length; ++i)
{
int element = xs[i] + 3;
product *= element;
sum += element;
}
return (uint64_t)length << 59 | (sum << 49) | product;
}
bool equal(int a[], int b[], int size)
{
return identity(a, size) == identity(b, size);
}
void main()
{
int a[] = { 23, 0, -2, 6, 3, 23, -1 };
int b[] = { 0, -1, 6, 23, 23, -2, 3 };
printf("%d\n", equal(a, b, _countof(a)));
}

Since you only have 66 possible numbers, you can create a bit vector (3 32-bit words or 2 64-bit words) and compare those. You can do it all with just shifts and adds. Since there are no comparisons required until the end (to find out if they are equal), it can run fast because there won't be many branches.

Make a copy of the first list. Then loop through the second and as you do remove each item from the copy. If you get all the way through the second list and found all elements in the copy, then the lists have the same elements. This is a lot of looping, but with only max 8 elements in the list, you won't get a performance gain by using a different type of collection.
If you had a lot more items, then has a Dictionary/Hashtable for the copy. Keep a unique key of values with a count of how many times they were found in the first list. This will give you a performance boost on larger lists.

Given 2 equal sized lists. If the products (multiplying all values together) of the lists are equal then the lists contain the same values, so long as the values are integers greater than 0.
No. Consider the following lists
(9, 9)
(3, 27)
They are the same size and the product of the elements are the same.

How fast do you need to process 8 integers? Sorting 8 things in any modern processor is going to take almost no time.
The easy thing is to just use an array of size 66 where index 0 represents value -2. Then you just increment counts across both arrays, and then you just iterate across them afterwards.

If your list has only 8 items then sorting is hardly a performance hit. If you want to do this without sorting you can do so using a hashmap.
Iterate over the first array and for each value N in the array Hash(N) = 1.
Iterate over the second array and for each value M, Hash(M) = Hash(M) + 1.
Iterate over the hash and find all keys K for which Hash(K) = 2.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Linq list subset removal - linq

You can try to change ContainsAll() with LINQ to check for subset collection : If Not Me(shortestIndex).Except(Me(index)).Any() Then Me.RemoveAt(index) End If

Related

Function which increases fast and slows down reaching predefined maximum

Unique random string with alphanumberic required in Ruby

All possible permutations with a condition

Algorithm to spread selection over a fixed size array

Determing if two lists contain the same numeric items without sorting

Categories

Resources