Find all sub-set of sequences (included overlap) whose total size is not greater than K - algorithm

I need the advice (providing keywords are enough for me) to find an optimal solution (O(M+N) would be ideal) for this problem for a large input size. I am thinking about combining Cycle detection but with this special cases I don't really know if this is the right path or not.
Given a list of circular arrays and an integer value K.
The order of the circular array must be left-to-right. For example, 1a-3c-4b can be re-ordered as 3c-4b-1a or 4b-1a-3c but not 4b-3c-1a.
Find all possible sub-set of sequences (including overlap) whose sub-set size is not greater than K.
For example:
list of circulars:
30, 26, 44, 32, 3c (1)
5a, 3c, 12, 1e, 4d (2)
1e, 5a, 12 (3)
4d, 5a, 1e (4)
32, 51, 2f, 49, 55, 42 (5)
K = 12
The output should be the following subsets:
(1), (2)
(1), (3)
(1), (4)
(1), (5)
(2), (3)
(2), (4)
(2), (5)
(3), (4)
(3), (5)
(1), (2), (4)
(1), (3), (4)
(2), (3), (4)
(2), (4), (5)
(3), (4), (5)
In the example above:
(1), (2), (4) - circulars (2) and (4) (rotate this circular to 1e-4d-5a) have an intersection at (1e-4d-5a). Therefore the size of this subset is reduced to 10, which is smaller than K (12)
The size of (1), (3), (4) is still 11 because we cannot find any intersection in this subset. 1e-5a in (3) cannot re-order to be like 5a-1e in (4)
(2), (3), (4) and (2), (4), (5) has the same reason as (1), (2), (4)

Related

integer of arbitrary base division, without conversion to base 10

Integers are represented as a list of ints, for example:
[13, 2] (D2 in base 16)
[1, 3, 6] (136 in base 10)
We are only allowed to use the digits separately, they cannot be converted to base 10 integers.
I wanted to implement division in 'paper & pencil' aka 'long division' way, but it turns out that
it requires the divisor and dividend prefixes to be base 10 integers to be able to perform base 10 integer division.
for example:
to find 11131 / 117, we need to first find 1113 / 117
Is there any way to decompose a division like 1113 / 117 into smaller steps, which won't require any divisions other than such that the divisor is in a closed range [1, base - 1]?
Is there any other way to perform this division not in the 'paper & pencil' way?

Get permutation count

I searching for an algorithm which gives me the permutation count of the elements 1....n. If i define the cycle lengths.
For example n := 4
<Set of cycle lengths> -> permutation count
1,1,1,1 -> 1 read 4 cycles of length 1 leads to 1 permutation: 1,2,3,4
1,1,2 -> 5 read 2 cycles of length 1 and 1 cycle of length 2 leads to 5 permutations: 1,2,4,3, 1,4,3,2, 1,3,2,4, 2,1,3,4, 3,2,1,4,
2,2 -> 3 read 2 cycles of length 2 leads to 3 permutations: 2,1,4,3, 3,4,1,2,4,3,2,1
1,3 -> 9 read 1 cycle of length 1 and 1 cycle of length 3 leads to 9 permutations 1,3,2,4, 1,3,4,2, 1,4,2,3, 2,3,1,4, 2,4,3,1, 3,1,2,4, 3,2,4,1,4,1,3,2, 4,2,1,3,
4 -> 6 read 1 cycle of length 4 leads to 6 permutations:
2,3,4,1, 2,4,1,3, 3,1,4,2, 3,4,2,1, 4,1,2,3, 4,3,1,2
How can i compute the permutation count of a given set consisting cycle lengths? Iterating through all permutations is not an option.
For a given cycle type, we can produce a permutation with that cycle type by writing down a permutation of the list 1, ..., n and then bracketing it appropriately, according to the lengths in the cycle type, to get a permutation written in cycle notation.
For example, if we want cycle type (3, 2, 2), then the permutation 1, 2, 3, 4, 5, 6, 7 is bracketed as (1 2 3)(4 5)(6 7), while 5, 1, 6, 2, 4, 3, 7 gives (5 1 6)(2 4)(3 7).
It's clear that we get all permutations of cycle type (3, 2, 2) this way, but it's also clear that we can get each permutation in multiple different ways. There are two causes of overcounting: first, we can make a cyclic shift for any of the cycles: (5 1 6)(2 4)(3 7) is the same permutation as (1 6 5)(2 4)(3 7) or (6 5 1)(2 4)(3 7). Second, cycles of the same length can be permuted arbitrarily: (5 1 6)(2 4)(3 7) is the same permutation as (5 1 6)(3 7)(2 4). A bit of thought should convince you that these are the only possible causes of overcounting.
To account for both causes of overcounting, we divide the total number of permutations by (a) the product of the cycle lengths, and also (b) the factorial of the number of cycles for any given cycle length. In the (3, 2, 2) case: we divide by 3 × 2 × 2 for (a), and 2! for (b), because there are two cycles of length 2.
Since this is Stack Overflow, here's some Python code:
from collections import Counter
from math import factorial
def count_cycle_type(p):
"""Number of permutations with a given cycle type."""
count = factorial(sum(p))
for cycle_length, ncycles in Counter(p).items():
count //= cycle_length ** ncycles * factorial(ncycles)
return count
Example:
>>> count_cycle_type((2, 2))
3
>>> count_cycle_type((3, 2, 2))
210
To double check correctness, we can add the counts for all cycle types of a given length n, and check that we get n!. The cycle types are the partitions of n. We can compute those fairly simply by a recursive algorithm. Here's some code to do that. partitions is the function we want; bounded_partitions is a helper.
def bounded_partitions(n, k):
"""Generate partitions of n with largest element <= k."""
if k == 0:
if n == 0:
yield ()
else:
if n >= k:
for c in bounded_partitions(n - k, k):
yield (k,) + c
yield from bounded_partitions(n, k - 1)
def partitions(n):
"""Generate partitions of n."""
return bounded_partitions(n, n)
Example:
>>> for partition in partitions(5): print(partition)
...
(5,)
(4, 1)
(3, 2)
(3, 1, 1)
(2, 2, 1)
(2, 1, 1, 1)
(1, 1, 1, 1, 1)
And here's the double check: the sum of all the cycle type counts, for total lengths 5, 6, 7 and 20. We get the expected results of 5!, 6!, 7! and 20!.
>>> sum(count_cycle_type(p) for p in partitions(5))
120
>>> sum(count_cycle_type(p) for p in partitions(6))
720
>>> sum(count_cycle_type(p) for p in partitions(7))
5040
>>> sum(count_cycle_type(p) for p in partitions(20))
2432902008176640000
>>> factorial(20)
2432902008176640000
This can be broken down into:
The number of ways to partition elements in to buckets matching the required count of elements with each distinct cycle size;
Multiplied by, for each distinct cycle size, the number of unique ways to partition the elements evenly into the required number of cycles;
Multiplied by, for each cycle, the number of distinct cyclic orderings
1: For bucket sizes s1...sk, that works out to n!/(s1! * ... * sk!)
2: For a bucket containing m elements that must be partitioned into c cycles, there are m!/( (m/c)!c * c! ) ways
3: For a cycle containing m elements, there are (m-1)! distinct cyclic orderings if m > 1, and just 1 ordering otherwise

Modified knapsack with branch and bound method

So the problem I have is the following : there is a set of N categories of objects, in each category there are M objects, each with a specified value and weight. We have to pick one object from each category so that the weight is <= some given capacity W, and the value is maximum. The task has to be solved using the branch and bounds method. I struggle to understand how is this method supposed to work in this situation. Could you please explain it to me?
A short example of what the algorithm should do.
lets say you have 4 items [(weight, value)]= [(3, 5),(8, 10),(1, 2),(4, 5)].
First sort them on there value per weight = [(1, 2),(12, 20),(4, 5),(9, 10)]
and the maximum weight is 16.
starting from the first item make a tree where you either ad or drop a item.
At each level in the tree calculate weight, the value and the value which is still left in the three. If the value left + value in a branch is less than maximum value you find, then you close that branch. You also close a branch if the weight is more than aloud.
Below a schematic representation of how it should work.
(value) (0)
(weight) (0)
(value left) (37)
add drop
(1,2) <------- ------>
(2) (0)
(1) (0)
(35) (35)
(20,12) ---------------------------------------------------------------
(22) (2) (20) (0)
(13) (1) (12) (0)
(15) *(15) (15) *(15)
(4,5) -----------------------------------------------------------------------
(27) (22) (25) (20)
(17) (13) (16) (12)
**(10) (10) (10) (10)
(9,10) ---------------------------------------------------------------------------
(31) (20) (35) (25) (30) (20)
(22) (13) (25) (16) (21) (12)
**(0) (0) **(0) (0) **(0) (0)
win
* branch is closed due to that the value+value left< then maximum value in the tree
**
branch is closed due to that the weight is more than the aloud weight.
The benefit of this method is that you reduce computations compared to a brute force method. By starting which items which have the highest value per weight, its very likely that you close of branches quickly and reduce computation time.
hopefully this helps

Length of binary encoding in Huffman algorithm?

What is the length of the longest binary encoding that occurs when using Huffman’s
algorithm with the weights 10, 10, 10, 10, 15, 15, 50?
Is there a fast way to do this or do i have to build the tree and then calculate the average number of bits which I guess would be:
= total length / number of bits
This is the tree I generated:
The length of the longest binary encoding that occurs when using Huffman’s algorithm is equal to the height of the tree, which in this case is 4. So longest length would be 4.
It can be easily seen, that when you assign 0 to left branch and 1 to right branch( you can also do it vice versa), the codes would be:
50: 0
10: 1000
10: 1001
10: 1010
10: 1011
15: 110
15: 111

Fastest gap sequence for shell sort?

According to Marcin Ciura's Optimal (best known) sequence of increments for shell sort algorithm,
the best sequence for shellsort is 1, 4, 10, 23, 57, 132, 301, 701...,
but how can I generate such a sequence?
In Marcin Ciura's paper, he said:
Both Knuth’s and Hibbard’s sequences
are relatively bad, because they are
defined by simple linear recurrences.
but most algorithm books I found tend to use Knuth’s sequence: k = 3k + 1, because it's easy to generate. What's your way of generating a shellsort sequence?
Ciura's paper generates the sequence empirically -- that is, he tried a bunch of combinations and this was the one that worked the best. Generating an optimal shellsort sequence has proven to be tricky, and the problem has so far been resistant to analysis.
The best known increment is Sedgewick's, which you can read about here (see p. 7).
If your data set has a definite upper bound in size, then you can hardcode the step sequence. You should probably only worry about generality if your data set is likely to grow without an upper bound.
The sequence shown seems to grow roughly as an exponential series, albeit with quirks. There seems to be a majority of prime numbers, but with non-primes in the mix as well. I don't see an obvious generation formula.
A valid question, assuming you must deal with arbitrarily large sets, is whether you need to emphasise worst-case performance, average-case performance, or almost-sorted performance. If the latter, you may find that a plain insertion sort using a binary search for the insertion step might be better than a shellsort. If you need good worst-case performance, then Sedgewick's sequence appears to be favoured. The sequence you mention is optimised for average-case performance, where the number of comparisons outweighs the number of moves.
I would not be ashamed to take the advice given in Wikipedia's Shellsort article,
With respect to the average number of comparisons, the best known gap
sequences are 1, 4, 10, 23, 57, 132, 301, 701 and similar, with gaps
found experimentally. Optimal gaps beyond 701 remain unknown, but good
results can be obtained by extending the above sequence according to
the recursive formula h_k = \lfloor 2.25 h_{k-1} \rfloor.
Tokuda's sequence [1, 4, 9, 20, 46, 103, ...], defined by the simple formula h_k = \lceil h'_k
\rceil, where h'k = 2.25h'k − 1 + 1, h'1 = 1, can be recommended for
practical applications.
guessing from the pseudonym, it seems Marcin Ciura edited the WP article himself.
The sequence is 1, 4, 10, 23, 57, 132, 301, 701, 1750. For every next number after 1750 multiply previous number by 2.25 and round down.
Sedgewick observes that coprimality is good. This rings true: if there are separate ‘streams’ not much cross-compared until the gap is small, and one stream contains mostly smalls and one mostly larges, then the small gap might need to move elements far. Coprimality maximises cross-stream comparison.
Gonnet and Baeza-Yates advise growth by a factor of about 2.2; Tokuda by 2.25. It is well known that if there is a mathematical constant between 2⅕ and 2¼ then it must† be precisely √5 ≈ 2.236.
So start {1, 3}, and then each subsequent is the integer closest to previous·√5 that is coprime to all previous except 1. This sequence can be pre-calculated and embedded in code. There follow the values up to 2⁶⁴ ≈ eighteen quintillion.
{1, 3, 7, 16, 37, 83, 187, 419, 937, 2099, 4693, 10499, 23479, 52501, 117391, 262495, 586961, 1312481, 2934793, 6562397, 14673961, 32811973, 73369801, 164059859, 366848983, 820299269, 1834244921, 4101496331, 9171224603, 20507481647, 45856123009, 102537408229, 229280615033, 512687041133, 1146403075157, 2563435205663, 5732015375783, 12817176028331, 28660076878933, 64085880141667, 143300384394667, 320429400708323, 716501921973329, 1602147003541613, 3582509609866643, 8010735017708063, 17912548049333207, 40053675088540303, 89562740246666023, 200268375442701509, 447813701233330109, 1001341877213507537, 2239068506166650537, 5006709386067537661, 11195342530833252689}
(Obviously, omit those that would overflow the relevant array index type. So if that is a signed long long, omit the last.)
On average these have ≈1.96 distinct prime factors and ≈2.07 non-distinct prime factors; 19/55 ≈ 35% are prime; and all but three are square-free (2⁴, 13·19² = 4693, 3291992692409·23³ ≈ 4.0·10¹⁶).
I would welcome formal reasoning about this sequence.
† There’s a little mischief in this “well known … must”. Choosing ∉ℚ guarantees that the closest number that is coprime cannot be a tie, but rational with odd denominator would achieve same. And I like the simplicity of √5, though other possibilities include e^⅘, 11^⅓, π/√2, and √π divided by the Chow-Robbins constant. Simplicity favours √5.
I've found this sequence similar to Marcin Ciura's sequence:
1, 4, 9, 23, 57, 138, 326, 749, 1695, 3785, 8359, 18298, 39744, etc.
For example, Ciura's sequence is:
1, 4, 10, 23, 57, 132, 301, 701, 1750
This is a mean of prime numbers. Python code to find mean of prime numbers is here:
import numpy as np
def isprime(n):
''' Check if integer n is a prime '''
n = abs(int(n)) # n is a positive integer
if n < 2: # 0 and 1 are not primes
return False
if n == 2: # 2 is the only even prime number
return True
if not n & 1: # all other even numbers are not primes
return False
# Range starts with 3 and only needs to go up the square root
# of n for all odd numbers
for x in range(3, int(n**0.5)+1, 2):
if n % x == 0:
return False
return True
# To apply a function to a numpy array, one have to vectorize the function
vectorized_isprime = np.vectorize(isprime)
a = np.arange(10000000)
primes = a[vectorized_isprime(a)]
#print(primes)
for i in range(2,20):
print(primes[0:2**i].mean())
The output is:
4.25
9.625
23.8125
57.84375
138.953125
326.1015625
749.04296875
1695.60742188
3785.09082031
8359.52587891
18298.4733887
39744.887085
85764.6216431
184011.130096
392925.738174
835387.635033
1769455.40302
3735498.24225
The gap in the sequence is slowly decreasing from 2.5 to 2.
Maybe this association could improve the Shellsort in the future.
I discussed this question here yesterday including the gap sequences I have found work best given a specific (low) n.
In the middle I write
A nasty side-effect of shellsort is that when using a set of random
combinations of n entries (to save processing/evaluation time) to test
gaps you may end up with either the best gaps for n entries or the
best gaps for your set of combinations - most likely the latter.
The problem lies in testing the proposed gaps such that valid conclusions can be drawn. Obviously, testing the gaps against all n! orderings that a set of n unique values can be expressed as is unfeasible. Testing in this manner for n=16, for example, means that 20,922,789,888,000 different combinations of n values must be sorted to determine the exact average, worst and reverse-sorted cases - just to test one set of gaps and that set might not be the best. 2^(16-2) sets of gaps are possible for n=16, the first being {1} and the last {15,14,13,12,11,10,9,8,7,6,5,4,3,2,1}.
To illustrate how using random combinations might give incorrect results assume n=3 that can assume six different orderings 012, 021, 102, 120, 201 and 210. You produce a set of two random sequences to test the two possible gap sets, {1} and {2,1}. Assume that these sequences turn out to be 021 and 201. for {1} 021 can be sorted with three comparisons (02, 21 and 01) and 201 with (20, 21, 01) giving a total of six comparisons, divide by two and voilà, an average of 3 and a worst case of 3. Using {2,1} gives (01, 02, 21 and 01) for 021 and (21, 10 and 12) for 201. Seven comparisons with a worst case of 4 and an average of 3.5. The actual average and worst case for {1] is 8/3 and 3, respectively. For {2,1} the values are 10/3 and 4. The averages were too high in both cases and the worst cases were correct. Had 012 been one of the cases {1} would have given a 2.5 average - too low.
Now extend this to finding a set of random sequences for n=16 such that no set of gaps tested will be favored in comparison with the others and the result close (or equal) to the true values, all the while keeping processing to a minimum. Can it be done? Possibly. After all, everything is possible - but is it probable? I think that for this problem random is the wrong approach. Selecting the sequences according to some system may be less bad and might even be good.
More information regarding jdaw1's post:
Gonnet and Baeza-Yates advise growth by a factor of about 2.2; Tokuda by 2.25. It is well known that if there is a mathematical constant between 2⅕ and 2¼ then it must† be precisely √5 ≈ 2.236.
It is known that √5 * √5 is 5 so I think every other index should increase by a factor of five. So first index being 1 insertion sort, second being 3 then each other subsequent is of the factor 5. There follow the values up to 2⁶⁴ ≈ eighteen quintillion.
{1, 3,, 15,, 75,, 375,, 1 875,, 9 375,, 46 875,, 234 375,, 1 171 875,, 5 859 375,, 29 296 875,, 146 484 375,, 732 421 875,, 3 662 109 375,, 18 310 546 875,, 91 552 734 375,, 457 763 671 875,, 2 288 818 359 375,, 11 444 091 796 875,, 57 220 458 984 375,, 286 102 294 921 875,, 1 430 511 474 609 375,, 7 152 557 373 046 875,, 35 762 786 865 234 375,, 178 813 934 326 171 875,, 894 069 671 630 859 375,, 4 470 348 358 154 296 875,}
The values in the gaps can simply be calculated by taking the value before and multiply by √5 rounding to whole numbers giving the resulting array (using 2.2360679775 * 5 ^ n * 3):
{1, 3, 7, 15, 34, 75, 168, 375, 839, 1 875, 4 193, 9 375, 20 963, 46 875, 104 816, 234 375, 524 078, 1 171 875, 2 620 392, 5 859 375, 13 101 961, 29 296 875, 65 509 804, 146 484 375, 327 549 020, 732 421 875, 1 637 745 101, 3 662 109 375, 8 188 725 504, 18 310 546 875, 40 943 627 518, 91 552 734 375, 204 718 137 589, 457 763 671 875, 1 023 590 687 943, 2 288 818 359 375, 5 117 953 439 713, 11 444 091 796 875, 25 589 767 198 563, 57 220 458 984 375, 127 948 835 992 813, 286 102 294 921 875, 639 744 179 964 066, 1 430 511 474 609 375, 3 198 720 899 820 328, 7 152 557 373 046 875, 15 993 604 499 101 639, 35 762 786 865 234 375, 79 968 022 495 508 194, 178 813 934 326 171 875, 399 840 112 477 540 970, 894 069 671 630 859 375, 1 999 200 562 387 704 849, 4 470 348 358 154 296 875, 9 996 002 811 938 524 246}
(Obviously, omit those that would overflow the relevant array index type. So if that is a signed long long, omit the last.)

Resources