Generating rows of combinations - algorithm

I don't know how I should formulate this question but I hope I can explain what I want to achive.
So I got a set of characters [A, B, C].
I want to generate the minimal amount of rows with length of N needed to contain all possible combinations of [A, B, C].
Example: when N = 4, it generates something like this with 9 rows of length N(1 column = 1 row)
AAABBBCCC
ABCCABBCA
BACACBCBA
ABCABCABC
For example the first row [A, A, B, A] contains the following combinations(1 column = 1 combination), notice how the combinations can wrap around the row.
A, A, A, B
A, A, B, A
A, B, A, A
B, A, A, A
How ever it's allowed for a combination to end up more than 1 time amongst all generated rows but it should be kept at the optimal minimum.
How should I go about this programmatically?

Are you counting in base 3? with the digits 1, 2, 3 instead of 0, 1, 2.
If you don't know how to count in base three, let's do it now.
0, 1, 2, 10, 11, 12, 20, 21, 22, 100
If you want to know what is the number of rows the with N digits then the answer is 3^N.
If you want to know what sequence you have at a given row of a given length at a given position (the first position is zero) in the sorted list you can use the following function
def row(k, N):
d = []
assert(k < 3**N)
for _ in range(N):
k,r = divmod(k, 3)
d.append(r+1)
return ''.join(str(di) for di in d[::-1])
An easy to verify row(1, 3)='112' is the second term in of the rows with lenght 3.
A not so easy to verify is that the billionth term of length 25 is given by row(10**9-1, 25)='1111113231311311132121111'.
Using generic symbols
If you want to return a list of arbitrary objects (not necessarily three) it is just changing the way the output is mapped.
def row(k, symbols, N):
d = []
B = len(symbols)
assert(k < B**N)
for _ in range(N):
k,r = divmod(k, B)
d.append(symbols[r])
return d[::-1];
Using it
print(row(1, ['red', 'green', 'blue'], 6))
print(row(100, ['red', 'green', 'blue'], 6))
> ['red', 'red', 'red', 'red', 'red', 'green']
> ['red', 'green', 'red', 'blue', 'red', 'green']

Related

Logistic Regression: How to maximize function parameters?

I have a Python function MyFunction (a, b, c, d, e, f, g, h, i, j) which takes several parameters, then processes some real data and returns a numerical value x. If I should be more specific, the function is basically processing a data table with 150000 rows and counting how many rows fulfill certain conditions based on the inputs.
For example MyFunction (1, 1, 1, 1, 1, 1, 2, 1, 1, 2) returns 79107, MyFunction (1, 3, -1.5545, 7, 3, 1, 3, 15, 1.785, -2.5454) returns 68758 and so on.
How can I find which combination of those 10 parameters a, b, c, d, e, f, g, h, i, j gives the maximum possible value of x? Whereas those passed parameters can be any numbers (float/integer) and within any range. X is always in the range 0-150000.
EDIT: Here's the code with data I use if somebody wants to take a look. Colab
I solved my case using the scipy.optimize.minimize function. The calculation was very fast, took less than a minute. I'm very surprised at how efficient it is. But I had to try different calculation methods, it's only the method='Powell' that worked like a charm in my case.
from scipy.optimize import minimize
def MyFunction (a, b, c, d, e, f, g, h, i, j):
#do something
return x*-1 # because the function is minimize, so x*-1 maximizes it
StartValues = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
res = minimize (MyFunction, StartValues, method='Powell', tol=0.01)
print (res)

Biggest non-contiguous submatrix with all ones

I'm tackling the problem of finding a non-contiguous submatrix of a boolean matrix with maximum size such that all of its cells are ones.
As an example, consider the following matrix:
M = [[1, 0, 1, 1],
[0, 0, 1, 0],
[1, 1, 1, 1]]
A non-contiguous submatrix of M is specified as a set of rows R and a set of columns C. The submatrix is formed by all the cells that are in some row in R and in some column in C (the intersections of R and C). Note that a non-contiguous submatrix is a generalization of a submatrix, so any (contiguous) submatrix is also a non-contiguous submatrix.
There is one maximum non-contiguous submatrix of M that has a one in all of its cells. This submatrix is defined as R={1, 3, 4} and C={1, 3}, which yields:
M[1, 2, 4][1, 3] = [[1, 1, 1],
[1, 1, 1]]
I'm having difficulties finding existing literature about this problem. I'm looking for efficient algorithms that don't necessarily need to be optimal (so I can relax the problem to finding maximal size submatrices). Of course, this can be modeled with integer linear programming, but I want to consider other alternatives.
In particular, I want to know if this problem is already known and covered by the literature, and I want to know if my definition of non-contiguous matrix makes sense and whether already exists a different name for them.
Thanks!
Since per your response to Josef Wittmann's comment you want to find the Rectangle Covering Number, my suggestion would be to construct the Lovász–Saks graph and apply a graph coloring algorithm.
The Lovász–Saks graph has a vertex for each 1 entry in the matrix and an edge between each pair of vertices whose 2x2 matrix contains a zero. In your example,
[[1, 0, 1, 1],
[0, 0, 1, 0],
[1, 1, 1, 1]]
we can label the 1s with letters:
[[a, 0, b, c],
[0, 0, d, 0],
[e, f, g, h]]
and then get edges
a--d, a--f, b--f, c--d, c--f, d--e, d--f, d--h.
a b a 0 0 b b c 0 c 0 d 0 d d 0
0 d e f f g d 0 f h e f f g g h
I think an optimal coloring is
{a, b, c, e, g, h} -> 1
{d} -> 2
{f} -> 3.

Lua - Choose a random value from a range (or table) excluding the values of a (or another) table

A range, 1, 2, 3, 4, 5, 6, 7, 8 (it can populate a Lua table if it makes it easier)
table = {1, 4, 3}
The possible random choice should be among 2, 5, 6, 7, 8.
In Python I have used this to get it:
possibleChoices = random.choice([i for i in range(9) if i not in table])
Any ideas how to achieve the same in Lua?
Lua has a very minimal library, so you will have to write your own functions to do some tasks that are automatically provided in many other languages.
A good way to go about this is to write small functions that solve part of your problem, and to incorporate those into a final solution. Here it would be nice to have a range of numbers, with certain of those numbers excluded, from which to randomly draw a number. A range can be obtained by using a range function:
-- Returns a sequence containing the range [a, b].
function range (a, b)
local r = {}
for i = a, b do
r[#r + 1] = i
end
return r
end
To get a sequence with some numbers excluded, a seq_diff function can be written; this version makes use of a member function:
-- Returns true if x is a value in the table t.
function member (x, t)
for k, v in pairs(t) do
if v == x then
return true
end
end
return false
end
-- Returns the sequence u - v.
function seq_diff (u, v)
local result = {}
for _, x in ipairs(u) do
if not member(x, v) then
result[#result + 1] = x
end
end
return result
end
Finally these smaller functions can be combined into a solution:
-- Returns a random number from the range [a, b],
-- excluding numbers in the sequence seq.
function random_from_diff_range (a, b, seq)
local selections = seq_diff(range(a, b), seq)
return selections[math.random(#selections)]
end
Sample interaction:
> for i = 1, 20 do
>> print(random_from_diff_range(1, 8, {1, 4, 3}))
>> end
8
6
8
5
5
8
6
7
8
5
2
5
5
7
2
8
7
2
6
5

How many times a number appears as a leaf node?

Suppose you have an array of n elements
A = {1,2,3,4,5}
total of 5! binary search trees are possible(not necessarily distinct) now my question is in how many of trees 1 appeared as leaf node and in how many 2 appeared as leaf node and so on ?
What I have tried:
I've seen for A = {1,2,3}
2 appears 6/3 = 2 times
1 appears 2+1 = 3 times
3 appears 2+1 = 3 times
can i generalise that and say that,
if A= {1,2,3,4}
2 = 24/4 = 6 times
3 = 24/4 = 6 times
1 = 6+1 = 7 times
4 = 6+1 = 7 times
We can generalize, but not in that way.
You can try to permute the array and produce all possible BST's. A brute-force approach, that returns answer in a map/dictionary data structure shouldn't be that hard. First write a function that given one of permuted arrays, finds all leaves. It takes first element as root, sends all elements less than root to left, all greater ones to right, and calls this function recursively for both of them. It then just returns after combining those values.
In the end, combine values for all possible permutations.
A possible approach in python:
from itertools import permutations
def func(arr):
if not arr: return {}
if len(arr)==1: return {arr[0]}
ans = set()
left = func([v for v in arr[1:] if v<arr[0]])
right = func([v for v in arr[1:] if v>=arr[0]])
ans.update(left)
ans.update(right)
return ans
arr = [1,2,3,4]
ans = {i:0 for i in arr}
for a in permutations(arr):
dic = func(a)
print(a,":",dic)
for k in dic:
ans[k]+=1
print(ans)
for [1,2,3] it outputs:
(1, 2, 3) : {3}
(1, 3, 2) : {2}
(2, 1, 3) : {1, 3}
(2, 3, 1) : {1, 3}
(3, 1, 2) : {2}
(3, 2, 1) : {1}
{1: 3, 2: 2, 3: 3}
for [1,2,3,4], only the last line i.e answer is:
{1: 12, 2: 8, 3: 8, 4: 12}
for [1,2,3,4,5], it is :
{1: 60, 2: 40, 3: 40, 4: 40, 5: 60}
Can you see the pattern? well, one last example. For up to 6 it is:
{1: 360, 2: 240, 3: 240, 4: 240, 5: 240, 6: 360}

Find objects with the most correspondences to a reference object

Reference object: { 1, 5, 6, 9, 10, 11 }
Other objects:
A { 2, 4, 5, 6, 8, 10, 11 }
B { 5, 7, 9, 10 }
C { 2, 5, 6, 7, 9, 12 }
D { 1, 3, 4, 5, 6, 8, 9, 10 }
E { 6, 8 }
F { 1, 2, 3, 4, 7, 8, 9, 13, 15 }
... { ... }
Difficulty: It should be faster than O(n*m)
Result should be:
Array
(
[D] => 5
[A] => 4
[C] => 3
[B] => 3
[F] => 2
[E] => 1
)
Slow solution:
ref = array(1, 5, 6, 9, 10, 11);
foreach (A, B, C, D,.. AS row)
{
foreach (row AS col)
{
if ( exist(col, ref) )
{
result[row] += 1;
}
}
}
sort (result)
.. this is a solution, but its far to slow.
Is there another way like patter recognition, hopefully in O(log n)?
It is possible to save each object in an other notation, like for example:
ref = "15691011"
A = "2456811"
But I don't know if this helps.
If you have all data in your objects sorted, you can do this routine faster, by comparing not single values in the row, but whole row step by step.
foreach (A, B, C, D,.. AS row)
{
for (i = 0, j = 0; i < row.length && j < ref.length)
{
if (row[i] < ref[j]) i++;
elseif (row[i] > ref[j]) j++;
else {
result[row] += 1;
i++; j++;
}
}
}
In this case you pass you reference only once for each row, but this algorithm need all your data to be already sorted.
You could start with the largest sequence (it has the largest change to have many references).
When you find - for example - 4 refs, you can safely skip all sequences with less then 4 elements.
Another early exit is to abort checking a sequence, when the current sequence cannot surpass the current max. for example: Your current max is 6 elements. You are processing a list of size 7, but the first two elements are no reference. The highest reachable for this list is 5, which is lower than 6, abort the sequence.
Problem in both cases is that you can not construct a complete array of results.
Assumptions:
There are m lists apart from the reference object.
The lists are sorted initially.
There are no repetition of elements in any array.
Scan all the arrays and find out the maximum element in all the lists. You only need to check the last element in each list. Call it MAX.
For each of the m + 1 lists, make a corresponding Boolean array with MAX elements and initialize their values to zero.
Scan all the arrays and make the corresponding indices of arrays 1.
For example, the corresponding array for the example reference object { 1, 5, 6, 9, 10, 11 } shall look like:
{1,0,0,0,1,1,0,0,1,1,1,0,0,...}
Now for every pair-wise combination, you can just check the corresponding indices and increment the count if both are 1.
The above algorithm can be done in linear time complexity with regards to the total number of elements in the data.
You should use other techniques used in search engines. For each number, you have a list of object contained this number in sorted order. In your case
1 -> {D, F}
5 -> {A, B, C, D}
6 -> {A, C, D, E}
9 -> {B, C, D, F}
10 -> {A, B, D}
11 -> {A}
Merging this list you can count how your object is similar to objects in list
A -> 4
B -> 3
C -> 2
D -> 5
E -> 1
F -> 2
After sorting, you get needed result. If you need only top k elements, you should use a priority queue.

Resources