How many times a number appears as a leaf node? - algorithm

Suppose you have an array of n elements
A = {1,2,3,4,5}
total of 5! binary search trees are possible(not necessarily distinct) now my question is in how many of trees 1 appeared as leaf node and in how many 2 appeared as leaf node and so on ?
What I have tried:
I've seen for A = {1,2,3}
2 appears 6/3 = 2 times
1 appears 2+1 = 3 times
3 appears 2+1 = 3 times
can i generalise that and say that,
if A= {1,2,3,4}
2 = 24/4 = 6 times
3 = 24/4 = 6 times
1 = 6+1 = 7 times
4 = 6+1 = 7 times

We can generalize, but not in that way.
You can try to permute the array and produce all possible BST's. A brute-force approach, that returns answer in a map/dictionary data structure shouldn't be that hard. First write a function that given one of permuted arrays, finds all leaves. It takes first element as root, sends all elements less than root to left, all greater ones to right, and calls this function recursively for both of them. It then just returns after combining those values.
In the end, combine values for all possible permutations.
A possible approach in python:
from itertools import permutations
def func(arr):
if not arr: return {}
if len(arr)==1: return {arr[0]}
ans = set()
left = func([v for v in arr[1:] if v<arr[0]])
right = func([v for v in arr[1:] if v>=arr[0]])
ans.update(left)
ans.update(right)
return ans
arr = [1,2,3,4]
ans = {i:0 for i in arr}
for a in permutations(arr):
dic = func(a)
print(a,":",dic)
for k in dic:
ans[k]+=1
print(ans)
for [1,2,3] it outputs:
(1, 2, 3) : {3}
(1, 3, 2) : {2}
(2, 1, 3) : {1, 3}
(2, 3, 1) : {1, 3}
(3, 1, 2) : {2}
(3, 2, 1) : {1}
{1: 3, 2: 2, 3: 3}
for [1,2,3,4], only the last line i.e answer is:
{1: 12, 2: 8, 3: 8, 4: 12}
for [1,2,3,4,5], it is :
{1: 60, 2: 40, 3: 40, 4: 40, 5: 60}
Can you see the pattern? well, one last example. For up to 6 it is:
{1: 360, 2: 240, 3: 240, 4: 240, 5: 240, 6: 360}

Related

Detect outlier in repeating sequence

I have a repeating sequence of say 0~9 (but may start and stop at any of these numbers). e.g.:
3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2
And it has outliers at random location, including 1st and last one, e.g.:
9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6
I need to find & correct the outliers, in the above example, I need correct the first "9" into "3", and "8" into "5", etc..
What I came up with is to construct a sequence with no outlier of desired length, but since I don't know which number the sequence starts with, I'd have to construct 10 sequences each starting from "0", "1", "2" ... "9". And then I can compare these 10 sequences with the given sequence and find the one sequence that match the given sequence the most. However this is very inefficient when the repeating pattern gets large (say if the repeating pattern is 0~99, I'd need to create 100 sequences to compare).
Assuming there won't be consecutive outliers, is there a way to find & correct these outliers efficiently?
edit: added some explanation and added the algorithm tag. Hopefully it is more appropriate now.
I'm going to propose a variation of #trincot's fine answer. Like that one, it doesn't care how many outliers there may be in a row, but unlike that one doesn't care either about how many in a row aren't outliers.
The base idea is just to let each sequence element "vote" on what the first sequence element "should be". Whichever gets the most votes wins. By construction, this maximizes the number of elements left unchanged: after the 1-liner loop ends, votes[i] is the number of elements left unchanged if i is picked as the starting point.
def correct(numbers, mod=None):
# this part copied from #trincot's program
if mod is None: # if argument is not provided:
# Make a guess what the range is of the values
mod = max(numbers) + 1
votes = [0] * mod
for i, x in enumerate(numbers):
# which initial number would make x correct?
votes[(x - i) % mod] += 1
winning_count = max(votes)
winning_numbers = [i for i, v in enumerate(votes)
if v == winning_count]
if len(winning_numbers) > 1:
raise ValueError("ambiguous!", winning_numbers)
winning_number = winning_numbers[0]
for i in range(len(numbers)):
numbers[i] = (winning_number + i) % mod
return numbers
Then, e.g.,
>>> correct([9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6])
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
but
>>> correct([1, 5, 3, 7, 5, 9])
...
ValueError: ('ambiguous!', [1, 4])
That is, it's impossible to guess whether you want [1, 2, 3, 4, 5, 6] or [4, 5, 6, 7, 8, 9]. They both have 3 numbers "right", and despite that there are never two adjacent outliers in either case.
I would do a first scan of the list to find the longest sublist in the input that maintains the right order. We will then assume that those values are all correct, and calculate backwards what the first value would have to be to produce those values in that sublist.
Here is how that would look in Python:
def correct(numbers, mod=None):
if mod is None: # if argument is not provided:
# Make a guess what the range is of the values
mod = max(numbers) + 1
# Find the longest slice in the list that maintains order
start = 0
longeststart = 0
longest = 1
expected = -1
for last in range(len(numbers)):
if numbers[last] != expected:
start = last
elif last - start >= longest:
longest = last - start + 1
longeststart = start
expected = (numbers[last] + 1) % mod
# Get from that longest slice what the starting value should be
val = (numbers[longeststart] - longeststart) % mod
# Repopulate the list starting from that value
for i in range(len(numbers)):
numbers[i] = val
val = (val + 1) % mod
# demo use
numbers = [9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6]
correct(numbers, 10) # for 0..9 provide 10 as argument, ...etc
print(numbers)
The advantage of this method is that it would even give a good result if there were errors with two consecutive values, provided that there are enough correct values in the list of course.
Still this runs in linear time.
Here is another way using groupby and count from Python's itertools module:
from itertools import count, groupby
def correct(lst):
groupped = [list(v) for _, v in groupby(lst, lambda a, b=count(): a - next(b))]
# Check if all groups are singletons
if all(len(k) == 1 for k in groupped):
raise ValueError('All groups are singletons!')
for k, v in zip(groupped, groupped[1:]):
if len(k) < 2:
out = v[0] - 1
if out >= 0:
yield out
else:
yield from k
else:
yield from k
# check last element of the groupped list
if len(v) < 2:
yield k[-1] + 1
else:
yield from v
lst = "9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6"
lst = [int(k) for k in lst.split(',')]
out = list(correct(lst))
print(out)
Output:
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
Edit:
For the case of [1, 5, 3, 7, 5, 9] this solution will return something not accurate, because i can't see which value you want to modify. This is why the best solution is to check & raise a ValueError if all groups are singletons.
Like this?
numbers = [9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6]
i = 0
for n in numbers[:-1]:
i += 1
if n > numbers[i] and n > 0:
numbers[i-1] = numbers[i]-1
elif n > numbers[i] and n == 0:
numbers[i - 1] = 9
n = numbers[-1]
if n > numbers[0] and n > 0:
numbers[-1] = numbers[0] - 1
elif n > numbers[0] and n == 0:
numbers[-1] = 9
print(numbers)

Computing number of sequences

I saw the following problem that I was unable to solve. What kind of algorithm will solve it?
We have been given a positive integer n. Let A be the set of all possible strings of length n where characters are from the set {1,2,3,4,5,6}, i.e. the results of dice thrown n times. How many elements of A contains at least one of the following strings as a substring:
1, 2, 3, 4, 5, 6
1, 1, 2, 2, 3, 3
4, 4, 5, 5, 6, 6
1, 1, 1, 2, 2, 2
3, 3, 3, 4, 4, 4
5, 5, 5, 6, 6, 6
1, 1, 1, 1, 1, 1
2, 2, 2, 2, 2, 2
3, 3, 3, 3, 3, 3
4, 4, 4, 4, 4, 4
5, 5, 5, 5, 5, 5
6, 6, 6, 6, 6, 6
I was wondering some kind of recursive approach but I got only mess when I tried to solve the problem.
I suggest reading up on the Aho-Corasick algorithm. This constructs a finite state machine based on a set of strings. (If your list of strings is fixed, you could even do this by hand.)
Once you have a finite state machine (with around 70 states), you should add an extra absorbing state to mark when any of the strings has been detected.
Now you problem is reduced to finding how many of the 6**n strings end up in the absorbing state after being pushed through the state machine.
You can do this by expressing the state machine as a matrix . Entry M[i,j] tells the number of ways of getting to state i from state j when one letter is added.
Finally you compute the matrix raised to the power n applied to an input vector that is all zeros except for a 1 in the position corresponding to the initial state. The number in the absorbing state position will tell you the total number of strings.
(You can use the standard matrix exponentiation algorithm to generate this answer in O(logn) time.)
What's wrong with your recursive approach, can you elaborate on that, anyway this can be solved using a recursive approach in O(6^n), but can be optimized using dp, using the fact that you only need to track the last 6 elements, so it can be done in O ( 6 * 2^6 * n) with dp.
rec (String cur, int step) {
if(step == n) return 0;
int ans = 0;
for(char c in { '1', '2', '3', '4', '5', '6' } {
if(cur.length < 6) cur += c
else {
shift(cur,1) // shift the string to the left by 1 step
cur[5] = c // add the new element to the end of the string
}
if(cur in list) ans += 1 + rec(cur, step+1) // list described in the question
else ans += rec(cur, step+1)
}
return ans;
}

Algorithm to generate Diagonal Latin Square matrix

I need for given N create N*N matrix which does not have repetitions in rows, cells, minor and major diagonals and values are 1, 2 , 3, ...., N.
For N = 4 one of matrices is the following:
1 2 3 4
3 4 1 2
4 3 2 1
2 1 4 3
Problem overview
The math structure you described is Diagonal Latin Square. Constructing them is the more mathematical problem than the algorithmic or programmatic.
To correctly understand what it is and how to create you should read following articles:
Latin squares definition
Magic squares definition
Diagonal Latin square construction <-- p.2 is answer to your question with proof and with other interesting properties
Short answer
One of the possible ways to construct Diagonal Latin Square:
Let N is the power of required matrix L.
If there are exist numbers A and B from range [0; N-1] which satisfy properties:
A relativly prime to N
B relatively prime to N
(A + B) relatively prime to N
(A - B) relatively prime to N
Then you can create required matrix with the following rule:
L[i][j] = (A * i + B * j) mod N
It would be nice to do this mathematically, but I'll propose the simplest algorithm that I can think of - brute force.
At a high level
we can represent a matrix as an array of arrays
for a given N, construct S a set of arrays, which contains every combination of [1..N]. There will be N! of these.
using an recursive & iterative selection process (e.g. a search tree), search through all orders of these arrays until one of the 'uniqueness' rules is broken
For example, in your N = 4 problem, I'd construct
S = [
[1,2,3,4], [1,2,4,3]
[1,3,2,4], [1,3,4,2]
[1,4,2,3], [1,4,3,2]
[2,1,3,4], [2,1,4,3]
[2,3,1,4], [2,3,4,1]
[2,4,1,3], [2,4,3,1]
[3,1,2,4], [3,1,4,2]
// etc
]
R = new int[4][4]
Then the algorithm is something like
If R is 'full', you're done
Evaluate does the next row from S fit into R,
if yes, insert it into R, reset the iterator on S, and go to 1.
if no, increment the iterator on S
If there are more rows to check in S, go to 2.
Else you've iterated across S and none of the rows fit, so remove the most recent row added to R and go to 1. In other words, explore another branch.
To improve the efficiency of this algorithm, implement a better data structure. Rather than a flat array of all combinations, use a prefix tree / Trie of some sort to both reduce the storage size of the 'options' and reduce the search area within each iteration.
Here's a method which is fast for N <= 9 : (python)
import random
def generate(n):
a = [[0] * n for _ in range(n)]
def rec(i, j):
if i == n - 1 and j == n:
return True
if j == n:
return rec(i + 1, 0)
candidate = set(range(1, n + 1))
for k in range(i):
candidate.discard(a[k][j])
for k in range(j):
candidate.discard(a[i][k])
if i == j:
for k in range(i):
candidate.discard(a[k][k])
if i + j == n - 1:
for k in range(i):
candidate.discard(a[k][n - 1 - k])
candidate_list = list(candidate)
random.shuffle(candidate_list)
for e in candidate_list:
a[i][j] = e
if rec(i, j + 1):
return True
a[i][j] = 0
return False
rec(0, 0)
return a
for row in generate(9):
print(row)
Output:
[8, 5, 4, 7, 1, 6, 2, 9, 3]
[2, 7, 5, 8, 4, 1, 3, 6, 9]
[9, 1, 2, 3, 6, 4, 8, 7, 5]
[3, 9, 7, 6, 2, 5, 1, 4, 8]
[5, 8, 3, 1, 9, 7, 6, 2, 4]
[4, 6, 9, 2, 8, 3, 5, 1, 7]
[6, 3, 1, 5, 7, 9, 4, 8, 2]
[1, 4, 8, 9, 3, 2, 7, 5, 6]
[7, 2, 6, 4, 5, 8, 9, 3, 1]

Pseudocode to find the longest run within an array

I know that A run is a sequence of adjacent repeated values , How would you write pseudo code for computing the length of the longest run in an array e.g.
5 would be the longest run in this array of integers.
1 2 4 4 3 1 2 4 3 5 5 5 5 3 6 5 5 6 3 1
Any idea would be helpful.
def longest_run(array):
result = None
prev = None
size = 0
max_size = 0
for element in array:
if (element == prev):
size += 1
if size > max_size:
result = element
max_size = size
else:
size = 0
prev = element
return result
EDIT
Wow. Just wow! This pseudocode is actually working:
>>> longest_run([1,2,4,4,3,1,2,4,3,5,5,5,5,3,6,5,5,6,3,1])
5
max_run_length = 0;
current_run_length = 0;
loop through the array storing the current index value, and the previous index's value
if the value is the same as the previous one, current_run_length++;
otherwise {
if current_run_length > max_run_length : max_run_length = current_run_length
current_run_length = 1;
}
Here a different functional approach in Python (Python looks like Pseudocode). This code works only with Python 3.3+. Otherwise you must replace "return" with "raise StopIteration".
I'm using a generator to yield a tuple with quantity of the element and the element itself. It's more universal. You can use this also for infinite sequences. If you want to get the longest repeated element from the sequence, it must be a finite sequence.
def group_same(iterable):
iterator = iter(iterable)
last = next(iterator)
counter = 1
while True:
try:
element = next(iterator)
if element is last:
counter += 1
continue
else:
yield (counter, last)
counter = 1
last = element
except StopIteration:
yield (counter, last)
return
If you have a list like this:
li = [0, 0, 2, 1, 1, 1, 1, 1, 5, 5, 6, 7, 7, 7, 12, 'Text', 'Text', 'Text2']
Then you can make a new list of it:
list(group_same(li))
Then you'll get a new list:
[(2, 0),
(1, 2),
(5, 1),
(2, 5),
(1, 6),
(3, 7),
(1, 12),
(2, 'Text'),
(1, 'Text2')]
To get longest repeated element, you can use the max function.
gen = group_same(li) # Generator, does nothing until iterating over it
grouped_elements = list(gen) # iterate over the generator until it's exhausted
longest = max(grouped_elements, key=lambda x: x[0])
Or as a one liner:
max(list(group_same(li)), key=lambda x: x[0])
The function max gives us the biggest element in a list. In this case, the list has more than one element. The argument key is just used to get the first element of the tuple as max value, but you'll still get back the tuple.
In : max(list(group_same(li)), key=lambda x: x[0])
Out: (5, 1)
The element 1 occurred 5 times repeatedly.
int main()
{
int a[20] = {1, 2, 4, 4, 3, 1, 2, 4, 3, 5, 5, 5, 5, 3, 6, 5, 5, 6, 3, 1};
int c=0;
for (int i=0;i<19;i++)
{
if (a[i] == a[i+1])
{
if (i != (i+1))
{
c++;
}
}
}
cout << c-1;
return 0;
}

Mutation for pipeline network optimization

I'm working on pipeline network optimization, and I'm representing the chromosomes as a string of numbers as following
example
chromosome [1] = 3 4 7 2 8 9 6 5
where, each number refers to well and the distance between wells are defined. since, the wells cannot be duplicated for one chromosome. for example
chromosome [1]' = 3 4 7 2 7 9 6 5 (not acceptable)
what is the best mutation that can deal with a representation like that? thanks in advance.
Can't say "best" but one model that I've used for graph-like problems is: For each node (well number), calculate the set of adjacent nodes / wells from the entire population. e.g.,
population = [[1,2,3,4], [1,2,3,5], [1,2,3,6], [1,2,6,5], [1,2,6,7]]
adjacencies = {
1 : [2] , #In the entire population, 1 is always only near 2
2 : [1, 3, 6] , #2 is adjacent to 1, 3, and 6 in various individuals
3 : [2, 4, 5, 6], #...etc...
4 : [3] ,
5 : [3, 6] ,
6 : [3, 2, 5, 7],
7 : [6]
}
choose_from_subset = [1,2,3,4,5,6,7] #At first, entire population
Then create a new individual / network by:
choose_next_individual(adjacencies, choose_from_subset) :
Sort adjacencies by the size of their associated sets
From the choices in choose_from_subset, choose the well with the highest number of adjacent possibilities (e.g., either 3 or 6, both of which have 4 possibilities)
If there is a tie (as there is with 3 and 6), choose among them randomly (let's say "3")
Place the chosen well as the next element of the individual / network ([3])
fewerAdjacencies = Remove the chosen well from the set of adjacencies (see below)
new_choose_from_subset = adjacencies to your just-chosen well (i.e., 3 : [2,4,5,6])
Recurse -- choose_next_individual(fewerAdjacencies, new_choose_from_subset)
The idea is that nodes with high numbers of adjacencies are ripe for recombination (since the population hasn't converged on, e.g., 1->2), a lower "adjacency count" (but non-zero) implies convergence, and a zero adjacency count is (basically) a mutation.
Just to show a sample run ..
#Recurse: After removing "3" from the population
new_graph = [3]
new_choose_from_subset = [2,4,5,6] #from 3 : [2,4,5,6]
adjacencies = {
1: [2]
2: [1, 6] ,
4: [] ,
5: [6] ,
6: [2, 5, 7] ,
7: [6]
}
#Recurse: "6" has most adjacencies in new_choose_from_subset, so choose and remove
new_graph = [3, 6]
new_choose_from_subset = [2, 5,7]
adjacencies = {
1: [2]
2: [1] ,
4: [] ,
5: [] ,
7: []
}
#Recurse: Amongst [2,5,7], 2 has the most adjacencies
new_graph = [3, 6, 2]
new_choose_from_subset = [1]
adjacencies = {
1: []
4: [] ,
5: [] ,
7: []
]
#new_choose_from_subset contains only 1, so that's your next...
new_graph = [3,6,2,1]
new_choose_from_subset = []
adjacencies = {
4: [] ,
5: [] ,
7: []
]
#From here on out, you'd be choosing randomly between the rest, so you might end up with:
new_graph = [3, 6, 2, 1, 5, 7, 4]
Sanity-check? 3->6 occurs 1x in original, 6->2 appears 2x, 2->1 appears 5x, 1->5 appears 0, 5->7 appears 0, 7->4 appears 0. So you've preserved the most-common adjacency (2->1) and two other "perhaps significant" adjacencies. Otherwise, you're trying out new adjacencies in the solution space.
UPDATE: Originally I'd forgotten the critical point that when recursing, you choose the most-connected to the just-chosen node. That's critical to preserving high-fitness chains! I've updated the description.

Resources