Algorithm - Check if there is a single type with the given word - algorithm

Given a dictionary list and a input word, return true if the input word has a single typo with the same length as the vocabulary in the dictionary.
dictionary = ["apple", "testing", "computer"];
singleType(dictionary, "adple") // true
singleType(dictionary, "addle") // false
singleType(dictionary, "apple") // false
singleType(dictionary, "apples") // false
I proposed a solution that runs in linear time, if we ignore the pre-process time needed for the hashmap.
O(k*26) => O(k), where k = length of the input word
My linear solution goes like, convert the dictionary list into a hash-map, where the key is the word and the value is a boolean, then loop through every character in the input word, and replace every character with 1 of the 26 alphabet and check if it maps to the hash-map.
But they say I could do better than O(k*26), but how?

You could extend the dictionary with all the variants of the word containing a single typo, but instead of the actual typo, you just put some "wildcard" character like ? or * in that place. Then, you can check whether (a) the word is not in the set of correctly spelled words, and (b) replacing any of the letters in the word with the same wildcard symbol, the word can be found in the set of words having one typo.
Example in Python:
>>> dictionary = ["apple", "testing", "computer"]
>>> wildcard = lambda w: [w[:i]+"?"+w[i+1:] for i in range(len(w))]
>>> onetypo = {x for w in dictionary for x in wildcard(w)}
>>> correct = {w for w in dictionary}
>>> word = "apxle"
>>> word not in correct and any(w in onetypo for w in wildcard(word))
True
This reduces the complexity of lookup to just O(k), i.e. still linear in the number of letters, but without the high constant factor. It does, however, greatly blow up the dictionary by a factor equal to the average number of letters in the words.

For a single lookup, I would filter the dictionary by word length, and then iterate the words, counting the errors, and bail out of each word, as soon as the error count is > 1.
val dictionary = List ("affen", "ample", "apple", "appse", "ipple", "appl", "pple", "mapple", "apples")
#annotation.tailrec
def oneError (w1: String, w2:String, err: Int) : Boolean = w1.length match {
case 0 => err == 1
case _ => if (err > 1) false else {
if (w1(0) == w2(0)) oneError (w1.substring (1), w2.substring (1), err) else
oneError (w1.substring (1), w2.substring (1), err + 1)
}
}
scala> dictionary.filter (_.length == 5).filter (s => oneError ("appxe", s, 0))
res5: List[String] = List(apple, appse)
For processing a longer text, I would preprocess the dictionary and split it into Maps (word.length -> List (words)).
For natural language, which is highly redundant, I would build a Set of unique words from the text, to lookup every word just once.
For a single word lookup, the worst case is n calls to the initial function, with n=max (dictionary.groupBy (w.length)).
Each word lookup (of words longer 1) will take at least 2 steps until failure, but most words, supposed no pathological input and dictionary, are only visited for 2 steps. From the remaining ones, most are excluded after 3 steps and so on.
Here is a version, which shows how deep it looks:
def oneError (word: String) : Array[String] = {
#tailrec
def oneError (w1: String, w2:String, steps: Int, err: Int) : Boolean = w1.length match {
case 0 => {print (s"($steps) "); err == 1}
case _ => if (err > 1) {print (s"$steps "); false } else {
if (w1(0) == w2(0)) oneError (w1.substring (1), w2.substring (1), steps +1, err) else
oneError (w1.substring (1), w2.substring (1), steps + 1, err + 1)
}
}
val d = dict (word.length)
println (s"Info: ${d.length} words of same length")
d.filter (entry => oneError (word, entry, 0, 0))
}
Sample output, redacted:
scala> oneError ("fuck")
Info: 3352 words of same length
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 (4) 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 (4) (4) 3 3 3 3
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 3 3 (4) (4) 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 (4) 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) 3 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 (4) (4) 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 (4) 3 3 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 (4) 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
res53: Array[String] = Array(Buck, Huck, Puck, buck, duck, funk, luck, muck, puck, suck, tuck, yuck)

It sounds like you are looking for the edit distance of 1 of your pattern with respect to the dictionary entry. For example, an edit distance of 1 would result if the pattern is "adple" and your dictionary entry is "apple". You have an additional constraint that the pattern is the same length as the dictionary entry, but this is easy to implement.

Related

Find unrelated partitions of a complete binary tree

I have a complete binary tree of height 'h'.
How do I find 'h' number of unrelated partitions for this ?
NOTE:
Unrelated partition means no child can be present with its immediate parent.
There is a constraint on the number of nodes in each partition.
The difference of the maximum number nodes in a partition and the minimum number of nodes in the partition can either be 0 or 1.
Also, root is excluded from including in the partitions.
Who devised the problem probably had a more elegant solution in mind, but the following works.
Let's say we have h partitions numbered 1 to h, and that the nodes of partition n have value n. The root node has value 0, and does not participate in the partitions. Let's call a partition even if nis even, and odd if n is odd. Let's also number the levels of the complete binary tree, ignoring the root and starting from level 1 with 2 nodes. Level n has 2n nodes, and the complete tree has 2h+1-1 nodes, but only P=2h+1-2 nodes belong to the partitions (because the root is excluded). Each partition consists of p=⌊P/h⌋ or p=⌈P/h⌉ nodes, such that ∑ᵢpᵢ=P.
If the height h of the tree is even, put all even partitions into the even levels of the left subtree and the odd levels of the right subtee, and put all odd partitions into the odd levels of the left subtree and the even levels of the right subtree.
If h is odd, distribute all partitions up to partition h-1 like in the even case, but distribute partition h evenly into the last level of the left and right subtrees.
This is the result for h up to 7 (I wrote a tiny Python library to print binary trees to the terminal in a compact way for this purpose):
0
1 1
0
1 2
2 2 1 1
0
1 2
2 2 1 1
1 1 3 3 2 2 3 3
0
1 2
2 2 1 1
1 1 1 1 2 2 2 2
2 4 4 4 4 4 4 4 1 3 3 3 3 3 3 3
0
1 2
2 2 1 1
1 1 1 1 2 2 2 2
2 2 2 2 2 2 4 4 1 1 1 1 1 1 3 3
3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5
0
1 2
2 2 1 1
1 1 1 1 2 2 2 2
2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1
1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 3 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
0
1 2
2 2 1 1
1 1 1 1 2 2 2 2
2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
And this is the code that generates it:
from basicbintree import Node
for h in range(1, 7 + 1):
root = Node(0)
P = 2 ** (h + 1) - 2 # nodes in partitions
p = P // h # partition size (may be p or p + 1)
if h & 1: # odd height
t = (p + 1) // 2 # subtree tail nodes from split partition
n = (h - 1) // 2 # odd or even partitions in subtrees except tail
else: # even height
t = 0 # no subtree tail nodes from split partition
n = h // 2 # odd or even partitions in subtrees
s = P // 2 - t # subtree nodes excluding tail
r = s - n * p # partitions of size p + 1 in subtrees
x = [p + 1] * r + [p] * (n - r) # nodes indexed by subtree partition - 1
odd = [1 + 2 * i for i, c in enumerate(x) for _ in range(c)] + [h] * t
even = [2 + 2 * i for i, c in enumerate(x) for _ in range(c)] + [h] * t
for g in range(1, h + 1):
start = 2 ** (g - 1) - 1
stop = 2 ** g - 1
if g & 1: # odd level
root.set_level(odd[start:stop] + even[start:stop])
else: # even level
root.set_level(even[start:stop] + odd[start:stop])
print('```none')
root.print_tree()
print('```')
All trees produced up to height 27 have been programmatically confirmed to meet the specifications.
Some parts of the algorithm would need a proof, like, e.g., that it's always possible to choose an even size for the split partition in the odd height case, but this and other proofs are left as an exercise to the reader ;-)

Maximum k such that A[0]<A[k], A[1]<A[k+1], ..., A[k-1]<A[2*k-1], after sorting each k-sized window

I need the efficient algorithm for this problem (time comlexity less than O(n^2)), please help me:
a[i..j] is called a[i..j] < b[i..j] if a[i]<b[i], a[i+1]<b[i+1], ..., a[j]<b[j] after sorting these 2 arrays.
Given array A[1..n], (n<= 10^5, a[i]<= 1000). Find the maximum of k that A[1..k] < A[k+1..2k]
For example, n=10: 2 2 1 4 3 2 5 4 2 3
the answer is 4
Easily to see that k <= n/2. So we can use brute-forces (k from n/2 to 1), but not binary search.
And I don't know what to do with a[i] <= 1000. Maybe using map???
Use a Fenwick tree with range updates. Each index in the tree represents the count of how many numbers in window A are smaller than it. For the windows to be valid, each element in B (the window on the right) must have a partner in A (the window on the left). When we shift a number x into A, we add 1 to the range, [x+1, 1000] in the tree. For the element shifted from B to A, add 1 in its tree index. For each new element in B, add -1 to its index in the tree. If an index drops below zero, the window is invalid.
For the example, we have:
2 2 1 4 3 2 5 4 2 3
2 2
|
Tree:
add 1 to [3, 1000]
add -1 to 2
idx 1 2 3 4 5
val 0 -1 1 1 1 (invalid)
2 2 1 4 3 2 5 4 2 3
2 2 1 4
|
Tree:
add 1 to [3, 1000]
add 1 to 2 (remove 2 from B)
add -1 to 1
add -1 to 4
idx 1 2 3 4 5
val -1 0 2 1 2 (invalid)
2 2 1 4 3 2 5 4 2 3
2 2 1 4 3 2
|
Tree:
add 1 to [2, 1000]
add 1 to 1 (remove 1 from B)
add -1 to 3
add -1 to 2
idx 1 2 3 4 5
val 0 0 2 2 3 (valid)
2 2 1 4 3 2 5 4 2 3
2 2 1 4 3 2 5 4
|
Tree:
add 1 to [5, 1000]
add 1 to 4 (remove 4 from B)
add -1 to 5
add -1 to 4
idx 1 2 3 4 5
val 0 0 2 2 3 (valid)
2 2 1 4 3 2 5 4 2 3
2 2 1 4 3 2 5 4 2 3
|
Tree:
add 1 to [4, 1000]
add 1 to 3 (remove 3 from B)
add -1 to 2
add -1 to 3
idx 1 2 3 4 5
val 0 -1 2 3 4 (invalid)

MATLAB: How to re-label a matrix based on another matrix?

Say I have matrix A:
A = [1 1 1 2 2 3 3 3;
1 1 1 2 2 3 3 3;
1 1 1 2 2 4 4 5;
2 2 2 2 2 5 5 5]
and matrix B with the same labels, just in different positions and not always with the same elements in each cluster:
B = [3 3 3 3 5 1 1 1:
3 3 3 3 5 1 1 1;
3 3 3 3 5 2 2 4:
5 5 5 5 5 4 4 4]
and I want matrix C to look like this
C = [1 1 1 1 2 3 3 3;
1 1 1 1 2 3 3 3;
1 1 1 1 2 4 4 5;
2 2 2 2 2 5 5 5]
Basically, I want the clusters in B that have a similar position to A to also have the same label as A, even if the clusters in B don't have the same exact amount of elements as the clusters in A. This is just a basic example because what I'm really working on are two images that have different labellings.
example of the image I'm working on

Tree searching algorithm: how to determine quickly if A has a sure-to-win strategy

The original question goes like this: There are 99 stones, A and B are playing a game that, each one take some stones in turn, and each turn one can only take 1, 2, 4, or 6 stones, the one take the last stone wins. If A is the first one to take stones, how many stones shall A take in the first turn?
This seems a quite complex tree searching quiz, listing out all the branches, then work it bottom up: the leaf with A taking the last stone is marked as "win"; for the intermediate node that whatever strategies B might take, if A always has a way to reach a node marked as "win", this node is also marked as "win".
But this approach is quite time consuming. Is there any smart algorithm to check out if A has a "guaranteed to win" strategy?
O(n) solution
If we start with 1, 2, 4 or 6 stones, A will always win, because he'll just take them all in the first move.
If we start with 3, A will lose no matter what he does, because regardless of whether he takes 1 or 2, B will take 2 or 1 next and win.
If we start with 5, A will win by taking 2 first, thus sending B to the case above, where he starts with 3 stones.
If we start with 7, A will win by taking 4, sending B to the same case with 3.
If we start with 8, A will lose no matter what he does: whatever he takes, he will send B to a winning position.
If we start with 9, A can take 1 and send B to the situation with 8, causing him to lose.
If we start with 10, A can take 2 and send B to the situation with 8 again, causing him to lose.
By now, it should become quite obvious how you can incrementally build an O(n) solution: let win[i] = true if i stones are winnable for the first person to move
We have:
win[1] = win[2] = win[4] = win[5] = win[6] = true, win[3] = false
win[x > 6] = not (win[x - 6] and win[x - 4] and win[x - 2] and win[x - 1])
For example:
win[7] = not (win[1] and win[3] and win[5] and win[6])
= not (true and false and true and true)
= not false
= true
Compute this up until the number you're interested in and that's it. No trees involved.
O(1) solution
By looking carefully at the above solution, we can derive a simple constant time solution: note that A can only lose if he sends B to a winning position no matter what he does, so if k - 6, k - 4, k - 2, k - 1 are all winning positions.
If you compute win for a few values, the pattern becomes obvious:
win[k] = false if k = 3, 8, 11, 16, 19, 24, 27, 32...
=> win[k] = false iff k mod 8 == 3 or k mod 8 == 0
For 99, 99 mod 8 = 3, so A does not have a sure winning strategy.
OK, so we can see that:
Every turn, number of stones can be taken is less than 7, so the result should be related to modulus 7.
So, for n < 1000, I have printed out the sequence of number of stones that makes the first person win, modulus 7, and it is a truly repeated cycle.
1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5 0 1 3 4 5 6 1 2 4 5 6 0 2 3 5 6 0 1 3 4 6 0 1 2 4 5 0 1 2 3 5 6 1 2 3 4 6 0 2 3 4 5
This cycle has the length is 56, so the problem can be solved in O(1) by finding the result of first 56 numbers.

Find multiple based on a single criterium (arrayfun)

I am trying to recieve all values from a variable (b) when using a criterium based on another variable (a) (it's like the =IF function in excel). like this:
Example:
(a): 1 2 2 2 3 3 3 3
(b): 3 6 3 5 6 4 5 4
my criteria is
(a) = 2
my reply has to be:
(b) = 6 3 5
I tried to find a solution using arrayfun, like this:
arrayfun(#(x) b(find(a == x, 1, 'first')), 2)
obviously, it only answers the 6, the first number that matches the criterium. Can I somehow formulate arrayfun correctly? Or do I need a whole other function?
Thanks!
Don't you just want:
a = [ 1 2 2 2 3 3 3 3]
b = [3 6 3 5 6 4 5 4]
b(a == 2)
ans =
6 3 5
If a was a matrix then:
a = [ 1 2 2 2 3 3 3 3; ...
1 1 1 2 2 3 4 4; ]
b = [3 6 3 5 6 4 5 4]
b(a(1,:)==2)
ans =
6 3 5

Resources