Related
I am new to programming I want to know how many times iteration takes place in the following function.I just want to how the number of iteration depends on a and b
where A and B are binary numbers .For example A=101010 and b=1011010.
The following is algorithm for adding two numbers without using + operator
while B is greater than 0:
U = A XOR B
V = A AND B
A = U
B = V * 2
I'm going to assume the usual convention that values that are non-zero are considered True and that 0 evaluates to False.
Since you didn't specify the initial value of A, let's consider both cases (I'm assuming that B==True at init otherwise there would be no iterations):
CASE 1: A==False => (U==True and V==False) => B=0 => iterations stop
CASE 2:A==True => (U==False and V==True) => B=True, A=False => this is now CASE 1
So, depending on the initial value of A and asuming B==True at the start, there will be either 1 or 2 iterations.
A group contains a set of entities and each entity has a value.
Each entity can be a part of more than one group.
Problem: Find largest N groups where each entity appears no more than once in the result. An entity can be excluded from a group if necessary.
Example:
Entities with values:
A = 2
B = 2
C = 2
D = 3
E = 3
Groups
1: (A,B,C) total value: 2+2+2 = 6
2: (B,D) total value: 2 + 3 = 5
3: (C,E) total value: 2 + 3 = 5
4: (D) total value: 3
5: (E) total value: 3
**Answers**:
Largest 1 group is obviously (A,B,C) with total value 6
Largest 2 groups are (B,D), (C,E) with total value 10
Largest 3 groups are either {(A,B,C),(D),(E)}, {(A,B),(C,E),(D)} or {(A,C), (B,D), (E)} with total value 12
The input data to the algorithm should be:
A set of entities with values
Groups containing one or more of the entities
The amount of groups in the result
If there are multiple answers then finding one of them is sufficient.
I included the example to try to make the problem clear, the amount of entities in practise should be less than about 50, and amount of groups should be less than the amount of entities. The amount of N groups to find will be between 1 and 10.
I am currently solving this problem by generating all possible combinations of N groups, excluding the results that contains duplicate entities and then picking the combination with largest total value. This is of course extremely inefficient but i cant get my head around how to obtain a general result in a more efficient way.
My question is if it's possible to solve this in a more efficient way, and if so, how? Any hints or answers are greatly appreciated.
edit
To be clear, in my solution i generate "fake" groups where duplicate entities are excluded from "real" groups. In the example entities (B, C, D, E) are duplicates (exist in more than one group. Then for group 1 (A,B,C) i add the fake groups (A,B),(A,C),(A) to the list of groups that I generate combinations for.
This problem can be formulated as a linear integer program. Although the integer programming is not super efficient in terms of complexity, it works very quick with this number of variables.
Here is how we turn this problem into an integer program.
Let v be a vector of size K representing the entity values.
Let G be a K x M binary matrix that defines the groups: G(i,j)=1 means that the entity i belongs to the group j and G(i,j)=0 otherwise.
Let x be a binary vector of size M, which represents the choice of groups: x[j]=1 indicates we pick the group j.
Let y be a binary vector of size K, which represents the inclusion of entities: y[i]=1 means that the entity i is included in the outcome.
Our goal is to choose x and y so as to maximize sum(v*y) under the following conditions:
G x >= y ... all included entities must belong to at least one of chosen groups
sum(x) = N ... we choose exactly N groups.
Below is an implementation in R. It uses lpSolve library, an interface to lpsolve.
library(lpSolve)
solver <- function(values, groups, N)
{
n_group <- ncol(groups)
n_entity <- length(values)
object <- c(rep(0, n_group), values)
lhs1 <- cbind(groups, -diag(n_entity))
rhs1 <- rep(0, n_entity)
dir1 <- rep(">=", n_entity)
lhs2 <- matrix(c(rep(1, n_group), rep(0, n_entity)), nrow=1)
rhs2 <- N
dir2 <- "="
lhs <- rbind(lhs1, lhs2)
rhs <- c(rhs1, rhs2)
direc <- c(dir1, dir2)
lp("max", object, lhs, direc, rhs, all.bin=TRUE)
}
values <- c(A=2, B=2, C=2, D=3, E=3)
groups <- matrix(c(1,1,1,0,0,
0,1,0,1,0,
0,0,1,0,1,
0,0,0,1,0,
0,0,0,0,1),
nrow=5, ncol=5)
rownames(groups) <- c("A", "B", "C", "D", "E")
ans <- solver(values, groups, 1)
print(ans)
names(values)[tail(ans$solution, length(values))==1]
# Success: the objective function is 6
# [1] "A" "B" "C"
ans <- solver(values, groups, 2)
print(ans)
names(values)[tail(ans$solution, length(values))==1]
# Success: the objective function is 10
# [1] "B" "C" "D" "E"
ans <- solver(values, groups, 3)
print(ans)
names(values)[tail(ans$solution, length(values))==1]
# Success: the objective function is 12
# [1] "A" "B" "C" "D" "E"
Below is to see how this can work with large problem. It finishes in one second.
# how does it scale?
n_entity <- 50
n_group <- 50
N <- 10
entity_names <- paste("X", 1:n_entity, sep="")
values <- sample(1:10, n_entity, replace=TRUE)
names(values) <- entity_names
groups <- matrix(sample(c(0,1), n_entity*n_group,
replace=TRUE, prob=c(0.99, 0.01)),
nrow=n_entity, ncol=n_group)
rownames(groups) <- entity_names
ans <- solver(values, groups, N)
print(ans)
names(values)[tail(ans$solution, length(values))==1]
if the entity values are always positive, I think you can get a solution without generating all combinations:
sort the groups by their largest element, 2nd largest element, nth largest element. in this case you would have 3 copies since the largest group has 3 elements.
for each copy, make one pass from the largest to the smallest adding the group to the solution only if it doesn't contain an element you've already added. this yields 3 results, take the largest. there shouldn't be a larger possible solution unless weights could be negative.
here's an implementation in C#
var entities = new Dictionary<char, int>() { { 'A', 2 }, { 'B', 2 }, { 'C', 2 }, { 'D', 3 }, { 'E', 3 } };
var groups = new List<string>() { "ABC", "BD", "CE", "D", "E" };
var solutions = new List<Tuple<List<string>, int>>();
for(int i = 0; i < groups.Max(x => x.Length); i++)
{
var solution = new List<string>();
foreach (var group in groups.OrderByDescending(x => x.Length > i ? entities[x[i]] : -1))
if (!group.ToCharArray().Any(c => solution.Any(g => g.Contains(c))))
solution.Add(group);
solutions.Add(new Tuple<List<string>, int>(solution, solution.Sum(g => g.ToCharArray().Sum(c => entities[c]))));
}
solutions.Dump();
solutions.OrderByDescending(x => x.Item2).First().Dump();
output:
I was asked this question during an interview and I couldn't come up with a satisfactory solution for it. Would appreciate if anybody could give some pointers.
Given a mapping like
mapping = {"A" => 1, "B" => 2, "C" => 3..... "Z" => 26}
encode("A") == "1"
encode("BA") == "21"
encode("ABC") == "123"
encode("") == ""
decode("1") == ["A"] -> 1
decode("21") == ["BA", "V"] -> 2
decode("123") == ["ABC", "JC", "AX"] -> 3
decode("012") == [] -> 0
decode("") == [""] -> 1
decode("102") == ["JB"] -> 1
numDecode(X) == len(decode(X))
numDecode("1") == 1
numDecode("21") == 2
numDecode("123") == 3
numDecode("") == 1
numDecode("102") == 1
numDecode("012") == 0
We need a numDecode method which gives the length of unique solution array.
Updated :
Given a mapping like :
mapping = {"A" => 1, "B" => 2, "C" => 3..... "Z" => 26}
Suppose we are given a string as "A" the it can be encoded as : "1"
encode("A") should return "1"
encode("BA") should return "21" as if mapping is a hash then B has a value of 2, A has a value of 1.
encode("ABC") should return "123" as mapping["A" is 1, mapping["B"] is 2, and mapping["C"] is 3.
encode("") should return "" as it is not in mapping.
Now if decode("1") is called then it should return an array with one element i.e. ["A"] as key matching with 1 as value in mapping is "A".
decode("") should return an array with empty string i.e. [""].
decode("21") should return an array ["BA", "U"] as 2 is "B", 1 is "A" and "U" is 21 in mapping.
decode("012") should return an empty array as string starts with "0" which is not in mapping keys.
decode("102") should return an array as ["JB"] as "10" is J and "2" is B.
And finally numDecode should return the count of unique decoded strings in array. So,
numDecode(X) == len(decode(X))
numDecode("1") == 1
numDecode("21") == 2
numDecode("123") == 3
numDecode("") == 1
numDecode("102") == 1
numDecode("012") == 0
This is an interesting question, and the interview technique that goes with it is most likely to see how far the critical thinking goes. A good interviewer would probably not expect a single canonically correct answer.
If you get as far as a recursive decode solution that you then enumerate, then you are doing well IMO (at least I'd hire most candidates who could demonstrate clearly thinking through a piece of recursive code at interview!)
Having said that, one key hint is that the question asks for a num_decode function, not necessarily for implementations of encode and decode.
There is a deeper understanding and structure accessible here, that can be gained from analysing the permutations and combinations. It allows you to write a num_decode function that can handle long strings with millions of possible answers, without filling memory or taking hours to enumerate all possibilities.
First note that any set of separate ambiguous encoding multiply the number of possibilities for the whole string:
1920 -> 19 is ambiguous 'AI' or 'S' -> 'AIT' or 'ST'
192011 -> 11 is also ambiguous 'AA' or 'K' -> 'AITAA', 'AITK', 'STAA', 'STK'
Here 19 has two possible interpretations, and 11 also has two. A string with both of these separate instances of ambiguous codings has 2 * 2 == 4 valid combinations.
Each independent section of ambiguous coding multiplies the size of the whole set of decode values by the number of possibilities that it represents.
Next how to deal with longer ambiguous sections. What happens when you add an ambiguous digit to an ambiguous sequence:
11 -> 'AA' or 'K' -> 2
111 -> 'AAA', 'AK', 'KA' -> 3
1111 -> 'AAAA', 'AAK', 'AKA', 'KAA', 'KK' -> 5
11111 -> 'AAAAA', 'AAAK', 'AAKA', 'AKAA', 'AKK', 'KAAA', 'KAK', 'KKA' -> 8
2,3,5,8 should look familiar, it is the Fibonacci sequence, what's going on? The answer is that adding one digit to the sequence allows all the previous combinations plus those of the sub-sequence before that. By adding a digit 1 to the sequence 1111 you can either interpret it as 1111(1) or 111(11) - so you can add together the number of possibilities in 1111 and 111 to get the number of possibilities in 11111. That is, N(i) = N(i-1) + N(i-2) which is how to construct the Fibonacci series.
So, if we can detect ambiguous coding sequences, and get their length, we can now calculate the number of possible decodes, without actually doing the decode:
# A caching Fibonacci sequence generator
def fib n
#fibcache ||= []
return #fibcache[n] if #fibcache[n]
a = b = 1
n.times do |i|
a, b = b, a + b
#fibcache[i+1] = a
end
#fibcache[n]
end
def num_decode encoded
# Check that we don't have invalid sequences, raising here, but you
# could technically return 0 and be correct according to question
if encoded.match(/[^0-9]/) || encoded.match(/(?<![12])0/)
raise ArgumentError, "Not a valid encoded sequence"
end
# The look-ahead assertion ensures we don't match
# a '1' or '2' that is needed by a '10' or '20'
ambiguous = encoded.scan(/[12]*1[789]|[12]+[123456](?![0])/)
ambiguous.inject(1) { |n,s| n * fib(s.length) }
end
# A few examples:
num_decode('') # => 1
num_decode('1') # => 1
num_decode('12') # => 2
num_decode('120') # => 1
num_decode('12121212') # => 34
num_decode('1212121212121212121212121211212121212121') # => 165580141
It is relatively short strings like the last one which foil attempts to enumerate
the possibilities directly by decoding.
The regex in the scan took a little experimentation to get right. Adding 7,8 or 9 is ambiguous after a 1, but not after a 2. You also want to avoid counting a 1 or 2 directly before a 0 as part of an ambiguous sequence because 10 or 20 have no other interpretations. I think I made about a dozen attempts at the regex before settling on the current version (which I believe to be correct, but I did keep finding exceptions to correct values most times I tested the first versions).
Finally, as an exercise, it should be possible to use this code as the basis from which to write a decoder that directly output the Nth possible decoding (or even one that enumerated them lazily from any starting point, without requiring excessive memory or CPU time).
Here's a recursive solution:
$mapping = Hash[(0..25).map { |i| [('A'.ord+i).chr,i+1] }]
$itoa = Hash[$mapping.to_a.map { |pair| pair.reverse.map(&:to_s) }]
def decode( str )
return [''] if str.empty?
return $itoa.key?(str) ? [$itoa[str]] : nil if str.length == 1
retval = []
0.upto(str.length-1) do |i|
word = $itoa[str[0..i]] or next
tails = decode(str[i+1..-1]) or next
retval.concat tails.map { |tail| word + tail }
end
return retval
end
Some sample output:
p decode('1') # ["A"]
p decode('21') # ["BA", "U"]
p decode('123') # ["ABC", "AW", "LC"]
p decode('012') # []
p decode('') # [""]
p decode('102') # ["JB"]
p decode('12345') # ["ABCDE", "AWDE", "LCDE"]
Note differences between this output and the question. E.g. The 21st letter of the alphabet is "U", not "V". etc.
#he = Hash[("A".."Z").to_a.zip((1..26).to_a.map(&:to_s))]
# => {"A"=>"1", "B"=>"2",...,"Z"=>"26"}
#hd = #he.invert # => {"1"=>"A", "2"=>"B",.., "26"=>"Z"}
def decode(str, comb='', arr=[])
return arr << s if str.empty?
# Return if the first character of str is not a key of #hd
return arr unless (c = #hd[str[0]])
# Recurse with str less the first char, s with c appended and arr
arr = decode(str[1..-1], s+c, arr)
# If the first two chars of str are a key of #hd (with value c),
# recurse with str less the first two chars, s with c appended and arr
arr = decode(str[2..-1], s+c, arr) if str.size > 1 && (c = #hd[str[0..1]])
arr
end
def num_decode(str) decode(str).size end
decode('1') # => ["A"]
decode('') # => [""].
decode('21') # => ["BA", "U"]
decode('012') # => [""]
decode('102') # => ["JB"]
decode('123') # => ["ABC", "AW", "LC"]
decode('12345') # => ["ABCDE", "AWDE", "LCDE"]
decode('120345') # => ["ATCDE"]
decode('12720132') # => ["ABGTACB", "ABGTMB", "LGTACB", "LGTMB"]
Any more? Yes, I see a hand back there. The gentleman with the red hat wants to see '12121212':
decode('12121212')
# => ["ABABABAB", "ABABABL", "ABABAUB", "ABABLAB", "ABABLL", "ABAUBAB",
"ABAUBL", "ABAUUB", "ABLABAB", "ABLABL", "ABLAUB", "ABLLAB",
"ABLLL", "AUBABAB", "AUBABL", "AUBAUB", "AUBLAB", "AUBLL",
"AUUBAB", "AUUBL", "AUUUB", "LABABAB", "LABABL", "LABAUB",
"LABLAB", "LABLL", "LAUBAB", "LAUBL", "LAUUB", "LLABAB",
"LLABL", "LLAUB", "LLLAB", "LLLL"]
num_decode('1') # => 1
num_decode('21') # => 2
num_decode('12121212') # => 34
num_decode('12912912') # => 8
This looks like a combinatorics problem, but it's also a parsing problem.
(You asked for pointers, so I'm doing this in English rather than dusting off my Ruby.)
I would do something like this:
If X is an empty string, return 1
If X is not a string composed of digits starting with a nonzero digit, return 0
If X contains no 1's or 2's, return 1 (there's only one possible parsing)
If X contains 1's or 2's, it gets a bit more complicated:
Every 1 that is not the last character in X matches both "A" and the first digit of one of the letters "J" through "S".
Every 2 that is not the last character in X and is followed by a digit less than 7 matches both "B" and the first digit of one of the letters.
Count up your 1's and 2's that meet those criteria. Let that number be Y. You have 2^Y combinations of those, so the answer should be 2^Y but you have to subtract 1 for every time you have a 1 and 2 next to each other.
So, if you haven't returned by Step 4 above, count up your 1's that aren't the last character in X, and the 2's that both aren't the last character in X and aren't followed by a 7,8,9, or 10. Let the sum of those counts be called Y.
Now count every instance that those 1's and 2's are neighbors; let that sum be called Z.
The number of possible parsings is (2^Y) - Z.
In the spirit of giving “some pointers”, instead of writing an actually implementation for numDecode let me say that the most logically straightforward way to tackle this problem is with recursion. If the string passed to numDecode is longer than one character then look at the beginning of the string and based on what you see use one or two (or zero) recursive calls to find the correct value.
And the risk of revealing too much, numDecode("1122") should make recursive calls to numDecode("122") and numDecode("22").
# just look for all singles and double as you go down and keep repeating this.. if you get to the end where the string would be 1 or 2 digets long you count 1
# IE
# 121
# 1 that's good 2 that's good 1 that's good if all good then count + 1
# 12 that's good 1 that's good ... no more doubles if all good then count + 1
# 1 that's good 21 that's good if all good then count + 1
# test this on other cases
$str = "2022"
$strlength = $str.length
$count = 0
def decode(str)
if str[0].to_i >= 1 and str[0].to_i <= 9
$count += 1 if str.length == 1
decode(str[1..-1])
end
if str[0..1].to_i >= 10 and str[0..1].to_i <= 26
$count += 1 if str.length == 2
p str.length
decode(str[2..-1])
end
end
decode($str)
p " count is #{$count}"
My problem is very simple but I haven't found an efficient implementation yet.
Suppose there is a matrix A like this:
0 0 0 0 0 0 0
4 4 2 2 2 0 0
4 4 2 2 2 0 0
0 0 2 2 2 1 1
0 0 0 0 0 1 1
Now I want to find all starting positions of rectangular areas in this matrix which have a given size. An area is a subset of A where all numbers are the same.
Let's say width=2 and height=3. There are 3 areas which have this size:
2 2 2 2 0 0
2 2 2 2 0 0
2 2 2 2 0 0
The result of the function call would be a list of starting positions (x,y starting with 0) of those areas.
List((2,1),(3,1),(5,0))
The following is my current implementation. "Areas" are called "surfaces" here.
case class Dimension2D(width: Int, height: Int)
case class Position2D(x: Int, y: Int)
def findFlatSurfaces(matrix: Array[Array[Int]], surfaceSize: Dimension2D): List[Position2D] = {
val matrixWidth = matrix.length
val matrixHeight = matrix(0).length
var resultPositions: List[Position2D] = Nil
for (y <- 0 to matrixHeight - surfaceSize.height) {
var x = 0
while (x <= matrixWidth - surfaceSize.width) {
val topLeft = matrix(x)(y)
val topRight = matrix(x + surfaceSize.width - 1)(y)
val bottomLeft = matrix(x)(y + surfaceSize.height - 1)
val bottomRight = matrix(x + surfaceSize.width - 1)(y + surfaceSize.height - 1)
// investigate further if corners are equal
if (topLeft == bottomLeft && topLeft == topRight && topLeft == bottomRight) {
breakable {
for (sx <- x until x + surfaceSize.width;
sy <- y until y + surfaceSize.height) {
if (matrix(sx)(sy) != topLeft) {
x = if (x == sx) sx + 1 else sx
break
}
}
// found one!
resultPositions ::= Position2D(x, y)
x += 1
}
} else if (topRight != bottomRight) {
// can skip x a bit as there won't be a valid match in current row in this area
x += surfaceSize.width
} else {
x += 1
}
}
}
return resultPositions
}
I already tried to include some optimizations in it but I am sure that there are far better solutions. Is there a matlab function existing for it which I could port? I'm also wondering whether this problem has its own name as I didn't exactly know what to google for.
Thanks for thinking about it! I'm excited to see your proposals or solutions :)
EDIT: Matrix dimensions in my application range from 300x300 to 3000x3000 approximately. Also, the algorithm will only be called once for the same matrix. The reason is that the matrix will always be changed afterwards (approx. 1-20% of it).
RESULTS
I implemented the algorithms of Kevin, Nikita and Daniel and benchmarked them in my application environment, i.e. no isolated synthetic benchmark here, but special care was taken to integrate all algorithms in their most performant way which was especially important for Kevin's approach as it uses generics (see below).
First, the raw results, using Scala 2.8 and jdk 1.6.0_23. The algorithms were executed several hundred times as part of solving an application-specific problem. "Duration" denotes the total time needed until the application algorithm finished (of course without jvm startup etc.). My machine is a 2.8GHz Core 2 Duo with 2 cores and 2gig of memory, -Xmx800M were given to the JVM.
IMPORTANT NOTE: I think my benchmark setup is not really fair for parallelized algorithms like the one from Daniel. This is because the application is already calculating multi-threaded. So the results here probably only show an equivalent to single-threaded speed.
Matrix size 233x587:
duration | JVM memory | avg CPU utilization
original O(n^4) | 3000s 30M 100%
original/-server| 840s 270M 100%
Nikita O(n^2) | 5-6s 34M 70-80%
Nikita/-server | 1-2s 300M 100%
Kevin/-server | 7400s 800M 96-98%
Kevin/-server** | 4900s 800M 96-99%
Daniel/-server | 240s 360M 96-99%
** with #specialized, to make generics faster by avoiding type erasure
Matrix size 2000x3000:
duration | JVM memory | avg CPU utilization
original O(n^4) | too long 100M 100%
Nikita O(n^2) | 150s 760M 70%
Nikita/-server | 295s (!) 780M 100%
Kevin/-server | too long, didn't try
First, a small note on memory. The -server JVM option uses considerably more memory at the advantage of more optimizations and in general faster execution. As you can see from the 2nd table Nikita's algorithm is slower with the -server option which is obviously due to hitting the memory limit. I assume that this also slows down Kevin's algorithm even for the small matrix as the functional approach is using much more memory anyways. To eliminate the memory factor I also tried it once with a 50x50 matrix and then Kevin's took 5secs and Nikita's 0secs (well, nearly 0). So in any case it's still slower and not just because of memory.
As you can see from the numbers, I will obviously use Nikita's algorithm because it's damn fast and this is absolutely necessary in my case. It can also be parallelized easily as Daniel pointed out. The only downside is that it's not really the scala-way.
At the moment Kevin's algorithm is probably in general a bit too complex and therefore slow, but I'm sure there are more optimizations possible (see last comments in his answer).
With the goal of directly transforming Nikita's algorithm to functional style Daniel came up with a solution which is already quite fast and as he says would even be faster if he could use scanRight (see last comments in his answer).
What's next?
At the technological side: waiting for Scala 2.9, ScalaCL, and doing synthetic benchmarks to get raw speeds.
My goal in all this is to have functional code, BUT only if it's not sacrificing too much speed.
Choice of answer:
As for choosing an answer, I would want to mark Nikita's and Daniel's algorithms as answers but I have to choose one. The title of my question included "most efficiently", and one is the fastest in imperative and the other in functional style. Although this question is tagged Scala I chose Nikita's imperative algorithm as 2s vs. 240s is still too much difference for me to accept. I'm sure the difference can still be pushed down a bit, any ideas?
So, thank you all very very much! Although I won't use the functional algorithms yet, I got many new insights into Scala and I think I slowly get an understanding of all the functional crazyness and its potential. (of course, even without doing much functional programming, Scala is much more pleasing than Java... that's another reason to learn it)
First, a couple of helper functions:
//count the number of elements matching the head
def runLength[T](xs:List[T]) = xs.takeWhile(_ == xs.head).size
//pair each element with the number of subsequent occurrences
def runLengths[T](row:List[T]) : List[(T,Int)] = row match {
case Nil => Nil
case h :: t => (h, runLength(row)) :: runLengths(t)
}
//should be optimised for tail-call, but easier to understand this way
//sample input: 1,1,2,2,2,3,4,4,4,4,5,5,6
//output: (1,2), (1,1), (2,3), (2,2), (2,1), (3,1), (4,4), (4,3), (4,2), (4,1), (5,2), (5,1), (6,1)
This can then be used against each row in the grid:
val grid = List(
List(0,0,0,0),
List(0,1,1,0),
List(0,1,1,0),
List(0,0,0,0))
val stage1 = grid map runLengths
// returns stage1: List[List[(Int, Int)]] =
// 0,4 0,3 0,2 0,1
// 0,1 1,2 1,1 0,1
// 0,1 1,2 1,1 0,1
// 0,4 0,3 0,2 0,1
Then having done the horizontal, the rows, we now perform exactly the same operation on the columns. This uses the transpose method available in the Scala standard collection library to exchange rows<->columns, as per the mathematical matrix operation of the same name. We also transpose back once this is done.
val stage2 = (stage1.transpose map runLengths).transpose
// returns stage2: List[List[((Int, Int), Int)]] =
// (0,4),1 (0,3),1 (0,2),1 (0,1),4
// (0,1),2 (1,2),2 (1,1),2 (0,1),3
// (0,1),1 (1,2),1 (1,1),1 (0,1),2
// (0,4),1 (0,3),1 (0,2),1 (0,1),1
What does this mean? Taking one element: (1,2),2, it means that that cell contains the value 1, and scanning to the right that there are 2 adjacent cells in the row containing 1. Scanning down, there are two adjacent cells with the same property of containing the value 1 and having the same number of equal values to their right.
It's a little clearer after tidying up, converting nested tuples of the form ((a,b),c) to (a,(b,c)):
val stage3 = stage2 map { _.map {case ((a,b),c) => a->(b,c) } }
//returns stage3: List[List[(Int, (Int, Int))]] =
// 0,(4,1) 0,(3,1) 0,(2,1) 0,(1,4)
// 0,(1,2) 1,(2,2) 1,(1,2) 0,(1,3)
// 0,(1,1) 1,(2,1) 1,(1,1) 0,(1,2)
// 0,(4,1) 0,(3,1) 0,(2,1) 0,(1,1)
Where 1,(2,2) refers to a cell containing the value 1, and being at the top-left corner of a 2x2 rectangle of similarly-valued cells.
From here, it's trivial to spot rectangles of the same size. Though a little more work is necessary if you also want to exclude areas that are the subset of a larger rectangle.
UPDATE: As written, the algorithm doesn't work well for cases like the cell at (0,0), which belongs to two distinct rectangles (1x4 and 4x1). If needed, this is also solvable using the same techniques. (do one pass using map/transpose/map/transpose and a second with transpose/map/transpose/map, then zip and flatten the results).
It would also need modifying if the input might contain adjoining rectangles containing cells of the same value, such as:
0 0 0 0 0 0 0 0
0 0 1 1 1 0 0 0
0 0 1 1 1 0 0 0
0 0 1 1 1 1 1 0
0 0 1 1 1 1 1 0
0 0 1 1 1 1 1 0
0 0 0 0 0 0 0 0
Putting it all together, and cleaning up a little:
type Grid[T] = List[List[T]]
def runLengths[T](row:List[T]) : List[(T,Int)] = row match {
case Nil => Nil
case h :: t => (h -> row.takeWhile(_ == h).size) :: runLengths(t)
}
def findRectangles[T](grid: Grid[T]) = {
val step1 = (grid map runLengths)
val step2 = (step1.transpose map runLengths).transpose
step2 map { _ map { case ((a,b),c) => (a,(b,c)) } }
}
UPDATE2
Hold onto your hats, this is a big one...
Before writing a single line of new functionality, we'll first refactor a bit, pulling some of the methods into Ops classes with implicit conversions, so they can be used as though they were methods defined on the underlying collection types:
import annotation.tailrec
class RowOps[T](row: List[T]) {
def withRunLengths[U](func: (T,Int)=>U) : List[U] = {
#tailrec def recurse(row:List[T], acc:List[U]): List[U] = row match {
case Nil => acc
case head :: tail =>
recurse(
tail,
func(head, row.takeWhile(head==).size) :: acc)
}
recurse(row, Nil).reverse
}
def mapRange(start: Int, len: Int)(func: T=>T) =
row.splitAt(start) match {
case (l,r) => l ::: r.take(len).map(func) ::: r.drop(len)
}
}
implicit def rowToOps[T](row: List[T]) = new RowOps(row)
This adds a withRunLengths method to lists. One notable difference here is that instead of returning a List of (value, length) pairs the method accepts, as a parameter, a function to create an output value for each such pair. This will come in handy later...
type Grid[T] = List[List[T]]
class GridOps[T](grid: Grid[T]) {
def deepZip[U](other: Grid[U]) = (grid zip other) map { case (g,o) => g zip o}
def deepMap[U](f: (T)=>U) = grid map { _ map f}
def mapCols[U](f: List[T]=>List[U]) = (grid.transpose map f).transpose
def height = grid.size
def width = grid.head.size
def coords = List.tabulate(height,width){ case (y,x) => (x,y) }
def zipWithCoords = deepZip(coords)
def deepMapRange(x: Int, y: Int, w: Int, h: Int)(func: T=>T) =
grid mapRange (y,h){ _.mapRange(x,w)(func) }
}
implicit def gridToOps[T](grid: Grid[T]) = new GridOps(grid)
There shouldn't be any surprises here. The deepXXX methods avoid having to write constructs of the form list map { _ map { ... } }. tabulate may also be new to you, but its purpose is hopefully obvious from the use.
Using these, it becomes trivial to define functions for finding the horizontal and vertical run lengths over a whole grid:
def withRowRunLengths[T,U](grid: Grid[T])(func: (T,Int)=>U) =
grid map { _.withRunLengths(func) }
def withColRunLengths[T,U](grid: Grid[T])(func: (T,Int)=>U) =
grid mapCols { _.withRunLengths(func) }
Why 2 parameter blocks and not one? I'll explain shortly.
I could have defined these as methods in GridOps, but they don't seem appropriate for general purpose use. Feel free to disagree with me here :)
Next, define some input...
def parseIntGrid(rows: String*): Grid[Int] =
rows.toList map { _ map {_.toString.toInt} toList }
val input: Grid[Int] = parseIntGrid("0000","0110","0110","0000")
...another useful helper type...
case class Rect(w: Int, h: Int)
object Rect { def empty = Rect(0,0) }
as an alternative to a tuple, this really helps with debugging. Deeply nested parenthesis are not easy on the eyes. (sorry Lisp fans!)
...and use the functions defined above:
val stage1w = withRowRunLengths(input) {
case (cell,width) => (cell,width)
}
val stage2w = withColRunLengths(stage1w) {
case ((cell,width),height) => Rect(width,height)
}
val stage1t = withColRunLengths(input) {
case (cell,height) => (cell,height)
}
val stage2t = withRowRunLengths(stage1t) {
case ((cell,height),width) => Rect(width,height)
}
All of the above blocks should be one-liners, but I reformatted for StackOverflow.
The outputs at this stage are just grids of Rectangles, I've intentionally dropped any mention of the actual value that the rectangle is comprised of. That's absolutely fine, it's easy enough to find from its co-ordinates in the grid, and we'll be recombining the data in just a brief moment.
Remember me explaining that RowOps#withRunLengths takes a function as a parameter? Well, this is where it's being used. case (cell,width) => (cell,width) is actually a function that takes the cell value and run length (calling them cell and width) then returns the (cell,width) Tuple.
This is also why I used two parameter blocks in defining the functions, so the second param can be passed in { braces }, and makes the whole thing all nice and DSL-like.
Another very important principle illustrated here is that the type inferencer works on parameter blocks in succession, so for this (remember it?):
def withRowRunLengths[T,U](grid: Grid[T])(func: (T,Int)=>U)
The type T will be determined by the supplied grid. The compiler then knows the input type for the function supplied as the second argument, - Int in this case, as the first param was a Grid[Int] - which is why I was able to the write case (cell,width) => (cell,width) and not have to explicitly state anywhere that cell and width are integers. In the second usage, a Grid[(Int,Int)] is supplied, this fits the closure case ((cell,width),height) => Rect(width,height) and again, it just works.
If that closure had contained anything that wouldn't work for the underlying type of the grid then the compiler would have complained, this is what type safety is all about...
Having calculated all the possible rectangles, all that remains is to gather up the data and present it in a format that's more practical for analysing. Because the nesting at this stage could get very messy, I defined another datatype:
case class Cell[T](
value: T,
coords: (Int,Int) = (0,0),
widest: Rect = Rect.empty,
tallest: Rect = Rect.empty
)
Nothing special here, just a case class with named/default parameters. I'm also glad I had the foresight to define Rect.empty above :)
Now mix the 4 datasets (input vals, coords, widest rects, tallest rects), gradually fold into the cells, stir gently, and serve:
val cellsWithCoords = input.zipWithCoords deepMap {
case (v,(x,y)) => Cell(value=v, coords=(x,y))
}
val cellsWithWidest = cellsWithCoords deepZip stage2w deepMap {
case (cell,rect) => cell.copy(widest=rect)
}
val cellsWithWidestAndTallest = cellsWithWidest deepZip stage2t deepMap {
case (cell,rect) => cell.copy(tallest=rect)
}
val results = (cellsWithWidestAndTallest deepMap {
case Cell(value, coords, widest, tallest) =>
List((widest, value, coords), (tallest, value, coords))
}
).flatten.flatten
The last stage is interesting there, it converts each cell to a size-2 list of (rectangle, value, coords) tuples (size 2 because we have both widest and tallest rects for each cell). Calling flatten twice then takes the resulting List[List[List[_]]] back down to a single List. There's no need to retain the 2D structure any more, as the necessary coordinates are already embedded in the results.
Voila!
It's now up to you what you now do with this List. The next stage is probably some form of sorting and duplicate removal...
You can do it in O(n^2) relatively easy.
First, some-precalculations. For each cell in matrix, calculate how many consecutive cells below it have the same number.
For your example, resulting matrix a (couldn't think of better name :/) will look like this
0 0 0 0 0 2 2
1 1 2 2 2 1 1
0 0 1 1 1 0 0
1 1 0 0 0 1 1
0 0 0 0 0 0 0
It can be produced in O(n^2) easily.
Now, for each row i, let's find all rectangles with top side in row i (and bottom side in row i + height - 1).
Here's an illustration for i = 1
0 0 0 0 0 0 0
-------------
4 4 2 2 2 0 0
4 4 2 2 2 0 0
0 0 2 2 2 1 1
-------------
0 0 0 0 0 1 1
Now, the main idea
int current_width = 0;
for (int j = 0; j < matrix.width; ++j) {
if (a[i][j] < height - 1) {
// this column has different numbers in it, no game
current_width = 0;
continue;
}
if (current_width > 0) {
// this column should consist of the same numbers as the one before
if (matrix[i][j] != matrix[i][j - 1]) {
current_width = 1; // start streak anew, from the current column
continue;
}
}
++current_width;
if (current_width >= width) {
// we've found a rectangle!
}
}
In the example above (i = 1) current_width after each iteration will be 0, 0, 1, 2, 3, 0, 0.
Now, we need to iterate over all possible i and we have a solution.
I'll play the devil's advocate here. I'll show Nikita's exact algorithm written in a functional style. I'll also parallelize it, just to show it can be done.
First, computing consecutive cells with the same number below a cell. I made a slight change by returning all values plus one compared to Nikita's proposed output, to avoid a - 1 in some other part of the code.
def computeHeights(column: Array[Int]) = (
column
.reverse
.sliding(2)
.map(pair => pair(0) == pair(1))
.foldLeft(List(1)) (
(list, flag) => (if (flag) list.head + 1 else 1) :: list
)
)
I'd rather use .view before reversing, but that doesn't work with the present Scala version. If it did, it would save repeated array creations, which ought to speed up the code a lot, for memory locality and bandwidth reasons if no other.
Now, all columns at the same time:
import scala.actors.Futures.future
def getGridHeights(grid: Array[Array[Int]]) = (
grid
.transpose
.map(column => future(computeHeights(column)))
.map(_())
.toList
.transpose
)
I'm not sure if the parallelization overhead will pay off here or not, but this is the first algorithm on Stack Overflow where it actually has a chance, given the non-trivial effort in computing a column. Here's another way of writing it, using the upcoming 2.9 features (it might work on Scala 2.8.1 -- not sure what :
def getGridHeights(grid: Array[Array[Int]]) = (
grid
.transpose
.toParSeq
.map(computeHeights)
.toList
.transpose
)
Now, the meat of Nikita's algorithm:
def computeWidths(height: Int, row: Array[Int], heightRow: List[Int]) = (
row
.sliding(2)
.zip(heightRow.iterator)
.toSeq
.reverse
.foldLeft(List(1)) { case (widths # (currWidth :: _), (Array(prev, curr), currHeight)) =>
(
if (currHeight >= height && currWidth > 0 && prev == curr) currWidth + 1
else 1
) :: widths
}
.toArray
)
I'm using pattern matching in this bit of code, though I'm concerned with its speed, because with all the sliding, and zipping and folding there's two many things being juggled here. And, speaking of performance, I'm using Array instead of IndexedSeq because Array is the only type in the JVM that is not erased, resulting in much better performance with Int. And, then, there's the .toSeq I'm not particular happy about either, because of memory locality and bandwidth.
Also, I'm doing this from right to left instead of Nikita's left to right, just so I can find the upper left corner.
However, is the same thing as the code in Nikita's answer, aside from the fact that I'm still adding one to current width compared to his code, and not printing the result right here. There's a bunch of things without clear origins here, though -- row, heightRow, height... Let's see this code in context -- and parallelized! -- to get the overall picture.
def getGridWidths(height: Int, grid: Array[Array[Int]]) = (
grid
.zip(getGridHeights(grid))
.map { case (row, heightsRow) => future(computeWidths(height, row, heightsRow)) }
.map(_())
)
And the 2.9 version:
def getGridWidths(height: Int, grid: Array[Array[Int]]) = (
grid
.toParSeq
.zip(getGridHeights(grid))
.map { case (row, heightsRow) => computeWidths(height, row, heightsRow) }
)
And, for the gran finale,
def findRectangles(height: Int, width: Int, grid: Array[Array[Int]]) = {
val gridWidths = getGridWidths(height, grid)
for {
y <- gridWidths.indices
x <- gridWidths(y).indices
if gridWidths(y)(x) >= width
} yield (x, y)
}
So... I have no doubt that the imperative version of Nikita's algorithm is faster -- it only uses Array, which is much faster with primitives than any other type, and it avoids massive creation of temporary collections -- though Scala could have done better here. Also, no closures -- as much as they help, they are slower than code without closures. At least until JVM grow something to help with them.
Also, Nikita's code can be easily parallelized with threads -- of all things! -- with little trouble.
But my point here is that Nikita's code is not particularly bad just because it uses arrays and a mutable variable here and there. The algorithm translates cleanly to a more functional style.
EDIT
So, I decided to take a shot at making an efficient functional version. It's not really fully functional because I use Iterator, which is mutable, but it's close enough. Unfortunately, it won't work on Scala 2.8.1, because it lacks scanLeft on Iterator.
There are two other unfortunate things here. First, I gave up on parallelization of grid heights, since I couldn't get around having at least one transpose, with all the collection copying that entails. There's still at least one copying of the data, though (see toArray call to understand where).
Also, since I'm working with Iterable I loose the ability to use the parallel collections. I do wonder if the code couldn't be made better by having grid be a parallel collection of parallel collections from the beginning.
I have no clue if this is more efficient than the previous version of not. It's an interesting question...
def getGridHeights(grid: Array[Array[Int]]) = (
grid
.sliding(2)
.scanLeft(Array.fill(grid.head.size)(1)) { case (currHeightArray, Array(prevRow, nextRow)) =>
(prevRow, nextRow, currHeightArray)
.zipped
.map { case (x, y, currHeight) => if (x == y) currHeight + 1 else 1 }
}
)
def computeWidths(height: Int, row: Array[Int], heightRow: Array[Int]) = (
row
.sliding(2)
.map { case Array(x, y) => x == y }
.zip(heightRow.iterator)
.scanLeft(1) { case (currWidth , (isConsecutive, currHeight)) =>
if (currHeight >= height && currWidth > 0 && isConsecutive) currWidth + 1
else 1
}
.toArray
)
import scala.actors.Futures.future
def getGridWidths(height: Int, grid: Array[Array[Int]]) = (
grid
.iterator
.zip(getGridHeights(grid))
.map { case (row, heightsRow) => future(computeWidths(height, row, heightsRow)) }
.map(_())
.toArray
)
def findRectangles(height: Int, width: Int, grid: Array[Array[Int]]) = {
val gridWidths = getGridWidths(height, grid)
for {
y <- gridWidths.indices
x <- gridWidths(y).indices
if gridWidths(y)(x) >= width
} yield (x - width + 1, y - height + 1)
}
I'm trying to write a piece of code that will do the following:
Take the numbers 0 to 9 and assign one or more letters to this number. For example:
0 = N,
1 = L,
2 = T,
3 = D,
4 = R,
5 = V or F,
6 = B or P,
7 = Z,
8 = H or CH or J,
9 = G
When I have a code like 0123, it's an easy job to encode it. It will obviously make up the code NLTD. When a number like 5,6 or 8 is introduced, things get different. A number like 051 would result in more than one possibility:
NVL and NFL
It should be obvious that this gets even "worse" with longer numbers that include several digits like 5,6 or 8.
Being pretty bad at mathematics, I have not yet been able to come up with a decent solution that will allow me to feed the program a bunch of numbers and have it spit out all the possible letter combinations. So I'd love some help with it, 'cause I can't seem to figure it out. Dug up some information about permutations and combinations, but no luck.
Thanks for any suggestions/clues. The language I need to write the code in is PHP, but any general hints would be highly appreciated.
Update:
Some more background: (and thanks a lot for the quick responses!)
The idea behind my question is to build a script that will help people to easily convert numbers they want to remember to words that are far more easily remembered. This is sometimes referred to as "pseudo-numerology".
I want the script to give me all the possible combinations that are then held against a database of stripped words. These stripped words just come from a dictionary and have all the letters I mentioned in my question stripped out of them. That way, the number to be encoded can usually easily be related to a one or more database records. And when that happens, you end up with a list of words that you can use to remember the number you wanted to remember.
It can be done easily recursively.
The idea is that to handle the whole code of size n, you must handle first the n - 1 digits.
Once you have all answers for n-1 digits, the answers for the whole are deduced by appending to them the correct(s) char(s) for the last one.
There's actually a much better solution than enumerating all the possible translations of a number and looking them up: Simply do the reverse computation on every word in your dictionary, and store the string of digits in another field. So if your mapping is:
0 = N,
1 = L,
2 = T,
3 = D,
4 = R,
5 = V or F,
6 = B or P,
7 = Z,
8 = H or CH or J,
9 = G
your reverse mapping is:
N = 0,
L = 1,
T = 2,
D = 3,
R = 4,
V = 5,
F = 5,
B = 6,
P = 6,
Z = 7,
H = 8,
J = 8,
G = 9
Note there's no mapping for 'ch', because the 'c' will be dropped, and the 'h' will be converted to 8 anyway.
Then, all you have to do is iterate through each letter in the dictionary word, output the appropriate digit if there's a match, and do nothing if there isn't.
Store all the generated digit strings as another field in the database. When you want to look something up, just perform a simple query for the number entered, instead of having to do tens (or hundreds, or thousands) of lookups of potential words.
The general structure you want to hold your number -> letter assignments is an array or arrays, similar to:
// 0 = N, 1 = L, 2 = T, 3 = D, 4 = R, 5 = V or F, 6 = B or P, 7 = Z,
// 8 = H or CH or J, 9 = G
$numberMap = new Array (
0 => new Array("N"),
1 => new Array("L"),
2 => new Array("T"),
3 => new Array("D"),
4 => new Array("R"),
5 => new Array("V", "F"),
6 => new Array("B", "P"),
7 => new Array("Z"),
8 => new Array("H", "CH", "J"),
9 => new Array("G"),
);
Then, a bit of recursive logic gives us a function similar to:
function GetEncoding($number) {
$ret = new Array();
for ($i = 0; $i < strlen($number); $i++) {
// We're just translating here, nothing special.
// $var + 0 is a cheap way of forcing a variable to be numeric
$ret[] = $numberMap[$number[$i]+0];
}
}
function PrintEncoding($enc, $string = "") {
// If we're at the end of the line, then print!
if (count($enc) === 0) {
print $string."\n";
return;
}
// Otherwise, soldier on through the possible values.
// Grab the next 'letter' and cycle through the possibilities for it.
foreach ($enc[0] as $letter) {
// And call this function again with it!
PrintEncoding(array_slice($enc, 1), $string.$letter);
}
}
Three cheers for recursion! This would be used via:
PrintEncoding(GetEncoding("052384"));
And if you really want it as an array, play with output buffering and explode using "\n" as your split string.
This kind of problem are usually resolved with recursion. In ruby, one (quick and dirty) solution would be
#values = Hash.new([])
#values["0"] = ["N"]
#values["1"] = ["L"]
#values["2"] = ["T"]
#values["3"] = ["D"]
#values["4"] = ["R"]
#values["5"] = ["V","F"]
#values["6"] = ["B","P"]
#values["7"] = ["Z"]
#values["8"] = ["H","CH","J"]
#values["9"] = ["G"]
def find_valid_combinations(buffer,number)
first_char = number.shift
#values[first_char].each do |key|
if(number.length == 0) then
puts buffer + key
else
find_valid_combinations(buffer + key,number.dup)
end
end
end
find_valid_combinations("",ARGV[0].split(""))
And if you run this from the command line you will get:
$ ruby r.rb 051
NVL
NFL
This is related to brute-force search and backtracking
Here is a recursive solution in Python.
#!/usr/bin/env/python
import sys
ENCODING = {'0':['N'],
'1':['L'],
'2':['T'],
'3':['D'],
'4':['R'],
'5':['V', 'F'],
'6':['B', 'P'],
'7':['Z'],
'8':['H', 'CH', 'J'],
'9':['G']
}
def decode(str):
if len(str) == 0:
return ''
elif len(str) == 1:
return ENCODING[str]
else:
result = []
for prefix in ENCODING[str[0]]:
result.extend([prefix + suffix for suffix in decode(str[1:])])
return result
if __name__ == '__main__':
print decode(sys.argv[1])
Example output:
$ ./demo 1
['L']
$ ./demo 051
['NVL', 'NFL']
$ ./demo 0518
['NVLH', 'NVLCH', 'NVLJ', 'NFLH', 'NFLCH', 'NFLJ']
Could you do the following:
Create a results array.
Create an item in the array with value ""
Loop through the numbers, say 051 analyzing each one individually.
Each time a 1 to 1 match between a number is found add the correct value to all items in the results array.
So "" becomes N.
Each time a 1 to many match is found, add new rows to the results array with one option, and update the existing results with the other option.
So N becomes NV and a new item is created NF
Then the last number is a 1 to 1 match so the items in the results array become
NVL and NFL
To produce the results loop through the results array, printing them, or whatever.
Let pn be a list of all possible letter combinations of a given number string s up to the nth digit.
Then, the following algorithm will generate pn+1:
digit = s[n+1];
foreach(letter l that digit maps to)
{
foreach(entry e in p(n))
{
newEntry = append l to e;
add newEntry to p(n+1);
}
}
The first iteration is somewhat of a special case, since p-1 is undefined. You can simply initialize p0 as the list of all possible characters for the first character.
So, your 051 example:
Iteration 0:
p(0) = {N}
Iteration 1:
digit = 5
foreach({V, F})
{
foreach(p(0) = {N})
{
newEntry = N + V or N + F
p(1) = {NV, NF}
}
}
Iteration 2:
digit = 1
foreach({L})
{
foreach(p(1) = {NV, NF})
{
newEntry = NV + L or NF + L
p(2) = {NVL, NFL}
}
}
The form you want is probably something like:
function combinations( $str ){
$l = len( $str );
$results = array( );
if ($l == 0) { return $results; }
if ($l == 1)
{
foreach( $codes[ $str[0] ] as $code )
{
$results[] = $code;
}
return $results;
}
$cur = $str[0];
$combs = combinations( substr( $str, 1, $l ) );
foreach ($codes[ $cur ] as $code)
{
foreach ($combs as $comb)
{
$results[] = $code.$comb;
}
}
return $results;}
This is ugly, pidgin-php so please verify it first. The basic idea is to generate every combination of the string from [1..n] and then prepend to the front of all those combinations each possible code for str[0]. Bear in mind that in the worst case this will have performance exponential in the length of your string, because that much ambiguity is actually present in your coding scheme.
The trick is not only to generate all possible letter combinations that match a given number, but to select the letter sequence that is most easy to remember. A suggestion would be to run the soundex algorithm on each of the sequence and try to match against an English language dictionary such as Wordnet to find the most 'real-word-sounding' sequences.