Scala Filter and Collect is slow - performance

I am just beginning with Scala development and am trying to filter out unnecessary lines from an iterator using filter and collect. But the operation seems to be too slow.
val src = Source.fromFile("/home/Documents/1987.csv") // 1.2 Million
val iter = src.getLines().map(_.split(":"))
val iter250 = iter.take(250000) // Only interested in the first 250,000
val intrestedIndices = range(1, 100000, 3).toSeq // This could be any order
val slicedData = iter250.zipWithIndex
// Takes 3 minutes
val firstCase = slicedData.collect { case (x, i) if intrestedIndices.contains(i) => x }.size
// Takes 3 minutes
val secondCase = slicedData.filter(x => intrestedIndices.contains(x._2)).size
// Takes 1 second
val thirdCase = slicedData.collect { case (x,i ) if i % 3 == 0 => x}.size
It appears the intrestedIndices.contains(_) part is slowing down the program in the first and second case. Is there an alternative way to speed this process up.

This answer helped solve the problem.
You iterate over all interestedIndices in first two cases in linear time. Use Set instead of Seq to improve performance – Sergey Lagutin

For the record, here's a method to filter with an (ordered) Seq of indices, not necessarily equidistant, without scanning the indices at each step:
def filterInteresting[T](it: Iterator[T], indices: Seq[Int]): Iterator[T] =
it.zipWithIndex.scanLeft((indices, None: Option[T])) {
case ((indices, _), (elem, index)) => indices match {
case h :: t if h == index => (t, Some(elem))
case l => (l, None)
}
}.map(_._2).flatten

Related

Why my Binary Search implementation in Scala is so slow?

Recently, I implemented this Binary Search, which is supposed to run under 6 seconds for Scala, yet it runs for 12-13 seconds on the machine that checks the assignments.
Note before you read the code: the input consists of two lines: first - list of numbers to search in, and second - list of "search terms" to search in the list of numbers. Expected output just lists the indexes of each term in the list of numbers. Each input can be maximum of length 10^5 and each number maximum of size 10^9.
For example:
Input:
5 1 5 8 12 13 //note, that the first number 5 indicates the length of the
following sequence
5 8 1 23 1 11 //note, that the first number 5 indicates the length of the
following sequence
Output:
2 0 -1 0 -1 // index of each term in the input array
My solution:
object BinarySearch extends App {
val n_items = readLine().split(" ").map(BigInt(_))
val n = n_items(0)
val items = n_items.drop(1)
val k :: terms = readLine().split(" ").map(BigInt(_)).toList
println(search(terms, items).mkString(" "))
def search(terms: List[BigInt], items:Array[BigInt]): Array[BigInt] = {
#tailrec
def go(terms: List[BigInt], results: Array[BigInt]): Array[BigInt] = terms match {
case List() => results
case head :: tail => go(tail, results :+ find(head))
}
def find(term: BigInt): BigInt = {
#tailrec
def go(left: BigInt, right: BigInt): BigInt = {
if (left > right) { -1 }
else {
val middle = left + (right - left) / 2
val middle_val = items(middle.toInt)
middle_val match {
case m if m == term => middle
case m if m <= term => go(middle + 1, right)
case m if m > term => go(left, middle - 1)
}
}
}
go(0, n - 1)
}
go(terms, Array())
}
}
What makes this code so slow? Thank you
I am worried about the complexity of
results :+ find(head)
Appending an item to a list of length L is O(L) (see here), so if you have n results to compute, the complexity will be O(n*n).
Try using a mutable ArrayBuffer instead of an Array to accumulate the results, or simply mapping the input terms through the find function.
In other words replace
go(terms, Array())
with
terms.map( x => find(x) ).toArray
By the way, the limits on the problem are small enough that using BigInt is overkill and probably making the code significantly slower. Normal ints should be large enough for this problem.

Scala best way to find pairs in a collection [duplicate]

This question already has answers here:
Composing a list of all pairs
(3 answers)
Closed 5 years ago.
I'm trying to find the most optimal way of finding pairs in a Scala collection. For example,
val list = List(1,2,3)
should produce these pairs
(1,2) (1,3) (2,1) (2,3) (3,1) (3,2)
My current implement seems quite expensive. How can I further optimize this?
val pairs = list.flatMap { currentElement =>
val clonedList: mutable.ListBuffer[Int] = list.to[ListBuffer]
val currentIndex = list.indexOf(currentElement)
val removedValue = clonedList.remove(currentIndex)
clonedList.map { y =>
(currentElement, y)
}
}
val l = Array(1,2,3,4)
val result = scala.collection.mutable.HashSet[(Int, Int)]()
for(i <- 0 until l.size) {
for(j<- (i+1) until l.size) {
result += l(i)->l(j)
result += l(j)->l(i)
}
}
Several optimizations here. First, with the second loop, we only traverse the list from the current element to the end, dividing the number of iterations by two. Then we limit the number of object creations to the minimum (Only tuples are created and added to a mutable hashset). Finally, with the hashset you handle the duplicates for free. An additional optimization would be to check if the set already contains the tuple to avoid creating an object for nothing.
For 1,000 elements, it takes less than 1s on my laptop. 7s for 10k distinct elements.
Note that recursively, you could do it this way:
def combi(s : Seq[Int]) : Seq[(Int, Int)] =
if(s.isEmpty)
Seq()
else
s.tail.flatMap(x=> Seq(s.head -> x, x -> s.head)) ++ combi(s.tail)
It takes a little bit more than 1s for 1000 elements.
Supposing that "most optimal way" could be treated differently (e.g. most of time I treat it the one which allows myself to be more productive) I suggest the following approach:
val originalList = (1 to 1000) toStream
def orderedPairs[T](list: Stream[T]) = list.combinations(2).map( p => (p(0), p(1)) ).toStream
val pairs = orderedPairs(originalList) ++ orderedPairs(originalList.reverse)
println(pairs.slice(0, 1000).toList)

scala version of swap algorithm for null models

The problem I am having is with trying to find an efficient way to find swappable elements in a matrix in order to implement a swap algorithm for null model creation.
The matrix consists of 0's and 1's and the idea is that elements can be switched between columns so that the row and column totals of the matrix remain the same.
For example, given the following matrix:
c1 c2 c3 c4
r1 0 1 0 0 = 1
r2 1 0 0 1 = 2
r3 0 0 0 0 = 0
r4 1 1 1 1 = 4
------------
2 2 1 2
columns c2 and c4 in r1 and r2 can each be swapped in such a way that totals are not altered i.e.:
c1 c2 c3 c4
r1 0 0 0 1 = 1
r2 1 1 0 0 = 2
r3 0 0 0 0 = 0
r4 1 1 1 1 = 4
------------
2 2 1 2
This all needs to be done randomly so as not to introduce any bias.
I have one solution that works. I randomly select a row and two columns. If they yield a 10 or 01 pattern then I randomly select another row and check the same columns to see if they yield the opposite pattern. If either of them fail I start over and select a new element.
This method works but I only "hit" the correct patterns about 10% of the time. In a large matrix or in one with few 1's in the rows I waste a lot of time "missing". I figured that there had to be a more intelligent way of choosing elements in the matrix but still doing it randomly.
The code for the working method is:
def isSwappable(matrix: Matrix): Tuple2[Tuple2[Int, Int], Tuple2[Int, Int]] = {
val indices = getRowAndColIndices(matrix)
(matrix(indices._1._1)(indices._2._1), matrix(indices._1._1)(indices._2._2)) match {
case (1, 0) => {
if (matrix(indices._1._2)(indices._2._1) == 0 & matrix(indices._1._2)(indices._2._2) == 1) {
indices
}
else {
isSwappable(matrix)
}
}
case (0, 1) => {
if (matrix(indices._1._2)(indices._2._1) == 1 & matrix(indices._1._2)(indices._2._2) == 0) {
indices
}
else {
isSwappable(matrix)
}
}
case _ => {
isSwappable(matrix)
}
}
}
def getRowAndColIndices(matrix: Matrix): Tuple2[Tuple2[Int, Int], Tuple2[Int, Int]] = {
(getNextIndex(rnd.nextInt(matrix.size), matrix.size), getNextIndex(rnd.nextInt(matrix(0).size), matrix(0).size))
}
def getNextIndex(i: Int, constraint: Int): Tuple2[Int, Int] = {
val newIndex = rnd.nextInt(constraint)
newIndex match {
case `i` => getNextIndex(i, constraint)
case _ => (i, newIndex)
}
}
I figured a more efficient way to handle this was to remove any rows that could not be used (all 1's or 0's) and then choose an element randomly. From there I could filter out any columns in the row that had the same value and the choose from the remaining columns.
Once the first row and column are chosen I then filter out the rows that can not provide the required pattern and then choose from the remaining rows.
This works for the most part but the problem that I can't figure out how to deal with is what happens when there are no columns or rows to choose from? I don't want to loop infinitely trying to find the pattern I need and I need a way of starting over if I do get an empty list of rows or columns to choose from.
The code that I have so far that sort of works (until I get an empty list) is:
def getInformativeRowIndices(matrix: Matrix) = (
matrix
.zipWithIndex
.filter(_._1.distinct.size > 1)
.map(_._2)
.toList
)
def getRowsWithOppositeValueInColumn(col: Int, value: Int, matrix: Matrix) = (
matrix
.zipWithIndex
.filter(_._1(col) != value)
.map(_._2)
.toList
)
def getColsWithOppositeValueInSameRow(row: Int, value: Int, matrix: Matrix) = (
matrix(row)
.zipWithIndex
.filter(_._1 != value)
.map(_._2)
.toList
)
def process(matrix: Matrix): Tuple2[Tuple2[Int, Int], Tuple2[Int, Int]] = {
val row1Indices = getInformativeRowIndices(matrix)
if (row1Indices.isEmpty) sys.error("No informative rows")
val row1 = row1Indices(rnd.nextInt(row1Indices.size))
val col1 = rnd.nextInt(matrix(0).size)
val colIndices = getColsWithOppositeValueInSameRow(row1, matrix(row1)(col1), matrix)
if (colIndices.isEmpty) process(matrix)
val col2 = colIndices(rnd.nextInt(colIndices.size))
val row2Indices = getRowsWithOppositeValueInColumn(col1, matrix(row1)(col1), matrix)
.intersect(getRowsWithOppositeValueInColumn(col2, matrix(row1)(col2), matrix))
println(row2Indices)
if (row2Indices.isEmpty) process(matrix)
val row2 = row2Indices(rnd.nextInt(row2Indices.size))
((row1, row2), (col1, col2))
}
I think the recursive methods are wrong and don't really work here. Also, I am really just trying to improve the speed of cell selection so any ideas or suggestions would be greatly appreciated.
EDIT:
I have had a chance to play with this little more and have come up with another solution but it does not seem to be much faster then just randomly choosing cells in the matrix. Also, I should add that the matrix needs to be swapped about 30000 times in succession in order for it to be considered randomised and I need to generate 5000 random matrices for each test of which I have at least another 5000 to do so performance is kind of important.
The current solution (besides random cell selection is:
Randomly select 2 rows from the matrix
subtract one row from the other and put it in an Array
if the new Array contains both a 1 and -1 then we can swap
The logic of the subtraction looks like this:
0 1 0 0
- 1 0 0 1
---------------
-1 1 0 -1
The method that does this looks like this:
def findSwaps(matrix: Matrix, iterations: Int): Boolean = {
var result = false
val mtxLength = matrix.length
val row1 = rnd.nextInt(mtxLength)
val row2 = getNextIndex(row1, mtxLength)
val difference = subRows(matrix(row1), matrix(row2))
if (difference.min == -1 & difference.max == 1) {
val zeroOne = difference.zipWithIndex.filter(_._1 == -1).map(_._2)
val oneZero = difference.zipWithIndex.filter(_._1 == 1).map(_._2)
val col1 = zeroOne(rnd.nextInt(zeroOne.length))
val col2 = oneZero(rnd.nextInt(oneZero.length))
swap(matrix, row1, row2, col1, col2)
result = true
}
result
}
The matrix row subtraction looks like this:
def subRows(a: Array[Int], b: Array[Int]): Array[Int] = (a, b).zipped.map(_ - _)
And the actual swap looks like this:
def swap(matrix: Matrix, row1: Int, row2: Int, col1: Int, col2: Int) = {
val temp = (matrix(row1)(col1), matrix(row1)(col2))
matrix(row1)(col1) = matrix(row2)(col1)
matrix(row1)(col2) = matrix(row2)(col2)
matrix(row2)(col1) = temp._1
matrix(row2)(col2) = temp._2
matrix
}
This works much better than before in that I get have between 80% and 90% success for an attempted swap (it was only about 10% with the random cell selection) however... it is still taking about 2.5 minutes to generate 1000 randomised matrices.
Any ideas on how to improve the speed?
I'm going to assume the matrices are big so that storage of the order of (matrix size squared) is not viable (for reasons of either speed or memory).
If you have a sparse matrix, you can enter the index of each 1 in each column in a set (here I show the compact way to do things, but you may wish to iterate with while loops for speed):
val mtx = Array(Array(0,1,0,0),Array(1,0,0,1),Array(0,0,0,0),Array(1,1,1,1))
val cols = mtx.transpose.map(x => x.zipWithIndex.filter(_._1==1).map(_._2).toSet)
Now for each column, a later column contains compatible pairs (at least one) if and only if only the following two sets are nonempty:
def xorish(a: Set[Int], b: Set[Int]) = (a--b, b--a)
So the answer will involve computing these sets and testing whether they're both nonempty.
Now the question is what you mean by "sample randomly". Randomly sampling single 1,0 pairs is not the same as randomly sampling possible swaps. To see this, consider the following:
1 0 1 0
1 0 1 0
1 0 1 0
0 1 1 0
0 1 1 0
0 1 0 1
The two columns on the left have nine possible swaps. The two on the right have only five possible swaps. But if you are looking for (1,0) patterns, you will sample only three times on the left vs. five on the right; if you are looking for either (1,0) or (0,1), you will sample six and six, which again distorts the probabilities. The only way to fix this is either to not be clever, and randomly sample a second time (which in the first case will work out with a usable swap 3/5 of the time, while only 1/5 in the second), or to basically compute every possible pair for swapping (or at least how many pairs there are) and select from that predefined set.
If we want to do the latter, we note that for each pair of nonidentical columns, we can compute the two sets to swap among, and we know the sizes and the product is the total number of possibilities. In order to avoid instantiating all the possibilities, we can create
val poss = {
for (i<-cols.indices; j <- (i+1) until cols.length) yield
(i, j, (cols(i)--cols(j)).toArray, (cols(j)--cols(i)).toArray)
}.filter{ case (_,_,a,b) => a.length>0 && b.length>0 }
and then count how many there are:
val cuml = poss.map{ case (_,_,a,b) => a.size*b.size }.scanLeft(0)(_ + _).toArray
Now to pick a number at random, we pick a number between 0 and cuml.last and pick out which bucket this is and which item within the bucket:
def pickItem(cuml: Array[Int], poss: Seq[(Int,Int,Array[Int],Array[Int])]) = {
val n = util.Random.nextInt(cuml.last)
val k = {
val i = java.util.Arrays.binarySearch(cuml,n)
if (i<0) -i-2 else i
}
val j = n - cuml(k)
val bucket = poss(k)
(
bucket._1, bucket._2,
bucket._3(j % bucket._3.size), bucket._4(j / bucket._3.size)
)
}
This ends up returning (c1,c2,r1,r2) selected randomly.
Now that you have the coordinates, you can create the new matrix however you wish. (Most efficient is probably to do an in-place swap of the entries, and then swap back when you want to try again.)
Note that this is only sensible for a large number of independent swaps from the same starting matrix. If you instead want to do this iteratively and maintain independence, you are probably best off doing this randomly after all unless the matrices are extremely sparse, at which point it's worth simply storing the matrices in some standard sparse matrix format (i.e. by index of nonzero entries) and doing your manipulation on those (probably with mutable sets and an update strategy, since the consequences of a single swap are confined to about n of the entries in an n*n matrix).

Algorithm for combining different age groups together based on their values

Let's say we have an array of age groups and an array of the number of people in each age group
For example:
Ages = ("1-13", "14-20", "21-30", "31-40", "41-50", "51+")
People = (1, 10, 21, 3, 2, 1)
I want to have an algorithm that combines these age groups with the following logic if there are fewer than 5 people in each group. The algorithm that I have so far does the following:
Start from the last element (e.g., "51+") can you combine it with the next group? (here "41-50") if yes add the numbers 1+2 and combine their labels. So we get the following
Ages = ("1-13", "14-20", "21-30", "31-40", "41+")
People = (1, 10, 21, 3, 3)
Take the last one again (here is "41+"). Can you combine it with the next group (31-40)? the answer is yes so we get:
Ages = ("1-13", "14-20", "21-30", "31+")
People = (1, 10, 21, 6)
since the group 31+ now has 6 members we cannot collapse it into the next group.
we cannot collapse "21-30" into the next one "14-20" either
"14-20" also has 10 people (>5) so we don't do anything on this either
for the first one ("1-13") since we have only one person and it is the last group we combine it with the next group "14-20" and get the following
Ages = ("1-20", "21-30", "31+")
People = (11, 21, 6)
I have an implementation of this algorithm that uses many flags to keep track of whether or not any data is changed and it makes a number of passes on the two arrays to finish this task.
My question is if you know any efficient way of doing the same thing? any data structure that can help? any algorithm that can help me do the same thing without doing too much bookkeeping would be great.
Update:
A radical example would be (5,1,5)
in the first pass it becomes (5,6) [collapsing the one on the right into the one in the middle]
then we have (5,6). We cannot touch 6 since it is larger than our threshold:5. so we go to the next one (which is element on the very left 5) since it is less than or equal to 5 and since it is the last one on the left we group it with the one on its right. so we finally get (11)
Here is an OCaml solution of a left-to-right merge algorithm:
let close_group acc cur_count cur_names =
(List.rev cur_names, cur_count) :: acc
let merge_small_groups mini l =
let acc, cur_count, cur_names =
List.fold_left (
fun (acc, cur_count, cur_names) (name, count) ->
if cur_count <= mini || count <= mini then
(acc, cur_count + count, name :: cur_names)
else
(close_group acc cur_count cur_names, count, [name])
) ([], 0, []) l
in
List.rev (close_group acc cur_count cur_names)
let input = [
"1-13", 1;
"14-20", 10;
"21-30", 21;
"31-40", 3;
"41-50", 2;
"51+", 1
]
let output = merge_small_groups 5 input
(* output = [(["1-13"; "14-20"], 11); (["21-30"; "31-40"; "41-50"; "51+"], 27)] *)
As you can see, the result of merging from left to right may not be what you want.
Depending on the goal, it may make more sense to merge the pair of consecutive elements whose sum is smallest and iterate until all counts are above the minimum of 5.
Here is my scala approach.
We start with two lists:
val people = List (1, 10, 21, 3, 2, 1)
val ages = List ("1-13", "14-20", "21-30", "31-40", "41-50", "51+")
and combine them to a kind of mapping:
val agegroup = ages.zip (people)
define a method to merge two Strings, describing an (open ended) interval. The first parameter is, if any, the one with the + in "51+".
/**
combine age-strings
a+ b-c => b+
a-b c-d => c-b
*/
def merge (xs: String, ys: String) = {
val xab = xs.split ("[+-]")
val yab = ys.split ("-")
if (xs.contains ("+")) yab(0) + "+" else
yab (0) + "-" + xab (1)
}
Here is the real work:
/**
reverse the list, combine groups < threshold.
*/
def remap (map: List [(String, Int)], threshold : Int) = {
def remap (mappings: List [(String, Int)]) : List [(String, Int)] = mappings match {
case Nil => Nil
case x :: Nil => x :: Nil
case x :: y :: xs => if (x._2 > threshold) x :: remap (y :: xs) else
remap ((merge (x._1, y._1), x._2 + y._2) :: xs) }
val nearly = (remap (map.reverse)).reverse
// check for first element
if (! nearly.isEmpty && nearly.length > 1 && nearly (0)._2 < threshold) {
val a = nearly (0)
val b = nearly (1)
val rest = nearly.tail.tail
(merge (b._1, a._1), a._2 + b._2) :: rest
} else nearly
}
and invocation
println (remap (agegroup, 5))
with result:
scala> println (remap (agegroup, 5))
List((1-20,11), (21-30,21), (31+,6))
The result is a list of pairs, age-group and membercount.
I guess the main part is easy to understand: There are 3 basic cases: an empty list, which can't be grouped, a list of one group, which is the solution itself, and more than one element.
If the first element (I reverse the list in the beginning, to start with the end) is bigger than 5 (6, whatever), yield it, and procede with the rest - if not, combine it with the second, and take this combined element and call it with the rest in a recursive way.
If 2 elements get combined, the merge-method for the strings is called.
The map is remapped, after reverting it, and the result reverted again. Now the first element has to be inspected and eventually combined.
We're done.
I think a good data structure would be a linked list of pairs, where each pair contains the age span and the count. Using that, you can easily walk the list, and join two pairs in O(1).

Algorithm Issue: letter combinations

I'm trying to write a piece of code that will do the following:
Take the numbers 0 to 9 and assign one or more letters to this number. For example:
0 = N,
1 = L,
2 = T,
3 = D,
4 = R,
5 = V or F,
6 = B or P,
7 = Z,
8 = H or CH or J,
9 = G
When I have a code like 0123, it's an easy job to encode it. It will obviously make up the code NLTD. When a number like 5,6 or 8 is introduced, things get different. A number like 051 would result in more than one possibility:
NVL and NFL
It should be obvious that this gets even "worse" with longer numbers that include several digits like 5,6 or 8.
Being pretty bad at mathematics, I have not yet been able to come up with a decent solution that will allow me to feed the program a bunch of numbers and have it spit out all the possible letter combinations. So I'd love some help with it, 'cause I can't seem to figure it out. Dug up some information about permutations and combinations, but no luck.
Thanks for any suggestions/clues. The language I need to write the code in is PHP, but any general hints would be highly appreciated.
Update:
Some more background: (and thanks a lot for the quick responses!)
The idea behind my question is to build a script that will help people to easily convert numbers they want to remember to words that are far more easily remembered. This is sometimes referred to as "pseudo-numerology".
I want the script to give me all the possible combinations that are then held against a database of stripped words. These stripped words just come from a dictionary and have all the letters I mentioned in my question stripped out of them. That way, the number to be encoded can usually easily be related to a one or more database records. And when that happens, you end up with a list of words that you can use to remember the number you wanted to remember.
It can be done easily recursively.
The idea is that to handle the whole code of size n, you must handle first the n - 1 digits.
Once you have all answers for n-1 digits, the answers for the whole are deduced by appending to them the correct(s) char(s) for the last one.
There's actually a much better solution than enumerating all the possible translations of a number and looking them up: Simply do the reverse computation on every word in your dictionary, and store the string of digits in another field. So if your mapping is:
0 = N,
1 = L,
2 = T,
3 = D,
4 = R,
5 = V or F,
6 = B or P,
7 = Z,
8 = H or CH or J,
9 = G
your reverse mapping is:
N = 0,
L = 1,
T = 2,
D = 3,
R = 4,
V = 5,
F = 5,
B = 6,
P = 6,
Z = 7,
H = 8,
J = 8,
G = 9
Note there's no mapping for 'ch', because the 'c' will be dropped, and the 'h' will be converted to 8 anyway.
Then, all you have to do is iterate through each letter in the dictionary word, output the appropriate digit if there's a match, and do nothing if there isn't.
Store all the generated digit strings as another field in the database. When you want to look something up, just perform a simple query for the number entered, instead of having to do tens (or hundreds, or thousands) of lookups of potential words.
The general structure you want to hold your number -> letter assignments is an array or arrays, similar to:
// 0 = N, 1 = L, 2 = T, 3 = D, 4 = R, 5 = V or F, 6 = B or P, 7 = Z,
// 8 = H or CH or J, 9 = G
$numberMap = new Array (
0 => new Array("N"),
1 => new Array("L"),
2 => new Array("T"),
3 => new Array("D"),
4 => new Array("R"),
5 => new Array("V", "F"),
6 => new Array("B", "P"),
7 => new Array("Z"),
8 => new Array("H", "CH", "J"),
9 => new Array("G"),
);
Then, a bit of recursive logic gives us a function similar to:
function GetEncoding($number) {
$ret = new Array();
for ($i = 0; $i < strlen($number); $i++) {
// We're just translating here, nothing special.
// $var + 0 is a cheap way of forcing a variable to be numeric
$ret[] = $numberMap[$number[$i]+0];
}
}
function PrintEncoding($enc, $string = "") {
// If we're at the end of the line, then print!
if (count($enc) === 0) {
print $string."\n";
return;
}
// Otherwise, soldier on through the possible values.
// Grab the next 'letter' and cycle through the possibilities for it.
foreach ($enc[0] as $letter) {
// And call this function again with it!
PrintEncoding(array_slice($enc, 1), $string.$letter);
}
}
Three cheers for recursion! This would be used via:
PrintEncoding(GetEncoding("052384"));
And if you really want it as an array, play with output buffering and explode using "\n" as your split string.
This kind of problem are usually resolved with recursion. In ruby, one (quick and dirty) solution would be
#values = Hash.new([])
#values["0"] = ["N"]
#values["1"] = ["L"]
#values["2"] = ["T"]
#values["3"] = ["D"]
#values["4"] = ["R"]
#values["5"] = ["V","F"]
#values["6"] = ["B","P"]
#values["7"] = ["Z"]
#values["8"] = ["H","CH","J"]
#values["9"] = ["G"]
def find_valid_combinations(buffer,number)
first_char = number.shift
#values[first_char].each do |key|
if(number.length == 0) then
puts buffer + key
else
find_valid_combinations(buffer + key,number.dup)
end
end
end
find_valid_combinations("",ARGV[0].split(""))
And if you run this from the command line you will get:
$ ruby r.rb 051
NVL
NFL
This is related to brute-force search and backtracking
Here is a recursive solution in Python.
#!/usr/bin/env/python
import sys
ENCODING = {'0':['N'],
'1':['L'],
'2':['T'],
'3':['D'],
'4':['R'],
'5':['V', 'F'],
'6':['B', 'P'],
'7':['Z'],
'8':['H', 'CH', 'J'],
'9':['G']
}
def decode(str):
if len(str) == 0:
return ''
elif len(str) == 1:
return ENCODING[str]
else:
result = []
for prefix in ENCODING[str[0]]:
result.extend([prefix + suffix for suffix in decode(str[1:])])
return result
if __name__ == '__main__':
print decode(sys.argv[1])
Example output:
$ ./demo 1
['L']
$ ./demo 051
['NVL', 'NFL']
$ ./demo 0518
['NVLH', 'NVLCH', 'NVLJ', 'NFLH', 'NFLCH', 'NFLJ']
Could you do the following:
Create a results array.
Create an item in the array with value ""
Loop through the numbers, say 051 analyzing each one individually.
Each time a 1 to 1 match between a number is found add the correct value to all items in the results array.
So "" becomes N.
Each time a 1 to many match is found, add new rows to the results array with one option, and update the existing results with the other option.
So N becomes NV and a new item is created NF
Then the last number is a 1 to 1 match so the items in the results array become
NVL and NFL
To produce the results loop through the results array, printing them, or whatever.
Let pn be a list of all possible letter combinations of a given number string s up to the nth digit.
Then, the following algorithm will generate pn+1:
digit = s[n+1];
foreach(letter l that digit maps to)
{
foreach(entry e in p(n))
{
newEntry = append l to e;
add newEntry to p(n+1);
}
}
The first iteration is somewhat of a special case, since p-1 is undefined. You can simply initialize p0 as the list of all possible characters for the first character.
So, your 051 example:
Iteration 0:
p(0) = {N}
Iteration 1:
digit = 5
foreach({V, F})
{
foreach(p(0) = {N})
{
newEntry = N + V or N + F
p(1) = {NV, NF}
}
}
Iteration 2:
digit = 1
foreach({L})
{
foreach(p(1) = {NV, NF})
{
newEntry = NV + L or NF + L
p(2) = {NVL, NFL}
}
}
The form you want is probably something like:
function combinations( $str ){
$l = len( $str );
$results = array( );
if ($l == 0) { return $results; }
if ($l == 1)
{
foreach( $codes[ $str[0] ] as $code )
{
$results[] = $code;
}
return $results;
}
$cur = $str[0];
$combs = combinations( substr( $str, 1, $l ) );
foreach ($codes[ $cur ] as $code)
{
foreach ($combs as $comb)
{
$results[] = $code.$comb;
}
}
return $results;}
This is ugly, pidgin-php so please verify it first. The basic idea is to generate every combination of the string from [1..n] and then prepend to the front of all those combinations each possible code for str[0]. Bear in mind that in the worst case this will have performance exponential in the length of your string, because that much ambiguity is actually present in your coding scheme.
The trick is not only to generate all possible letter combinations that match a given number, but to select the letter sequence that is most easy to remember. A suggestion would be to run the soundex algorithm on each of the sequence and try to match against an English language dictionary such as Wordnet to find the most 'real-word-sounding' sequences.

Resources