Why my Binary Search implementation in Scala is so slow? - algorithm

Recently, I implemented this Binary Search, which is supposed to run under 6 seconds for Scala, yet it runs for 12-13 seconds on the machine that checks the assignments.
Note before you read the code: the input consists of two lines: first - list of numbers to search in, and second - list of "search terms" to search in the list of numbers. Expected output just lists the indexes of each term in the list of numbers. Each input can be maximum of length 10^5 and each number maximum of size 10^9.
For example:
Input:
5 1 5 8 12 13 //note, that the first number 5 indicates the length of the
following sequence
5 8 1 23 1 11 //note, that the first number 5 indicates the length of the
following sequence
Output:
2 0 -1 0 -1 // index of each term in the input array
My solution:
object BinarySearch extends App {
val n_items = readLine().split(" ").map(BigInt(_))
val n = n_items(0)
val items = n_items.drop(1)
val k :: terms = readLine().split(" ").map(BigInt(_)).toList
println(search(terms, items).mkString(" "))
def search(terms: List[BigInt], items:Array[BigInt]): Array[BigInt] = {
#tailrec
def go(terms: List[BigInt], results: Array[BigInt]): Array[BigInt] = terms match {
case List() => results
case head :: tail => go(tail, results :+ find(head))
}
def find(term: BigInt): BigInt = {
#tailrec
def go(left: BigInt, right: BigInt): BigInt = {
if (left > right) { -1 }
else {
val middle = left + (right - left) / 2
val middle_val = items(middle.toInt)
middle_val match {
case m if m == term => middle
case m if m <= term => go(middle + 1, right)
case m if m > term => go(left, middle - 1)
}
}
}
go(0, n - 1)
}
go(terms, Array())
}
}
What makes this code so slow? Thank you

I am worried about the complexity of
results :+ find(head)
Appending an item to a list of length L is O(L) (see here), so if you have n results to compute, the complexity will be O(n*n).
Try using a mutable ArrayBuffer instead of an Array to accumulate the results, or simply mapping the input terms through the find function.
In other words replace
go(terms, Array())
with
terms.map( x => find(x) ).toArray
By the way, the limits on the problem are small enough that using BigInt is overkill and probably making the code significantly slower. Normal ints should be large enough for this problem.

Related

Generate random numbers where the difference is always positive

I am trying to generate some numbers, such that the difference is always positive. the user inputs the number of digits and the amount of rows they want. for example 3 digits 3 rows will produce
971
888
121
I want to make sure the difference of those is always positive. is there some kind of algorithm I can use. right now i just have my program create numbers, then subtract them and if it comes out negative, it will do it again... and again. It is very slow.
I was thinking of first generating the difference and then adding to it until the amount of desired rows is reached. But i ran into problems if i generates a very large number.
here is the code i use to generate a random number with X digits, just in case it matters
private fun createRandomNumber(digits: Int): Int {
val numberArray = IntArray(digits)
for (number in 0 until numberArray.size){
numberArray[number] = 9
}
val maxnumber:Int = numberArray.joinToString("").toInt()
numberArray[0] = 1
for (number in 1 until numberArray.size){
numberArray[number] = 0
}
val minnumber:Int = numberArray.joinToString("").toInt()
return (minnumber..maxnumber).random()
}
based on the suggestion by Jeff Bowman, I began by sorting an array with all the numbers that are generated and it speeds everything up to an acceptable amount!
Even when #forpas solution is fine, it still runs in O(n log n) because of the final sorting. My solution just generates the increasing intervals where to generate random numbers (for uniformity distribution), and then map each interval to a random number in that range, hence avoiding the need to sort the final list. Complexity is O(n)
I chose to use Stream to avoid mutation or explicit recursion, but is not mandatory.
Example
fun main(args: Array<String>) {
val count = 20L
val digits = 5
val min = pow(10.0, digits.toDouble() - 1).toLong()
val max = min*10 - 1
val gap = (max - min)/count + 1
val numbers =
Stream.iterate(Pair(min, min + gap)) { (_, prev) -> Pair(prev, prev + gap) }
.map { (start, end) -> Random.nextLong(start, end) }
.limit(count)
.collect(Collectors.toList())
numbers.forEach(::println)
}
Output
11298
16284
20841
26084
31960
35538
37208
45325
46970
52918
57514
59769
67689
70135
75338
78075
84561
86652
91938
99931
I would use this function to create a random number with a certain number of digits:
fun createRandomNumber(digits: Int) = (10f.pow(digits - 1).toInt() until 10f.pow(digits).toInt()).shuffled().first()
you will need this import:
import kotlin.math.pow
And then with this:
fun main(args: Array<String>) {
print("how many numbers?: ")
val numbers = readLine()!!.toInt()
print("how many digits?: ")
val digits = readLine()!!.toInt()
val set = mutableSetOf<Int>()
do {
set.add(createRandomNumber(digits))
} while (set.size < numbers)
val array = set.toTypedArray().sortedArrayDescending()
array.forEach { println(it) }
}
you get the user's input and create a set of random numbers.
With toTypedArray().sortedArrayDescending() you get the array.

Scala best way to find pairs in a collection [duplicate]

This question already has answers here:
Composing a list of all pairs
(3 answers)
Closed 5 years ago.
I'm trying to find the most optimal way of finding pairs in a Scala collection. For example,
val list = List(1,2,3)
should produce these pairs
(1,2) (1,3) (2,1) (2,3) (3,1) (3,2)
My current implement seems quite expensive. How can I further optimize this?
val pairs = list.flatMap { currentElement =>
val clonedList: mutable.ListBuffer[Int] = list.to[ListBuffer]
val currentIndex = list.indexOf(currentElement)
val removedValue = clonedList.remove(currentIndex)
clonedList.map { y =>
(currentElement, y)
}
}
val l = Array(1,2,3,4)
val result = scala.collection.mutable.HashSet[(Int, Int)]()
for(i <- 0 until l.size) {
for(j<- (i+1) until l.size) {
result += l(i)->l(j)
result += l(j)->l(i)
}
}
Several optimizations here. First, with the second loop, we only traverse the list from the current element to the end, dividing the number of iterations by two. Then we limit the number of object creations to the minimum (Only tuples are created and added to a mutable hashset). Finally, with the hashset you handle the duplicates for free. An additional optimization would be to check if the set already contains the tuple to avoid creating an object for nothing.
For 1,000 elements, it takes less than 1s on my laptop. 7s for 10k distinct elements.
Note that recursively, you could do it this way:
def combi(s : Seq[Int]) : Seq[(Int, Int)] =
if(s.isEmpty)
Seq()
else
s.tail.flatMap(x=> Seq(s.head -> x, x -> s.head)) ++ combi(s.tail)
It takes a little bit more than 1s for 1000 elements.
Supposing that "most optimal way" could be treated differently (e.g. most of time I treat it the one which allows myself to be more productive) I suggest the following approach:
val originalList = (1 to 1000) toStream
def orderedPairs[T](list: Stream[T]) = list.combinations(2).map( p => (p(0), p(1)) ).toStream
val pairs = orderedPairs(originalList) ++ orderedPairs(originalList.reverse)
println(pairs.slice(0, 1000).toList)

Scala Filter and Collect is slow

I am just beginning with Scala development and am trying to filter out unnecessary lines from an iterator using filter and collect. But the operation seems to be too slow.
val src = Source.fromFile("/home/Documents/1987.csv") // 1.2 Million
val iter = src.getLines().map(_.split(":"))
val iter250 = iter.take(250000) // Only interested in the first 250,000
val intrestedIndices = range(1, 100000, 3).toSeq // This could be any order
val slicedData = iter250.zipWithIndex
// Takes 3 minutes
val firstCase = slicedData.collect { case (x, i) if intrestedIndices.contains(i) => x }.size
// Takes 3 minutes
val secondCase = slicedData.filter(x => intrestedIndices.contains(x._2)).size
// Takes 1 second
val thirdCase = slicedData.collect { case (x,i ) if i % 3 == 0 => x}.size
It appears the intrestedIndices.contains(_) part is slowing down the program in the first and second case. Is there an alternative way to speed this process up.
This answer helped solve the problem.
You iterate over all interestedIndices in first two cases in linear time. Use Set instead of Seq to improve performance – Sergey Lagutin
For the record, here's a method to filter with an (ordered) Seq of indices, not necessarily equidistant, without scanning the indices at each step:
def filterInteresting[T](it: Iterator[T], indices: Seq[Int]): Iterator[T] =
it.zipWithIndex.scanLeft((indices, None: Option[T])) {
case ((indices, _), (elem, index)) => indices match {
case h :: t if h == index => (t, Some(elem))
case l => (l, None)
}
}.map(_._2).flatten

What's a more efficient implementation of this puzzle?

The puzzle
For every input number n (n < 10) there is an output number m such that:
m's first digit is n
m is an n digit number
every 2 digit sequence inside m must be a different prime number
The output should be m where m is the smallest number that fulfils the conditions above. If there is no such number, the output should be -1;
Examples
n = 3 -> m = 311
n = 4 -> m = 4113 (note that this is not 4111 as that would be repeating 11)
n = 9 -> m = 971131737
My somewhat working solution
Here's my first stab at this, the "brute force" approach. I am looking for a more elegant solution as this is very inefficient as n grows larger.
public long GetM(int n)
{
long start = n * (long)Math.Pow((double)10, (double)n - 1);
long end = n * (long)Math.Pow((double)10, (double)n);
for (long x = start; x < end; x++)
{
long xCopy = x;
bool allDigitsPrime = true;
List<int> allPrimeNumbers = new List<int>();
while (xCopy >= 10)
{
long lastDigitsLong = xCopy % 100;
int lastDigits = (int)lastDigitsLong;
bool lastDigitsSame = allPrimeNumbers.Count != 0 && allPrimeNumbers.Contains(lastDigits);
if (!IsPrime(lastDigits) || lastDigitsSame)
{
allDigitsPrime = false;
break;
}
xCopy /= 10;
allPrimeNumbers.Add(lastDigits);
}
if (n != 1 && allDigitsPrime)
{
return x;
}
}
return -1;
}
Initial thoughts on how this could be made more efficient
So, clearly the bottleneck here is traversing through the whole list of numbers that could fulfil this condition from n.... to (n+1).... . Instead of simply incrementing the number of every iteration of the loop, there must be some clever way of skipping numbers based on the requirement that the 2 digit sequences must be prime. For instance for n = 5, there is no point going through 50000 - 50999 (50 isn't prime), 51200 - 51299 (12 isn't prime), but I wasn't quite sure how this could be implemented or if it would be enough of an optimization to make the algorithm run for n=9.
Any ideas on this approach or a different optimization approach?
You don't have to try all numbers. You can instead use a different strategy, summed up as "try appending a digit".
Which digit? Well, a digit such that
it forms a prime together with your current last digit
the prime formed has not occurred in the number before
This should be done recursively (not iteratively), because you may run out of options and then you'd have to backtrack and try a different digit earlier in the number.
This is still an exponential time algorithm, but it avoids most of the search space because it never tries any numbers that don't fit the rule that every pair of adjacent digits must form a prime number.
Here's a possible solution, in R, using recursion . It would be interesting to build a tree of all the possible paths
# For every input number n (n < 10)
# there is an output number m such that:
# m's first digit is n
# m is an n digit number
# every 2 digit sequence inside m must be a different prime number
# Need to select the smallest m that meets the criteria
library('numbers')
mNumHelper <- function(cn,n,pr,cm=NULL) {
if (cn == 1) {
if (n==1) {
return(1)
}
firstDigit <- n
} else {
firstDigit <- mod(cm,10)
}
possibleNextNumbers <- pr[floor(pr/10) == firstDigit]
nPossible = length(possibleNextNumbers)
if (nPossible == 1) {
nextPrime <- possibleNextNumbers
} else{
# nextPrime <- sample(possibleNextNumbers,1)
nextPrime <- min(possibleNextNumbers)
}
pr <- pr[which(pr!=nextPrime)]
if (is.null(cm)) {
cm <- nextPrime
} else {
cm = cm * 10 + mod(nextPrime,10)
}
cn = cn + 1
if (cn < n) {
cm = mNumHelper(cn,n,pr,cm)
}
return(cm)
}
mNum <- function(n) {
pr<-Primes(10,100)
m <- mNumHelper(1,n,pr)
}
for (i in seq(1,9)) {
print(paste('i',i,'m',mNum(i)))
}
Sample output
[1] "i 1 m 1"
[1] "i 2 m 23"
[1] "i 3 m 311"
[1] "i 4 m 4113"
[1] "i 5 m 53113"
[1] "i 6 m 611317"
[1] "i 7 m 7113173"
[1] "i 8 m 83113717"
[1] "i 9 m 971131737"
Solution updated to select the smallest prime from the set of available primes, and remove bad path check since it's not required.
I just made a list of the two-digit prime numbers, then solved the problem by hand; it took only a few minues. Not every problem requires a computer!

Algorithm Issue: letter combinations

I'm trying to write a piece of code that will do the following:
Take the numbers 0 to 9 and assign one or more letters to this number. For example:
0 = N,
1 = L,
2 = T,
3 = D,
4 = R,
5 = V or F,
6 = B or P,
7 = Z,
8 = H or CH or J,
9 = G
When I have a code like 0123, it's an easy job to encode it. It will obviously make up the code NLTD. When a number like 5,6 or 8 is introduced, things get different. A number like 051 would result in more than one possibility:
NVL and NFL
It should be obvious that this gets even "worse" with longer numbers that include several digits like 5,6 or 8.
Being pretty bad at mathematics, I have not yet been able to come up with a decent solution that will allow me to feed the program a bunch of numbers and have it spit out all the possible letter combinations. So I'd love some help with it, 'cause I can't seem to figure it out. Dug up some information about permutations and combinations, but no luck.
Thanks for any suggestions/clues. The language I need to write the code in is PHP, but any general hints would be highly appreciated.
Update:
Some more background: (and thanks a lot for the quick responses!)
The idea behind my question is to build a script that will help people to easily convert numbers they want to remember to words that are far more easily remembered. This is sometimes referred to as "pseudo-numerology".
I want the script to give me all the possible combinations that are then held against a database of stripped words. These stripped words just come from a dictionary and have all the letters I mentioned in my question stripped out of them. That way, the number to be encoded can usually easily be related to a one or more database records. And when that happens, you end up with a list of words that you can use to remember the number you wanted to remember.
It can be done easily recursively.
The idea is that to handle the whole code of size n, you must handle first the n - 1 digits.
Once you have all answers for n-1 digits, the answers for the whole are deduced by appending to them the correct(s) char(s) for the last one.
There's actually a much better solution than enumerating all the possible translations of a number and looking them up: Simply do the reverse computation on every word in your dictionary, and store the string of digits in another field. So if your mapping is:
0 = N,
1 = L,
2 = T,
3 = D,
4 = R,
5 = V or F,
6 = B or P,
7 = Z,
8 = H or CH or J,
9 = G
your reverse mapping is:
N = 0,
L = 1,
T = 2,
D = 3,
R = 4,
V = 5,
F = 5,
B = 6,
P = 6,
Z = 7,
H = 8,
J = 8,
G = 9
Note there's no mapping for 'ch', because the 'c' will be dropped, and the 'h' will be converted to 8 anyway.
Then, all you have to do is iterate through each letter in the dictionary word, output the appropriate digit if there's a match, and do nothing if there isn't.
Store all the generated digit strings as another field in the database. When you want to look something up, just perform a simple query for the number entered, instead of having to do tens (or hundreds, or thousands) of lookups of potential words.
The general structure you want to hold your number -> letter assignments is an array or arrays, similar to:
// 0 = N, 1 = L, 2 = T, 3 = D, 4 = R, 5 = V or F, 6 = B or P, 7 = Z,
// 8 = H or CH or J, 9 = G
$numberMap = new Array (
0 => new Array("N"),
1 => new Array("L"),
2 => new Array("T"),
3 => new Array("D"),
4 => new Array("R"),
5 => new Array("V", "F"),
6 => new Array("B", "P"),
7 => new Array("Z"),
8 => new Array("H", "CH", "J"),
9 => new Array("G"),
);
Then, a bit of recursive logic gives us a function similar to:
function GetEncoding($number) {
$ret = new Array();
for ($i = 0; $i < strlen($number); $i++) {
// We're just translating here, nothing special.
// $var + 0 is a cheap way of forcing a variable to be numeric
$ret[] = $numberMap[$number[$i]+0];
}
}
function PrintEncoding($enc, $string = "") {
// If we're at the end of the line, then print!
if (count($enc) === 0) {
print $string."\n";
return;
}
// Otherwise, soldier on through the possible values.
// Grab the next 'letter' and cycle through the possibilities for it.
foreach ($enc[0] as $letter) {
// And call this function again with it!
PrintEncoding(array_slice($enc, 1), $string.$letter);
}
}
Three cheers for recursion! This would be used via:
PrintEncoding(GetEncoding("052384"));
And if you really want it as an array, play with output buffering and explode using "\n" as your split string.
This kind of problem are usually resolved with recursion. In ruby, one (quick and dirty) solution would be
#values = Hash.new([])
#values["0"] = ["N"]
#values["1"] = ["L"]
#values["2"] = ["T"]
#values["3"] = ["D"]
#values["4"] = ["R"]
#values["5"] = ["V","F"]
#values["6"] = ["B","P"]
#values["7"] = ["Z"]
#values["8"] = ["H","CH","J"]
#values["9"] = ["G"]
def find_valid_combinations(buffer,number)
first_char = number.shift
#values[first_char].each do |key|
if(number.length == 0) then
puts buffer + key
else
find_valid_combinations(buffer + key,number.dup)
end
end
end
find_valid_combinations("",ARGV[0].split(""))
And if you run this from the command line you will get:
$ ruby r.rb 051
NVL
NFL
This is related to brute-force search and backtracking
Here is a recursive solution in Python.
#!/usr/bin/env/python
import sys
ENCODING = {'0':['N'],
'1':['L'],
'2':['T'],
'3':['D'],
'4':['R'],
'5':['V', 'F'],
'6':['B', 'P'],
'7':['Z'],
'8':['H', 'CH', 'J'],
'9':['G']
}
def decode(str):
if len(str) == 0:
return ''
elif len(str) == 1:
return ENCODING[str]
else:
result = []
for prefix in ENCODING[str[0]]:
result.extend([prefix + suffix for suffix in decode(str[1:])])
return result
if __name__ == '__main__':
print decode(sys.argv[1])
Example output:
$ ./demo 1
['L']
$ ./demo 051
['NVL', 'NFL']
$ ./demo 0518
['NVLH', 'NVLCH', 'NVLJ', 'NFLH', 'NFLCH', 'NFLJ']
Could you do the following:
Create a results array.
Create an item in the array with value ""
Loop through the numbers, say 051 analyzing each one individually.
Each time a 1 to 1 match between a number is found add the correct value to all items in the results array.
So "" becomes N.
Each time a 1 to many match is found, add new rows to the results array with one option, and update the existing results with the other option.
So N becomes NV and a new item is created NF
Then the last number is a 1 to 1 match so the items in the results array become
NVL and NFL
To produce the results loop through the results array, printing them, or whatever.
Let pn be a list of all possible letter combinations of a given number string s up to the nth digit.
Then, the following algorithm will generate pn+1:
digit = s[n+1];
foreach(letter l that digit maps to)
{
foreach(entry e in p(n))
{
newEntry = append l to e;
add newEntry to p(n+1);
}
}
The first iteration is somewhat of a special case, since p-1 is undefined. You can simply initialize p0 as the list of all possible characters for the first character.
So, your 051 example:
Iteration 0:
p(0) = {N}
Iteration 1:
digit = 5
foreach({V, F})
{
foreach(p(0) = {N})
{
newEntry = N + V or N + F
p(1) = {NV, NF}
}
}
Iteration 2:
digit = 1
foreach({L})
{
foreach(p(1) = {NV, NF})
{
newEntry = NV + L or NF + L
p(2) = {NVL, NFL}
}
}
The form you want is probably something like:
function combinations( $str ){
$l = len( $str );
$results = array( );
if ($l == 0) { return $results; }
if ($l == 1)
{
foreach( $codes[ $str[0] ] as $code )
{
$results[] = $code;
}
return $results;
}
$cur = $str[0];
$combs = combinations( substr( $str, 1, $l ) );
foreach ($codes[ $cur ] as $code)
{
foreach ($combs as $comb)
{
$results[] = $code.$comb;
}
}
return $results;}
This is ugly, pidgin-php so please verify it first. The basic idea is to generate every combination of the string from [1..n] and then prepend to the front of all those combinations each possible code for str[0]. Bear in mind that in the worst case this will have performance exponential in the length of your string, because that much ambiguity is actually present in your coding scheme.
The trick is not only to generate all possible letter combinations that match a given number, but to select the letter sequence that is most easy to remember. A suggestion would be to run the soundex algorithm on each of the sequence and try to match against an English language dictionary such as Wordnet to find the most 'real-word-sounding' sequences.

Resources