scala : better solution - performance

QUESTION:
We define super digit of an integer x using the following rules:
Iff x has only 1 digit, then its super digit is x.
Otherwise, the super digit of x is equal to the super digit of the digit-sum of x. Here, digit-sum of a number is defined as the sum of its digits.
For example, super digit of 9875 will be calculated as:
super-digit(9875) = super-digit(9+8+7+5)
= super-digit(29)
= super-digit(2+9)
= super-digit(11)
= super-digit(1+1)
= super-digit(2)
= 2.
You are given two numbers - n k. You have to calculate the super digit of P.
P is created when number n is concatenated k times. That is, if n = 123 and k = 3, then P = 123123123.
Input Format
Input will contain two space separated integers, n and k.
Output Format
Output the super digit of P, where P is created as described above.
Constraint
1≤n<10100000
1≤k≤105
Sample Input
148 3
Sample Output
3
Explanation
Here n = 148 and k = 3, so P = 148148148.
super-digit(P) = super-digit(148148148)
= super-digit(1+4+8+1+4+8+1+4+8)
= super-digit(39)
= super-digit(3+9)
= super-digit(12)
= super-digit(1+2)
= super-digit(3)
= 3.
I have written the following program to solve the above problem , but how to solve it even efficiently and is string operation efficient than math operation ??? and for few inputs it takes a long time for example
861568688536788 100000
object SuperDigit {
def main(args: Array[String]) {
/* Enter your code here. Read input from STDIN. Print output to STDOUT. Your class should be named Solution
*/
def generateString (no:String,re:BigInt , tot:BigInt , temp:String):String = {
if(tot-1>re) generateString(no+temp,re+1,tot,temp)
else no
}
def totalSum(no:List[Char]):BigInt = no match {
case x::xs => x.asDigit+totalSum(xs)
case Nil => '0'.asDigit
}
def tot(no:List[Char]):BigInt = no match {
case _ if no.length == 1=> no.head.asDigit
case no => tot(totalSum(no).toString.toList)
}
var list = readLine.split(" ");
var one = list.head.toString();
var two = BigInt(list(1));
//println(generateString("148",0,3,"148"))
println(tot(generateString(one,BigInt(0),two,one).toList))
}
}

One reduction is to realise that you do not have to concatenate the number considered as a string k times but rather can start with the number k * qs(n) (where qs is the function that maps a number to its sum of digits, i. e. qs(123) = 1+2+3). Here is a more functional programming stylish approach. I do not know whether it can be made faster than this.
object Solution {
def qs(n: BigInt): BigInt = n.toString.foldLeft(BigInt(0))((n, ch) => n + (ch - '0').toInt)
def main(args: Array[String]) {
val input = scala.io.Source.stdin.getLines
val Array(n, k) = input.next.split(" ").map(BigInt(_))
println(Stream.iterate(k * qs(n))(qs(_)).find(_ < 10).get)
}
}

Related

Compact way to produce a large sequence of strings in lexical order

I want to generate a sequence of strings with the following properties:
Lexically ordered
Theoretically infinite
Compact over a realistic range
Generated by a simple process of incrementation
Matches the regexp /\w+/
The obvious way to generate a lexically-ordered sequence is to choose a string length and pad the strings with a base value like this: 000000, 000001, etc. This approach poses a trade-off between the number of permutations and compactness: a string long enough to yield many permutations will be filled many zeros along the way. Plus, the length I choose sets an upper bound on the total number of permutations unless I have some mechanism for expanding the string when it maxes out.
So I came up with a sequence that works like this:
Each string consists of a "head", which is a base-36 number, followed by an underscore, and then the "tail", which is also a base-36 number padded by an increasing number of zeros
The first cycle goes from 0_0 to 0_z
The second cycle goes from 1_00 to 1_zz
The third cycle goes from 2_000 to 2_zzz, and so on
Once the head has reached z and the tail consists of 36 zs, the first "supercycle" has ended. Now the whole sequence starts over, except the z remains at the beginning, so the new cycle starts with z0_0, then continues to z1_00, and so on
The second supercycle goes zz0_0, zz1_00, and so on
Although the string of zs in the head could become unwieldy over the long run, a single supercycle contains over 10^56 permutations, which is far more than I ever expect to use. The sequence is theoretically infinite but very compact within a realistic range. For instance, the trillionth permutation is a succinct 7_bqd55h8s.
I can generate the sequence relatively simply with this javascript function:
function genStr (n) {
n = BigInt(n);
let prefix = "",
cycle = 0n,
max = 36n ** (cycle + 1n);
while (n >= max) {
n -= max;
if (cycle === 35n) {
prefix += "z";
cycle = 0n;
} else {
cycle++;
}
max = 36n ** (cycle + 1n);
}
return prefix
+ cycle.toString(36)
+ "_"
+ n.toString(36).padStart(Number(cycle) + 1, 0);
}
The n parameter is a number that I increment and pass to the function to get the next member of the sequence. All I need to keep track of is a simple integer, making the sequence very easy to use.
So obviously I spent a lot of time on this and I think it's pretty good, but I'm wondering if there is a better way. Is there a good algorithm for generating a sequence along the lines of the one I'm looking for?
A close idea to yours. (more rafined than my first edit...).
Let our alphabet be A = {0,1,2,3}.
Let |2| mean we iterate from 0 to 2 and |2|^2 mean we generate the cartesian product in a lexically sorted manner (00,01,10,11).
We start with
0 |3|
So we have a string of length 2. We "unshift" the digit 1 which "factorizes" since any 0|3|... is less than 1|3|^2.
1 |3|^2
Same idea: unshift 2, and make words of length 4.
2 |3|^3
Now we can continue and generate
3 |2| |3|^3
Notice |2| and not |3|. Now our maximum number becomes 32333. And as you did, we can now add the carry and start a new supercycle:
33 0|3|
This is a slight improvement, since _ can now be part of our alphabet: we don't need to reserve it as a token separator.
In our case we can represent in a supercycle:
n + n^2 + ... + n^(n-1) + (n-1) * n^(n-1)
\-----------------------/\--------------/
geometric special
In your case, the special part would be n^n (with the nuance that you have theorically one char less so replace n with n-1 everywhere)
The proposed supercycle is of length :
P = (n \sum_{k = 0}^{n-2} n^k) + (n-1) * n^(n-1)
P = (n \sum_{k = 0}^{n-3} n^k) + n^n
P = n(n^{n-2} - 1)/(n-1) + n^n
Here is an example diff with alphabet A={0,1,2}
my genStr(grandinero)
,00 0_0
,01 0_1
,02 0_2
,100 1_00
,101 1_01
,102 1_02
,110 1_10
,111 1_11
,112 1_12
,120 1_20
,121 1_21
,122 1_22
,2000 2_000
,2001 2_001
,2002 2_002
,2010 2_010
,2011 2_011
,2012 2_012
,2020 2_020
,2021 2_021
,2022 2_022
,2100 2_100
,2101 2_101
,2102 2_102
,2110 2_110
,2111 2_111
,2112 2_112
,2120 2_120
,2121 2_121
,2122 2_122
22,00 2_200 <-- end of my supercycle if no '_' allowed
22,01 2_201
22,02 2_202
22,100 2_210
22,101 2_211
22,102 2_212
22,110 2_220
22,111 2_221
22,112 2_222 <-- end of yours
22,120 z0_0
That said, for a given number x, we can can count how many supercycles (E(x / P)) there are, each supercycle making two leading e (e being the last char of A).
e.g: A = {0,1,2} and x = 43
e = 2
P = n(n^{n-2} - 1)/(n-1) + n^n = 3(3^1 -1)/2 + 27 = 30
// our supercycle is of length 30
E(43/30) = 1 // 43 makes one supercycle and a few more "strings"
r = x % P = 13 // this is also x - (E(43/30) * 30) (the rest of the euclidean division by P)
Then for the left over (r = x % P) two cases to consider:
either we fall in the geometric sequence
either we fall in the (n-1) * n^(n-1) part.
1. Adressing the geometric sequence with cumulative sums (x < S_w)
Let S_i be the cumsum of n, n^2,..
S_i = n\sum_{k = 0}^{i-1} n^k
S_i = n/(n-1)*(n^i - 1)
which gives S_0 = 0, S_1 = n, S_2 = n + n^2...
So basically, if x < S_1, we get 0(x), elif x < S_2, we get 1(x-S_1)
Let S_w = S_{n-1} the count of all the numbers we can represent.
If x <= S_w then we want the i such that
S_i < x <= S_{i+1} <=> n^i < (n-1)/n * x + 1 <= n^{i+1}
We can then apply some log flooring (base(n)) to get that i.
We can then associate the string: A[i] + base_n(x - S_i).
Illustration:
This time with A = {0,1,2,3}.
Let x be 17.
Our consecutive S_i are:
S_0 = 0
S_1 = 4
S_2 = S_1 + 4^2 = 20
S_3 = S_2 + 4^3 = 84
S_w = S_{4-1} = S_3 = 84
x=17 is indeed less than 84, we will be able to affect it to one of the S_i ranges.
In particular S_1==4 < x==17 <= S_2==20.
We remove the strings encoded by the leading 0(there are a number S_1 of those strings).
The position to encode with the leading 1 is
x - 4 = 13.
And we conclude the thirteen's string generated with a leading 1 is base_4(13) = '31' (idem string -> '131')
Should we have had x = 21, we would have removed the count of S_2 so 21-20 = 1, which in turn gives with a leading 2 the string '2001'.
2. Adressing x in the special part (x >= S_w)
Let's consider study case below:
with A = {0,1,2}
The special part is
2 |1| |2|^2
that is:
2 0 00
2 0 01
2 0 02
2 0 10
2 0 11
2 0 12
2 0 20
2 0 21
2 0 22
2 1 20
2 1 21
2 1 22
2 1 10
2 1 11
2 1 12
2 1 20
2 1 21
2 1 22
Each incremented number of the second column (here 0 to 1 (specified from |1|)) gives 3^2 combination.
This is similar to the geometric series except that here each range is constant. We want to find the range which means we know which string to prefix.
We can represent it as the matrix
20 (00,01,02,10,11,12,20,21,22)
21 (00,01,02,10,11,12,20,21,22)
The portion in parenthesis is our matrix.
Every item in a row is simply its position base_3 (left-padded with 0).
e.g: n=7 has base_3 value '21'. (7=2*3+1).
'21' does occur in position 7 in the row.
Assuming we get some x (relative to that special part).
E(x / 3^2) gives us the row number (here E(7/9) = 0 so prefix is '20')
x % 3^2 give us the position in the row (here base_3(7%9)='21' giving us the final string '2021')
If we want to observe it remember that we substracted S_w=12 before to get x = 7, so we would call myGen(7+12)
Some code
Notice the same output as long as we stand in the "geometric" range, without supercycle.
Obviously, when carry starts to appear, it depends on whether I can use '_' or not. If yes, my words get shorter otherwise longer.
// https://www.cs.sfu.ca/~ggbaker/zju/math/int-alg.html
// \w insensitive could give base64
// but also éè and other accents...
function base_n(x, n, A) {
const a = []
while (x !== 0n) {
a.push(A[Number(x % n)])
x = x / n // auto floor with bigInt
}
return a.reverse().join('')
}
function mygen (A) {
const n = A.length
const bn = BigInt(n)
const A_last = A[A.length-1]
const S = Array(n).fill(0).map((x, i) => bn * (bn ** BigInt(i) - 1n) / (bn - 1n))
const S_w = S[n-1]
const w = S_w + (bn - 1n) * bn ** (bn - 1n)
const w2 = bn ** (bn - 1n)
const flog_bn = x => {
// https://math.stackexchange.com/questions/1627914/smart-way-to-calculate-floorlogx
let L = 0
while (x >= bn) {
L++
x /= bn
}
return L
}
return function (x) {
x = BigInt(x)
let r = x % w
const q = (x - r) / w
let s
if (r < S_w) {
const i = flog_bn(r * (bn - 1n) / bn + 1n)
const r2 = r - S[i]
s = A[i] + base_n(r2, bn, A).padStart(i+1, '0')
} else {
const n2 = r - S_w
const r2 = n2 % w2
const q2 = (n2 - r2 ) / w2
s = A_last + A[q2] + base_n(r2, bn, A).padStart(n-1, '0')
}
// comma below __not__ necessary, just to ease seeing cycles
return A_last.repeat(2*Number(q)) +','+ s
}
}
function genStr (A) {
A = A.filter(x => x !== '_')
const bn_noUnderscore = BigInt(A.length)
return function (x) {
x = BigInt(x);
let prefix = "",
cycle = 0n,
max = bn_noUnderscore ** (cycle + 1n);
while (x >= max) {
x -= max;
if (cycle === bn_noUnderscore - 1n) {
prefix += "z";
cycle = 0n;
} else {
cycle++;
}
max = bn_noUnderscore ** (cycle + 1n);
}
return prefix
+ base_n(cycle, bn_noUnderscore, A)
+ "_"
+ base_n(x, bn_noUnderscore, A).padStart(Number(cycle) + 1, 0);
}
}
function test(a, b, x){
console.log(a(x), b(x))
}
{
console.log('---my supercycle is shorter if underscore not used. Plenty of room for grandinero')
const A = '0123456789abcdefghijklmnopqrstuvwxyz'.split('').sort((a,b)=>a.localeCompare(b))
let my = mygen(A)
const grandinero = genStr(A)
test(my, grandinero, 1e4)
test(my, grandinero, 1e12)
test(my, grandinero, 106471793335560744271846581685593263893929893610517909620n) // cycle ended for me (w variable value)
}
{
console.log('---\n my supercycle is greater if underscore is used in my alphabet (not grandinero since "forbidden')
// underscore used
const A = '0123456789abcdefghijklmnopqrstuvwxyz_'.split('').sort((a,b)=>a.localeCompare(b))
let my = mygen(A)
const grandinero = genStr(A)
test(my, grandinero, 1e12)
test(my, grandinero, 106471793335560744271846581685593263893929893610517909620n) // cycle ended for me (w variable value)
test(my, grandinero, 1e57) // still got some place in the supercycle
}
After considering the advice provided by #kaya3 and #grodzi and reviewing my original code, I have made some improvements. I realized a few things:
There was a bug in my original code. If one cycle ends at z_z (actually 36 z's after the underscore, but you get the idea) and the next one begins at z0_0, then lexical ordering is broken because _ comes after 0. The separator (or "neck") needs to be lower in lexical order than the lowest possible value of the head.
Though I was initially resistant to the idea of rolling a custom baseN generator so that more characters can be included, I have now come around to the idea.
I can squeeze more permutations out of a given string length by also incrementing the neck. For example, I can go from A00...A0z to A10...A1z, and so on, thus increasing the number of unique strings I can generate with A as the head before I move on to B.
With that in mind, I have revised my code:
// this is the alphabet used in standard baseN conversions:
let baseAlpha = "0123456789abcdefghijklmnopqrstuvwxyz";
// this is a factory for creating a new string generator:
function sequenceGenerator (config) {
let
// alphabets for the head, neck and body:
headAlpha = config.headAlpha,
neckAlpha = config.neckAlpha,
bodyAlpha = config.bodyAlpha,
// length of the body alphabet corresponds to the
// base of the numbering system:
base = BigInt(bodyAlpha.length),
// if bodyAlpha is identical to an alphabet that
// would be used for a standard baseN conversion,
// then use the built-in method, which should be
// much faster:
convertBody = baseAlpha.startsWith(bodyAlpha)
? (n) => n.toString(bodyAlpha.length)
// otherwise, roll a custom baseN generator:
: function (n) {
let s = "";
while (n > 0n) {
let i = n % base;
s = bodyAlpha[i] + s;
n = n / base;
}
return s;
},
// n is used to cache the last iteration and is
// incremented each time you call `getNext`
// it can optionally be initialized to a value other
// than 0:
n = BigInt(config.start || 0),
// see below:
headCycles = [0n],
cycleLength = 0n;
// the length of the body increases by 1 each time the
// head increments, meaning that the total number of
// permutations increases geometrically for each
// character in headAlpha
// here we cache the maximum number of permutations for
// each length of the body
// since we know these values ahead of time, calculating
// them in advance saves time when we generate a new
// string
// more importantly, it saves us from having to do a
// reverse calculation involving Math.log, which requires
// converting BigInts to Numbers, which breaks the
// program on larger numbers:
for (let i = 0; i < headAlpha.length; i++) {
// the maximum number of permutations depends on both
// the string length (i + 1) and the number of
// characters in neckAlpha, since the string length
// remains the same while the neck increments
cycleLength += BigInt(neckAlpha.length) * base ** BigInt(i + 1);
headCycles.push(cycleLength);
}
// given a number n, this function searches through
// headCycles to find where the total number of
// permutations exceeds n
// this is how we avoid the reverse calculation with
// Math.log to determine which head cycle we are on for
// a given permutation:
function getHeadCycle (n) {
for (let i = 0; i < headCycles.length; i++) {
if (headCycles[i] > n) return i;
}
}
return {
cycleLength: cycleLength,
getString: function (n) {
let cyclesDone = Number(n / cycleLength),
headLast = headAlpha[headAlpha.length - 1],
prefix = headLast.repeat(cyclesDone),
nn = n % cycleLength,
headCycle = getHeadCycle(nn),
head = headAlpha[headCycle - 1],
nnn = nn - headCycles[headCycle - 1],
neckCycleLength = BigInt(bodyAlpha.length) ** BigInt(headCycle),
neckCycle = nnn / neckCycleLength,
neck = neckAlpha[Number(neckCycle)],
body = convertBody(nnn % neckCycleLength);
body = body.padStart(headCycle , bodyAlpha[0]);
return prefix + head + neck + body;
},
getNext: function () { return this.getString(n++); }
};
}
let bodyAlpha = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz",
getStr = sequenceGenerator({
// achieve more permutations within a supercycle
// with a larger headAlpha:
headAlpha: "123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
// the highest value of neckAlpha must be lower than
// the lowest value of headAlpha:
neckAlpha: "0",
bodyAlpha: bodyAlpha
});
console.log("---supercycle length:");
console.log(Number(getStr.cycleLength));
console.log("---first two values:")
console.log(getStr.getNext());
console.log(getStr.getNext());
console.log("---arbitrary large value (1e57):");
console.log(getStr.getString(BigInt(1e57)));
console.log("");
// here we use a shorter headAlpha and longer neckAlpha
// to shorten the maximum length of the body, but this also
// decreases the number of permutations in the supercycle:
getStr = sequenceGenerator({
headAlpha: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
neckAlpha: "0123456789",
bodyAlpha: bodyAlpha
});
console.log("---supercycle length:");
console.log(Number(getStr.cycleLength));
console.log("---first two values:");
console.log(getStr.getNext());
console.log(getStr.getNext());
console.log("---arbitrary large value (1e57):");
console.log(getStr.getString(BigInt(1e57)));
EDIT
After further discussion with #grodzi, I have made some more improvements:
I realized that the "neck" or separator wasn't providing much value, so I have gotten rid of it. Later edit: actually, the separator is necessary. I am not sure why I thought it wasn't. Without the separator, the beginning of each new supercycle will lexically precede the end of the previous supercycle. I haven't changed my code below, but anyone using this code should include a separator. I have also realized that I was wrong to use an underscore as the separator. The separator must be a character, such as the hyphen, which lexically precedes the lowest digit used in the sequence (0).
I have taken #grodzi's suggestion to allow the length of the tail to continue growing indefinitely.
Here is the new code:
let baseAlpha = "0123456789abcdefghijklmnopqrstuvwxyz";
function sequenceGenerator (config) {
let headAlpha = config.headAlpha,
tailAlpha = config.tailAlpha,
base = BigInt(tailAlpha.length),
convertTail = baseAlpha.startsWith(tailAlpha)
? (n) => n.toString(tailAlpha.length)
: function (n) {
if (n === 0n) return "0";
let s = "";
while (n > 0n) {
let i = n % base;
s = tailAlpha[i] + s;
n = n / base;
}
return s;
},
n = BigInt(config.start || 0);
return {
getString: function (n) {
let cyclesDone = 0n,
headCycle = 0n,
initLength = 0n,
accum = 0n;
for (;; headCycle++) {
let _accum = accum + base ** (headCycle + 1n + initLength);
if (_accum > n) {
n -= accum;
break;
} else if (Number(headCycle) === headAlpha.length - 1) {
cyclesDone++;
initLength += BigInt(headAlpha.length);
headCycle = -1n;
}
accum = _accum;
}
let headLast = headAlpha[headAlpha.length - 1],
prefix = headLast.repeat(Number(cyclesDone)),
head = headAlpha[Number(headCycle)],
tail = convertTail(n),
tailLength = Number(headCycle + initLength);
tail = tail.padStart(tailLength, tailAlpha[0]);
return prefix + head + tail;
},
getNext: function () { return this.getString(n++); }
};
}
let alpha = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz",
genStr = sequenceGenerator({headAlpha: alpha, tailAlpha: alpha});
console.log("--- first string:");
console.log(genStr.getString(0n));
console.log("--- 1e+57");
console.log(genStr.getString(BigInt(1e+57)));
console.log("--- end of first supercycle:");
console.log(genStr.getString(63n*(1n-(63n**63n))/(1n-63n)-1n));
console.log("--- start of second supercycle:");
console.log(genStr.getString(63n*(1n-(63n**63n))/(1n-63n)));

K product array

I am working on an algorithms problem. You have an array numbers, size of array t , number number_of_elements and number multiplication_value. You have to find any set of number_of_elements indexes of the elements of the array , which product will be equal to multiplication_value. It is guaranteed, that such set of indexes exists
That problem looks like 2 sum, but I can't extrapolate it to my case.
I have tried naive algorithm for O(n), but it fails, when you have bad first number in an array. I think there is a way to use recursion in here. I guess it is well-known problem, but I couldn't find the solution
Example in:
t = 7
number_of_elements = 2
multiplication_value = 27
numbers = [9,1,1,27,3,27,3]
Example out:
1 3
My code ideas:
def return_index_values(numbers,multiplication_value,number_of_elements):
cur_number = int(multiplication_value)
list_of_indexes = []
values = []
for i in range(len(numbers)):
if ((cur_number == 1) and (len(values) == number_of_elements)):
print(values)
#finishing if everything worked
break
else:
if (cur_number % int(numbers[i]) == 0):
if(len(values) < number_of_elements):
#pushing values if possible
values.append(int(numbers[i]))
list_of_indexes.append(i)
cur_number = int(cur_number / int(numbers[i]))
print(cur_number)
else:
pass
if(len(values) == number_of_elements):
if mult_check(values,int(multiplication_value)):
#mult_check checks if the array's element multiplication gives a value
break
else:
#started dealing with bad cases, but it doesn't work properly
values.sort()
val_popped = values.pop()
cur_number = cur_number * val_popped
Bad case for my code
numbers = [9,3,1,27,3,27,3]
Here is one implementation. Not necessarily the best solution but it gives you some sense of how it can be done.
It first sorts the numbers by the element keeping the indices information. Then it performs recursion calls.
number_of_elements = 2
multiplication_value = 27
numbers = [9,1,1,27,3,27,3]
def preprocess(numbers, multiplication_value, number_of_elements):
l = []
for i, num in enumerate(numbers):
l.append((num, i))
return sorted(l, key = lambda tup: tup[0])
def subroutine(numbers, multiplication_value, number_of_elements, idx_start, result):
if idx_start >= len(numbers):
return False
if number_of_elements == 0:
return True if multiplication_value == 1 else False
for i in range(idx_start, len(numbers)):
num = numbers[i][0]
if num <= multiplication_value:
if multiplication_value % num == 0:
idx = numbers[i][1]
result.append(idx)
found = subroutine(numbers, multiplication_value / num, number_of_elements - 1, i + 1, result)
if not found:
del result[-1]
else:
return True
else:
return False
return False
result = []
processed_numbers = preprocess(numbers, multiplication_value, number_of_elements)
subroutine(processed_numbers, multiplication_value, number_of_elements, 0, result)
print(result)
You can use itertools.combinations() (https://www.geeksforgeeks.org/itertools-combinations-module-python-print-possible-combinations/) to select number_of_elements entries from your list in all possible ways, then check each whether they multiply to the required number.

DNA subsequence dynamic programming question

I'm trying to solve DNA problem which is more of improved(?) version of LCS problem.
In the problem, there is string which is string and semi-substring which allows part of string to have one or no letter skipped. For example, for string "desktop", it has semi-substring {"destop", "dek", "stop", "skop","desk","top"}, all of which has one or no letter skipped.
Now, I am given two DNA strings consisting of {a,t,g,c}. I"m trying to find longest semi-substring, LSS. and if there is more than one LSS, print out the one in the fastest order.
For example, two dnas {attgcgtagcaatg, tctcaggtcgatagtgac} prints out "tctagcaatg"
and aaaattttcccc, cccgggggaatatca prints out "aattc"
I'm trying to use common LCS algorithm but cannot solve it with tables although I did solve the one with no letter skipped. Any advice?
This is a variation on the dynamic programming solution for LCS, written in Python.
First I'm building up a Suffix Tree for all the substrings that can be made from each string with the skip rule. Then I'm intersecting the suffix trees. Then I'm looking for the longest string that can be made from that intersection tree.
Please note that this is technically O(n^2). Its worst case is when both strings are the same character, repeated over and over again. Because you wind up with a lot of what logically is something like, "an 'l' at position 42 in the one string could have matched against position l at position 54 in the other". But in practice it will be O(n).
def find_subtree (text, max_skip=1):
tree = {}
tree_at_position = {}
def subtree_from_position (position):
if position not in tree_at_position:
this_tree = {}
if position < len(text):
char = text[position]
# Make sure that we've populated the further tree.
subtree_from_position(position + 1)
# If this char appeared later, include those possible matches.
if char in tree:
for char2, subtree in tree[char].iteritems():
this_tree[char2] = subtree
# And now update the new choices.
for skip in range(max_skip + 1, 0, -1):
if position + skip < len(text):
this_tree[text[position + skip]] = subtree_from_position(position + skip)
tree[char] = this_tree
tree_at_position[position] = this_tree
return tree_at_position[position]
subtree_from_position(0)
return tree
def find_longest_common_semistring (text1, text2):
tree1 = find_subtree(text1)
tree2 = find_subtree(text2)
answered = {}
def find_intersection (subtree1, subtree2):
unique = (id(subtree1), id(subtree2))
if unique not in answered:
answer = {}
for k, v in subtree1.iteritems():
if k in subtree2:
answer[k] = find_intersection(v, subtree2[k])
answered[unique] = answer
return answered[unique]
found_longest = {}
def find_longest (tree):
if id(tree) not in found_longest:
best_candidate = ''
for char, subtree in tree.iteritems():
candidate = char + find_longest(subtree)
if len(best_candidate) < len(candidate):
best_candidate = candidate
found_longest[id(tree)] = best_candidate
return found_longest[id(tree)]
intersection_tree = find_intersection(tree1, tree2)
return find_longest(intersection_tree)
print(find_longest_common_semistring("attgcgtagcaatg", "tctcaggtcgatagtgac"))
Let g(c, rs, rt) represent the longest common semi-substring of strings, S and T, ending at rs and rt, where rs and rt are the ranked occurences of the character, c, in S and T, respectively, and K is the number of skips allowed. Then we can form a recursion which we would be obliged to perform on all pairs of c in S and T.
JavaScript code:
function f(S, T, K){
// mapS maps a char to indexes of its occurrences in S
// rsS maps the index in S to that char's rank (index) in mapS
const [mapS, rsS] = mapString(S)
const [mapT, rsT] = mapString(T)
// h is used to memoize g
const h = {}
function g(c, rs, rt){
if (rs < 0 || rt < 0)
return 0
if (h.hasOwnProperty([c, rs, rt]))
return h[[c, rs, rt]]
// (We are guaranteed to be on
// a match in this state.)
let best = [1, c]
let idxS = mapS[c][rs]
let idxT = mapT[c][rt]
if (idxS == 0 || idxT == 0)
return best
for (let i=idxS-1; i>=Math.max(0, idxS - 1 - K); i--){
for (let j=idxT-1; j>=Math.max(0, idxT - 1 - K); j--){
if (S[i] == T[j]){
const [len, str] = g(S[i], rsS[i], rsT[j])
if (len + 1 >= best[0])
best = [len + 1, str + c]
}
}
}
return h[[c, rs, rt]] = best
}
let best = [0, '']
for (let c of Object.keys(mapS)){
for (let i=0; i<(mapS[c]||[]).length; i++){
for (let j=0; j<(mapT[c]||[]).length; j++){
let [len, str] = g(c, i, j)
if (len > best[0])
best = [len, str]
}
}
}
return best
}
function mapString(s){
let map = {}
let rs = []
for (let i=0; i<s.length; i++){
if (!map[s[i]]){
map[s[i]] = [i]
rs.push(0)
} else {
map[s[i]].push(i)
rs.push(map[s[i]].length - 1)
}
}
return [map, rs]
}
console.log(f('attgcgtagcaatg', 'tctcaggtcgatagtgac', 1))
console.log(f('aaaattttcccc', 'cccgggggaatatca', 1))
console.log(f('abcade', 'axe', 1))

Single Number 2 Scala solution

Given an array of integers, every element appears three times except for one, which appears exactly once. Find that single one. This is what I have now. But I don't know how to break the for loop once I get the single number "b". Any solution in scala please?
for(Array(a,b) <- nums.sorted.sliding(2))
{
if (a == b){j = j+1}
else
{
if (j < 3) j =1
b
}
}
This will do it.
nums.groupBy(identity).find(_._2.length == 1).get._1
It's a bit unsafe in that it will throw if there is no single-count element. It can be made safer if a default value is returned when no single-count element is found.
nums.groupBy(identity).find(_._2.length == 1).fold(-1)(_._1)
Another way is to sum the array by adding digits of two numbers in base 3 modulo 3 (in other words XOR in base 3). The elements that appear 3 times will become zero, so the result of this sum will be the single number.
def findSingleNumber(numbers: Array[Int]) = {
def add3(a: String, b: String): String = a.zipAll(b, '0', '0').map {
case (i, j) => ((i.toInt + j.toInt) % 3 + '0').toChar
}(collection.breakOut)
val numbersInBase3 = numbers.map(n => Integer.toString(n, 3).reverse)
Integer.parseInt(numbersInBase3.fold("0")(add3).reverse, 3)
}
scala> findSingleNumber(Array(10, 20, 30, 100, 20, 100, 10, 10, 20, 100))
res1: Int = 30
Or representing base 3 numbers as digit arrays:
def findSingleNumber(numbers: Array[Int]) = {
def toBase3(int: Int): Array[Int] =
Iterator.iterate(int)(_ / 3).takeWhile(_ != 0).map(_ % 3).toArray
def toBase10(arr: Array[Int]): Int =
arr.reverseIterator.foldLeft(0)(_ * 3 + _)
def add3(a: Array[Int], b: Array[Int]): Array[Int] = a.zipAll(b, 0, 0).map {
case (i, j) => (i + j) % 3
}
toBase10(numbers.map(toBase3).fold(Array.empty[Int])(add3))
}

Find prime numbers using Scala. Help me to improve

I wrote this code to find the prime numbers less than the given number i in scala.
def findPrime(i : Int) : List[Int] = i match {
case 2 => List(2)
case _ => {
val primeList = findPrime(i-1)
if(isPrime(i, primeList)) i :: primeList else primeList
}
}
def isPrime(num : Int, prePrimes : List[Int]) : Boolean = prePrimes.forall(num % _ != 0)
But, I got a feeling the findPrime function, especially this part:
case _ => {
val primeList = findPrime(i-1)
if(isPrime(i, primeList)) i :: primeList else primeList
}
is not quite in the functional style.
I am still learning functional programming. Can anyone please help me improve this code to make it more functional.
Many thanks.
Here's a functional implementation of the Sieve of Eratosthenes, as presented in Odersky's "Functional Programming Principles in Scala" Coursera course :
// Sieving integral numbers
def sieve(s: Stream[Int]): Stream[Int] = {
s.head #:: sieve(s.tail.filter(_ % s.head != 0))
}
// All primes as a lazy sequence
val primes = sieve(Stream.from(2))
// Dumping the first five primes
print(primes.take(5).toList) // List(2, 3, 5, 7, 11)
The style looks fine to me. Although the Sieve of Eratosthenes is a very efficient way to find prime numbers, your approach works well too, since you are only testing for division against known primes. You need to watch out however--your recursive function is not tail recursive. A tail recursive function does not modify the result of the recursive call--in your example you prepend to the result of the recursive call. This means that you will have a long call stack and so findPrime will not work for large i. Here is a tail-recursive solution.
def primesUnder(n: Int): List[Int] = {
require(n >= 2)
def rec(i: Int, primes: List[Int]): List[Int] = {
if (i >= n) primes
else if (prime(i, primes)) rec(i + 1, i :: primes)
else rec(i + 1, primes)
}
rec(2, List()).reverse
}
def prime(num: Int, factors: List[Int]): Boolean = factors.forall(num % _ != 0)
This solution isn't prettier--it's more of a detail to get your solution to work for large arguments. Since the list is built up backwards to take advantage of fast prepends, the list needs to be reversed. As an alternative, you could use an Array, Vector or a ListBuffer to append the results. With the Array, however, you would need to estimate how much memory to allocate for it. Fortunately we know that pi(n) is about equal to n / ln(n) so you can choose a reasonable size. Array and ListBuffer are also a mutable data types, which goes again your desire for functional style.
Update: to get good performance out of the Sieve of Eratosthenes I think you'll need to store data in a native array, which also goes against your desire for style in functional programming. There might be a creative functional implementation though!
Update: oops! Missed it! This approach works well too if you only divide by primes less than the square root of the number you are testing! I missed this, and unfortunately it's not easy to adjust my solution to do this because I'm storing the primes backwards.
Update: here's a very non-functional solution that at least only checks up to the square root.
rnative, you could use an Array, Vector or a ListBuffer to append the results. With the Array, however, you would need to estimate how much memory to allocate for it. Fortunately we know that pi(n) is about equal to n / ln(n) so you can choose a reasonable size. Array and ListBuffer are also a mutable data types, which goes again your desire for functional style.
Update: to get good performance out of the Sieve of Eratosthenes I think you'll need to store data in a native array, which also goes against your desire for style in functional programming. There might be a creative functional implementation though!
Update: oops! Missed it! This approach works well too if you only divide by primes less than the square root of the number you are testing! I missed this, and unfortunately it's not easy to adjust my solution to do this because I'm storing the primes backwards.
Update: here's a very non-functional solution that at least only checks up to the square root.
import scala.collection.mutable.ListBuffer
def primesUnder(n: Int): List[Int] = {
require(n >= 2)
val primes = ListBuffer(2)
for (i <- 3 to n) {
if (prime(i, primes.iterator)) {
primes += i
}
}
primes.toList
}
// factors must be in sorted order
def prime(num: Int, factors: Iterator[Int]): Boolean =
factors.takeWhile(_ <= math.sqrt(num).toInt) forall(num % _ != 0)
Or I could use Vectors with my original approach. Vectors are probably not the best solution because they don't have the fasted O(1) even though it's amortized O(1).
As schmmd mentions, you want it to be tail recursive, and you also want it to be lazy. Fortunately there is a perfect data-structure for this: Stream.
This is a very efficient prime calculator implemented as a Stream, with a few optimisations:
object Prime {
def is(i: Long): Boolean =
if (i == 2) true
else if ((i & 1) == 0) false // efficient div by 2
else prime(i)
def primes: Stream[Long] = 2 #:: prime3
private val prime3: Stream[Long] = {
#annotation.tailrec
def nextPrime(i: Long): Long =
if (prime(i)) i else nextPrime(i + 2) // tail
def next(i: Long): Stream[Long] =
i #:: next(nextPrime(i + 2))
3 #:: next(5)
}
// assumes not even, check evenness before calling - perf note: must pass partially applied >= method
def prime(i: Long): Boolean =
prime3 takeWhile (math.sqrt(i).>= _) forall { i % _ != 0 }
}
Prime.is is the prime check predicate, and Prime.primes returns a Stream of all prime numbers. prime3 is where the Stream is computed, using the prime predicate to check for all prime divisors less than the square root of i.
/**
* #return Bitset p such that p(x) is true iff x is prime
*/
def sieveOfEratosthenes(n: Int) = {
val isPrime = mutable.BitSet(2 to n: _*)
for (p <- 2 to Math.sqrt(n) if isPrime(p)) {
isPrime --= p*p to n by p
}
isPrime.toImmutable
}
A sieve method is your best bet for small lists of numbers (up to 10-100 million or so).
see: Sieve of Eratosthenes
Even if you want to find much larger numbers, you can use the list you generate with this method as divisors for testing numbers up to n^2, where n is the limit of your list.
#mfa has mentioned using a Sieve of Eratosthenes - SoE and #Luigi Plinge has mentioned that this should be done using functional code, so #netzwerg has posted a non-SoE version; here, I post a "almost" functional version of the SoE using completely immutable state except for the contents of a mutable BitSet (mutable rather than immutable for performance) that I posted as an answer to another question:
object SoE {
def makeSoE_Primes(top: Int): Iterator[Int] = {
val topndx = (top - 3) / 2
val nonprms = new scala.collection.mutable.BitSet(topndx + 1)
def cullp(i: Int) = {
import scala.annotation.tailrec; val p = i + i + 3
#tailrec def cull(c: Int): Unit = if (c <= topndx) { nonprms += c; cull(c + p) }
cull((p * p - 3) >>> 1)
}
(0 to (Math.sqrt(top).toInt - 3) >>> 1).filterNot { nonprms }.foreach { cullp }
Iterator.single(2) ++ (0 to topndx).filterNot { nonprms }.map { i: Int => i + i + 3 }
}
}
How about this.
def getPrimeUnder(n: Int) = {
require(n >= 2)
val ol = 3 to n by 2 toList // oddList
def pn(ol: List[Int], pl: List[Int]): List[Int] = ol match {
case Nil => pl
case _ if pl.exists(ol.head % _ == 0) => pn(ol.tail, pl)
case _ => pn(ol.tail, ol.head :: pl)
}
pn(ol, List(2)).reverse
}
It's pretty fast for me, in my mac, to get all prime under 100k, its take around 2.5 sec.
A scalar fp approach
// returns the list of primes below `number`
def primes(number: Int): List[Int] = {
number match {
case a
if (a <= 3) => (1 to a).toList
case x => (1 to x - 1).filter(b => isPrime(b)).toList
}
}
// checks if a number is prime
def isPrime(number: Int): Boolean = {
number match {
case 1 => true
case x => Nil == {
2 to math.sqrt(number).toInt filter(y => x % y == 0)
}
}
}
def primeNumber(range: Int): Unit ={
val primeNumbers: immutable.IndexedSeq[AnyVal] =
for (number :Int <- 2 to range) yield {
val isPrime = !Range(2, Math.sqrt(number).toInt).exists(x => number % x == 0)
if(isPrime) number
}
for(prime <- primeNumbers) println(prime)
}
object Primes {
private lazy val notDivisibleBy2: Stream[Long] = 3L #:: notDivisibleBy2.map(_ + 2)
private lazy val notDivisibleBy2Or3: Stream[Long] = notDivisibleBy2
.grouped(3)
.map(_.slice(1, 3))
.flatten
.toStream
private lazy val notDivisibleBy2Or3Or5: Stream[Long] = notDivisibleBy2Or3
.grouped(10)
.map { g =>
g.slice(1, 7) ++ g.slice(8, 10)
}
.flatten
.toStream
lazy val primes: Stream[Long] = 2L #::
notDivisibleBy2.head #::
notDivisibleBy2Or3.head #::
notDivisibleBy2Or3Or5.filter { i =>
i < 49 || primes.takeWhile(_ <= Math.sqrt(i).toLong).forall(i % _ != 0)
}
def apply(n: Long): Stream[Long] = primes.takeWhile(_ <= n)
def getPrimeUnder(n: Long): Long = Primes(n).last
}

Resources