First N digits of a long number in constant-time? - algorithm

In a Project Euler problem I need to deal with numbers that can have hundreds of digits. And I need to perform some calculation on the first 9 digits.
My question is: what is the fastest possible way to determine the first N digits of a 100-digit integer? Last N digits are easy with modulo/remainder. For the first digits I can apply modulo 100 times to get digit by digit, or I can convert the number to String and truncate, but they all are linear time. Is there a better way?

You can count number of digits with this function:
(defn dec-digit-count [n]
(inc (if (zero? n) 0
(long (Math/floor (Math/log10 n))))))
Now we know how many digits are there, and we want to leave only first 9. What we have to is divide the number with 10^(digits-9) or in Clojure:
(defn first-digits [number digits]
(unchecked-divide number (int (Math/pow 10 digits))))
And call it like: (first-digits your-number 9) and I think it's in constant time. I'm only not sure about log10 implementation. But, it's sure a lot faster that a modulo/loop solution.
Also, there's an even easier solution. You can simply copy&paste first 9 digits from the number.

Maybe you can use not a long number, but tupple of two numbers: [first-digits, last-digits]. Perform operations on both of them, each time truncating to the required length (twice of the condition, 9 your case) the first at the right and the second at the left.
Like
222000333 * 666000555
147|852344988184|815
222111333 * 666111555
147|950925407752|815
so you can do only two small calculations: 222 * 666 = 147[852] and 333 * 555 = [184]815
But the comment about "a ha" solution is the most relevant for Project Euler :)

In Java:
public class Main {
public static void main(String[] args) throws IOException {
long N = 7812938291232L;
System.out.println(N / (int) (Math.pow(10, Math.floor(Math.log10(N)) - 8)));
N = 1234567890;
System.out.println(N / (int) (Math.pow(10, Math.floor(Math.log10(N)) - 8)));
N = 1000000000;
System.out.println(N / (int) (Math.pow(10, Math.floor(Math.log10(N)) - 8)));
}
}
yields
781293829
123456789
100000000

It may helps you first n digits of an exponentiation
and the answer of from this question
This algorithm has a compexity of O(b). But it is easy to change it to get O(log b)

Related

First number that divides all numbers (1,2,...,100)

Given that the first number to divide all (1,2,..,10) is 2520.
And given that the first number to divide all (1,2,..,20) is 232792560.
Find the first number to divide all (1,2,..,100). (all consecutive numbers from 1 to 100).
The answer should run in less than a minute.
I'm writing the solution in Java, and I'm facing two problems:
How can I compute this is the solution itself is a number so huge that cannot be handled?
I tried using "BigInteger" by I'm doing many additions and divisions and I don't know if this is increasing my time complexity.
How can I calculate this in a less than a minute? The solution I though about so far haven't even stopped.
This is my Java code (using big integers):
public static boolean solved(int start, int end, BigInteger answer) {
for (int i=start; i<=end; i++) {
if (answer.mod(new BigInteger(valueOf(i))).compareTo(new BigInteger(valueOf(0)))==0) {
return false;
}
}
return true;
}
public static void main(String[] args) {
BigInteger answer = new BigInteger("232792560");
BigInteger y = new BigInteger("232792560");
while(!solved(21,100, answer)) {
answer = answer.add(y);
}
System.out.println(answer);
}
I take advantage of the fact that I know already the solution for (1,..,20).
Currently is simply not stopping.
I though I could improve it by changing the function solved to check for only values we care about.
For example:
100 = 25,50,100
99 = 33,99
98 = 49,98
97 = 97
96 = 24,32,48,96
And so on. But, this simple calculation of identifying the smallest group of number needed has become a problem itself to which I didn't look for / found a solution. Of course, the time complexity should stay under a minute either case.
Any other ideas?
The first number that can be divided by all elements of some set (which is what you have there, despite the slightly different formulation) is also known as the Least Common Multiple of that set. LCM(a, b, c) = LCM(LCM(a, b), c) and so on, so in general, it can be computed by taking n - 1 pairwise LCMs where n is the number of items in the set. BigInteger does not have an lcm function, but the LCM can be computed via a * b / gcd(a, b) so in Java with BigInteger:
static BigInteger lcm(BigInteger a, BigInteger b) {
return a.multiply(b).divide(a.gcd(b));
}
For 1 to 20, computing the LCM in that way indeed results in 232792560. It's easy to do it up to 100 too.
Find all max prime powers in your range and take their product.
E.g. 1-10: 2^3, 3^2, 5^1, 7^1: product is 2520, which is the right answer (not 5250). You could find the primes via the sieve of Eratosthenes or just download them from a list of primes.
As 100 is small, you can work this out by producing the prime factorization of all numbers from 2 to 100 and keep the largest exponent of each prime among all factorizations. In fact, trying divisions by 2, 3, 5 and 7 will be enough to check primality up to 100, and there are just 25 primes to consider. You can implement a simple sieve to find the primes and perform the factorizations.
After you found all exponents of the prime decomposition of the lcm, you can either leave this as the answer, or perform the multiplications explicitly.

How to turn integers into Fibonacci coding efficiently?

Fibonacci sequence is obtained by starting with 0 and 1 and then adding the two last numbers to get the next one.
All positive integers can be represented as a sum of a set of Fibonacci numbers without repetition. For example: 13 can be the sum of the sets {13}, {5,8} or {2,3,8}. But, as we have seen, some numbers have more than one set whose sum is the number. If we add the constraint that the sets cannot have two consecutive Fibonacci numbers, than we have a unique representation for each number.
We will use a binary sequence (just zeros and ones) to do that. For example, 17 = 1 + 3 + 13. Then, 17 = 100101. See figure 2 for a detailed explanation.
I want to turn some integers into this representation, but the integers may be very big. How to I do this efficiently.
The problem itself is simple. You always pick the largest fibonacci number less than the remainder. You can ignore the the constraint with the consecutive numbers (since if you need both, the next one is the sum of both so you should have picked that one instead of the initial two).
So the problem remains how to quickly find the largest fibonacci number less than some number X.
There's a known trick that starting with the matrix (call it M)
1 1
1 0
You can compute fibbonacci number by matrix multiplications(the xth number is M^x). More details here: https://www.nayuki.io/page/fast-fibonacci-algorithms . The end result is that you can compute the number you're look in O(logN) matrix multiplications.
You'll need large number computations (multiplications and additions) if they don't fit into existing types.
Also store the matrices corresponding to powers of two you compute the first time, since you'll need them again for the results.
Overall this should be O((logN)^2 * large_number_multiplications/additions)).
First I want to tell you that I really liked this question, I didn't know that All positive integers can be represented as a sum of a set of Fibonacci numbers without repetition, I saw the prove by induction and it was awesome.
To respond to your question I think that we have to figure how the presentation is created. I think that the easy way to find this is that from the number we found the closest minor fibonacci item.
For example if we want to present 40:
We have Fib(9)=34 and Fib(10)=55 so the first element in the presentation is Fib(9)
since 40 - Fib(9) = 6 and (Fib(5) =5 and Fib(6) =8) the next element is Fib(5). So we have 40 = Fib(9) + Fib(5)+ Fib(2)
Allow me to write this in C#
class Program
{
static void Main(string[] args)
{
List<int> fibPresentation = new List<int>();
int numberToPresent = Convert.ToInt32(Console.ReadLine());
while (numberToPresent > 0)
{
int k =1;
while (CalculateFib(k) <= numberToPresent)
{
k++;
}
numberToPresent = numberToPresent - CalculateFib(k-1);
fibPresentation.Add(k-1);
}
}
static int CalculateFib(int n)
{
if (n == 1)
return 1;
int a = 0;
int b = 1;
// In N steps compute Fibonacci sequence iteratively.
for (int i = 0; i < n; i++)
{
int temp = a;
a = b;
b = temp + b;
}
return a;
}
}
Your result will be in fibPresentation
This encoding is more accurately called the "Zeckendorf representation": see https://en.wikipedia.org/wiki/Fibonacci_coding
A greedy approach works (see https://en.wikipedia.org/wiki/Zeckendorf%27s_theorem) and here's some Python code that converts a number to this representation. It uses the first 100 Fibonacci numbers and works correctly for all inputs up to 927372692193078999175 (and incorrectly for any larger inputs).
fibs = [0, 1]
for _ in xrange(100):
fibs.append(fibs[-2] + fibs[-1])
def zeck(n):
i = len(fibs) - 1
r = 0
while n:
if fibs[i] <= n:
r |= 1 << (i - 2)
n -= fibs[i]
i -= 1
return r
print bin(zeck(17))
The output is:
0b100101
As the greedy approach seems to work, it suffices to be able to invert the relation N=Fn.
By the Binet formula, Fn=[φ^n/√5], where the brackets denote the nearest integer. Then with n=floor(lnφ(√5N)) you are very close to the solution.
17 => n = floor(7.5599...) => F7 = 13
4 => n = floor(4.5531) => F4 = 3
1 => n = floor(1.6722) => F1 = 1
(I do not exclude that some n values can be off by one.)
I'm not sure if this is an efficient enough for you, but you could simply use Backtracking to find a(the) valid representation.
I would try to start the backtracking steps by taking the biggest possible fib number and only switch to smaller ones if the consecutive or the only once constraint is violated.

Need to devise a number crunching algorithm

I stumbled upon this question:
7 power 7 is 823543. Which higher power of 7 ends with 823543 ?
How should I go about it ? The one I came up with is very slow, it keeps on multiplying by 7 and checks last 6 digits of the result for a match.
I tried with Lou's code:
int x=1;
for (int i=3;i<=100000000;i=i+4){
x=(x*7)%1000000;
System.out.println("i="+ i+" x= "+x);
if (x==823543){
System.out.println("Ans "+i);}
}
And CPU sounds like a pressure cooker but couldn't get the answer :(
Multiply modulo 10^6. See this Lua code.
local x=1
for i=1,100000 do
x=(x*7) % 1e6
if x==823543 then print(i) end
end
You could use Euler's generalization of Fermat's little theorem which applied to your case says that for any number a that is not divisible by two or five, a to the power 400000 is equal to 1 modulo 10^6. Which means that 7^400000 is equal to one and 7^400007 is equal to 823543 modulo 10^6
There may be smaller powers of 7 that are also equal to one modulo 10^6. Any such power should be a divisor of 400000. So if you search all divisors of 400000 you should find your answer.
Brute-force solution in Python:
def check():
i = 8
while True:
if str(7**i)[-6:] == "823543":
print i, 7**i
break
i += 1
if __name__ == "__main__":
check()
Runs in a tad more then 10 seconds on my machine:
$ time python 7\*\*7.py
5007 25461638709540284156782446957365168367138070393489656084508152816071765490828583739345420574947301301356529652113030016806506783009529977928336772622054260724106711204039012806363481521302203821096274017061906820115931889920385802499836705571461280700786627503189500663279772123190279763997339608040949194040289041117811256914511855302927500076094761237077649092849658261309060277197389760351907599243227298336309204635761799394324969277220810221310805265921431367291459357151617279190810954501590069774137519833706444943573459910208627100504003480684029216932299650285683013274883359754231186580602570771682084721896446416234857382909168309309630688331305154545352580787700878011742720440707156231891841057992434850068501355342227582074144717324718396296563918284728120322255330707786227631084119636101174217518654320128390401231343058708073280898554293777842571799775647325449392944570725467462072394864457569308219294304248413378339223195121800534783732295135735588409249690562213409520783181313960347723827308102920022860541043691808218543350580271593107019737918976365348051012746817678592727727988993175444584453532474156202438866838819565827414970029052602274354173178190323239427022953097424087683011937130778414189673555875258508014323428137406618951161046883845551087123412471364400737145434714864392224194773030522352601143771552489895838728148761974811275894561985163094852437703080985644653666048979901975905667811053289029958524703063742007291722490298429637413913574845245364780928447142275001431370017543206188428912106120676556219532197108435997375879569102044435752697298456147512203108094030745606163915437604076966518127099543894645297945324345093247636119593298654296614887389164509070158924404441687810434488061150620012547321097786493748417764592151734279632949607485719050349385098350202294648324398902047614892248381794929374952059877187100434970751833289677556040879755065563758085919673107576808662549999202791489324437408075089456174056904323973798979207791446889016369166632636035638123394649891606479407561222474471530411700646266636732205895085248823824764170316644547100628119484733814900100986786082211477261114056206393554335903410036064553032366200714266053598548735147707681592574886559888869327771461046450774938490837810526377213647071217152427693219479552580138352651791476758476864761332281826701978038126122728967682552206820425685782165630494478519812498630475776384700259524274670258777572341538755828794632819515842335609785884327007667337426644594091547392441314523035569100326662245947022517857248412004291423280879791576077952474202068318934524092750814844945529148131063116233331840380254781283689084385600858175504170157015630699919186013526052643206240745256569669847298952477441594748635701081031979500954081732722211598460098426985932512920424237248250698541558227081975966598720056015879151923686438360541128221854058867910136449528237543680180470919685862102358708465872395643586424250239281775923511452769821487580471289910257908740451431952197725174728917413539539795856895884961513784804247268727165303942024508367184898248006123651950710237279288601317817391869969699767431782664773248447758526620050588927086506013616563459173620496200348863132442180734592661348887012997849309740799709045762939781801481205704629203758859772476278892928066844445088880207986848424855774325574728566649552154520262460969975214802828263093097997124519153537792591659204109087699977445745067857471581656151077039286563447099850537157044829081400190710358959493358343935904669416958301921942118288210835104022359479660409954097409669785908666166908117346073702337825511531650740900904200220658196171839969860945908503151878488455004283026700303698398069644419655035582904253655945381261383285097911378914794161551292914993411444083214513058414480129560671193659591364146612550890288116403596333209446976453193340267725222134755872075133141618388704912211996423838163706006930973361661094103734887312836613195349528793780496172839376426055357343094188450140671138356505144988151110902047791487250988374130384058324229250761311655685931891857894126054047458969174494155762486464149775147410127618088224310828566286409277000561087588768230619606746804073498788244935099280434916850444895829823543
real 0m10.779s
user 0m10.709s
sys 0m0.024s
Not so much an answer, more a hint:
Observe that the pattern of rightmost digits of powers of 7 goes 1,7,9,3,1,7,9,3,1,7,... so you only need to generate every 4th power of 7 from the 3rd. Further study might show a pattern for the two (three, four, ...) rightmost digits, but I haven't done studied them for you.
Be prepared for some very large numbers, Mathematica reports that the next power of 7 with the sought-for rightmost digits is the 5007th.
Which I guess answers your question -- a faster approach is to post on SO and wait for someone to tell you the answer ! You could even try Wolfram Alpha if you don't like the SO algorithm.
The Fermat's little theorem approach is a mathematically sensible one, and just mulitplying over and over by 7 mod 10^6 is the simplest code, but there's another approach you could take that is computationally efficient (but requires more complex code). First, note that when multiplying by 7 the last digit depends only on the last digit before (i.e. we're doing everything mod 10). We multiply repeatedly by 7 to get
7 (4)9 (6)3 (2)1 (0)7 ...
Okay, great, so if we want a 3, we start at 7^3 and go up every 7^4 from there. Now, we note that when multiplying by 7^4, the last two digits depend only on the last two digits of 7^4 and the last two digits of the previous answer. 7^4 is 2401. So in fact the last two digits will always be the same when going up by 7^4.
What about the last three? Well, 7^3 = 343 and 7^4 ends with 401, so mod 1000 we get
343 543 743 943 143 343
We've got our first three digits in column #2 (543), and we see that the the sequence repeats ever 5, so we should go up from there by 7^20.
We can play this trick over and over again: find how often the next block of digits repeats, find the right subsequence within that block, and then multiply up not by 7 but by 7^n.
What we're really doing is finding a (multiplicative) ring over the m'th digit, and then multiplying the sizes of all the rings together to get the span between successive powers that have the same N digits if we follow this method. Here's some Scala code (2.8.0 Beta1) that does just this:
def powRing(bigmod: BigInt, checkmod: BigInt, mul: BigInt) = {
val powers = Stream.iterate(1:BigInt)(i => (i*mul)%bigmod)
powers.take( 2+powers.tail.indexWhere(_ % checkmod == 1) ).toList
}
def ringSeq(digits: Int, mod: BigInt, mul: BigInt): List[(BigInt,List[BigInt])] = {
if (digits<=1) List( (10:BigInt , powRing(mod,10,mul)) )
else {
val prevSeq = ringSeq(digits-1, mod, mul)
val prevRing = prevSeq.head
val nextRing = powRing(mod,prevRing._1*10,prevRing._2.last)
(prevRing._1*10 , nextRing) :: prevSeq
}
}
def interval(digits: Int, mul: Int) = {
val ring = ringSeq(digits, List.fill(digits)(10:BigInt).reduceLeft(_*_), mul)
(1L /: ring)((p,r) => p * (r._2.length-1))
}
So, if we've found one case of the digits that we want, we can now find all of them by finding the size of the appropriate ring. In our case, with 6 digits (i.e. mod 10^6) and base 7, we find a repeat size of:
scala> interval(6,7)
res0: Long = 5000
So, we've got our answer! 7^7 is the first, 7^5007 is the second, 7^10007 is the third, etc..
Since this is generic, we can try other answers...11^11 = 285311670611 (an 8 digit number). Let's look at the interval:
scala> interval(12,11)
res1: Long = 50000000000
So, this tells us that 11^50000000007 is the next number after 11^11 with the same initial set of 12 digits. Check by hand if you're curious!
Let's also check with 3^3--what's the next power of 3 whose decimal expansion ends with 27?
scala> interval(2,3)
res2: Long = 20
Should be 3^23. Checking:
scala> List.fill(23)(3L).reduceLeft((l,r) => {println(l*r) ; l*r})
9
27
81
243
729
2187
6561
19683
59049
177147
531441
1594323
4782969
14348907
43046721
129140163
387420489
1162261467
3486784401
10460353203
31381059609
94143178827
Yup!
(Switched code in edits to use BigInt so it could handle arbitrary numbers of digits. The code doesn't detect degenerate cases, though, so make sure you use a prime for the power....)
Another hint: You are only interested in the last N digits: you can perform calculations modulo 10^N and keep the result fit nicely into an integer

What's the best way to strip leading / trailing digits from a number?

If I have a number like 12345, and I want an output of 2345, is there a mathematical algorithm that does this? The hack in me wants to convert the number to a string, and substring it. I know this will work, but I'm sure there has to be a better way, and Google is failing me.
Likewise, for 12345, if I want 1234, is there another algorithm that will do that? The best I can come up with is Floor(x / 10^(n)), where x is the input and n is the number of digits to strip, but I feel like there has to be a better way, and I just can't see it.
In the first case, don't you just want
n % 10000
i.e. the modulus wrt. 10000 ?
For your second case, if you're using integer arithmetic, just divide by 10. You might want to do this in a more 'explicit' fashion by modding with 10 to get the last digit, subtract and then divide (think of a shift in base 10).
Yes, the modulus operator (%) which is present in most languages, can return the n last digits:
12345 % 10^4 = 12345 % 10000 = 2345
Integral division (/ in C/C++/Java) can return the first n digits:
12345 / 10^4 = 12345 / 10000 = 1
Python 3.0:
>>> import math
>>> def remove_most_significant_digit(n, base=10):
... return n % (base ** int(math.log(n, base)))
...
>>> def remove_least_significant_digit(n, base=10):
... return int(n // base)
...
>>> remove_most_significant_digit(12345)
2345
>>> remove_least_significant_digit(12345)
1234
Converting to a string, and then using a substring method will ultimately be the fastest and best way, since you can just strip characters instead of doing math.
If you really don't want to do that though, you should use modulus (%), which gives the remainder of a division. 11 % 3 = 2, because 3 can only go into 11 three times (9). The remainder is then 2. 41 % 10 = 1, because 10 can go into 41 four times (40). The remainder is then 1.
For stripping the first digits, all you would have to do is mod the tens value that you want to get rid of. For stripping two digits from 12345, you should modulus by 1000. 1000 goes into 12345 twelve times, then the remainder will be 345, which is your answer. You would just need to find a way to find the tens value of the last digit you were trying to strip. Use x % (10^(n)), where x is input, and n is the lowest digit you want to strip.
For stripping the last digits, your way works just fine. What's easier than a quick formula like that?
I don't think there's any other approach than division for removing trailing numbers. It might be more efficient to do repeated integral division than to cast to a float, perform an exponent, then floor and cast back to an integer, but the basic idea remains the same.
Keep in mind that the operation is nearly identical for any base. To remove one trailing decimal digit, you do / 10. If you had 0b0111 and you wanted to remove one digit, it would have to be /2. Or you could have 0xff / 16 to get 0x0f.
You have to realize that numbers don't have digits, only strings do, and how many (and which) digits they have depends entirely on the base (which numbers don't have either). Internally, computers use what amounts to binary strings. So in general, manipulating base 10 digits requires you to convert the number to a string first - or do calculations that are the same you would do when converting it to a string. However, for your specific task of removing leading and trailing digits, these calculations (modulo and integer division) are very simple and much faster than converting the entire number.
i think that converting to string and then remove the first char wont do the trick.
i believe that the alg for converting to string is doing the div-mod routine, for optimization you might as well do the div-mod alg by yourself and manipulate it to your needs
Here is C++ code ... It's not tested.
int myPow ( int n , int k ){
int ret = 1;
for (int i=0;i<k;++i) ret*=n;
return ret;
}
int countDigits (int n ){
int count = 0;
while ( n )++count, n/=10;
return count;
}
int getLastDigits (int number , int numDigits ){
int tmp = myPow ( 10 , numDigits );
return number % tmp;
}
int getFirstDigits (int number, numDigits ){
int tmp = myPow ( 10, countDigits ( number) - numDigits );
return nuber / tmp;
}

Expressing an integer as a series of multipliers

Scroll down to see latest edit, I left all this text here just so that I don't invalidate the replies this question has received so far!
I have the following brain teaser I'd like to get a solution for, I have tried to solve this but since I'm not mathematically that much above average (that is, I think I'm very close to average) I can't seem wrap my head around this.
The problem: Given number x should be split to a serie of multipliers, where each multiplier <= y, y being a constant like 10 or 16 or whatever. In the serie (technically an array of integers) the last number should be added instead of multiplied to be able to convert the multipliers back to original number.
As an example, lets assume x=29 and y=10. In this case the expected array would be {10,2,9} meaning 10*2+9. However if y=5, it'd be {5,5,4} meaning 5*5+4 or if y=3, it'd be {3,3,3,2} which would then be 3*3*3+2.
I tried to solve this by doing something like this:
while x >= y, store y to multipliers, then x = x - y
when x < y, store x to multipliers
Obviously this didn't work, I also tried to store the "leftover" part separately and add that after everything else but that didn't work either. I believe my main problem is that I try to think this in a way too complex manner while the solution is blatantly obvious and simple.
To reiterate, these are the limits this algorithm should have:
has to work with 64bit longs
has to return an array of 32bit integers (...well, shorts are OK too)
while support for signed numbers (both + and -) would be nice, if it helps the task only unsigned numbers is a must
And while I'm doing this using Java, I'd rather take any possible code examples as pseudocode, I specifically do NOT want readily made answers, I just need a nudge (well, more of a strong kick) so that I can solve this at least partly myself. Thanks in advance.
Edit: Further clarification
To avoid some confusion, I think I should reword this a bit:
Every integer in the result array should be less or equal to y, including the last number.
Yes, the last number is just a magic number.
No, this is isn't modulus since then the second number would be larger than y in most cases.
Yes, there is multiple answers to most of the numbers available, however I'm looking for the one with least amount of math ops. As far as my logic goes, that means finding the maximum amount of as big multipliers as possible, for example x=1 000 000,y=100 is 100*100*100 even though 10*10*10*10*10*10 is equally correct answer math-wise.
I need to go through the given answers so far with some thought but if you have anything to add, please do! I do appreciate the interest you've already shown on this, thank you all for that.
Edit 2: More explanations + bounty
Okay, seems like what I was aiming for in here just can't be done the way I thought it could be. I was too ambiguous with my goal and after giving it a bit of a thought I decided to just tell you in its entirety what I'd want to do and see what you can come up with.
My goal originally was to come up with a specific method to pack 1..n large integers (aka longs) together so that their String representation is notably shorter than writing the actual number. Think multiples of ten, 10^6 and 1 000 000 are the same, however the representation's length in characters isn't.
For this I wanted to somehow combine the numbers since it is expected that the numbers are somewhat close to each other. I firsth thought that representing 100, 121, 282 as 100+21+161 could be the way to go but the saving in string length is neglible at best and really doesn't work that well if the numbers aren't very close to each other. Basically I wanted more than ~10%.
So I came up with the idea that what if I'd group the numbers by common property such as a multiplier and divide the rest of the number to individual components which I can then represent as a string. This is where this problem steps in, I thought that for example 1 000 000 and 100 000 can be expressed as 10^(5|6) but due to the context of my aimed usage this was a bit too flaky:
The context is Web. RESTful URL:s to be specific. That's why I mentioned of thinking of using 64 characters (web-safe alphanumberic non-reserved characters and then some) since then I could create seemingly random URLs which could be unpacked to a list of integers expressing a set of id numbers. At this point I thought of creating a base 64-like number system for expressing base 10/2 numbers but since I'm not a math genius I have no idea beyond this point how to do it.
The bounty
Now that I have written the whole story (sorry that it's a long one), I'm opening a bounty to this question. Everything regarding requirements for the preferred algorithm specified earlier is still valid. I also want to say that I'm already grateful for all the answers I've received so far, I enjoy being proven wrong if it's done in such a manner as you people have done.
The conclusion
Well, bounty is now given. I spread a few comments to responses mostly for future reference and myself, you can also check out my SO Uservoice suggestion about spreading bounty which is related to this question if you think we should be able to spread it among multiple answers.
Thank you all for taking time and answering!
Update
I couldn't resist trying to come up with my own solution for the first question even though it doesn't do compression. Here is a Python solution using a third party factorization algorithm called pyecm.
This solution is probably several magnitudes more efficient than Yevgeny's one. Computations take seconds instead of hours or maybe even weeks/years for reasonable values of y. For x = 2^32-1 and y = 256, it took 1.68 seconds on my core duo 1.2 ghz.
>>> import time
>>> def test():
... before = time.time()
... print factor(2**32-1, 256)
... print time.time()-before
...
>>> test()
[254, 232, 215, 113, 3, 15]
1.68499994278
>>> 254*232*215*113*3+15
4294967295L
And here is the code:
def factor(x, y):
# y should be smaller than x. If x=y then {y, 1, 0} is the best solution
assert(x > y)
best_output = []
# try all possible remainders from 0 to y
for remainder in xrange(y+1):
output = []
composite = x - remainder
factors = getFactors(composite)
# check if any factor is larger than y
bad_remainder = False
for n in factors.iterkeys():
if n > y:
bad_remainder = True
break
if bad_remainder: continue
# make the best factors
while True:
results = largestFactors(factors, y)
if results == None: break
output += [results[0]]
factors = results[1]
# store the best output
output = output + [remainder]
if len(best_output) == 0 or len(output) < len(best_output):
best_output = output
return best_output
# Heuristic
# The bigger the number the better. 8 is more compact than 2,2,2 etc...
# Find the most factors you can have below or equal to y
# output the number and unused factors that can be reinserted in this function
def largestFactors(factors, y):
assert(y > 1)
# iterate from y to 2 and see if the factors are present.
for i in xrange(y, 1, -1):
try_another_number = False
factors_below_y = getFactors(i)
for number, copies in factors_below_y.iteritems():
if number in factors:
if factors[number] < copies:
try_another_number = True
continue # not enough factors
else:
try_another_number = True
continue # a factor is not present
# Do we want to try another number, or was a solution found?
if try_another_number == True:
continue
else:
output = 1
for number, copies in factors_below_y.items():
remaining = factors[number] - copies
if remaining > 0:
factors[number] = remaining
else:
del factors[number]
output *= number ** copies
return (output, factors)
return None # failed
# Find prime factors. You can use any formula you want for this.
# I am using elliptic curve factorization from http://sourceforge.net/projects/pyecm
import pyecm, collections, copy
getFactors_cache = {}
def getFactors(n):
assert(n != 0)
# attempt to retrieve from cache. Returns a copy
try:
return copy.copy(getFactors_cache[n])
except KeyError:
pass
output = collections.defaultdict(int)
for factor in pyecm.factors(n, False, True, 10, 1):
output[factor] += 1
# cache result
getFactors_cache[n] = output
return copy.copy(output)
Answer to first question
You say you want compression of numbers, but from your examples, those sequences are longer than the undecomposed numbers. It is not possible to compress these numbers without more details to the system you left out (probability of sequences/is there a programmable client?). Could you elaborate more?
Here is a mathematical explanation as to why current answers to the first part of your problem will never solve your second problem. It has nothing to do with the knapsack problem.
This is Shannon's entropy algorithm. It tells you the theoretical minimum amount of bits you need to represent a sequence {X0, X1, X2, ..., Xn-1, Xn} where p(Xi) is the probability of seeing token Xi.
Let's say that X0 to Xn is the span of 0 to 4294967295 (the range of an integer). From what you have described, each number is as likely as another to appear. Therefore the probability of each element is 1/4294967296.
When we plug it into Shannon's algorithm, it will tell us what the minimum number of bits are required to represent the stream.
import math
def entropy():
num = 2**32
probability = 1./num
return -(num) * probability * math.log(probability, 2)
# the (num) * probability cancels out
The entropy unsurprisingly is 32. We require 32 bits to represent an integer where each number is equally likely. The only way to reduce this number, is to increase the probability of some numbers, and decrease the probability of others. You should explain the stream in more detail.
Answer to second question
The right way to do this is to use base64, when communicating with HTTP. Apparently Java does not have this in the standard library, but I found a link to a free implementation:
http://iharder.sourceforge.net/current/java/base64/
Here is the "pseudo-code" which works perfectly in Python and should not be difficult to convert to Java (my Java is rusty):
def longTo64(num):
mapping = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
output = ""
# special case for 0
if num == 0:
return mapping[0]
while num != 0:
output = mapping[num % 64] + output
num /= 64
return output
If you have control over your web server and web client, and can parse the entire HTTP requests without problem, you can upgrade to base85. According to wikipedia, url encoding allows for up to 85 characters. Otherwise, you may need to remove a few characters from the mapping.
Here is another code example in Python
def longTo85(num):
mapping = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~!*'();:#&=+$,/?%#[]"
output = ""
base = len(mapping)
# special case for 0
if num == 0:
return mapping[0]
while num != 0:
output = mapping[num % base] + output
num /= base
return output
And here is the inverse operation:
def stringToLong(string):
mapping = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~!*'();:#&=+$,/?%#[]"
output = 0
base = len(mapping)
place = 0
# check each digit from the lowest place
for digit in reversed(string):
# find the number the mapping of symbol to number, then multiply by base^place
output += mapping.find(digit) * (base ** place)
place += 1
return output
Here is a graph of Shannon's algorithm in different bases.
As you can see, the higher the radix, the less symbols are needed to represent a number. At base64, ~11 symbols are required to represent a long. At base85, it becomes ~10 symbols.
Edit after final explanation:
I would think base64 is the best solution, since there are standard functions that deal with it, and variants of this idea don't give much improvement. This was answered with much more detail by others here.
Regarding the original question, although the code works, it is not guaranteed to run in any reasonable time, as was answered as well as commented on this question by LFSR Consulting.
Original Answer:
You mean something like this?
Edit - corrected after a comment.
shortest_output = {}
foreach (int R = 0; R <= X; R++) {
// iteration over possible remainders
// check if the rest of X can be decomposed into multipliers
newX = X - R;
output = {};
while (newX > Y) {
int i;
for (i = Y; i > 1; i--) {
if ( newX % i == 0) { // found a divider
output.append(i);
newX = newX /i;
break;
}
}
if (i == 1) { // no dividers <= Y
break;
}
}
if (newX != 1) {
// couldn't find dividers with no remainder
output.clear();
}
else {
output.append(R);
if (output.length() < shortest_output.length()) {
shortest_output = output;
}
}
}
It sounds as though you want to compress random data -- this is impossible for information theoretic reasons. (See http://www.faqs.org/faqs/compression-faq/part1/preamble.html question 9.) Use Base64 on the concatenated binary representations of your numbers and be done with it.
The problem you're attempting to solve (you're dealing with a subset of the problem, given you're restriction of y) is called Integer Factorization and it cannot be done efficiently given any known algorithm:
In number theory, integer factorization is the breaking down of a composite number into smaller non-trivial divisors, which when multiplied together equal the original integer.
This problem is what makes a number of cryptographic functions possible (namely RSA which uses 128 bit keys - long is half of that.) The wiki page contains some good resources that should move you in the right direction with your problem.
So, your brain teaser is indeed a brain teaser... and if you solve it efficiently we can elevate your math skills to above average!
Updated after the full story
Base64 is most likely your best option. If you want a custom solution you can try implementing a Base 65+ system. Just remember that just because 10000 can be written as "10^4" doesn't mean that everything can be written as 10^n where n is an integer. Different base systems are the simplest way to write numbers and the higher the base the less digits the number requires. Plus most framework libraries contain algorithms for Base64 encoding. (What language you are using?).
One way to further pack the urls is the one you mentioned but in Base64.
int[] IDs;
IDs.sort() // So IDs[i] is always smaller or equal to IDs[i-1].
string url = Base64Encode(IDs[0]);
for (int i = 1; i < IDs.length; i++) {
url += "," + Base64Encode(IDs[i-1] - IDs[i]);
}
Note that you require some separator as the initial ID can be arbitrarily large and the difference between two IDs CAN be more than 63 in which case one Base64 digit is not enough.
Updated
Just restating that the problem is unsolvable. For Y = 64 you can't write 87681 in multipliers + remainder where each of these is below 64. In other words, you cannot write any of the numbers 87617..87681 with multipliers that are below 64. Each of these numbers has an elementary term over 64. 87616 can be written in elementary terms below 64 but then you'd need those + 65 and so the remainder will be over 64.
So if this was just a brainteaser, it's unsolvable. Was there some practical purpose for this which could be achieved in some way other than using multiplication and a remainder?
And yes, this really should be a comment but I lost my ability to comment at some point. :p
I believe the solution which comes closest is Yevgeny's. It is also easy to extend Yevgeny's solution to remove the limit for the remainder in which case it would be able to find solution where multipliers are smaller than Y and remainder as small as possible, even if greater than Y.
Old answer:
If you limit that every number in the array must be below the y then there is no solution for this. Given large enough x and small enough y, you'll end up in an impossible situation. As an example with y of 2, x of 12 you'll get 2 * 2 * 2 + 4 as 2 * 2 * 2 * 2 would be 16. Even if you allow negative numbers with abs(n) below y that wouldn't work as you'd need 2 * 2 * 2 * 2 - 4 in the above example.
And I think the problem is NP-Complete even if you limit the problem to inputs which are known to have an answer where the last term is less than y. It sounds quite much like the [Knapsack problem][1]. Of course I could be wrong there.
Edit:
Without more accurate problem description it is hard to solve the problem, but one variant could work in the following way:
set current = x
Break current to its terms
If one of the terms is greater than y the current number cannot be described in terms greater than y. Reduce one from current and repeat from 2.
Current number can be expressed in terms less than y.
Calculate remainder
Combine as many of the terms as possible.
(Yevgeny Doctor has more conscise (and working) implementation of this so to prevent confusion I've skipped the implementation.)
OP Wrote:
My goal originally was to come up with
a specific method to pack 1..n large
integers (aka longs) together so that
their String representation is notably
shorter than writing the actual
number. Think multiples of ten, 10^6
and 1 000 000 are the same, however
the representation's length in
characters isn't.
I have been down that path before, and as fun as it was to learn all the math, to save you time I will just point you to: http://en.wikipedia.org/wiki/Kolmogorov_complexity
In a nutshell some strings can be easily compressed by changing your notation:
10^9 (4 characters) = 1000000000 (10 characters)
Others cannot:
7829203478 = some random number...
This is a great great simplification of the article I linked to above, so I recommend that you read it instead of taking my explanation at face value.
Edit:
If you are trying to make RESTful urls for some set of unique data, why wouldn't you use a hash, such as MD5? Then include the hash as part of the URL, then look up the data based on the hash. Or am I missing something obvious?
The original method you chose (a * b + c * d + e) would be very difficult to find optimal solutions for simply due to the large search space of possibilities. You could factorize the number but it's that "+ e" that complicates things since you need to factorize not just that number but quite a few immediately below it.
Two methods for compression spring immediately to mind, both of which give you a much-better-than-10% saving on space from the numeric representation.
A 64-bit number ranges from (unsigned):
0 to
18,446,744,073,709,551,616
or (signed):
-9,223,372,036,854,775,808 to
9,223,372,036,854,775,807
In both cases, you need to reduce the 20-characters taken (without commas) to something a little smaller.
The first is to simply BCD-ify the number the base64 encode it (actually a slightly modified base64 since "/" would not be kosher in a URL - you should use one of the acceptable characters such as "_").
Converting it to BCD will store two digits (or a sign and a digit) into one byte, giving you an immediate 50% reduction in space (10 bytes). Encoding it base 64 (which turns every 3 bytes into 4 base64 characters) will turn the first 9 bytes into 12 characters and that tenth byte into 2 characters, for a total of 14 characters - that's a 30% saving.
The only better method is to just base64 encode the binary representation. This is better because BCD has a small amount of wastage (each digit only needs about 3.32 bits to store [log210], but BCD uses 4).
Working on the binary representation, we only need to base64 encode the 64-bit number (8 bytes). That needs 8 characters for the first 6 bytes and 3 characters for the final 2 bytes. That's 11 characters of base64 for a saving of 45%.
If you wanted maximum compression, there are 73 characters available for URL encoding:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789$-_.+!*'(),
so technically you could probably encode base-73 which, from rough calculations, would still take up 11 characters, but with more complex code which isn't worth it in my opinion.
Of course, that's the maximum compression due to the maximum values. At the other end of the scale (1-digit) this encoding actually results in more data (expansion rather than compression). You can see the improvements only start for numbers over 999, where 4 digits can be turned into 3 base64 characters:
Range (bytes) Chars Base64 chars Compression ratio
------------- ----- ------------ -----------------
< 10 (1) 1 2 -100%
< 100 (1) 2 2 0%
< 1000 (2) 3 3 0%
< 10^4 (2) 4 3 25%
< 10^5 (3) 5 4 20%
< 10^6 (3) 6 4 33%
< 10^7 (3) 7 4 42%
< 10^8 (4) 8 6 25%
< 10^9 (4) 9 6 33%
< 10^10 (5) 10 7 30%
< 10^11 (5) 11 7 36%
< 10^12 (5) 12 7 41%
< 10^13 (6) 13 8 38%
< 10^14 (6) 14 8 42%
< 10^15 (7) 15 10 33%
< 10^16 (7) 16 10 37%
< 10^17 (8) 17 11 35%
< 10^18 (8) 18 11 38%
< 10^19 (8) 19 11 42%
< 2^64 (8) 20 11 45%
Update: I didn't get everything, thus I rewrote the whole thing in a more Java-Style fashion. I didn't think of the prime number case that is bigger than the divisor. This is fixed now. I leave the original code in order to get the idea.
Update 2: I now handle the case of the big prime number in another fashion . This way a result is obtained either way.
public final class PrimeNumberException extends Exception {
private final long primeNumber;
public PrimeNumberException(long x) {
primeNumber = x;
}
public long getPrimeNumber() {
return primeNumber;
}
}
public static Long[] decompose(long x, long y) {
try {
final ArrayList<Long> operands = new ArrayList<Long>(1000);
final long rest = x % y;
// Extract the rest so the reminder is divisible by y
final long newX = x - rest;
// Go into recursion, actually it's a tail recursion
recDivide(newX, y, operands);
} catch (PrimeNumberException e) {
// return new Long[0];
// or do whatever you like, for example
operands.add(e.getPrimeNumber());
} finally {
// Add the reminder to the array
operands.add(rest);
return operands.toArray(new Long[operands.size()]);
}
}
// The recursive method
private static void recDivide(long x, long y, ArrayList<Long> operands)
throws PrimeNumberException {
while ((x > y) && (y != 1)) {
if (x % y == 0) {
final long rest = x / y;
// Since y is a divisor add it to the list of operands
operands.add(y);
if (rest <= y) {
// the rest is smaller than y, we're finished
operands.add(rest);
}
// go in recursion
x = rest;
} else {
// if the value x isn't divisible by y decrement y so you'll find a
// divisor eventually
if (--y == 1) {
throw new PrimeNumberException(x);
}
}
}
}
Original: Here some recursive code I came up with. I would have preferred to code it in some functional language but it was required in Java. I didn't bother converting the numbers to integer but that shouldn't be that hard (yes, I'm lazy ;)
public static Long[] decompose(long x, long y) {
final ArrayList<Long> operands = new ArrayList<Long>();
final long rest = x % y;
// Extract the rest so the reminder is divisible by y
final long newX = x - rest;
// Go into recursion, actually it's a tail recursion
recDivide(newX, y, operands);
// Add the reminder to the array
operands.add(rest);
return operands.toArray(new Long[operands.size()]);
}
// The recursive method
private static void recDivide(long newX, long y, ArrayList<Long> operands) {
long x = newX;
if (x % y == 0) {
final long rest = x / y;
// Since y is a divisor add it to the list of operands
operands.add(y);
if (rest <= y) {
// the rest is smaller than y, we're finished
operands.add(rest);
} else {
// the rest can still be divided, go one level deeper in recursion
recDivide(rest, y, operands);
}
} else {
// if the value x isn't divisible by y decrement y so you'll find a divisor
// eventually
recDivide(x, y-1, operands);
}
}
Are you married to using Java? Python has an entire package dedicated just for this exact purpose. It'll even sanitize the encoding for you to be URL-safe.
Native Python solution
The standard module I'm recommending is base64, which converts arbitrary stings of chars into sanitized base64 format. You can use it in conjunction with the pickle module, which handles conversion from lists of longs (actually arbitrary size) to a compressed string representation.
The following code should work on any vanilla installation of Python:
import base64
import pickle
# get some long list of numbers
a = (854183415,1270335149,228790978,1610119503,1785730631,2084495271,
1180819741,1200564070,1594464081,1312769708,491733762,243961400,
655643948,1950847733,492757139,1373886707,336679529,591953597,
2007045617,1653638786)
# this gets you the url-safe string
str64 = base64.urlsafe_b64encode(pickle.dumps(a,-1))
print str64
>>> gAIoSvfN6TJKrca3S0rCEqMNSk95-F9KRxZwakqn3z58Sh3hYUZKZiePR0pRlwlfSqxGP05KAkNPHUo4jooOSixVFCdK9ZJHdEqT4F4dSvPY41FKaVIRFEq9fkgjSvEVoXdKgoaQYnRxAC4=
# this unwinds it
a64 = pickle.loads(base64.urlsafe_b64decode(str64))
print a64
>>> (854183415, 1270335149, 228790978, 1610119503, 1785730631, 2084495271, 1180819741, 1200564070, 1594464081, 1312769708, 491733762, 243961400, 655643948, 1950847733, 492757139, 1373886707, 336679529, 591953597, 2007045617, 1653638786)
Hope that helps. Using Python is probably the closest you'll get from a 1-line solution.
Wrt the original algorithm request: Is there a limit on the size of the last number (beyond that it must be stored in a 32b int)?
(The original request is all I'm able to tackle lol.)
The one that produces the shortest list is:
bool negative=(n<1)?true:false;
int j=n%y;
if(n==0 || n==1)
{
list.append(n);
return;
}
while((long64)(n-j*y)>MAX_INT && y>1) //R has to be stored in int32
{
y--;
j=n%y;
}
if(y<=1)
fail //Number has no suitable candidate factors. This shouldn't happen
int i=0;
for(;i<j;i++)
{
list.append(y);
}
list.append(n-y*j);
if(negative)
list[0]*=-1;
return;
A little simplistic compared to most answers given so far but it achieves the desired functionality of the original post... It's a little dirty but hopefully useful :)
Isn't this modulus?
Let / be integer division (whole numbers) and % be modulo.
int result[3];
result[0] = y;
result[1] = x / y;
result[2] = x % y;
Just set x:=x/n where n is the largest number that is less both than x and y. When you end up with x<=y, this is your last number in the sequence.
Like in my comment above, I'm not sure I understand exactly the question. But assuming integers (n and a given y), this should work for the cases you stated:
multipliers[0] = n / y;
multipliers[1] = y;
addedNumber = n % y;

Resources