Can be in any language or even pseudocode. I was asked this in an interview question, and was curious what you guys can come up with.
I think this is a trick question - the obvious answer of generating digits using a standard library routine is almost certainly flawed, if you want to generate every possible 10000 digit number with equal probability...
If an algorithmic random number generator maintains n bits of state, then clearly it can generate at most 2n possible different output sequences, because there are only 2n different initial configurations.
233219 < 1010000 < 233220, so if your algorithm uses less than 33220 bits of internal state, it cannot possibly generate some of the 1010000 possible 10000-digit (decimal) numbers.
Typical standard library random number generators won't use anything like this much internal state. Even the Mersenne Twister (the most frequently mentioned generator with a large state that I'm aware of) only keeps 624 32-bit words (= 19968 bits) of state.
Just one of many ways. You can pass in any string of the alphabet of characters you want to use:
public class RandomUtils
{
private static readonly Random random = new Random((int)DateTime.Now.Ticks);
public static string GenerateRandomDigitString(int length)
{
const string digits = "1234567890";
return GenerateRandomString(length, digits);
}
public static string GenerateRandomAlphaString(int length)
{
const string alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
return GenerateRandomString(length, alpha);
}
public static string GenerateRandomString(int length, string alphabet)
{
int maxlen = alphabet.Length;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < length; i++)
{
sb.Append(alphabet[random.Next(0, maxlen)]);
}
return sb.ToString();
}
}
Without additional requirements, this will work:
StringBuilder randomStr = new StringBuilder(10000);
Random rnd = new Random();
for(int i = 0; i<10000;i++)
{
char randomChar = rnd.AsChar();
randomStr[i] = randomChar;
}
This will result in unprintable characters and other unpleasentness. Using an ASCII encoder you can get letters, numbers and punctutaiton by sticking to the range 32 - 126. Or creating a random number between 0 and 94 and adding 32. Not sure which aspect they were looking for in the question.
BTW, No I did not know the visible range off the top of my head, I looked it up on wikipedia.
Generate a number in the range 0..9. Convert it to a digit. Stuff that into a string. Repeat 10000 times.
I always like saying Computer Random Numbers are always only pseudo-random. Anyway, your favourite language will invariably have a random library. Next what is a numeric string ? 0-9 valued for each character ? Well let's start with that assumption. So we can generate bytes between to Ascii codes of 0-9 with offset (48) and (int) random*10 (since random generators typically return floats). Then place these all in a char buffer 10000 long and convert to string.
Return a string containing 10,000 1s -- that's just as random as any other digit string of the same length.
I think the real question was to determine what the interviewer actually wanted. For example, random in what sense? Uncompressable? Random over multiple runs of the same algorithm? Etc.
You can start with a list of seed digits:
seeds = [4,9,3,1,2,5,5,4,4,8,4,3] # This should be relatively large
Then, use a counter to keep track of which digit was last used. This would be system-wide and shouldn't reset with the system:
def next_digit():
counter = 0
while True:
yield counter
counter += 1
pos_it = next_digit()
rand_it = next_digit()
Next, use an algorithm that uses modulus to determine the "next number":
def random_digit():
position = pos_it.next() % len(seeds)
digit = seeds[position] * rand_it.next()
return digit % 10
Last, generate 10,000 of those digits.
output = ""
for i in range(10000):
output = "%s%s" % (output, random_digit())
I believe that an ideal answer would use more prime numbers, but this should be pretty sufficient.
I've got a real problem (it's not homework, you can check my profile). I need to parse data whose formatting is not under my control.
The data look like this:
6,852:6,100,752
So there's first a number made of up to 9 digits, followed by a colon.
Then I know for sure that, after the colon:
there's at least one valid combination of numbers that add up to the number before the column
I know exactly how many numbers add up to the number before the colon (two in this case, but it can go as high as ten numbers)
In this case, 6852 is 6100 + 752.
My problem: I need to find these numbers (in this example, 6100 + 752).
It is unfortunate that in the data I'm forced to parse, the separator between the numbers (the comma) is also the separator used inside the number themselves (6100 is written as 6,100).
Once again: that unfortunate formatting is not under my control and, once again, this is not homework.
I need to solve this for up to 10 numbers that need to add up.
Here's an example with three numbers adding up to 6855:
6,855:360,6,175,320
I fear that there are cases where there would be two possible different solutions. HOWEVER if I get a solution that works "in most cases" it would be enough.
How do you typically solve such a problem in a C-style bracket language?
Well, I would start with the brute force approach and then apply some heuristics to prune the search space. Just split the list on the right by commas and iterate over all possible ways to group them into n terms (where n is the number of terms in the solution). You can use the following two rules to skip over invalid possibilities.
(1) You know that any group of 1 or 2 digits must begin a term.
(2) You know that no candidate term in your comma delimited list can be greater than the total on the left. (This also tells you the maximum number of digit groups that any candidate term can have.)
Recursive implementation (pseudo code):
int total; // The total read before the colon
// Takes the list of tokens as integers after the colon
// tokens is the set of tokens left to analyse,
// partialList is the partial list of numbers built so far
// sum is the sum of numbers in partialList
// Aggregate takes 2 ints XXX and YYY and returns XXX,YYY (= XXX*1000+YYY)
function getNumbers(tokens, sum, partialList) =
if isEmpty(tokens)
if sum = total return partialList
else return null // Got to the end with the wrong sum
var result1 = getNumbers(tokens[1:end], sum+token[0], Add(partialList, tokens[0]))
var result2 = getNumbers(tokens[2:end], sum+Aggregate(token[0], token[1]), Append(partialList, Aggregate(tokens[0], tokens[1])))
if result1 <> null return result1
if result2 <> null return result2
return null // No solution overall
You can do a lot better from different points of view, like tail recursion, pruning (you can have XXX,YYY only if YYY has 3 digits)... but this may work well enough for your app.
Divide-and-conquer would make for a nice improvement.
I think you should try all possible ways to parse the string and calculate the sum and return a list of those results that give the correct sum. This should be only one result in most cases unless you are very unlucky.
One thing to note that reduces the number of possibilities is that there is only an ambiguity if you have aa,bbb and bbb is exactly 3 digits. If you have aa,bb there is only one way to parse it.
Reading in C++:
std::pair<int,std::vector<int> > read_numbers(std::istream& is)
{
std::pair<int,std::vector<int> > result;
if(!is >> result.first) throw "foo!"
for(;;) {
int i;
if(!is >> i)
if(is.eof()) return result;
else throw "bar!";
result.second.push_back(i);
char ch;
if(is >> ch)
if(ch != ',') throw "foobar!";
is >> std::ws;
}
}
void f()
{
std::istringstream iss("6,852:6,100,752");
std::pair<int,std::vector<int> > foo = read_numbers(iss);
std::vector<int> result = get_winning_combination( foo.first
, foo.second.begin()
, foo.second.end() );
for( std::vector<int>::const_iterator i=result.begin(); i!=result.end(), ++i)
std::cout << *i << " ";
}
The actual cracking of the numbers is left as an exercise to the reader. :)
I think your main problem is deciding how to actually parse the numbers. The rest is just rote work with strings->numbers and iteration over combinations.
For instance, in the examples you gave, you could heuristically decide that a single-digit number followed by a three-digit number is, in fact, a four-digit number. Does a heuristic such as this hold true over a larger dataset? If not, you're also likely to have to iterate over the possible input parsing combinations, which means the naive solution is going to have a big polynomic complexity (O(nx), where x is >4).
Actually checking for which numbers add up is easy to do using a recursive search.
List<int> GetSummands(int total, int numberOfElements, IEnumerable<int> values)
{
if (numberOfElements == 0)
{
if (total == 0)
return new List<int>(); // Empty list.
else
return null; // Indicate no solution.
}
else if (total < 0)
{
return null; // Indicate no solution.
}
else
{
for (int i = 0; i < values.Count; ++i)
{
List<int> summands = GetSummands(
total - values[i], numberOfElements - 1, values.Skip(i + 1));
if (summands != null)
{
// Found solution.
summands.Add(values[i]);
return summands;
}
}
}
}
Interesting programming puzzle:
If the integers from 1 to 999,999,999
are written as words, sorted
alphabetically, and concatenated, what
is the 51 billionth letter?
To be precise: if the integers from 1
to 999,999,999 are expressed in words
(omitting spaces, ‘and’, and
punctuation - see note below for format), and sorted
alphabetically so that the first six
integers are
eight
eighteen
eighteenmillion
eighteenmillioneight
eighteenmillioneighteen
eighteenmillioneighteenthousand
and the last is
twothousandtwohundredtwo
then reading top to bottom, left to
right, the 28th letter completes the
spelling of the integer
“eighteenmillion”.
The 51 billionth letter also completes
the spelling of an integer. Which one,
and what is the sum of all the
integers to that point?
Note: For example, 911,610,034 is
written
“ninehundredelevenmillionsixhundredtenthousandthirtyfour”;
500,000,000 is written
“fivehundredmillion”; 1,709 is written
“onethousandsevenhundrednine”.
I stumbled across this on a programming blog 'Occasionally Sane', and couldn't think of a neat way of doing it, the author of the relevant post says his initial attempt ate through 1.5GB of memory in 10 minutes, and he'd only made it up to 20,000,000 ("twentymillion").
Can anyone think of come up with share with the group a novel/clever approach to this?
Edit: Solved!
You can create a generator that outputs the numbers in sorted order. There are a few rules for comparing concatenated strings that I think most of us know implicitly:
a < a+b, where b is non-null.
a+b < a+c, where b < c.
a+b < c+d, where a < c, and a is not a subset of c.
If you start with a sorted list of the first 1000 numbers, you can easily generate the rest by appending "thousand" or "million" and concatenating another group of 1000.
Here's the full code, in Python:
import heapq
first_thousand=[('', 0), ('one', 1), ('two', 2), ('three', 3), ('four', 4),
('five', 5), ('six', 6), ('seven', 7), ('eight', 8),
('nine', 9), ('ten', 10), ('eleven', 11), ('twelve', 12),
('thirteen', 13), ('fourteen', 14), ('fifteen', 15),
('sixteen', 16), ('seventeen', 17), ('eighteen', 18),
('nineteen', 19)]
tens_name = (None, 'ten', 'twenty', 'thirty', 'forty', 'fifty', 'sixty',
'seventy','eighty','ninety')
for number in range(20, 100):
name = tens_name[number/10] + first_thousand[number%10][0]
first_thousand.append((name, number))
for number in range(100, 1000):
name = first_thousand[number/100][0] + 'hundred' + first_thousand[number%100][0]
first_thousand.append((name, number))
first_thousand.sort()
def make_sequence(base_generator, suffix, multiplier):
prefix_list = [(name+suffix, number*multiplier)
for name, number in first_thousand[1:]]
prefix_list.sort()
for prefix_name, base_number in prefix_list:
for name, number in base_generator():
yield prefix_name + name, base_number + number
return
def thousand_sequence():
for name, number in first_thousand:
yield name, number
return
def million_sequence():
return heapq.merge(first_thousand,
make_sequence(thousand_sequence, 'thousand', 1000))
def billion_sequence():
return heapq.merge(million_sequence(),
make_sequence(million_sequence, 'million', 1000000))
def solve(stopping_size = 51000000000):
total_chars = 0
total_sum = 0
for name, number in billion_sequence():
total_chars += len(name)
total_sum += number
if total_chars >= stopping_size:
break
return total_chars, total_sum, name, number
It took a while to run, about an hour. The 51 billionth character is the last character of sixhundredseventysixmillionsevenhundredfortysixthousandfivehundredseventyfive, and the sum of the integers to that point is 413,540,008,163,475,743.
I'd sort the names of the first 20 integers and the names of the tens, hundreds and thousands, work out how many numbers start with each of those, and go from there.
For example, the first few are [ eight, eighteen, eighthundred, eightmillion, eightthousand, eighty, eleven, ....
The numbers starting with "eight" are 8. With "eighthundred", 800-899, 800,000-899,999, 800,000,000-899,999,999. And so on.
The number of letters in the concatenation of words for 0 ( represented by the empty string ) to 99 can be found and totalled; this can be multiplied with "thousand"=8 or "million"=7 added for higher ranges. The value for 800-899 will be 100 times the length of "eighthundred" plus the length of 0-99. And so on.
This guy has a solution to the puzzle written in Haskell. Apparently Michael Borgwardt was right about using a Trie for finding the solution.
Those strings are going to have lots and lots of common prefixes - perfect use case for a trie, which would drastically reduce memory usage and probably also running time.
Here's my python solution that prints out the correct answer in a fraction of a second. I'm not a python programmer generally, so apologies for any egregious code style errors.
#!/usr/bin/env python
import sys
ONES=[
"", "one", "two", "three", "four",
"five", "six", "seven", "eight", "nine",
"ten", "eleven", "twelve", "thirteen", "fourteen",
"fifteen", "sixteen", "seventeen","eighteen", "nineteen",
]
TENS=[
"zero", "ten", "twenty", "thirty", "forty",
"fifty", "sixty", "seventy", "eighty", "ninety",
]
def to_s_h(i):
if(i<20):
return(ONES[i])
return(TENS[i/10] + ONES[i%10])
def to_s_t(i):
if(i<100):
return(to_s_h(i))
return(ONES[i/100] + "hundred" + to_s_h(i%100))
def to_s_m(i):
if(i<1000):
return(to_s_t(i))
return(to_s_t(i/1000) + "thousand" + to_s_t(i%1000))
def to_s_b(i):
if(i<1000000):
return(to_s_m(i))
return(to_s_m(i/1000000) + "million" + to_s_m(i%1000000))
def try_string(s,t):
global letters_to_go,word_sum
l=len(s)
letters_to_go -= l
word_sum += t
if(letters_to_go == 0):
print "solved: " + s
print "sum is: " + str(word_sum)
sys.exit(0)
elif(letters_to_go < 0):
print "failed: " + s + " " + str(letters_to_go)
sys.exit(-1)
def solve(depth,prefix,prefix_num):
global millions,thousands,ones,letters_to_go,onelen,thousandlen,word_sum
src=[ millions,thousands,ones ][depth]
for x in src:
num=prefix + x[2]
nn=prefix_num+x[1]
try_string(num,nn)
if(x[0] == 0):
continue
if(x[0] == 1):
stl=(len(num) * 999) + onelen
ss=(nn*999) + onesum
else:
stl=(len(num) * 999999) + thousandlen + onelen*999
ss=(nn*999999) + thousandsum
if(stl < letters_to_go):
letters_to_go -= stl
word_sum += ss
else:
solve(depth+1,num,nn)
ones=[]
thousands=[]
millions=[]
onelen=0
thousandlen=0
onesum=(999*1000)/2
thousandsum=(999999*1000000)/2
for x in range(1,1000):
s=to_s_b(x)
l=len(s)
ones.append( (0,x,s) )
onelen += l
thousands.append( (0,x,s) )
thousands.append( (1,x*1000,s + "thousand") )
thousandlen += l + (l+len("thousand"))*1000
millions.append( (0,x,s) )
millions.append( (1,x*1000,s + "thousand") )
millions.append( (2,x*1000000,s + "million") )
ones.sort(key=lambda x: x[2])
thousands.sort(key=lambda x: x[2])
millions.sort(key=lambda x: x[2])
letters_to_go=51000000000
word_sum=0
solve(0,"",0)
It works by precomputing the length of the numbers from 1..999 and 1..999999 so that it can skip entire subtrees unless it knows that the answer lies somewhere within them.
(The first attempt at this is wrong, but I will leave it up since it's more useful to see mistakes on the way to solving something rather than just the final answer.)
I would first generate the strings from 0 to 999 and store them into an array called thousandsStrings. The 0 element is "", and "" represents a blank in the lists below.
The thousandsString setup uses the following:
Units: "" one two three ... nine
Teens: ten eleven twelve ... nineteen
Tens: "" "" twenty thirty forty ... ninety
The thousandsString setup is something like this:
thousandsString[0] = ""
for (i in 1..10)
thousandsString[i] = Units[i]
end
for (i in 10..19)
thousandsString[i] = Teens[i]
end
for (i in 20..99)
thousandsString[i] = Tens[i/10] + Units[i%10]
end
for (i in 100..999)
thousandsString[i] = Units[i/100] + "hundred" + thousandsString[i%100]
end
Then, I would sort that array alphabetically.
Then, assuming t1 t2 t3 are strings taken from thousandsString, all of the strings have the form
t1
OR
t1 + million + t2 + thousand + t3
OR
t1 + thousand + t2
To output them in the proper order, I would process the individual strings, followed by the millions strings followed by the string + thousands strings.
foreach (t1 in thousandsStrings)
if (t1 == "")
continue;
process(t1)
foreach (t2 in thousandsStrings)
foreach (t3 in thousandsStrings)
process (t1 + "million" + t2 + "thousand" + t3)
end
end
foreach (t2 in thousandsStrings)
process (t1 + "thousand" + t2)
end
end
where process means store the previous sum length and then add the new string length to the sum and if the new sum is >= your target sum, you spit out the results, and maybe return or break out of the loops, whatever makes you happy.
=====================================================================
Second attempt, the other answers were right that you need to use 3k strings instead of 1k strings as a base.
Start with the thousandsString from above, but drop the blank "" for zero. That leaves 999 elements and call this uStr (units string).
Create two more sets:
tStr = the set of all uStr + "thousand"
mStr = the set of all uStr + "million"
Now create two more set unions:
mtuStr = mStr union tStr union uStr
tuStr = tStr union uStr
Order uStr, tuStr, mtuStr
Now the looping and logic here are a bit different than before.
foreach (s1 in mtuStr)
process(s1)
// If this is a millions or thousands string, add the extra strings that can
// go after the millions or thousands parts.
if (s1.contains("million"))
foreach (s2 in tuStr)
process (s1+s2)
if (s2.contains("thousand"))
foreach (s3 in uStr)
process (s1+s2+s3)
end
end
end
end
if (s1.contains("thousand"))
foreach (s2 in uStr)
process (s1+s2)
end
end
end
What I did:
1) Iterate through 1 - 999 and generate the words for each of these.
As we generate:
2) Create 3 data structures where each node has a pointer to children and each node has a character value, and a pointer to Siblings. (A binary tree, in fact, but we don't want to think of it that way necessarily - for me it's easier to conceptualise as a list of siblings with lists of children hanging off, but if you think about it {draw a pic} you'll realise it is in fact a Binary Tree).
These 3 data structures are created cocurrently as follows:
a) first one with the word as generated (ie 1-999 sorted alphabetically)
b) all the values in the first + all the values with 'thousand' appended (ie 1-999 and 1,000 - 999,000 (step 1000) (1998 values in total)
c) all the values in B + all the values in a with million appended (2997 values in total)
3) For every leaf node in(b) add a Child as (a). For every leaf node in (c) add a child as (b).
4) Traverse the tree, counting how many characters we pass and stopping at 51 Billion.
NOTE: This doesn't sum the values (I didn't read that bit when I originally did it), and runs in just over 3 minutes (about 192 secs usually, using c++).
NOTE 2: (in case it isn't obvious) there are only 5,994 values stored, but they are stored in such a way that there are a billion paths through the tree
I did this about a year or two ago when I stumbled accross it, and have since realised there are many optimisations (the most time consuming bit is traversing the tree - by a LONG WAY). There are a few optimisations that I think would significantly improve this approach, but I could never be bothered taking it further, other than to optimise redundant nodes in the tree slightly, so they stored strings rather than characters
I have seen people claim on line that they've solved it in less than 5 seconds....
weird but fun idea.
build a sparse list of the lengths of the number from 0 to 9, then 10-90 by tens, then 100, 1000, etc etc, to billion, indexes are the value of the integer part who's lenght is stored.
write a function to calculate the number as a string length using the table.
(breaking the number into it's parts, and looking up the length of the aprts, never actally creating a string.)
then you're only doing math as you traverse the numbers, calculating the length from the
table afterward summing for your sum.
with the sum, and the value of the final integer, figure out the integer that's being spelled, and volia, you're done.
Yes, me again, but a completely different approach.
Simply, rather than storing the "onethousandeleventyseven" words, you write the sort to use that when comparing.
Crude java POC:
public class BillionsBillions implements Comparator {
public int compare(Object a, Object b) {
String s1 = (String)a; // "1234";
String s2 = (String)b; // "1235";
int n1 = Integer.valueOf(s1);
int n2 = Integer.valueOf(s2);
String w1 = numberToWords(n1);
String w2 = numberToWords(n2);
return w1.compare(w2);
}
public static void main(String args[]) {
long numbers[] = new long[1000000000]; // Bring your 64 bit JVMs
for(int i = 0; i < 1000000000; i++) {
numbers[i] = i;
}
Arrays.sort(numbers, 0, numbers.length, new BillionsBillions());
long total = 0;
for(int i : numbers) {
String l = numberToWords(i);
long offset = total + l - 51000000000;
if (offset >= 0) {
String c = l.substring(l - offset, l - offset + 1);
System.out.println(c);
break;
}
}
}
}
"numberToWords" is left as an exercise for the reader.
Do you need to save the entire string in memory?
If not, just save how many characters you've appended so far. For each iteration, you check the length the next number's textual representation. If it exceeds the nth letter you are looking for, the letter must be in that string, so extract it by it's index, print it, and stop execution. Otherwise, add the string length to the character count and move to the next number.
All the strings are going to start with either one, ten, two, twenty, three, thirty, four, etc so I'd start with figuring out how many are in each of the buckets. Then you should at least know which bucket you need to look closer at.
Then I'd look at subdividing the buckets further based on the possible prefixes. For example, within ninehundred, you are going to have all the same buckets that you had to start off with, just for numbers starting with 900.
The question is about efficient data storage not string manipulation. Create an enum to represent the words. the words should appear in sorted order so that when it comes time to sort it is a simplish compare. Now generate the list and sort. use the fact that you know how long each word is in conjunction with the enum to add up to the character you need.
Code wins...
#!/bin/bash
n=0
while [ $n -lt 1000000000 ]; do
number -l $n | sed -e 's/[^a-z]//g'
let n=n+1
done | sort > /tmp/bignumbers
awk '
BEGIN {
total = 0;
}
{
l = length($0);
offset = total + l - 51000000000;
print total " " offset
if (offset >= 0) {
c = substr($0, l - offset, 1);
print c;
exit;
}
total = total + l;
}' /tmp/bignumbers
Tested for a much smaller range ;-). Requires a LOT of diskspace, a compressed filesystem would be, umm, valuable, but not so much memory.
Sort has options to compress work files as well, and you could toss in gzip to directly compress data.
Not the zippiest solution.
But it does work.
Honestly I would let an RDBMS like SQL Server or Oracle do the work for me.
Insert the billion strings into an indexed table.
Compute a string length column.
Start pulling off the top X records at a time with a SUM, until I get to 51 billion.
Might beat up the server for a while as it would need to do a lot of Disk IO, but overall I think I could find an answer faster than someone who would write a program to do it.
Sometimes just getting it done is what the client really wants, and could care less what fancy design pattern or data structure you used.
figure out lengths for 1-999 and include length for 0 as 0.
so now you have an array for 0-999 namely uint32 sizes999[1000];
(not going to get into the details of generating this)
also need an array of thousand last letters last_letters[1000]
(again not going to get into the details of generating this as it is even easier even hundreds d even tens y except 10 which is n others cycle though last of on e through nin e zero is irrelavant)
uint32 sizes999[1000];
uint64 totallen = 0;
strlen_million = strlen("million");
strlen_thousand = strlen("thousand");
for (uint32 i = 0; i<1000;++i){
for (uint32 j = 0; j<1000;++j){
for (uint32 j = 0; j<1000;++j){
total_len += sizes999[i]+strlen_million +
sizes999[j]+strlen_thousand +
sizes999[k];
if totallen == 51000000000 goto done;
ASSERT(totallen <51000000000);//he claimed 51000000000 was not intermediate
}
}
}
done:
//now use i j k to get last letter by using last_letters999
//think of i,j,k as digits base 1000
//if k = 0 & j ==0 then the letter is n million
//if only k = 0 then the letter is d thousand
//other wise use the array of last_letters since
//the units digit base 1000, that is k, is not zero
//for the sum of the numbers i,j,k are the digits of the number base 1000 so
n = i*1000000 + j*1000 + k;
//represent the number and use
sum = n*(n+1)/2;
if you need to do it for number other than 51000000000 then also calculate sums_sizes999 and use that in the natural way.
total memory: 0(1000);
total time: 0(n) where n is the number
This is what I'd do:
Create an array of 2,997 strings: "one" through "ninehundredninetynine", "onethousand" through "ninehundredninetyninethousand", and "onemillion" through "ninehundredninetyninemillion".
Store the following about each string: length (this can be calculated of course), the integer value represented by the string, and some enum to signify whether it's "ones", "thousands", or "millions".
Sort the 2,997 strings alphabetically.
With this array created, it's straightforward to find all 999,999,999 strings in order alphabetically based on the following observations:
Nothing can follow a "ones" string
Either nothing, or a "ones" string, can follow a "thousands" string
Either nothing, a "ones" string, a "thousands" string, or a "thousands" string then a "ones" string, can follow a "millions" string.
Constructing the words basically involves creating one- to three-letter "words" based on these 2,997 tokens, making sure that the order of the tokens makes a valid number according to the rules above. Given a particular "word", the next "word" is found like this:
Lengthen the "word" by adding the token first alphabetically, if possible.
If this can't be done, advance the rightmost token to the next one alphabetically, if possible.
If this too is not possible, then remove the rightmost token, and advance the second-rightmost token to the next one alphabetically, if possible.
If this too is not possible, you're done.
At each step you can calculate the total length of the string and the sum of the numbers by just keeping two running totals.
It's important to note that there is a lot of overlapping and double counting if you iterate over all 100 billion possible numbers. It's important to realize that the number of strings that start with "eight" is the same number of numbers that start with "nin" or "seven" or "six" etc...
To me, this begs for a dynamic programming solution where the number of strings for tens, hundreds, thousands, etc are calculated and stored in some type of look up table. Ofcourse, there will be special cases for one vs eleven, two vs twelve, etc
I'll update this if I can get a quick running solution.
WRONG!!!!!!!!! I READ THE PROBLEM WRONG. I thought it meant "what's the last letter of the alphabetically last number"
what's wrong with:
public class Nums {
// if overflows happen, switch to an BigDecimal or something
// with arbitrary precision
public static void main(String[] args) {
System.out.println("last letter: " + lastLetter(1L, 51000000L);
System.out.println("sum: " + sum(1L, 51000000L);
}
static char lastLetter(long start, long end) {
String last = toWord(start);
for(long i = start; i < end; i++)
String current = toWord(i);
if(current.compareTo(last) > 1)
last = current;
return last.charAt(last.length()-1);
}
static String toWord(long num) {
// should be relatively easy, but for now ...
return "one";
}
static long sum(long first, long n) {
return (n * first + n*n) / 2;
}
}
haven't actually tried this :/ LOL
You have one billion numbers and 51 billion characters - there's a good chance that this is a trick question, as there are an average of 51 characters per number. Sum up the conversions of all the numbers and see if it adds up to 51 billion.
Edit: It adds up to 70,305,000,000 characters, so this is the wrong answer.
I solved this in Java sometime in 2008 as part of an application to work at ITA Software.
The code is long, and it now being three years later, I look at it with a bit of horror... So I'm not going to post it.
But I'll post quotes from some notes that I included with the application.
The problem with this puzzle is of course the size. The naïve approach would be to sort the list in word number order and then to iterate through the sorted list counting characters and summing. With a list of size 999,999,999 this would of course take a rather long time and the sort could likely not be done in memory.
But there are natural patterns in the ordering which allow shortcuts.
Immediately following any entry (say the number is X) ending in “million” will come 999,999 entries starting with the same text, representing all the numbers from X +1
to X + 10^6 -1.
The sum of all these numbers can be computed by a classic formula (an “arithmetic series”), and the character count can be computed by a similarly simple formula based on the prefix (X above) and a once-computed character count for the numbers from 1 to 999,999. Both depend only on the “millions” part of the number at the base of the range. Thus if the character count for the entire range will keep the entire count below the search goal, the individual entries need not be traversed.
Similar shortcuts apply for “thousand”, and indeed could be applied to “hundred” or “billion” though I didn’t bother with shortcuts at the hundreds level and the billions level is out of range for this problem.
In order to apply these shortcuts, my code creates and sorts a list of 2997 objects representing the numbers:
1 to 999 stepping by 1
1000 to 999000 stepping by 1000
1000000 to 999000000 stepping by 1000000
The code iterates through this list, accumulating sums and character counts, recursively creating, sorting and traversing similar but smaller lists as needed.
Explicit counting and adding is only needed near the end.
I didn't get the job, but later used the code as a "code sample" for another job, which I did get.
The Java code using these techniques for skipping much of the explicit counting and adding runs in about 8 seconds.
If I have a number like 12345, and I want an output of 2345, is there a mathematical algorithm that does this? The hack in me wants to convert the number to a string, and substring it. I know this will work, but I'm sure there has to be a better way, and Google is failing me.
Likewise, for 12345, if I want 1234, is there another algorithm that will do that? The best I can come up with is Floor(x / 10^(n)), where x is the input and n is the number of digits to strip, but I feel like there has to be a better way, and I just can't see it.
In the first case, don't you just want
n % 10000
i.e. the modulus wrt. 10000 ?
For your second case, if you're using integer arithmetic, just divide by 10. You might want to do this in a more 'explicit' fashion by modding with 10 to get the last digit, subtract and then divide (think of a shift in base 10).
Yes, the modulus operator (%) which is present in most languages, can return the n last digits:
12345 % 10^4 = 12345 % 10000 = 2345
Integral division (/ in C/C++/Java) can return the first n digits:
12345 / 10^4 = 12345 / 10000 = 1
Python 3.0:
>>> import math
>>> def remove_most_significant_digit(n, base=10):
... return n % (base ** int(math.log(n, base)))
...
>>> def remove_least_significant_digit(n, base=10):
... return int(n // base)
...
>>> remove_most_significant_digit(12345)
2345
>>> remove_least_significant_digit(12345)
1234
Converting to a string, and then using a substring method will ultimately be the fastest and best way, since you can just strip characters instead of doing math.
If you really don't want to do that though, you should use modulus (%), which gives the remainder of a division. 11 % 3 = 2, because 3 can only go into 11 three times (9). The remainder is then 2. 41 % 10 = 1, because 10 can go into 41 four times (40). The remainder is then 1.
For stripping the first digits, all you would have to do is mod the tens value that you want to get rid of. For stripping two digits from 12345, you should modulus by 1000. 1000 goes into 12345 twelve times, then the remainder will be 345, which is your answer. You would just need to find a way to find the tens value of the last digit you were trying to strip. Use x % (10^(n)), where x is input, and n is the lowest digit you want to strip.
For stripping the last digits, your way works just fine. What's easier than a quick formula like that?
I don't think there's any other approach than division for removing trailing numbers. It might be more efficient to do repeated integral division than to cast to a float, perform an exponent, then floor and cast back to an integer, but the basic idea remains the same.
Keep in mind that the operation is nearly identical for any base. To remove one trailing decimal digit, you do / 10. If you had 0b0111 and you wanted to remove one digit, it would have to be /2. Or you could have 0xff / 16 to get 0x0f.
You have to realize that numbers don't have digits, only strings do, and how many (and which) digits they have depends entirely on the base (which numbers don't have either). Internally, computers use what amounts to binary strings. So in general, manipulating base 10 digits requires you to convert the number to a string first - or do calculations that are the same you would do when converting it to a string. However, for your specific task of removing leading and trailing digits, these calculations (modulo and integer division) are very simple and much faster than converting the entire number.
i think that converting to string and then remove the first char wont do the trick.
i believe that the alg for converting to string is doing the div-mod routine, for optimization you might as well do the div-mod alg by yourself and manipulate it to your needs
Here is C++ code ... It's not tested.
int myPow ( int n , int k ){
int ret = 1;
for (int i=0;i<k;++i) ret*=n;
return ret;
}
int countDigits (int n ){
int count = 0;
while ( n )++count, n/=10;
return count;
}
int getLastDigits (int number , int numDigits ){
int tmp = myPow ( 10 , numDigits );
return number % tmp;
}
int getFirstDigits (int number, numDigits ){
int tmp = myPow ( 10, countDigits ( number) - numDigits );
return nuber / tmp;
}