ITA Software
I tried to solve the ITA Software word-numbers problem recently but hit a snag. User dons at Reddit posted Dylan Thurston, and Ken Shan's approach to Reddit. Thanks to them all for inspiring me to make this post describing my approach.
My background is more bottom-up than top down. I went from lots of scribbling and small programs to check my results to a program just large enough to get what I thought should be the answer in a reasonable amount of time. Instead my computed 51 billionth character was two away from the end of a number string. I'm eager to see the rest of Dylan and Ken's writeup to see where I went off the rails.
I observe/claim that once a "million" or "thousand" suffix is reached in the sorted number list, the next million minus one (or thousand minus one, respectively) entries will have the same prefix, whatever their alphabetical sorting among them.
For example, the full sorted list starts with "eight", "eighteen", and "eighteenmillion" but then the next 999,999 entries are all prefixed by "eighteenmillion". Now the strings "one" through "ninehundredninetyninethousandninehundredninetynine" are 44,872,000 long; if they are prefixed with "eighteenmillion", for example, I have to add another 15 times 1,000,000 (because "eighteeenmillion" appears 1,000,000 times) to that subtotal. So the first 1,000,002 strings add up to 5 + 8 + (15*1,000,000 + 44,872,000) = 59,872,013.
Combining these observations, I wrote a program, partialsums, that takes the 2997 strings representing 1, 2,
..., 999, 1000, 2000, ..., 999000, 1000000, 2000000, ..., 999000000 and sorts them. Then for a string which ends with "thousand" or "million" it computes the contribution of that prefix and its 999 or 999,999 following entries as I showed with the example above, and adds that to the running total.
The command line and some lines of output are:
partialsums 1-1000,1 \ 1000-1000000,1000 \ 1000000-1000000000,1000000 sixhundredseventysix 20 50958081297 sixhundredseventysixmillion 71872000 51029953297 sixhundredseventysixthousand 46440 51029999737
This showed that the 51 billionth letter was in a substring of 676,000,000, since after those million entries, the total number of characters is 51,029,953,297
I re-ran the program, this time expanding 676,000,000 by thousands instead of millions, but the other multiples of one million were still expanded by a million.
The command:
partialsums 1-1000,1 1000-1000000,1000 \ 1000000-676000000,1000000 \ 676000000-677000000,1000 \ 677000000-1000000000,1000000
narrowed the search to:
sixhundredseventysixmillionsevenhundredfortyonethousand 73440 50999929937 sixhundredseventysixmillionsevenhundredfortyseventhousand 75440 51000005377 sixhundredseventysixmillionsevenhundredfortysixthousand 73440 51000078817
And finally:
partialsums 1-1000,1 \
1000-1000000,1000 \
1000000-676000000,1000000 \
676000000-676747000,1000 \
676747000-676748000,1 \
676748000-677000000,1000 \
677000000-1000000000,1000000
came up with:
sixhundredseventysixmillionsevenhundredfortyseventhousandtwohundredfortyeight 77 50999999926 sixhundredseventysixmillionsevenhundredfortyseventhousandtwohundredfortyfive 76 51000000002 sixhundredseventysixmillionsevenhundredfortyseventhousandtwohundredfortyfour 76 51000000078
That's where I'm stumped as the claim was that the 51,000,000,000th letter ended a word.