ITA Software

I tried to solve the ITA Software word-numbers problem recently but hit a snag. User dons at Reddit posted

">Dylan Thurston

, and Ken Shan's approach to Reddit. Thanks to them all for inspiring me to make this post describing my approach.

My background is more bottom-up than top down. I went from lots of scribbling and small programs to check my results to a program just large enough to get what I thought should be the answer in a reasonable amount of time. Instead my computed 51 billionth character was two away from the end of a number string. I'm eager to see the rest of Dylan and Ken's writeup to see where I went off the rails.

I observe/claim that once a "million" or "thousand" suffix is reached in the sorted number list, the next million minus one (or thousand minus one, respectively) entries will have the same prefix, whatever their alphabetical sorting among them.

For example, the full sorted list starts with "eight", "eighteen", and "eighteenmillion" but then the next 999,999 entries are all prefixed by "eighteenmillion". Now the strings "one" through "ninehundredninetyninethousandninehundredninetynine" are 44,872,000 long; if they are prefixed with "eighteenmillion", for example, I have to add another 15 times 1,000,000 (because "eighteeenmillion" appears 1,000,000 times) to that subtotal. So the first 1,000,002 strings add up to 5 + 8 + (15*1,000,000 + 44,872,000) = 59,872,013.

Combining these observations, I wrote a program, partialsums, that takes the 2997 strings representing 1, 2, ..., 999, 1000, 2000, ..., 999000, 1000000, 2000000, ..., 999000000 and sorts them. Then for a string which ends with "thousand" or "million" it computes the contribution of that prefix and its 999 or 999,999 following entries as I showed with the example above, and adds that to the running total.

The command line and some lines of output are:

partialsums 1-1000,1 \
1000-1000000,1000 \

sixhundredseventysix 20 50958081297
sixhundredseventysixmillion 71872000 51029953297
sixhundredseventysixthousand 46440 51029999737

This showed that the 51 billionth letter was in a substring of 676,000,000, since after those million entries, the total number of characters is 51,029,953,297

I re-ran the program, this time expanding 676,000,000 by thousands instead of millions, but the other multiples of one million were still expanded by a million.

The command:

partialsums 1-1000,1 1000-1000000,1000 \
1000000-676000000,1000000 \
676000000-677000000,1000 \

narrowed the search to:

sixhundredseventysixmillionsevenhundredfortyonethousand 73440 50999929937
sixhundredseventysixmillionsevenhundredfortyseventhousand 75440 51000005377
sixhundredseventysixmillionsevenhundredfortysixthousand 73440 51000078817

And finally:

partialsums 1-1000,1 \
            1000-1000000,1000 \
            1000000-676000000,1000000 \
            676000000-676747000,1000 \
            676747000-676748000,1 \
            676748000-677000000,1000 \

came up with:

sixhundredseventysixmillionsevenhundredfortyseventhousandtwohundredfortyeight 77 50999999926
sixhundredseventysixmillionsevenhundredfortyseventhousandtwohundredfortyfive 76 51000000002
sixhundredseventysixmillionsevenhundredfortyseventhousandtwohundredfortyfour 76 51000000078

That's where I'm stumped as the claim was that the 51,000,000,000th letter ended a word.

changed November 9, 2007