Integer sequences

This project really started when I was implementing an optimized json parser in C. And how does one figure out what the weight of each binary index into each decimal index.

If we think about the number $15$ and we want to represent it as binary, we immediately know that the 1st index must be $0$. This follows from the rule that in binary an even number must have their 1st index as $1$.

Now just for the sake of formalness how does this rule propagate? Let us start from the bottom first we start with the number $0$, then we find the next number that has a 1st index that is $0$ in base 2. This comes out to be $2$. If we continue this list becomes $\left[0,2,4,6,8,10,12,...\right]$. This means that the 1st index into the binary number effectively acts as a $+2$. And this is where the first method of going from binary to decimal comes from.

If we extrapolate this out to the other indexs, the rule comes out very quickly that for each index into the binary number, this can be treated as a $+2^{index}$.

But this is slow, we have to compose each of the numbers every time, by doubling, and then doubling again. For arbitrarily long integers in base 10 – like those found in json files– this can become quite costly.

So what if we abstracted away the idea of how much a binary number represent in decimal, and instead found the amount that each binary index affects each decimal index. As we have already found the increasing indexes of base 2 are the powers of two, which are represented with $2^{index-1} = \left[0,2,4,8,16,32,64,128,...\right]$. And you might notice that there is a repeating sequence present there. If we are looking at the 1st index into the decimal numbers we get the list $\left[0,2,4,8,6,2,4,8,...\right]$.

We can then treat this as a function $F$ which returns a set. $F\left(radix_{base}=2,radix_{transposed}=10,index_{transposed}=0\right)$ $=$ $\left\{2,4,8,6,2,4,8,6,...\right\}$, where radix corresponds to the number of unique symbols we are counting with. Decimal is radix 10, while binary is radix 2. Where the base is where we are coming from, and the transposed value is the base that we are going to, and we are finding the index into the transposed number that has some affected value. I will be writing this function as $R_{radix_{base},radix_{transposed}}\left(index_{transposed}\right)$, so for this specific case it would be $R_{2,10}\left(0\right)$.

This really becomes interesting when we get to $R_{2,10}\left(1\right)$. This is probably the last one that you would be willing to work out by hand, but just for the sake of argument let’s go through a few. We are essentially looking at how much each power of two has an effect into the “tens” place of any decimal number. So for $2^0$ its $0$, for $2^1$ its 2. This continue in kind until we get to $2^4 = 16$ where that “tens” place has become a 1. If we now continue we know it will be following the powers of two still:

\[32 \to 3\] \[64 \to 6\] \[128 \to 2\] \[256 \to 5\] \[512 \to 1\]

Okay, grand, we are back at a value we’ve seen before, does that mean that it will loop?

\[1024 \to 2\]

Of course not! So the question becomes will this loop. And this is really what the project was, trying to find more and more optimized ways of finding higher indexes. To cut to the chase we find out that indeed it does loop, after 20 slots.

$$R_{2,10}\left(1\right) = \left\{ 0, 0, 0, 0, 1, 3, 6, 2, 5, 1, 2, 4, 9, 9, 8, 6, 3, 7, 4, 8\right\}$$

Now that is a pretty alarming jump, so the question then becomes does every sequence loop? And if so, how long are those loops. I will not go through the proof for each of the sequences looping here. However, they do, and the lenght of the loops follows a regular pattern. Namely, $\|R_{2,10}\left(n\right)\|=4*5^{n-1}$. This has many other fun and interesting sub problems and proofs, like what is $f\left(x,n\right) = \|R_{x,10}\left(n\right)\|$. They will be left to the reader if they want.

This means that for any length binary number we will be able to directly compute the “ones” and the “tens” place of the resulting number with minimal cost as they just will loop. For example we know that $10011011$ will end in a $55$. Or, where this actually becomes useful, given the following base64 number 9fkjX1POg0dRyN6eR8ZWV4QLVTFADIolmhAeos+dYVs= (2048 bit long base 2 number), we know that it will end in a $91$ in decimal with trivial compute complexity.

Getting back to the original question, is this actually going to be useful for json parsing?

No. Not at all.

Is this really useful for anything?

Not that I know of. No.

The source code can be found here.