@roman3017: For whatever reason I only just saw this post, because I was testing the DLL on the most recent silicon and was searching the channels to see what I had already said about it in the past. The answer to your question is: (1) For each set of 3 stages, two are programmable, and of those two, one shadows the other. In other words, if the first stage is turned on then it bypasses the second stage, so then it doesn't make any difference at all whether or not the second stage is turned on or not. So there are two groups of bits, and the bits in one group must be turned on before the corresponding bits in the other group. Otherwise, (2) in theory, there is no difference in frequency between any sets of trim bits with the same number of ones and zeros. In practice, grouping the same path through the same multiplexers all on one side of the ring oscillator could cause larger or smaller than expected delays steps between certain trim points, so I designed my encoding to balance the multiplexing around the ring oscillator as much as possible.
Having just measured the DLL for the first time running correctly, I get a measured jitter of 1.3ns. The numbers I cited in the documentation are what you would get if the ring oscillator were hand-optimized. Instead, we just ran the thing through standard place and route with the usual low core utilization, so there are significant parasitic wire delays. At 1.8V the DCO reaches 118MHz; by upping the vccd voltage to 2.0V it can it 150MHz. Consequently, though, at standard 1.8V the delays are about double the ideal value. The 1.3ns RMS jitter number is quite tolerable, though, considering that it is done completely through standard digital synthesis.