This works pretty well. The latency is a bit painful, but it's the only approach I've really tried, and I don't want to redesign. It is also very pretty, and exactly 1000g, so I don't feel too bad. I had a 990g solve at 53 cycles before making this, and I felt like I could squeak out one more cycle. I realized that because the top half runs two cycles ahead of the bottom half, I can give up one cycle on top, saving 10g, to gain one cycle on the bottom, spending 20g. That saves me one cycle overall for an extra 10g, which puts me at exactly 1000g. Hooray. I'm pretty sure theory min is in the 22-24 range - the latency makes it difficult to calculate exactly and I didn't bother because of the cost restriction. If anyone gets under 30 I will be stunned. Good luck everyone.