Initialization takes 40 cycles, and assembly of products takes 36 cycles. Filling the left and right buffers can be sustained indefinitely at 4R (drawing one input every 2 cycles!) and assembly is done at 6R, for a total of 10R; that is, each additional output requires another 10 cycles provided you appropriately extend the solve to left and right. It feels like this approximate layout should be able to get down to about 65 cycles but I only marginally improved over my initial 83 cycle (14R = 6R initialization + 8R assembly) solve which had a much worse layout.