Very interesting metric that was fairly hard to work with. Two latency cycles are lost by grabbing the third output on cycle 8 and one is lost in the final assembly. I think buddy duping two inputs is the way to go for min latency, I attempted to dupe air-earth-salt into salt-salt-air but that did not yield good results. I have other 11-latency solutions with the first two inputs duped on cycle 6 and the third input grabbed on cycle 6. Unfortunately, geometry doesn't allow it to go beyond 11 latency, so this ugly solve wins because it is cheaper. I think 10 is definitely possible but spending more time for the off chance of finding the one layout that actually works was not worth it for me. I'd be astonished if I saw single-digit latency though.