As you may be able to tell from the solve title, I was very surprised that this solution ended up being good. When I drafted this, it only saved me 6 cycles to my prior best; despite saving 6 rate, I'd added 24 latency in the process. Surely it would have been easier to find a rate saving trick in my old solution? However, after taking another look at it a few times, I managed to find ways to remove an entire 10 of those latency, leaving this solve far beyond what any of my prior designs could possibly have become. Also shoutout to arm 2. it used to be 2 different arms, one used in setup that needs to get out of the way, and another that helps the main loop. turns out there was just enough time for the one-use arm to get into place as the looping arm, which means despite the setup time, every arm (except berlo) is actually used in the main loop of the machine, which is pretty neat.