Slightly improved cost, cycles, and area over my first submission of this. Probably won't have time for further optimisations, but if I did I would try to break up arm 7 into multiple other arms as it seems to be the current bottleneck. Also I'd consider if I can change how arm 1 does stuff to conserve cycles and area.