OK Maybe one more iteration. This is a bit cleaner than my previous 12P solution with an extra cycle and some area saved. I really feel like 8P should be possible but I am not seeing an efficient way to get the input out of the way in order to make it possible.