Ok so here's the thing - elemental comparator is a trap!

My very first idea for this puzzle was to use a test atom to somehow generate the 3 other cardinals, and then use those 3 cardinals to sense for match or no match.  But I didn't like that, because it would require so many unification attempts per result.  Then, when I focused instead on elemental comparator, I figured out what I thought was "the strat":

If two atoms are different from each other, then there exists another pair of atoms such that unification will proc.  Conversely, if you can't get unification to proc, then the test atoms must be the same.  Doesn't matter which actual type they are.

My first few designs for habitability detector relied on the idea that I was comparing two unknown atoms, both on equal footing.  To reduce amount of unifications, I first used triplex and a known fire, to determine the easy cases (one fire atom = no match, two fire atoms = match).  Then I used used unification to check for the 3 remaining possible no-match pairings.

All of this led to a machine that needed to make 216 comparisons to successfully complete habitability detector.  I started off around 1100 wsum, with a machine that used one unification glyph every 3 cycles, totaling a little over 700 cycles.  My next design intended to use 3 unification glyphs, each with a dedicated pair of atoms it was sensing for, but I abandoned this when the cost seemed like it would be over 2000, leading to a worse wsum.  It was also a nightmare to initialize.

I made an entire new design, still in the range of 700 cycles but a few dozen wsum better, primarily by saving cost.

But all of this was falling into the elemental comparator trap!  My earlier idea was actually better, when you needed to run a lot of comparisons to the same match pattern.. because duplication exists.  Once per test atom in the match pattern, you do the awkward busywork of finding out the other 3 cardinals.  But then, each time you want to evaluate equality, you first duplicate the proper trio, and then it takes just one additional unification to get an answer!

I made two designs based on this principle.  One of them had a big triangle, and the next much cheaper iteration had a pinwheel.  Both scored around 400 cycles and around 180 area, and the wsum got down under 800.

Instead of 216 comparisons, it takes just 84.  Four per test atom, and then one for each of the 12 comparisons needed on each of the 6 outputs.

I'm pretty sure that you can still save a lot of area by adding cost, but I don't want to try to explore the tradeoffs in that dimension.  Every redesign is a new headache. 

My pinwheel looks cool, and so it is good enough for me.