While likely not viable in software, it seem to absolutely be possible to optimize hardware to use fewer xors in computing the exponent of a multiplication. While it doesn't help with carries, addition is largely xors, which is commutative and associative, so
s xor a xor b xor c xor s
can reduce to
a xor b xor c
.
I don't know how FPGAs work, but they seem to use LUTs more than primitive logic gates, so maybe this doesn't actually help in that context.