Sort accumulating operands on to LHS of commutative operations

Description

For all SSE rules we play the following game:

r: FLOAT_ADD(r, r)
p.child1.isREGISTERNode() ? 11 : 13
EMIT_INSTRUCTION
SSE2_COP(IA32_ADDSS, P(p), Binary.getResult(P(p)), Binary.getVal1(P(p)), Binary.getVal2(P(p)));

r: FLOAT_ADD(r, r)
p.child2.isREGISTERNode() ? 11 : 13
EMIT_INSTRUCTION
SSE2_COP(IA32_ADDSS, P(p), Binary.getResult(P(p)), Binary.getVal2(P(p)), Binary.getVal1(P(p)));

that is, we make the rule cheaper if we find that the child of the operand on the LHS is truly a register. I believe the rational is to have a register that accumulates the result. We don't play the same game for integer operations, for example:

czr: INT_ADD(r, riv)
13
EMIT_INSTRUCTION
EMIT_Commutative(IA32_ADD, P(p), Binary.getResult(P(p)), Binary.getVal1(P(p)), Binary.getVal2(P(p)));

which if recoded using the SSE style scheme would be:

czr: INT_ADD(r, riv)
p.child1.isREGISTERNode() ? 11 : 13
EMIT_INSTRUCTION
EMIT_Commutative(IA32_ADD, P(p), Binary.getResult(P(p)), Binary.getVal1(P(p)), Binary.getVal2(P(p)));

czr: INT_ADD(r, riv)
p.child2.isREGISTERNode() ? 11 : 13
EMIT_INSTRUCTION
EMIT_Commutative(IA32_ADD, P(p), Binary.getResult(P(p)), Binary.getVal2(P(p)), Binary.getVal1(P(p)));

It strikes me that making these cost methods dynamic isn't a good thing, instead maybe we could pass over the IR making sure that true register operands occur on the LHS, this will reduce the number of rules by half and remove a runtime cost in BURS. If this doesn't make sense then we should consider adding the SSE style isREGISTERNode dynamic costs to integer operations.

Environment

None

Assignee

Ian Rogers

Reporter

Ian Rogers

Labels

None

Fix versions

Priority

Medium
Configure