Summary
When compiling SIMD f16 vector compares for WebAssembly, SelectionDAG type legalization can enter a non-converging/cyclic legalization path.
The problem seems to happen when a v4f16 compare is first split/scalarized, but then the compare result is re-promoted/rebuilt back into vector integer masks (v2i16 / v4i16 / v8i16) through generic legalization paths such as concat_vectors, vector_shuffle, extract_vector_elt, and BUILD_VECTOR.
This eventually creates a cycle in the DAG instead of making progress.
Reproducer
A minimal reproducer looks like this:
define void @f(ptr %out, float %x) {
%h = fptrunc float %x to half
%v = insertelement <4 x half> poison, half %h, i32 0
%cmp = fcmp une <4 x half> %v, zeroinitializer
%shuf = shufflevector <4 x i1> %cmp, <4 x i1> poison,
<4 x i32> zeroinitializer
%res = uitofp <4 x i1> %shuf to <4 x float>
store <4 x float> %res, ptr %out
ret void
}
What seems to happen
The initial/optimized lowered DAG is still reasonable:
- v4i1 = setcc
- v4i1 = vector_shuffle
- v4f32 = uint_to_fp
However, during type legalization, the v4f16 inputs are split/scalarized, and then the compare result is rebuilt as wider integer vector masks. In the failing path, legalization eventually constructs nodes like:
t61: v8i16 = vector_shuffle<0,8,u,u,u,u,u,u> t79, t56
t79: v8i16 = BUILD_VECTOR t82, t84, ...
t82: i32 = extract_vector_elt t61, 0
t84: i32 = extract_vector_elt t61, 1
which creates a cycle:
t61 -> t79 -> t82 -> t61
t61 -> t79 -> t84 -> t61
At that point legalization stops making progress and eventually fails with Operand not processed? on the user of that chain.
Why I think this is specific to the SIMD path
If I compile the same pattern without +simd128, legalization does not rebuild the compare result back into v*i16 vector masks.
Instead, it keeps splitting/scalarizing:
v4i1 -> v2i1 -> v1i1 -> i1
v4f16 -> v2f16 -> v1f16
half comparison eventually becomes a soft-promoted/scalarized f32 compare
the result is then converted and stored as scalars
That path converges cleanly and does not form cycles.
So the issue does not seem to be the f16 comparison semantics themselves; it appears to be the SIMD legalization path re-promoting compare masks after splitting.
Suspected root cause
The root cause appears to be:
v4f16 compare inputs are split/scalarized during legalization.
In the SIMD path, the setcc result is then re-promoted/rebuilt into integer vector masks (v2i16 / v4i16 / v8i16).
Generic vector legalization paths for those masks (concat_vectors, vector_shuffle, extract_vector_elt, BUILD_VECTOR) can end up feeding rebuilt nodes back into themselves, producing a cycle.
In other words, the problem seems to be not setcc itself, but the fact that a split/scalarized f16 compare result is later reassembled into vector masks through generic promotion/widening logic.
Possible fixes
I think the most promising fix direction is:
Option 1 (preferred)
Avoid re-promoting SIMD f16 compare results through the generic vector-mask legalization path.
For WebAssembly SIMD f16 compares, once legalization starts splitting/scalarizing the compare, keep following that direction (similar to the non-SIMD path) instead of rebuilding the result as v*i16 masks.
Notes from debugging
I instrumented legalization and observed that the cycle is detected immediately after rewriting the second extract_vector_elt from the widened setcc source. I also tried bypassing some rebuild paths, which changed the shape of the cycle but did not eliminate the underlying problem, suggesting the more fundamental issue is the generic re-promotion/rebuild of setcc-derived masks in the SIMD f16 path.
Related issue : #171908, #189251
Summary
When compiling SIMD
f16vector compares for WebAssembly, SelectionDAG type legalization can enter a non-converging/cyclic legalization path.The problem seems to happen when a
v4f16compare is first split/scalarized, but then the compare result is re-promoted/rebuilt back into vector integer masks (v2i16/v4i16/v8i16) through generic legalization paths such asconcat_vectors,vector_shuffle,extract_vector_elt, andBUILD_VECTOR.This eventually creates a cycle in the DAG instead of making progress.
Reproducer
A minimal reproducer looks like this:
What seems to happen
The initial/optimized lowered DAG is still reasonable:
However, during type legalization, the v4f16 inputs are split/scalarized, and then the compare result is rebuilt as wider integer vector masks. In the failing path, legalization eventually constructs nodes like:
which creates a cycle:
At that point legalization stops making progress and eventually fails with Operand not processed? on the user of that chain.
Why I think this is specific to the SIMD path
If I compile the same pattern without +simd128, legalization does not rebuild the compare result back into v*i16 vector masks.
Instead, it keeps splitting/scalarizing:
half comparison eventually becomes a soft-promoted/scalarized f32 compare
the result is then converted and stored as scalars
That path converges cleanly and does not form cycles.
So the issue does not seem to be the f16 comparison semantics themselves; it appears to be the SIMD legalization path re-promoting compare masks after splitting.
Suspected root cause
The root cause appears to be:
v4f16 compare inputs are split/scalarized during legalization.
In the SIMD path, the setcc result is then re-promoted/rebuilt into integer vector masks (v2i16 / v4i16 / v8i16).
Generic vector legalization paths for those masks (concat_vectors, vector_shuffle, extract_vector_elt, BUILD_VECTOR) can end up feeding rebuilt nodes back into themselves, producing a cycle.
In other words, the problem seems to be not setcc itself, but the fact that a split/scalarized f16 compare result is later reassembled into vector masks through generic promotion/widening logic.
Possible fixes
I think the most promising fix direction is:
Option 1 (preferred)
Avoid re-promoting SIMD f16 compare results through the generic vector-mask legalization path.
For WebAssembly SIMD f16 compares, once legalization starts splitting/scalarizing the compare, keep following that direction (similar to the non-SIMD path) instead of rebuilding the result as v*i16 masks.
Notes from debugging
I instrumented legalization and observed that the cycle is detected immediately after rewriting the second extract_vector_elt from the widened setcc source. I also tried bypassing some rebuild paths, which changed the shape of the cycle but did not eliminate the underlying problem, suggesting the more fundamental issue is the generic re-promotion/rebuild of setcc-derived masks in the SIMD f16 path.
Related issue : #171908, #189251