Non-SIMD math instructions are missing. #344

mcourteaux · 2025-02-21T09:38:15Z

I was looking at some performance metrics coming from vg-renderer, and it's spending a conciderable amount of time in vg::vec2Dir for example. Looking into the calculations of that, I saw that it calls to bx::rsqrt, which seems like 100% the correct thing to do:

// Direction from a to b
inline Vec2 vec2Dir(const Vec2& a, const Vec2& b)
{
	const float dx = b.x - a.x;
	const float dy = b.y - a.y;
	const float lenSqr = dx * dx + dy * dy;
	const float invLen = lenSqr < VG_EPSILON ? 0.0f : bx::rsqrt(lenSqr);
	return{ dx * invLen, dy * invLen };
}

However, bx::rsqrt() does not have an implementation mapping to _mm_rsqrt_ss for x64 (SSE).
There is a mechanism that splats the value over the whole vector and compiles in a _mm_rsqrt_ps and then extracts one elements, which I'd consider wasteful, and prevents the compiler, or micro-architectures from potentially vectorizing this.

So, I'm thinking: those functions are missing? But I see that they don't really have a place right now, as all of the files are named simd_xxx, and this is an example of a non-SIMD instruction.

The text was updated successfully, but these errors were encountered:

mcourteaux · 2025-02-21T15:48:30Z

For reference, I PR'd this in vg-renderer to work around this issue, and overall improve performance: jdryg/vg-renderer#43

bkaradzic · 2025-02-21T17:08:46Z

Code you're talking about is here:

bx/include/bx/inline/math.inl

Lines 768 to 780 in 8e9a998

    
           	inline BX_CONSTEXPR_FUNC float rsqrt(float _a) 
        
           	{ 
        
           #if BX_SIMD_SUPPORTED 
        
           		if (isConstantEvaluated() ) 
        
           		{ 
        
           			return rsqrtRef(_a); 
        
           		} 
        
           		return rsqrtSimd(_a); 
        
           #else 
        
           		return rsqrtRef(_a); 
        
           #endif // BX_SIMD_SUPPORTED 
        
           	}

SIMD implementation here:

bx/include/bx/inline/math.inl

Lines 718 to 736 in 8e9a998

    
           	inline BX_CONST_FUNC float rsqrtSimd(float _a) 
        
           	{ 
        
           		if (_a < kFloatSmallest) 
        
           		{ 
        
           			return kFloatInfinity; 
        
           		} 
        
           		const simd128_t aa = simd_splat(_a); 
        
           #if BX_SIMD_NEON 
        
           		const simd128_t rsqrta = simd_rsqrt_nr(aa); 
        
           #else 
        
           		const simd128_t rsqrta = simd_rsqrt_ni(aa); 
        
           #endif // BX_SIMD_NEON 
        
           		float result = 0.0f; 
        
           		simd_stx(&result, rsqrta); 
        
           		return result; 
        
           	}

There is a mechanism that splats the value over the whole vector and compiles in a _mm_rsqrt_ps and then extracts one elements, which I'd consider wasteful, and prevents the compiler, or micro-architectures from potentially vectorizing this.

You need to load float to SIMD register somehow, splat is one way to load it. Extracting one component from SIMD register is because result expected is float.

Ideally for vg-renderer SIMD functions in your PR you should call bx SIMD stuff, instead SSE intrinsic directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-SIMD math instructions are missing. #344

Non-SIMD math instructions are missing. #344

mcourteaux commented Feb 21, 2025

mcourteaux commented Feb 21, 2025

bkaradzic commented Feb 21, 2025

Non-SIMD math instructions are missing. #344

Non-SIMD math instructions are missing. #344

Comments

mcourteaux commented Feb 21, 2025

mcourteaux commented Feb 21, 2025

bkaradzic commented Feb 21, 2025