WIP: clang build fixes and workarounds#16
Conversation
experimental/bits/simd_x86.h
Outdated
| else | ||
| { | ||
| __builtin_memcpy(&__a, __mem + 16, 32); | ||
| __builtin_memcpy(&__a, static_cast<const char*>(__mem) + 16, 32); |
There was a problem hiding this comment.
my compiler flags this operation to always overflow, since __m128i __a is 16 bytes thing
There was a problem hiding this comment.
This looks broken, yes. I'll take a look.
There was a problem hiding this comment.
Yes, this needs to be 16. Seems I have no test coverage for this case 😉
There was a problem hiding this comment.
There is already some value from clang then :)
| // __is_vectorizable {{{ | ||
| template <typename _Tp> | ||
| struct __is_vectorizable : public std::is_arithmetic<_Tp> | ||
| struct __is_vectorizable : public std::is_arithmetic<std::remove_reference_t<_Tp>> |
There was a problem hiding this comment.
why is this needed? _Tp should never be a reference. And references are not vectorizable. (Pointers might be - needs a proposal)
There was a problem hiding this comment.
clang by some reason returns long long&& out of operator[] in my example:
https://godbolt.org/z/WJN7_M
There was a problem hiding this comment.
Then whatever calls __is_vectorizable<decltype(x[0])> is incorrect.
There was a problem hiding this comment.
Yes, thus a number of remove_references will be required to workaround that thing across other places.
|
My hello world example, which I use to make initial build possible have only 3 remaining errors: Are there ideas on how to address that? |
| ? __min_vector_size<_Tp> | ||
| : __next_power_of_2(_Np * sizeof(_Tp)); | ||
| using type [[__gnu__::__vector_size__(_Bytes)]] = _Tp; | ||
| using type [[__gnu__::__vector_size__(_Bytes)]] = std::remove_reference_t<_Tp>; |
There was a problem hiding this comment.
again, _Tp should never be a reference
| // split<_Sizes...>(simd) {{{ | ||
| template <size_t... _Sizes, typename _Tp, typename _Ap, | ||
| typename = enable_if_t<((_Sizes + ...) == simd<_Tp, _Ap>::size())>> | ||
| template <size_t... _Sizes, typename _Tp, typename _Ap, typename> |
There was a problem hiding this comment.
removing the SFINAE condition breaks the spec
There was a problem hiding this comment.
SFINAE is there in declaration. Here it is definition. clang says it is redefinition of default argument
experimental/bits/simd_builtin.h
Outdated
| }; | ||
| [[maybe_unused]] const auto __vi = __to_intrin(__v); | ||
| auto&& __make_array = [](std::initializer_list<auto> __xs) { | ||
| auto&& __make_array = [](auto __xs) { |
There was a problem hiding this comment.
This can't work. An initializer list argument cannot be deduced as one. But I have a fix coming up for this. Erich mentioned it yesterday.
There was a problem hiding this comment.
Yes, we talked with Erich yesterday about that.
|
|
https://godbolt.org/z/NeTaNs apparently only Clang rejects this pattern. That doesn't prove Clang is wrong. Not sure where to start in the standard 😉 |
|
Try this: @@ -405,7 +405,12 @@ __is_neon_abi()
⸱
// }}}
// __make_dependent_t {{{
-template <typename, typename _Up> using __make_dependent_t = _Up;
+template <typename, typename _Up> struct __make_dependent
+{
+ using type = _Up;
+};
+template <typename _Tp, typename _Up>
+using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type;
⸱
// }}}
// ^^^ ---- type traits ---- ^^^ |
That made this thing compile: |
|
Basic arithmetic ops look working. Next broken thing is here: // _S_signmask, _S_absmask{{{
template <typename _V, typename = _VectorTraits<_V>>
static inline constexpr _V _S_signmask = __xor(_V() + 1, _V() - 1);clang does not consider it a constant expression |
|
clang does not consider it a constant expression
https://godbolt.org/z/RpWrqw
That's an unfortunate restriction in Clang. The better solution would be if
Clang were to support constant expressions involving [[gnu::vector_size(N)]]
objects. (Note that I want to propose constexpr simd for the C++23 inclusion
[1].) Until that happens we should define a macro to turn this kind of
constexpr into a const.
[1] mattkretz/std-simd-feedback#14
|
|
|
experimental/bits/simd_detail.h
Outdated
| #if __clang__ | ||
| #define _GLIBCXX_CONSTEXPR_SIMD | ||
| #else | ||
| #define _GLIBCXX_CONSTEXPR_SIMD constexpr | ||
| #endif |
There was a problem hiding this comment.
My naming convention has been to prefix everything with _GLIBCXX_SIMD_. I understand naming the macro "constexpr simd" is trying to explain the purpose of the macro. SIMD_CONSTEXPR_SIMD is just confusing again, though.
<bits/c++config> has _GLIBCXX_USE_CONSTEXPR which seems like a good fit here. I.e. name it _GLIBCXX_SIMD_USE_CONSTEXPR.
And define the __clang__ branch to const.
Interesting. Maybe you can check whether the _S_absmask constants are correct?
This is, IIRC, still just a loop over |
|
__andnot(_S_signmask<_V>, _S_allbits<_V>); // does not work, I have no idea why
auto a =_S_signmask<_V>;
auto b = _S_allbits<_V>;
__andnot(a, b); // does work |
|
Last unresolved error for stdx::sin() to compile: /std-simd/experimental/bits/simd_builtin.h:1663:12: error: call to '__and' is ambiguous
return __and(__x._M_data, __y._M_data);
^~~~~
/std-simd/experimental/bits/simd.h:4569:14: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplBuiltin<std::experimental::parallelism_v2::simd_abi::_VecBuiltin<8> >::__bit_and<int, 2>' requested here
_Impl::__bit_and(__data(__x), __data(__y)));
^
/std-simd/experimental/bits/simd_math.h:576:48: note: in instantiation of member function 'std::experimental::parallelism_v2::operator&' requested here
const auto __need_sin = (__f._M_quadrant & 1) == 0;
^
/std-simd/experimental/bits/simd_fixed_size.h:1632:37: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
_GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin)
^
/std-simd/experimental/bits/simd_math.h:558:46: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplFixedSize<4>::__sin<double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
return {__private_init, _Abi::_SimdImpl::__sin(__data(__x))};
^
/std-simd/experimental/bits/simd_math.h:564:6: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<double, std::experimental::parallelism_v2::simd_abi::_Fixed<4> >' requested here
sin(static_simd_cast<rebind_simd_t<double, _V>>(__x)));
^
/std-simd/experimental/bits/simd_fixed_size.h:1632:37: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<float, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
_GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin)
^
/std-simd/experimental/bits/simd_math.h:558:46: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplFixedSize<32>::__sin<float, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
return {__private_init, _Abi::_SimdImpl::__sin(__data(__x))};
^
my_test.cpp:69:39: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<float, std::experimental::parallelism_v2::simd_abi::_Fixed<32> >' requested here
print_simd("stdx::sin( fa )", stdx::sin( fa ) );
^
/std-simd/experimental/bits/simd.h:1615:1: note: candidate function [with _Tp = __attribute__((__vector_size__(2 * sizeof(int)))) int, _TVT = std::experimental::parallelism_v2::_VectorTraitsImpl<__attribute__((__vector_size__(2 * sizeof(int)))) int, void>, _Dummy = <>]
__and(_Tp __a, typename _TVT::type __b, _Dummy...) noexcept
^
/std-simd/experimental/bits/simd.h:1626:1: note: candidate function [with _Tp = __attribute__((__vector_size__(2 * sizeof(int)))) int, $1 = __attribute__((__vector_size__(2 * sizeof(int)))) int]
__and(_Tp __a, _Tp __b) noexcept
^
|
This smells like a compiler bug: https://godbolt.org/z/SYU0Fp |
Interesting. Clang and GCC disagree on how overload resolution works here. |
|
Ok, my stdx::sin example works, and provides reasonable numbers. Got a number of similar warnings though: In file included from /std-simd/experimental/simd:62:
/std-simd/experimental/bits/simd_x86.h:2452:25: warning: '__builtin_is_constant_evaluated' will always evaluate to 'true' in a manifestly constant-evaluated expresnstant-evaluated]
else if constexpr (!__builtin_is_constant_evaluated() && sizeof(__x) == 8) // {{{
^ |
|
Tried to compile tests. for testfile in *.cpp; do
$CXX -std=c++17 -Ivirtest -D__remove_cvref_t=std::remove_cvref_t \
-Wno-everything \
-D_GLIBCXX_SIMD_ABI=__sse $testfile
doneThose tests compiled:
Those tests require __make_array workaround:
Internal compiler error
Other reasons:
2020.05.15: Updated based on __andnot fix |
|
BTW, just so you're aware: I'm currently working on integrating this repo into libstdc++ and the relevant repo would be mattkretz/gcc with the mkretz/simd2 branch. However, I regularly force-push into that branch, so don't switch to working there. |
|
Then I'll continue trying to build it here. Here is an error I get from tests I have reproduced it with the following example: Update: resolved. |
|
I am observing strange behavior with Here ABI is deduced as void foo(stdx::simd_mask<long double> a);But that thing is deduced to void foo() {
stdx::simd_mask<long double>();
}Looks like that: |
The error you see from constructing an object of type |
|
At the moment I see a weirdest behavior. In the same compilation unit |
|
Ok, |
|
@mattkretz, do you happen to think of |
|
Next thing I have, is absence of unary not operator in clang. This fires in operators test. template <typename _Tp, size_t _Np>
_GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
__negate(_SimdWrapper<_Tp, _Np> __x) noexcept
{
return __vector_bitcast<_Tp>(!__x._M_data);
} |
|
I feel like I've fixed/workarounded all known compilation problems up to Haswell. Now I need to run tests. Direct compilation of tests does not work well - compiler fails on instantiation of huge test template functions: virtest/vir/test.h:1000:12: instantiating function definition {and here is a function declarartion 200KB long} Is there a way to run the test system only for one instruction set, e.g. SSE and/or one given data type e.g. double? |
|
Sorry, I dropped out for a few days because of other work + a public holiday here. First thing I'll do is to get my gcc branch merged back here for convenience. That'll be a patch drop, but that should help do duplicate less work. |
This is an initial list of changes needed to make it build-able by clang.
The list of fixes is not full and PR is here to make a notification about early findings, which may be useful anyway.