Project Stage 2 - Troubleshooting

This blog post is for my SPO600 class I am participating in at Seneca College, this is related to stage 2 of our final project detailed here.

The post detailing my selection and general course of action for this project can be found here. The previous part of my project where I detail my implementation plan can be found here.

In my previous post I briefly touched on the roadblock I faced during my attempts to implement and test my changes to the RTM library. I'd like to explore in further the detail the issues I ran into, and the troubleshooting that came along with them.

Build Errors on AArch64

Building RTM to test never succeeded on the CDOT AArch64 system "Israel", it did build however, on my personal workstation. For reference, before getting into the errors encountered on Israel here is what a successful build looks like:

After a hundred or so progress messages...

Looking great!

A clean build on my own system and a successful run of the tests had me feeling confident moving on to testing for AArch64 on the Israel system. This is what i ran into instead:

Fitting the entire build attempt in one image is a little tricky and maybe makes it tough to read so I will highlight the important part. The error we first run into is within the the "scalar.h" file at line 1062 within the function "scalar_sincos" here is where this is going wrong:

Specifically the line calling "vcreate_f32" is what is giving us trouble. There are a few things to identify here. What is this line trying to accomplish, what is the error, and what could we do about it?

First, this line is trying to create a two lane vector of 32-bit float values from scalarf (float) values sin and cos, simple enough. What isn't simple is the operation that is being performed to modify these values to make this possible. First they alias these float values as two unsigned 32-bit integers, then pack them into one unsigned 64-bit integer value, theoretically allowing the use of the vcreate intrinsic to store them into one 32x2 float vector.

That brings us to what the error is, it is a warning in truth, but the compiler options have been set to treat warning as errors so here we are. The warning here notes "Dereferencing a type-punned pointer will break strict-aliasing rules". There's a lot to work with here so let's start with type-punning.

Type-punning in essence means looking at a piece of data stored as one type, and treating it as if it were another type. Here we are accessing the value of two floats as if they were uint32_t.

This operation is apparently breaking strict-aliasing rules, so what is strict aliasing? This is where it gets a little complicated for me, so I will boil it down to the compiler only allowing us to access values using specific types. In this case type-punning a float to uint32_t is not a valid alias and we are being warned, but due to the compiler options this warning is stopping us in our tracks. To read up on strict aliasing rules and type-punning this write-up by Shafik Yaghmourwas very helpful to me.

The final step here is exploring the options to solve this issue, what can we do about this?

The simplest and maybe most reckless answer would be to disable the -Werror compiler option. It's only a warning, so why not just tell the compiler to stop worrying so much and keep moving? This is certainly an option but the ideal is no errors OR warnings when shipping a product. We want our solutions to be airtight and options like -Werror help us to accomplish this. Just out of curiosity though, we should give it a try and see if this at least gets it to compile.

Disabling -Werror is simple enough, finding the cmake file responsible for building the test program and commenting out the line adding the option is quick and easy.

Changes made in "rtm/cmake/CMakeCompiler.cmake"

Now let's give it a run and see what we get!


More Errors

The attempted build after telling the compiler to forget about warnings illuminated some much greater issues with the RTM library on AArch64 systems. There are quite literally countless different conversion errors halting the compilation on AArch64. This problem was completely absent on my personal x86_64 machine, the RTM test program compiled and tested with no issues. Solving the aliasing warning may be easy for someone to complete, but the countless illegal conversions being executed seemingly isolated to AArch64 are problematic to say the least. There are issues with attempting to build with this library on AArch64 that clearly need to be solved. I did reach out to the author of this library, Nicholas Frechette, and detailed the errors I was encountering.

It is worth noting that in my correspondence with Nicholas I was told RTM is being tested with GCC compiler versions 5-10. Currently the Israel system is working with GCC11. Perhaps for now this is isolated to this version of GCC and AArch64, and perhaps in the future when RTM is tested with GCC11 onward these errors on AArch64 builds will be identified and solved.

That concludes my investigation of the errors preventing me from moving forward with my testing of SVE/SVE2 intrinsics! The end of the semester is closing in and researching build errors is not the purpose of this project! In my next post I will be concluding my project journey with an analysis of why SVE is perhaps not suitable for the purposes and intended use cases of RTM.

Comments