Project Stage 2 - Implementation

This blog post is for my SPO600 class I am participating in at Seneca College, this is related to stage 2 of our final project detailed here.

The post detailing my selection and general course of action for this project can be found here. To summarize, I selected Realtime Math to work with, and due to their use of ARM Neon intrinsics, decided to investigate adding support for SVE/SVE2 intrinsics.

To begin, I investigated the library looking for a section utilizing the ARM Neon intrinsics. This was simple enough, found within the "Math.h" file there is code detecting the proper intrinsics to utilize based on the compilation environment. Inside this file if an ARM64 architecture is detected, "RTM_NEON_INTRINSICS" is defined, flagging them for use within this specific environment. Once that was discovered all it took was searching the library for any code only executing if "RTM_NEON_INTRINSICS" is defined.

These are the results:

 

Lots of great places to start that are checking for the ability to use Neon intrinsics. The one that caught my eye specifically was "vector_common.h" and I decided to start there. The next step is finding where the Neon intrinsics were being used, and what they are being used for.

Inside "vector_common.h" I found this function right at the top:

This function at first was very intimidating to me, but this function simply creates a four lane vector of 32-bit float values. Let's break this down with the information that is important to my modifications.

First, what is this function returning? It is returning an alias type defined within the file "types.h". This is yet another part of the library that changes depending on the compilation environment. It is important to note that when we are on ARM64 architecture, meaning we have access to Neon intrinsics, this type is "float32x4_t".

Next, which Neon intrinsics are being used to create this vector? The answer is a combination of vcreate_f32, and vcombine_f32. First, vcreate_f32 will create a two lane vector of 32-bit float values, this is done twice. To add two float values into one float32x2_t vector, the values are aliased, bit-shifted, and ORed into a single uint64_t value, one value occupying the upper 32 bits and the other the lower 32. Once both of these vectors are created vcombine_f32 is used to combine the two float32x2_t vectors into one float32x4_t vector.

How can this be changed to use SVE? My goal here would be to simply change this function to return an SVE sizeless vector (svfloat32_t) created with the four 32-bit float values provided to the function.

First and foremost is detecting support for SVE and following the conventions of RTM. Inside "Math.h" this change is made to start:


This will include SVE/SVE2 support if the the gcc compiler option "-march=armv8-a+sve2" has been specified, indicating the desire to build code that includes SVE2 instructions.

Next, since I want to return a sizeless vector, the return type for vector_set must be changed to "svfloat32_t". This is accomplished in the "types.h" file mentioned earlier. In this file where the alias "vector4f" is defined as "float32x4_t" for machines supporting ARM Neon, it would instead be defined as "svfloat32_t" on machines supporting SVE/SVE2.

Finally, creating a sizeless vector out of the four float values provided. The SVE intrinsic svdupq_n_f32 can accomplish this. The adjusted code ends up looking like this:


The modification is simple, check for SVE support, and use the SVE intrinsic svdupq_n_f32 to create a sizeless vector with the 32-bit float values.

Testing

Unfortunately this is where my project takes a turn for the worse. After planning my approach to these modifications, and building RTM on my own machine, finally trying to build (without my modifications) on the Israel ARM64 system failed with puzzling errors.

These errors were unexpected to say the least, and failing to build an unmodified version of RTM threw a major wrench in my plans for testing my proposed changes. After reaching out to the author of RTM it is clear that I will not be able to build RTM on Israel to test my changes. This means I am unable verify they work, so for this section of my project I merely have theory and proposed changes.

What's Next?

What this endeavour DID provide me, however, was a dialogue with Nicholas Frechette, the author of RTM. While reaching out to him and explaining my goal to add SVE/SVE2 support to RTM, he provided me with an alternative topic for the final stage of my project. Nicholas expressed to me that an SVE implementation for RTM isn't necessary! In light of my inability to properly implement and test my proposed changes, the final stage of my project will instead explore this fact that SVE is not the right fit for RTM.




Comments