Update: A D'Oh moment, and the curious case of static member classes.

So last night, I was laying in bed thinking about the myriad of issues I had been having with functions pointing to member objects and multiple threads trying to assign values to those objects, when I realized something. The part I was attempting to parallelize was a waste because there were multiple serial functions embedded in them. What actually made sense was to move further into the code and instead of trying to parallelize the flux calculation in time, it made more sense to parallelize the spatial calculations because those do not have the same dependencies on objects.

This morning has consisted of implementing that code, and tagging the required functions as device runnable code, going back and catching typos, and then relocating the kernel because __global__ functions cannot be called as class types. I was feeling great about this and was quite confident as I keyed in the make arguments, when...

Whoops...

Turns out, static-member classes, because of the fact that their memory is allocated by the compiler (This is what I have read, I am not a Computer Scientist) and there is no compile-time way to incorporate this into unified memory. Currently looking at ways to move the data into something that the device can play with, but won't break any host-side calls to the array. Positive steps, though.

Cheers,

TEJ

Comments

Prototyping some CUDA code

After a few false starts, and at a bit of a slowed pace do to other academic responsibilities created some working CUDA code. The static member class is still causing some issues, but the workaround was to create a new array and overload the new and delete operators to allow CUDA to handle moving memory to and from the GPU. It is a definite step forward, but there are still several issues. GPU utilization sits quite low except for specific moments and I think that is due to a lot of the radiation calculations being on the CPU. As you can see, the GPU does eventually catch up with the serial code, but the divisions being noted are in terms of the number of rays used for the two dimensional radiative transfer calculations. The crossover point occurs somewhere near 1000 radial and 1000 azimuthal rays, which is not particularly useful in practice. I am pleased with the start, though, as we now have compiling and running GPU code. The next task...

More adventures in Inverse-Compton

Had to do some back pedaling on implementing Inverse-Compton. The Klein-Nishina effects are not as important for most observations, and there was some strangeness in the fast cooling regime. The main issue was that the cooling frequency was not suppressed as one would expect and this was due to stupidity in the way I originally implemented it. Here are some new results for physical parameters E = 10^52 ergs, p=2.5, theta_0= 0.2 rad, theta_0bs=0, n= 5, e_e=1, e_b = 0.01, and ksi_n=1. We tested in the fast cooling regime using the standard expression for Y (Full derivation to follow) and in the slow cooling regime using an approximation of the formula derived in Beniamini et. al. 2015 (arxiv:1504.04833v2). The red line is the minimum accelerated electron emitted frequency, the dashed line is the IC suppressed cooling frequency, and the blue line is the un-suppressed cooling frequency. We originally had a debate about whether the fast cooling spect...

The Afterglow Project Development Blog

Search This Blog