Skip to main content

Here things are.

     Okay, this is the inaugural post of the Afterglow project development blog.  I have been pushing hard the last couple of weeks to get the CUDA-accelerated version of boxfit to into a state where it can be alpha-tested.  The main issue is in trying to deal with the way variables are allocated in classes.  There are a lot of functions that call class objects which are just doubles or ints, with the idea being that they can be passed between functions without actually having to be explicitly passed.

     This works quite well when each object is allocated and requested sequentially for a thread, or when each thread has an independent instance of that object.  The problem with porting the code to CUDA is that the objects become shared among the threads, so each thread attempts to assign its own value to the variable and everything would go up in flames were it not for the compiler catching the code as incompatible.

     I made a video talking about this which I may or may not post because it rambles quite a bit, but I am hoping to have a working code tomorrow, as well as an in-depth discussion of how I got around this and a few other issues.  But for now, it is back to building the websites and scratching my head as I stare at terminals.

Cheers,

TEJ

Comments

Popular posts from this blog

Prototyping some CUDA code

After a few false starts, and at a bit of a slowed pace do to other academic responsibilities created some working CUDA code.  The static member class is still causing some issues, but the workaround was to create a new array and overload the new and delete operators to allow CUDA to handle moving memory to and from the GPU.  It is a definite step forward, but there are still several issues.  GPU utilization sits quite low except for specific moments and I think that is due to a lot of the radiation calculations being on the CPU.  As you can see, the GPU does eventually catch up with the serial code, but the divisions being noted are in terms of the number of rays used for the two dimensional radiative transfer calculations.  The crossover point occurs somewhere near 1000 radial and 1000 azimuthal rays, which is not particularly useful in practice.  I am pleased with the start, though, as we now have compiling and running GPU code.  The next task...

So that static array is actually really cool... but not super useful in CUDA

     So that static array I mentioned last post just got a lot more interesting.  At first, I thought it was an array of values that contained information about the Equidistant Surface for doing calculations on how much flux the observer actually sees.  It does do this, but the way it does it is very different than what I thought.  My intuition was that each array element was some value, and that it was static for ease of access by mulitiple programs.  What it actually is, is an array of a struct that contains several doubles.      What this means is you specify values for each struct object for each array point, creating a collection of parameters for for every point in discrete space.  It is very cool indeed, but I am now even more unsure of how to copy this static object to the GPU.  I don't want to dismantle it, as it is such an elegant solution to the problem of defining all these variables for each point, but I have...

An update on the static class member

    Okay, so I have started my teaching duties for the summer, I have also started working with the original code to add inverse-Compton scatter--This may be helpful for finally finishing up the model fit of GRB 070125: The original reason I started playing with boxfit .  Both of these have slowed progess on the CUDA port, so it is behind where I wanted it to be in the original roadmap.  I do bear some good news, though.  I think I understand why the static is causing such a big problem with CUDA.      Originally, I was confused because I had overloaded the new and delete operators to automatically move any declared variables to device memory using CUDAMallocManaged() (I know this is not an efficient means of allocated device memory, but it is useful for getting a prototype up quickly.)  The problem is that static members of a class are identical across all instances of the class, so there is no way to seperately call the array fo...