Skip to main content

Synchotron Self-Compton Radiation and some multithreading

  Progress has been made since the last post.  My modifications to boxfit now allow for basic Inverse-Compton radiation.  Here is a reference spectrum generated using the shipped settings.  The current method uses the definition of the inverse compton parameter (Y) laid out in Nakar et. al. Apj, 703, 675, and functions for the slow cooling regime mainly, with placeholders in the other regimes.
The orange is the SSC enabled spectrum, and it is behaving exactly as expected above the cooling break.

  The Next step is to get the proper parameterization for Y based on Nakar et. al. as well as Beniamini et al. MNRAS, 454, 1073B.  This includes the Klein-Nishina effect at higher frequencies.  I do worry a bit about how computationally expensive this will be, but I can't really speak to optimizations until I have a better idea of what the algorithm is going to look like.

  I am still working on the CUDA port, but I haven't had much time to think about how I want to change the data structures to fit within what CUDA can work with.  I have started playing with gcc's built-in multithreading schemes.  The -ftree-parallelize-loops= option has given some modest but tangible improvements over single threaded compilation.  I am also working to write it with OpenMP, but just parallelizing the loops seems to slow things down a bit, probably related to cacheline refreshes.  I am going to try with SIMD instructions in the next couple of days, as well as see if I can tweak how the code runs to avoid memory hiccups and hopefully provide an additional speedup, as OpenMP will fully utilize the cores of the machine, unlike g++.  I am also thinking about replacing CUDA with OpenACC for the initial port as that is largely manufacturer agnostic and a bit simpler to code in than OpenCL. 

Comments

Popular posts from this blog

Prototyping some CUDA code

After a few false starts, and at a bit of a slowed pace do to other academic responsibilities created some working CUDA code.  The static member class is still causing some issues, but the workaround was to create a new array and overload the new and delete operators to allow CUDA to handle moving memory to and from the GPU.  It is a definite step forward, but there are still several issues.  GPU utilization sits quite low except for specific moments and I think that is due to a lot of the radiation calculations being on the CPU.  As you can see, the GPU does eventually catch up with the serial code, but the divisions being noted are in terms of the number of rays used for the two dimensional radiative transfer calculations.  The crossover point occurs somewhere near 1000 radial and 1000 azimuthal rays, which is not particularly useful in practice.  I am pleased with the start, though, as we now have compiling and running GPU code.  The next task...

So that static array is actually really cool... but not super useful in CUDA

     So that static array I mentioned last post just got a lot more interesting.  At first, I thought it was an array of values that contained information about the Equidistant Surface for doing calculations on how much flux the observer actually sees.  It does do this, but the way it does it is very different than what I thought.  My intuition was that each array element was some value, and that it was static for ease of access by mulitiple programs.  What it actually is, is an array of a struct that contains several doubles.      What this means is you specify values for each struct object for each array point, creating a collection of parameters for for every point in discrete space.  It is very cool indeed, but I am now even more unsure of how to copy this static object to the GPU.  I don't want to dismantle it, as it is such an elegant solution to the problem of defining all these variables for each point, but I have...