Skip to main content

An update on the static class member

    Okay, so I have started my teaching duties for the summer, I have also started working with the original code to add inverse-Compton scatter--This may be helpful for finally finishing up the model fit of GRB 070125: The original reason I started playing with boxfit.  Both of these have slowed progess on the CUDA port, so it is behind where I wanted it to be in the original roadmap.  I do bear some good news, though.  I think I understand why the static is causing such a big problem with CUDA.

     Originally, I was confused because I had overloaded the new and delete operators to automatically move any declared variables to device memory using CUDAMallocManaged() (I know this is not an efficient means of allocated device memory, but it is useful for getting a prototype up quickly.)  The problem is that static members of a class are identical across all instances of the class, so there is no way to seperately call the array for both the host and the device, the memory pointer always looks to the original allocation on the host.  what this means is that I will likely have to make a seperate array for the GPU--something that I was trying to avoid for the sake of IOPs, but that may not actually be much worse than what it would have been anyway.  I have go look a little bit further to see if that breaks other calls in the algorithm, but the work on adding another emission method will put me up close and personal with the portions of boxfit that are effected by this, so I should have a much better idea of the extent of the rennovation in about a week's time.

Comments

Popular posts from this blog

Prototyping some CUDA code

After a few false starts, and at a bit of a slowed pace do to other academic responsibilities created some working CUDA code.  The static member class is still causing some issues, but the workaround was to create a new array and overload the new and delete operators to allow CUDA to handle moving memory to and from the GPU.  It is a definite step forward, but there are still several issues.  GPU utilization sits quite low except for specific moments and I think that is due to a lot of the radiation calculations being on the CPU.  As you can see, the GPU does eventually catch up with the serial code, but the divisions being noted are in terms of the number of rays used for the two dimensional radiative transfer calculations.  The crossover point occurs somewhere near 1000 radial and 1000 azimuthal rays, which is not particularly useful in practice.  I am pleased with the start, though, as we now have compiling and running GPU code.  The next task...

Synchotron Self-Compton Radiation and some multithreading

  Progress has been made since the last post.  My modifications to boxfit now allow for basic Inverse-Compton radiation.  Here is a reference spectrum generated using the shipped settings.  The current method uses the definition of the inverse compton parameter (Y) laid out in Nakar et. al. Apj, 703, 675, and functions for the slow cooling regime mainly, with placeholders in the other regimes. The orange is the SSC enabled spectrum, and it is behaving exactly as expected above the cooling break.   The Next step is to get the proper parameterization for Y based on Nakar et. al. as well as Beniamini et al. MNRAS, 454, 1073B.  This includes the Klein-Nishina effect at higher frequencies.  I do worry a bit about how computationally expensive this will be, but I can't really speak to optimizations until I have a better idea of what the algorithm is going to look like.   I am still working on the CUDA port, but I haven't had much time to think abo...

More adventures in Inverse-Compton

Had to do some back pedaling on implementing Inverse-Compton.  The Klein-Nishina effects are not as important for most observations, and there was some strangeness in the fast cooling regime.   The main issue was that the cooling frequency was not suppressed as one would expect and this was due to stupidity in the way I originally implemented it.  Here are some new results for physical parameters E = 10^52 ergs, p=2.5, theta_0= 0.2 rad, theta_0bs=0, n= 5, e_e=1, e_b = 0.01, and ksi_n=1.  We tested in the fast cooling regime using the standard expression for Y (Full derivation to follow) and in the slow cooling regime using an approximation of the formula derived in Beniamini et. al. 2015 (arxiv:1504.04833v2). The red line is the minimum accelerated electron emitted frequency, the dashed line is the IC suppressed cooling frequency, and the blue line is the un-suppressed cooling frequency.  We originally had a debate about whether the fast cooling spect...