Okay, so I have started my teaching duties for the summer, I have also started working with the original code to add inverse-Compton scatter--This may be helpful for finally finishing up the model fit of GRB 070125: The original reason I started playing with boxfit. Both of these have slowed progess on the CUDA port, so it is behind where I wanted it to be in the original roadmap. I do bear some good news, though. I think I understand why the static is causing such a big problem with CUDA.
Originally, I was confused because I had overloaded the new and delete operators to automatically move any declared variables to device memory using CUDAMallocManaged() (I know this is not an efficient means of allocated device memory, but it is useful for getting a prototype up quickly.) The problem is that static members of a class are identical across all instances of the class, so there is no way to seperately call the array for both the host and the device, the memory pointer always looks to the original allocation on the host. what this means is that I will likely have to make a seperate array for the GPU--something that I was trying to avoid for the sake of IOPs, but that may not actually be much worse than what it would have been anyway. I have go look a little bit further to see if that breaks other calls in the algorithm, but the work on adding another emission method will put me up close and personal with the portions of boxfit that are effected by this, so I should have a much better idea of the extent of the rennovation in about a week's time.
Originally, I was confused because I had overloaded the new and delete operators to automatically move any declared variables to device memory using CUDAMallocManaged() (I know this is not an efficient means of allocated device memory, but it is useful for getting a prototype up quickly.) The problem is that static members of a class are identical across all instances of the class, so there is no way to seperately call the array for both the host and the device, the memory pointer always looks to the original allocation on the host. what this means is that I will likely have to make a seperate array for the GPU--something that I was trying to avoid for the sake of IOPs, but that may not actually be much worse than what it would have been anyway. I have go look a little bit further to see if that breaks other calls in the algorithm, but the work on adding another emission method will put me up close and personal with the portions of boxfit that are effected by this, so I should have a much better idea of the extent of the rennovation in about a week's time.
Comments
Post a Comment