implementing CPU postprocessor #89
Replies: 15 comments 12 replies
-
Thanks for the code let me try that in my fork to see if I am able to get it work on OSX :) |
Beta Was this translation helpful? Give feedback.
-
Adapting the code you provide I am able to decode with some artefact in the image but running on CPU |
Beta Was this translation helpful? Give feedback.
-
Mmmh the code is using mainly C except for the cuda code :) |
Beta Was this translation helpful? Give feedback.
-
Hi @MartinPulec I just pushed some more change and I think the decoder is working now on CPU. The decoder give me a clean output now for the raw.rgb file: |
Beta Was this translation helpful? Give feedback.
-
Hi @MartinPulec I also prepared the code for runtime change of the encoder/decoder
to
Where gpujpeg_device can be created like that
6077209#diff-6b0299835ddf4a55e2f46a9de6c72b171531cfad7ffb9bf756f7f18705f06520 |
Beta Was this translation helpful? Give feedback.
-
Hi Martin how are you ? So I am trying to really not use your CPU implementation but instead be able to generalize your kernel so I can use it on CPU. The other advantage will be for CPU to implement a similar kernel and later improve the speed by using openmp and multithread. In orde to be similar I right now pass the block / thread as parameter of the function. Right now the raw data I get is full gray Do you mind check the code to see what I missed please ? Right now in order to replace __syncthreads I am separating the kernel in three step (with openmp we will use an equivalent barrier) Thanks
|
Beta Was this translation helpful? Give feedback.
-
Hi @MartinPulec no worry, I am also will a lot of task right now and was not able to make some progress. But I start understanding more of what I was missing. My last update on the change I did I try to use a more C approach syntax and modify slightly the code so instead to pass a device to the gpujpeg_init as int for the cuda gpu number, now you can specify the gpujpeg_device who can be But lot of work remain to do :) I will try to find some few hours this week to make some progress |
Beta Was this translation helpful? Give feedback.
-
I totally understand. Not sure if it helps you – just from curiosity I've just tried to compare outcome of your code with the original, and for very simple example (just 2 DCT values in a single 8x8 block) and the values look the same. You can try out tst.cu.txt – it is basically your code with |
Beta Was this translation helpful? Give feedback.
-
Hi @MartinPulec I am working on the codebase, I create a PR on my side so you can easily the change I am doing and get your feedback : anthonyliot#1 The code still need a lot of cleanup and improvement but the idea is mainly: You need to have a gpujpeg_device as parameter to initialize gpujpeg the device can be 'cpu', 'opencl:0', 'cuda:0' .... I have one class accelerate with a list of pointer function per acceleration, right now the CPU decoder is implemented (But output is incorrect) and will start adding back cuda decoder. If that work then the logic for the encoder should be the same. The interesting part is potentially when everything work the CPU could be just modify a little bit so I can use OpenMP and then all the for loop in the kernel to mimic cuda will be accelerated also on CPU using the multicore. There is still a lot to do and right now the output of my CPU decode still give me weird output : In case you have an idea why it look like that do not hesitate to share :) |
Beta Was this translation helpful? Give feedback.
-
Hi @MartinPulec I did some progress I was able to get the code (Only decoder) fully working with cuda on my linux machine and output correct result. All using this new acceleration system. I know something is wrong in my IDCT kernel, by modifying how I try to mimic the running kernel in CPU I am able to get an output but with a lot of noise / artefact Here for example the result of the command -L on the gpupeg (list of devices) need to complete some value for CPU and OpenCL
To decode an image you have to just do :
|
Beta Was this translation helpful? Give feedback.
-
Ok I got the decode working on CPU the issue is related to a misunderstanding on the __syncthread behavior. |
Beta Was this translation helpful? Give feedback.
-
Hi @MartinPulec Anyway I attached the output result from CUDA encoder in case you have an idea what can give me such result |
Beta Was this translation helpful? Give feedback.
-
HI @MartinPulec After fixing that I was ablate get a better output (Still using the interval 0 - like that I can focus first on the DCT and Preprocessor kernel) The output is almost good (we see the image) but there is still something wrong, any idea ? |
Beta Was this translation helpful? Give feedback.
-
Found the issue it was related to some syncthread on the shared array, but that say I still feel there is an issue in my DCT kernel because the output JPEG seems having some weird vertical line. I need to fix the DCT kernel and the ENCODER kernel then I will be able to start OpenCL, |
Beta Was this translation helpful? Give feedback.
-
refers to #75 (comment)
For the CPU postprocessor, you should already have data in
decoder->coder->component[
0-2].d_data
. What the postprocessor actually does is in the conversion from the JPEG raw data to the output format. The data is almost always 8-bit YCbCr BT.601 and it may subsampled.Let say that you want to decode to RGB. Your JPEG (in the original post) is subsampled 4:4:4, which is great, because you won't need to deal with the subsampling. So you'll just need to convert YCbCr->RGB, so I believe that in this easiest case it could be just something like (using coefficient from wiki), something similar to following:
There may be multiple aspects, anyways, except than subsampling. Eg. if the image dimension size not not divisible by 8. GPUJPEG also supports more output formats than RGB (it is specified by
decoder->coder->param_image
) etc.UPDATE 2024-01-17: fixed
d_data_raw
which has been mistakenlyd_data_row
+ pointers (eitherdecoder->coder.
orcoder->
but notdecoder->coder->
)Beta Was this translation helpful? Give feedback.
All reactions