You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For image compression models, considering the resolution of the hyperprior, I understand that the input image's spatial resolution should be a multiple of 64. (Kodak24 satisfy this condition)
For video compression model (SSF), since hyperprior encoder down-scale the latent y 3 times, so the input image's spatial resolution should be a multiple of 128. (I think)
But UVG dataset's spatial dimension is 1080x1920.
So i did just perform padding (reflection) the input frames so that the spatial resolution of the frames is a multiple of 128. (1152x1920)
But the result shows that P-frame compression performance is not much better than I-frame compression performance like below (quality 4).
Then i try to analysis what happened by performing center crop UVG dataset to 768x768 resolution with same test condition. (no padding)
i got this results (well performed i guess).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello.
First of all, thank you for the good work.
For image compression models, considering the resolution of the hyperprior, I understand that the input image's spatial resolution should be a multiple of 64. (Kodak24 satisfy this condition)
For video compression model (SSF), since hyperprior encoder down-scale the latent y 3 times, so the input image's spatial resolution should be a multiple of 128. (I think)
But UVG dataset's spatial dimension is 1080x1920.
So i did just perform padding (reflection) the input frames so that the spatial resolution of the frames is a multiple of 128. (1152x1920)
But the result shows that P-frame compression performance is not much better than I-frame compression performance like below (quality 4).
Video name: Beauty (GOP 12)
frame# ----- Bpp ------- PSNR----| -Res_Bpp-Motion_Bpp
Beauty 000, 0.096373, 33.829868, | 0.096373, 0.000000
Beauty 001, 0.113426, 33.837170, | 0.107114, 0.006312
Beauty 002, 0.110293, 33.835434, | 0.104244, 0.006049
Beauty 003, 0.106605, 33.844116, | 0.100617, 0.005988
Beauty 004, 0.101451, 33.867897, | 0.095602, 0.005849
Beauty 005, 0.099583, 33.888126, | 0.093596, 0.005988
Beauty 006, 0.095417, 33.928127, | 0.089784, 0.005633
Beauty 007, 0.091497, 33.947746, | 0.086296, 0.005201
Beauty 008, 0.089059, 33.971870, | 0.083719, 0.005340
Beauty 009, 0.087022, 34.006771, | 0.081744, 0.005278
Beauty 010, 0.085957, 34.008808, | 0.080664, 0.005293
Beauty 011, 0.082716, 34.018856, | 0.077377, 0.005340
Beauty 012, 0.078920, 34.021229, | 0.078920, 0.000000
Beauty 013, 0.087793, 34.046371, | 0.082330, 0.005463
Beauty 014, 0.082145, 34.026760, | 0.076898, 0.005247
Beauty 015, 0.081883, 34.017952, | 0.076806, 0.005077
Beauty 016, 0.081312, 34.029579, | 0.076281, 0.005031
Beauty 017, 0.080370, 34.024151, | 0.075324, 0.005046
Beauty 018, 0.081497, 34.020878, | 0.076204, 0.005293
Beauty 019, 0.082068, 34.008186, | 0.076991, 0.005077
Beauty 020, 0.081019, 34.009117, | 0.075679, 0.005340
Beauty 021, 0.081852, 34.009365, | 0.076466, 0.005386
Beauty 022, 0.079491, 34.031116, | 0.074336, 0.005154
Beauty 023, 0.076559, 34.029755, | 0.071451, 0.005108
Then i try to analysis what happened by performing center crop UVG dataset to 768x768 resolution with same test condition. (no padding)
i got this results (well performed i guess).
Video name: Beauty (GOP 12)
frame# ----- Bpp ------- PSNR----| -Res_Bpp-Motion_Bpp
Beauty 000, 0.086643, 34.743198, | 0.086643, 0.000000
Beauty 001, 0.077854, 34.867981, | 0.055990, 0.005425
Beauty 003, 0.062283, 34.860504, | 0.056641, 0.005642
Beauty 004, 0.056044, 34.889980, | 0.050890, 0.005154
Beauty 005, 0.058485, 34.888424, | 0.053168, 0.005317
Beauty 006, 0.056044, 34.859280, | 0.051161, 0.004883
Beauty 007, 0.057020, 34.869595, | 0.052192, 0.004829
Beauty 008, 0.054145, 34.831181, | 0.049371, 0.004774
Beauty 009, 0.050836, 34.863861, | 0.046224, 0.004612
Beauty 010, 0.050727, 34.832378, | 0.045953, 0.004774
Beauty 011, 0.051107, 34.835670, | 0.046441, 0.004666
Beauty 012, 0.072862, 34.713913, | 0.072862, 0.000000
Beauty 013, 0.064724, 34.843842, | 0.060113, 0.004612
Beauty 014, 0.051866, 34.804825, | 0.047201, 0.004666
Beauty 015, 0.051215, 34.842400, | 0.046604, 0.004612
Beauty 016, 0.047092, 34.867538, | 0.042860, 0.004232
Beauty 017, 0.048611, 34.870323, | 0.044434, 0.004178
Beauty 018, 0.045193, 34.853519, | 0.040961, 0.004232
Beauty 019, 0.047092, 34.858543, | 0.043077, 0.004015
Beauty 020, 0.045898, 34.857796, | 0.041558, 0.004340
Beauty 021, 0.043620, 34.851822, | 0.039117, 0.004503
Beauty 022, 0.039876, 34.868626, | 0.035590, 0.004286
Beauty 023, 0.040744, 34.894634, | 0.036404, 0.004340
What kind of method is the best solution for me? (for solving padding issues)
(+Add: Bpp and PSNR calculation codes )
SSF pre-trained model results (quality 3, 4) compare to the paper's results
Beta Was this translation helpful? Give feedback.
All reactions