Replies: 2 comments
-
Hi. Thanks! I think there is a little misunderstanding. We focused on generating visually relevant sounds, which does not imply alignment. We selected RegNet because it exhibits alignment and, therefore, relevance making it state of the art for both. RegNet is a great idea and very relevant to our topic, leaving it without comparison would not be decent towards the authors. Regarding the alignment with SpecVQGAN. I think in general it does not support the alignment and I hope we never claimed that it does. I think adding this ability to the model would make a great contribution and I am looking forward to seeing it! |
Beta Was this translation helpful? Give feedback.
-
I see. no worry, I am just curious about that and I am trying to identify my own project. Thanks a lot! |
Beta Was this translation helpful? Give feedback.
-
Hi,
First of all, congrats and really great work! While there are lots of audio examples, I haven't found any examples with videos so it is hard to tell. Since you have compared with RegNet which claimed to generate Visually Aligned Sound from Videos, I am just curious whether this work can also achieve that. Thank you.
Beta Was this translation helpful? Give feedback.
All reactions