Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

todo #5

Open
3 of 7 tasks
lucidrains opened this issue Dec 7, 2024 · 0 comments
Open
3 of 7 tasks

todo #5

lucidrains opened this issue Dec 7, 2024 · 0 comments

Comments

@lucidrains
Copy link
Owner

lucidrains commented Dec 7, 2024

  • classifier free guidance + disney research
  • allow for peeking at the last frame before deciding on next action for next time step
  • order the actions, then use a small hierarchical action transformer to predict next set of actions
  • abstract the vq pre-post transformer logic into a wrapper, and prepare for swapping out various wrappers (do a residual VQ version, followed by some guesses to perhaps working in continuous latent space) the main ambiguity is whether they operate on discrete or continuous embeddings from imagen
  • design an axial space / time version of the transformer
  • allow for decoding of next set of actions
  • add a concise example for pong at root
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant