Replies: 2 comments
-
I also see that NVIDIA published a package named NeMo-Guardrails which does a similar thing. |
Beta Was this translation helpful? Give feedback.
0 replies
-
I wrote an extension to implement a lightweight linear model from profanity-check. It's not perfect, but it's something. requirements.txt script.py
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I want to be able to offer the use of my installation of the text generation webui to minors for educational purposes. Currently, I am trying out a LLaMa 2 Chat model and it does a pretty good job of self-censoring unsafe topics in the chat and instruct modes. However, in the default and notebook tabs, it is trivial to delete the Q&A context and unlock the production of unsafe content. I am in need of some additional guardrails to prevent minors from seeing that unsafe content.
I saw that meta released a LLaMa-guard model that can classify LLaMa's input and output as safe or unsafe. Can I use a model such as this to block unsafe content in the default and notebook tabs?
https://huggingface.co/meta-llama/LlamaGuard-7b
For reference, my Stable Diffusion WebUI installation has this extension, which uses an image classification model to detect unsafe images and block them.
https://github.com/AUTOMATIC1111/stable-diffusion-webui-nsfw-censor
Beta Was this translation helpful? Give feedback.
All reactions