This project consists of 2 parts:
- Topic modeling
- BERTopic workflow, trains the model -> tunes hyperparams -> visualize and measure coherence
- LDA workflow, trains the model -> tunes hyperparams -> visualize and measure coherence
- GAN
- FrozenBert workflow, trains pre-trained bert model over pubmed data
- Discriminator workflow, using Bert as embedder
- Discriminator workflow with Sentence bert as embedder
- GAN Workflow, using Bert as embedder
Each workflow is treated as an independent script which generates the data and modify the project structure in case it is missing anything, to start a workflow use "Run" function that invokes it.
In each folder a run_on_server.sh
file exists. It should be used for running the workflow as a batch job on Technion 'lambda' server by using:
sbatch -c 2 --gres=gpu:1 run_on_server.sh -o run.out
command. Use hparams_config
file in each workflow to tune the hyperparams as desired. WandB logging is used, use: wandb login
before running the workflows.
For example - BerTopic workflow:
- Move to
TopicModeling\Bert\src
- Configure the Hyperparams for your BerTopic experiment in
TopicModeling\Bert\src\hparams_config.py
- Run
sbatch -c 2 --gres=gpu:1 run_on_server.sh -o run.out
inside the lambda server
- Results :
- trained topic models will be saved in
TopicModeling\Bert\saved_models
- visualizations and coherence csv files will be saved in
TopicModeling\Bert\results
Note - the project has a requirements file, run: pip install -r requirements.txt
to create the environment
Enjoy :)