Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor(Fine-Tuning): Improve fine-tuning script modularity, options #176

Open
26 tasks
justinthelaw opened this issue Aug 30, 2023 · 0 comments
Open
26 tasks
Assignees
Labels
feature New feature or request performance Refactor commits that improve performance python Pull requests that update Python code

Comments

@justinthelaw
Copy link
Owner

justinthelaw commented Aug 30, 2023

The fine_tuning.ipynb scripts need to be refactored for more modularity and fine-grained control:

  • All blocks in the Jupyter Notebook that have named methods and classes need to be split-out into separate Python files

In addition to this, extra modules and options should be added. These modules and options should have the capability to be hot-swapped and/or turned on/off depending on the preferences of the user. After some research, the following are possible ways to improve the fine_tuning.ipynb through the introduction of more parameters and methods:

  • Fine-Tuning:

    • Data Augmentation: Augment your training data by adding noise, paraphrasing, or using techniques like back-translation. This can help the model generalize better to unseen examples.
    • Curriculum Learning: Start training with easier examples and gradually increase the difficulty. This can make the optimization landscape smoother and help with convergence.
  • Regularization:

    • Dropout: If it's not already being used, introducing dropout layers can help prevent overfitting.
    • Weight Decay (L2 Regularization): Add L2 regularization to the model's weights. This can often be done directly through the optimizer (e.g., the weight_decay parameter in Adam).
    • LayerNorm: Ensure that normalization layers like LayerNorm are used, especially if they are part of the original T5 model.
  • Optimization:

    • Alternative Optimizers: Consider trying optimizers like AdamW, RAdam, or LAMB.
    • Learning Rate Schedule: Adjust the learning rate schedule. Experiment with other schedules like cosine annealing or cyclic learning rates.
    • Gradient Clipping: The provided code uses a gradient clipping value of 1.0. You can experiment with different values to see if they yield better results.
  • Loss Functions:

    • Label Smoothing: Instead of using a hard one-hot encoded target, you can soften the targets which can prevent the model from becoming overly confident in its predictions.
    • Focal Loss: This can be used to handle extreme class imbalance or when you want the model to focus more on harder examples.
  • Custom Penalty:

    • Refine Format Penalty: The custom format penalty can be refined based on what works best empirically. Consider using different regular expressions or adjusting the penalty strength.
    • Additional Penalties: Introduce penalties based on other criteria specific to your use-case.
  • Batching:

    • Dynamic Padding: Instead of padding all sequences to a fixed length, pad sequences in each batch to the maximum length in that batch. This can save computation and sometimes improve results.
    • Gradient Accumulation: If memory constraints are not allowing you to increase batch size, consider accumulating gradients over several forward/backward passes before performing an optimization step.
  • External Knowledge:

    • Knowledge Distillation: If you have a larger, more accurate model, you can use its predictions to guide the training of the smaller T5 model. The smaller model learns to imitate the larger model's behavior.
  • Evaluation & Feedback:

    • Early Stopping: Monitor the model's performance on a validation set and stop training once the performance plateaus or starts degrading to avoid overfitting.
    • Model Checkpointing: Regularly save model checkpoints. This allows you to revert to the best version if later versions start to overfit.
@justinthelaw justinthelaw self-assigned this Aug 30, 2023
@justinthelaw justinthelaw converted this from a draft issue Aug 30, 2023
@justinthelaw justinthelaw added feature New feature or request performance Refactor commits that improve performance python Pull requests that update Python code labels Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request performance Refactor commits that improve performance python Pull requests that update Python code
Projects
Status: 🏗 In Progress
Development

No branches or pull requests

2 participants