FM Training Estimator

Estimators for Large Language Model Training.

Estimate resource consumption - memory, tokens, time etc for training and fine-tuning jobs using an hybrid of theory and learned regression models.

Feature Matrix and Roadmap

Technique	Support
Full (1 gpu)	✔️
FSDP (multi)	✔️
PT (1 gpu)	Coming soon
Lora (1 gpu)	Coming soon
QLora (1 gpu)	Planned
Speculators	Planned
Tensor Parallelism	Planned

Time

Full learned approach. Coverage based on availability of training data.

Memory

Hybrid theory + learned. Coverage of learned approach is subject to availability of training data.

Tokens

Fully theory. Simulation based models available.

Technique	Explanation	Availability
TE0	Simulation based - slow but accurate	✔️
TE1	Statistical	Planned
TE2	Approximate - fast, light, reasonable accurate	Coming soon

Usage

First, clone the repository in your local machine. Within the repository, create a virtual environment to ensure no conflicts in dependencies.

python -m venv .venv
source .venv/bin/activate

Install

make install

Setup

Now, prepare data in the expected format for lookup and regression. Look at the csv files in ./fm_training_estimator/regressor/test_data/ for examples. Save this file into ./workdir/data.csv.

mkdir workdir
mv <data file> ./workdir/data.csv

Now, build a regression model using this data, using the provided make target:

make build-model

This will create a model called ./workdir/model.json which will be used to estimate the resource consumption.

You can now run the estimator, using one of the various UIs.

Interacting

For a full list of UIs, look into the ./fm_training_estimator/ui folder.

Easiest option is to run the Web UI.

To do this, first prepare a txt file called model_whitelist.txt in the workdir/ with a list of model names, 1 per line. Note that these are the models on which you want to run the estimator to estimate their resource consumption. You can use the provided example listing using:

cp ./fm_training_estimator/ui/model_whitelist.txt ./workdir/

Modify this list as needed.

Now, run the ui:

make run-web-ui

This will start the UI on localhost:3000 port.

(The web ui has other options, not covered in this simple setup. If you want to skip the model whitelisting or change the port, directly run the UI as shown in the README in the ./fm_training_estimator/ui folder.)

Container Image

To build the estimator container image:

Make sure both model.json and data.csv files are present in the workdir folder.
Use this command to build and push the image:

make cbuild
make cpush # If you want to push to the container registry

Use this command to run the image:

docker run --rm -it -v "/path/to/input.json:/app/input.json" icr.io/ftplatform/fm_training_estimator:latest

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
adrs		adrs
fm_training_estimator		fm_training_estimator
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini
tox.sh		tox.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FM Training Estimator

Feature Matrix and Roadmap

Time

Memory

Tokens

Usage

Install

Setup

Interacting

Container Image

About

Releases

Packages

Contributors 8

Languages

License

foundation-model-stack/fm-training-estimator

Folders and files

Latest commit

History

Repository files navigation

FM Training Estimator

Feature Matrix and Roadmap

Time

Memory

Tokens

Usage

Install

Setup

Interacting

Container Image

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages