Cloud Parallelism Project

Cloud Parallelism Project for TCSS 562 Fall 2024

SUMMARY: Objective: To evaluate and optimize GPU hardware and pricing configurations to achieve efficient, cost-effective training of large-scale Automatic Speech Recognition models.

Designed and implemented a distributed training pipeline for Wav2Vec2 on the 100-hour LibriSpeech dataset, leveraging AWS SageMaker's smdistributed module for multi-GPU parallelism.
Conducted comparative analysis of single-GPU (ml.g4dn.2xlarge) vs. multi-GPU (ml.g4dn.12xlarge) setups, isolating the impact of hardware distribution on training throughput, GPU utilization, and network latency.
Experimentally determined that multi-GPU setups reduced epoch training time by up to 50% and nearly doubled throughput (e.g., from 30.8 to 61.3 samples/second for 4 GPUs), while lowering inter-node communication latency by 7.6%.
Analyzed cost-performance tradeoffs between on-demand and spot instance clusters, showing that spot instances achieved up to 35% cost savings with minimal performance trade-offs, reducing total training costs by $7.96 for 12 GPUs.
Identified diminishing returns in scaling GPU clusters beyond 8 GPUs due to data pipeline and synchronization bottlenecks, highlighting critical areas for optimizing distributed training.

References

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
training		training
.gitignore		.gitignore
README.md		README.md
download_s3.py		download_s3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloud Parallelism Project

About

Releases

Packages

Contributors 3

Languages

Anemnox/CloudParallelism

Folders and files

Latest commit

History

Repository files navigation

Cloud Parallelism Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages