Skip to content

Commit

Permalink
Merge branch 'master' into has_triton
Browse files Browse the repository at this point in the history
  • Loading branch information
loadams authored Feb 6, 2025
2 parents ec8e8e1 + f04649d commit 053a74c
Show file tree
Hide file tree
Showing 150 changed files with 2,092 additions and 587 deletions.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/deepspeed_chat_bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ If applicable, add screenshots to help explain your problem.
**System info (please complete the following information):**
- OS: [e.g. Ubuntu 18.04]
- GPU count and types [e.g. two machines with x8 A100s each]
- (if applicable) what [DeepSpeed-MII](https://github.com/microsoft/deepspeed-mii) version are you using
- (if applicable) what [DeepSpeed-MII](https://github.com/deepspeedai/deepspeed-mii) version are you using
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions
- Python version
- Any other relevant info about your setup
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/inference_bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ If applicable, add screenshots to help explain your problem.
**System info (please complete the following information):**
- OS: [e.g. Ubuntu 18.04]
- GPU count and types [e.g. two machines with x8 A100s each]
- (if applicable) what [DeepSpeed-MII](https://github.com/microsoft/deepspeed-mii) version are you using
- (if applicable) what [DeepSpeed-MII](https://github.com/deepspeedai/deepspeed-mii) version are you using
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions
- Python version
- Any other relevant info about your setup
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nv-a6000.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ jobs:
BRANCH="${{ github.event.inputs.mii_branch }}"
fi
echo "Cloning DeepSpeed-MII branch: $BRANCH"
git clone -b $BRANCH --depth=1 https://github.com/microsoft/DeepSpeed-MII.git
git clone -b $BRANCH --depth=1 https://github.com/deepspeedai/DeepSpeed-MII.git
cd DeepSpeed-MII
pip install .[dev]
cd tests
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nv-ds-chat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ jobs:
BRANCH="${{ github.event.inputs.dse_branch }}"
fi
echo "DeepSpeedExamples Branch: $BRANCH"
git clone -b $BRANCH https://github.com/microsoft/DeepSpeedExamples.git
git clone -b $BRANCH https://github.com/deepspeedai/DeepSpeedExamples.git
cd DeepSpeedExamples/applications/DeepSpeed-Chat
pip install -r requirements.txt
pip install -e .
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nv-mii.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ jobs:
BRANCH="${{ github.event.inputs.mii_branch }}"
fi
echo "Cloning DeepSpeed-MII branch: $BRANCH"
git clone -b $BRANCH --depth=1 https://github.com/microsoft/DeepSpeed-MII.git
git clone -b $BRANCH --depth=1 https://github.com/deepspeedai/DeepSpeed-MII.git
cd DeepSpeed-MII
pip install .[dev]
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
Expand Down
19 changes: 13 additions & 6 deletions .github/workflows/xpu-max1100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,26 +36,31 @@ jobs:
unit-tests:
runs-on: [self-hosted, intel, xpu]
container:
image: intel/oneapi-basekit:2024.2.1-0-devel-ubuntu22.04
image: intel/oneapi-basekit:2025.0.1-0-devel-ubuntu24.04
ports:
- 80
options: --privileged -it --rm --device /dev/dri:/dev/dri -v /dev/dri/by-path:/dev/dri/by-path --ipc=host --cap-add=ALL

steps:
- uses: actions/checkout@v4
- name: Install prerequisite
shell: bash
run: |
apt-get update
apt-get install clinfo libaio-dev python3-pip -y
pip install torch==2.3.1 -f https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/torch/
pip install intel-extension-for-pytorch==2.3.110+xpu -f https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/intel-extension-for-pytorch/
pip install oneccl_bind_pt==2.3.100+xpu -f https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/oneccl-bind-pt/
pip install torchvision==0.18.1 -f https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/torchvision/
apt-get install clinfo libaio-dev python3-pip python3.12-venv -y
python3 -m venv ~/ds_env
source ~/ds_env/bin/activate
pip install torch==2.5.1 -f https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/torch/
pip install intel-extension-for-pytorch==2.5.10+xpu -f https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/intel-extension-for-pytorch/
pip install oneccl_bind_pt==2.5.0+xpu -f https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/oneccl-bind-pt/
pip install torchvision==0.20.1 -f https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/torchvision/
pip install py-cpuinfo numpy
pip install .[dev,autotuning]
- name: Check container state
shell: bash
run: |
source ~/ds_env/bin/activate
ldd --version
ds_report
python3 -c "import torch; print('torch:', torch.__version__, torch)"
Expand All @@ -64,7 +69,9 @@ jobs:
pip list
- name: Unit tests
shell: bash
run: |
source ~/ds_env/bin/activate
cd tests/unit
pytest --verbose accelerator/*
pytest --verbose autotuning/*
Expand Down
8 changes: 4 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ and then repeat the previous `git commit` command.
## Testing
DeepSpeed tracks two types of tests: unit tests and more costly model convergence tests.
The model convergence tests train
[DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/) and measure
[DeepSpeedExamples](https://github.com/deepspeedai/DeepSpeedExamples/) and measure
end-to-end convergence and related metrics. Unit tests are found in `tests/unit/` and
the model convergence tests are found in `tests/model/`.

Expand All @@ -40,7 +40,7 @@ tests. Note that [pytest-forked](https://github.com/pytest-dev/pytest-forked) an

### Model Tests
To execute model tests, first [install DeepSpeed](#installation). The
[DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/) repository is cloned
[DeepSpeedExamples](https://github.com/deepspeedai/DeepSpeedExamples/) repository is cloned
as part of this process. Next, execute the model test driver:
```bash
cd tests/model/
Expand Down Expand Up @@ -85,8 +85,8 @@ Based on the issue we shall discuss the merit of the new feature and decide whet
### Step 2: implementation and verification
Contributor will go ahead and implement the feature, and the DeepSpeed team will provide guidance/helps as needed. The required deliverables include:

* A PR to [microsoft/DeepSpeed](https://github.com/microsoft/DeepSpeed) including (1) the feature implementation (2) unit tests (3) documentation (4) tutorial
* A PR to [microsoft/DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples) or [microsoft/Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed) including the examples of how to use the feature (this is related to the planned testing experiments in proposal)
* A PR to [deepspeedai/DeepSpeed](https://github.com/deepspeedai/DeepSpeed) including (1) the feature implementation (2) unit tests (3) documentation (4) tutorial
* A PR to [deepspeedai/DeepSpeedExamples](https://github.com/deepspeedai/DeepSpeedExamples) or [deepspeedai/Megatron-DeepSpeed](https://github.com/deepspeedai/Megatron-DeepSpeed) including the examples of how to use the feature (this is related to the planned testing experiments in proposal)
* In the implementation (code, documentation, tutorial), we require the feature author to record their GitHub username as a contact method for future questions/maintenance.

After receiving the PRs, we will review them and merge them after necessary tests/fixes.
Expand Down
Loading

0 comments on commit 053a74c

Please sign in to comment.