Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Upstream sync 2024 05 05 #224

Merged
merged 127 commits into from
May 13, 2024
Merged
Show file tree
Hide file tree
Changes from 123 commits
Commits
Show all changes
127 commits
Select commit Hold shift + click to select a range
7b11874
[Core] Add `shutdown()` method to `ExecutorBase` (#4349)
njhill Apr 25, 2024
28590fc
[Core] Move function tracing setup to util function (#4352)
njhill Apr 25, 2024
7873343
[ROCm][Hardware][AMD][Doc] Documentation update for ROCm (#4376)
hongxiayang Apr 26, 2024
5f32d89
[Bugfix] Fix parameter name in `get_tokenizer` (#4107)
DarkLight1337 Apr 26, 2024
c20ff92
[Frontend] Add --log-level option to api server (#4377)
normster Apr 26, 2024
ec4050a
[CI] Disable non-lazy string operation on logging (#4326)
rkooo567 Apr 26, 2024
ee654c9
[Core] Refactoring sampler and support prompt logprob for chunked pre…
rkooo567 Apr 26, 2024
4f5d020
[Misc][Refactor] Generalize linear_method to be quant_method (#4373)
comaniac Apr 26, 2024
dc47676
[Misc] add RFC issue template (#4401)
youkaichao Apr 26, 2024
192c704
[Core] Introduce `DistributedGPUExecutor` abstract class (#4348)
njhill Apr 27, 2024
1e88172
[Kernel] Optimize FP8 support for MoE kernel / Mixtral via static sca…
pcmoritz Apr 27, 2024
b9e05fa
[Frontend][Bugfix] Disallow extra fields in OpenAI API (#4355)
DarkLight1337 Apr 27, 2024
5395fa3
[Misc] Fix logger format typo (#4396)
esmeetu Apr 27, 2024
cc7a791
[ROCm][Hardware][AMD] Enable group query attention for triton FA (#4406)
hongxiayang Apr 27, 2024
77c1eb1
[Kernel] Full Tensor Parallelism for LoRA Layers (#3524)
FurtherAI Apr 27, 2024
287d987
[Model] Phi-3 4k sliding window temp. fix (#4380)
caiom Apr 27, 2024
b3759af
[Bugfix][Core] Fix get decoding config from ray (#4335)
esmeetu Apr 27, 2024
6a44e8e
[Bugfix] Abort requests when the connection to /v1/completions is int…
chestnut-Q Apr 27, 2024
821a91a
[BugFix] Fix `min_tokens` when `eos_token_id` is None (#4389)
njhill Apr 27, 2024
5a4c41b
[Core] Support offline use of local cache for models (#4374)
prashantgupta24 Apr 27, 2024
593db14
[BugFix] Fix return type of executor execute_model methods (#4402)
njhill Apr 27, 2024
1f87fe1
[BugFix] Resolved Issues For LinearMethod --> QuantConfig (#4418)
robertgshaw2-neuralmagic Apr 27, 2024
b24aae6
[Misc] fix typo in llm_engine init logging (#4428)
DefTruth Apr 28, 2024
6a8a97b
Add more Prometheus metrics (#2764)
ronensc Apr 28, 2024
8ab0de8
[CI] clean docker cache for neuron (#4441)
simon-mo Apr 28, 2024
7f5a450
[mypy][5/N] Support all typing on model executor (#4427)
rkooo567 Apr 29, 2024
1e75df8
[Kernel] Marlin Expansion: Support AutoGPTQ Models with Marlin (#3922)
robertgshaw2-neuralmagic Apr 29, 2024
19187df
[CI] hotfix: soft fail neuron test (#4458)
simon-mo Apr 29, 2024
43add77
[Core][Distributed] use cpu group to broadcast metadata in cpu (#4444)
youkaichao Apr 29, 2024
768facf
[Misc] Upgrade to `torch==2.3.0` (#4454)
mgoin Apr 30, 2024
10b984a
[Bugfix][Kernel] Fix compute_type for MoE kernel (#4463)
WoosukKwon Apr 30, 2024
42929fe
[Core]Refactor gptq_marlin ops (#4466)
jikunshang Apr 30, 2024
da4215e
[BugFix] fix num_lookahead_slots missing in async executor (#4165)
leiwen83 Apr 30, 2024
40b286f
[Doc] add visualization for multi-stage dockerfile (#4456)
prashantgupta24 Apr 30, 2024
faed3eb
[Kernel] Support Fp8 Checkpoints (Dynamic + Static) (#4332)
robertgshaw2-neuralmagic Apr 30, 2024
8b9d685
[Frontend] Support complex message content for chat completions endpo…
fgreinacher Apr 30, 2024
9ad9b65
[Frontend] [Core] Tensorizer: support dynamic `num_readers`, update v…
alpayariyak Apr 30, 2024
195439e
[Bugfix][Minor] Make ignore_eos effective (#4468)
bigPYJ1151 Apr 30, 2024
7cff2a5
fix_tokenizer_snapshot_download_bug (#4493)
kingljl Apr 30, 2024
666ccdb
Unable to find Punica extension issue during source code installation…
kingljl May 1, 2024
e1fc3da
[Core] Centralize GPU Worker construction (#4419)
njhill May 1, 2024
2ef0a89
[Misc][Typo] type annotation fix (#4495)
HarryWu99 May 1, 2024
bd7f454
[Misc] fix typo in block manager (#4453)
Juelianqvq May 1, 2024
66d2c00
Allow user to define whitespace pattern for outlines (#4305)
robcaulk May 1, 2024
dc2970e
[Misc]Add customized information for models (#4132)
jeejeelee May 1, 2024
b496ac2
[Test] Add ignore_eos test (#4519)
rkooo567 May 1, 2024
c1e7a79
[Bugfix] Fix the fp8 kv_cache check error that occurs when failing to…
AnyISalIn May 1, 2024
d05b702
[Bugfix] Fix 307 Redirect for `/metrics` (#4523)
robertgshaw2-neuralmagic May 1, 2024
75c6ebf
[Doc] update(example model): for OpenAI compatible serving (#4503)
fpaupier May 1, 2024
21bc3bf
[Bugfix] Use random seed if seed is -1 (#4531)
sasha0552 May 1, 2024
752043f
[CI/Build][Bugfix] VLLM_USE_PRECOMPILED should skip compilation (#4534)
tjohnson31415 May 1, 2024
862330a
[Speculative decoding] Add ngram prompt lookup decoding (#4237)
leiwen83 May 1, 2024
3d32972
[Core] Enable prefix caching with block manager v2 enabled (#4142)
leiwen83 May 1, 2024
56d2002
[Core] Add `multiproc_worker_utils` for multiprocessing-based workers…
njhill May 1, 2024
7c04a00
[Kernel] Update fused_moe tuning script for FP8 (#4457)
pcmoritz May 1, 2024
0533a6b
[Bugfix] Add validation for seed (#4529)
sasha0552 May 1, 2024
224ecd7
[Bugfix][Core] Fix and refactor logging stats (#4336)
esmeetu May 1, 2024
5b174c4
[Core][Distributed] fix pynccl del error (#4508)
youkaichao May 1, 2024
4be23dd
[Misc] Remove Mixtral device="cuda" declarations (#4543)
pcmoritz May 1, 2024
de3262f
[Misc] Fix expert_ids shape in MoE (#4517)
WoosukKwon May 1, 2024
b85188d
[MISC] Rework logger to enable pythonic custom logging configuration …
May 2, 2024
b259286
[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.…
rkooo567 May 2, 2024
91f8b48
[CI]Add regression tests to ensure the async engine generates metrics…
ronensc May 2, 2024
2017aaf
[mypy][6/N] Fix all the core subdirectory typing (#4450)
rkooo567 May 2, 2024
27f0c2b
[Core][Distributed] enable multiple tp group (#4512)
youkaichao May 2, 2024
2078207
[Kernel] Support running GPTQ 8-bit models in Marlin (#4533)
alexm-neuralmagic May 2, 2024
ed6d376
[mypy][7/N] Cover all directories (#4555)
rkooo567 May 2, 2024
87d793d
[Misc] Exclude the `tests` directory from being packaged (#4552)
itechbear May 2, 2024
4dc269d
[BugFix] Include target-device specific requirements.txt in sdist (#4…
markmc May 2, 2024
f7d8e46
[Misc] centralize all usage of environment variables (#4548)
youkaichao May 2, 2024
673e4eb
[kernel] fix sliding window in prefix prefill Triton kernel (#4405)
mmoskal May 2, 2024
2ff2756
[CI/Build] AMD CI pipeline with extended set of tests. (#4267)
Alexei-V-Ivanov-AMD May 2, 2024
3d453d0
[Core] Ignore infeasible swap requests. (#4557)
rkooo567 May 2, 2024
2a0fb55
[Core][Distributed] enable allreduce for multiple tp groups (#4566)
youkaichao May 3, 2024
82bbb3d
[BugFix] Prevent the task of `_force_log` from being garbage collecte…
Atry May 3, 2024
44f6086
[Misc] remove chunk detected debug logs (#4571)
DefTruth May 3, 2024
f62ba17
[Doc] add env vars to the doc (#4572)
youkaichao May 3, 2024
fc4f08f
[Core][Model runner refactoring 1/N] Refactor attn metadata term (#4518)
rkooo567 May 3, 2024
f10844f
[Bugfix] Allow "None" or "" to be passed to CLI for string args that …
mgoin May 3, 2024
e132240
Fix/async chat serving (#2727)
schoennenbeck May 3, 2024
4b0f703
[Kernel] Use flashinfer for decoding (#4353)
LiuXiaoxuanPKU May 3, 2024
6dd96ce
[Speculative decoding] Support target-model logprobs (#4378)
cadedaniel May 3, 2024
19ae179
[Misc] add installation time env vars (#4574)
youkaichao May 3, 2024
12c155b
[Misc][Refactor] Introduce ExecuteModelData (#4540)
comaniac May 4, 2024
5d65e2f
[Doc] Chunked Prefill Documentation (#4580)
rkooo567 May 4, 2024
55dd119
[Kernel] Support MoE Fp8 Checkpoints for Mixtral (Static Weights with…
mgoin May 4, 2024
c152bd7
[CI] check size of the wheels (#4319)
simon-mo May 4, 2024
f8fb8c1
[Bugfix] Fix inappropriate content of model_name tag in Prometheus me…
DearPlanet May 4, 2024
2d96b61
bump version to v0.4.2 (#4600)
simon-mo May 5, 2024
9f817f0
[CI] Reduce wheel size by not shipping debug symbols (#4602)
simon-mo May 5, 2024
f57a219
make linter happy
robertgshaw2-neuralmagic May 6, 2024
6b2c4c1
updated sparsity integration
robertgshaw2-neuralmagic May 6, 2024
18a6e93
hooked up sparsity properly post refactor
robertgshaw2-neuralmagic May 6, 2024
bcf686d
lint
robertgshaw2-neuralmagic May 6, 2024
8423620
updated skip for remote push
robertgshaw2-neuralmagic May 6, 2024
50c1029
stray issues
robertgshaw2-neuralmagic May 7, 2024
a55fb2b
updated test
May 8, 2024
b091999
fixed rotary embeddingS
robertgshaw2-neuralmagic May 8, 2024
4c04122
format
robertgshaw2-neuralmagic May 8, 2024
0300194
fixed torch reinit
May 9, 2024
5dc0afe
identified OOM issue causing server to die
robertgshaw2-neuralmagic May 11, 2024
94878e5
Merge branch 'main' into upstream-sync-2024-05-05
robertgshaw2-neuralmagic May 11, 2024
81f5e29
format
robertgshaw2-neuralmagic May 11, 2024
e04b743
skip test chunked prefill basic correctness
robertgshaw2-neuralmagic May 11, 2024
774df9d
format
robertgshaw2-neuralmagic May 11, 2024
02b7775
updated block threshold
robertgshaw2-neuralmagic May 11, 2024
fe43f6b
formt
robertgshaw2-neuralmagic May 11, 2024
9ba99bd
format
robertgshaw2-neuralmagic May 12, 2024
8e49ada
updated to run with -v instead of -s due to too much logs
robertgshaw2-neuralmagic May 12, 2024
e61507e
made into no reporting
robertgshaw2-neuralmagic May 12, 2024
2b2f301
updated build test to run build on AWS
robertgshaw2-neuralmagic May 12, 2024
304a5f9
added fix for fp8 kernels
robertgshaw2-neuralmagic May 12, 2024
2f6849f
tweaked gptq marlin test
robertgshaw2-neuralmagic May 13, 2024
e257749
format
robertgshaw2-neuralmagic May 13, 2024
6a22a11
updated cache test to skip gpu1
robertgshaw2-neuralmagic May 13, 2024
6207d84
Update test_basic_distributed_correctness.py
robertgshaw2-neuralmagic May 13, 2024
c8450a7
Update test_chunked_prefill_distributed.py
robertgshaw2-neuralmagic May 13, 2024
2df6bda
Update test_tensorizer.py
robertgshaw2-neuralmagic May 13, 2024
cb216b6
Update test_layer_variation.py
robertgshaw2-neuralmagic May 13, 2024
a767eb8
Update test_layer_variation.py
robertgshaw2-neuralmagic May 13, 2024
e7dd38e
format
robertgshaw2-neuralmagic May 13, 2024
30d202d
Update test_logprobs.py
robertgshaw2-neuralmagic May 13, 2024
5096714
Update test_logprobs.py (#236)
robertgshaw2-neuralmagic May 13, 2024
635fc10
updated to skip distributed tests
robertgshaw2-neuralmagic May 13, 2024
4477333
reverted to gke build
robertgshaw2-neuralmagic May 13, 2024
9a9c899
updated skip-test-lists to skip spec-decode (running OOM)
robertgshaw2-neuralmagic May 13, 2024
1c359ae
updated skip lists to skip spec decode
robertgshaw2-neuralmagic May 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions .buildkite/check-wheel-size.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import os
import zipfile

MAX_SIZE_MB = 100


def print_top_10_largest_files(zip_file):
with zipfile.ZipFile(zip_file, 'r') as z:
file_sizes = [(f, z.getinfo(f).file_size) for f in z.namelist()]
file_sizes.sort(key=lambda x: x[1], reverse=True)
for f, size in file_sizes[:10]:
print(f"{f}: {size/(1024*1024)} MBs uncompressed.")


def check_wheel_size(directory):
for root, _, files in os.walk(directory):
for f in files:
if f.endswith(".whl"):
wheel_path = os.path.join(root, f)
wheel_size = os.path.getsize(wheel_path)
wheel_size_mb = wheel_size / (1024 * 1024)
if wheel_size_mb > MAX_SIZE_MB:
print(
f"Wheel {wheel_path} is too large ({wheel_size_mb} MB) "
f"compare to the allowed size ({MAX_SIZE_MB} MB).")
print_top_10_largest_files(wheel_path)
return 1
else:
print(f"Wheel {wheel_path} is within the allowed size "
f"({wheel_size_mb} MB).")
return 0


if __name__ == "__main__":
import sys
sys.exit(check_wheel_size(sys.argv[1]))
58 changes: 25 additions & 33 deletions .buildkite/run-amd-test.sh
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# This script build the ROCm docker image and run the API server inside the container.
# It serves a sanity check for compilation and basic model usage.
# This script build the ROCm docker image and runs test inside it.
set -ex

# Print ROCm version
echo "--- ROCm info"
rocminfo

echo "--- Resetting GPUs"

echo "reset" > /opt/amdgpu/etc/gpu_state

Expand All @@ -16,37 +17,28 @@ while true; do
fi
done

echo "--- Building container"
sha=$(git rev-parse --short HEAD)
container_name=rocm_${sha}
docker build \
-t ${container_name} \
-f Dockerfile.rocm \
--progress plain \
.

remove_docker_container() {
docker rm -f ${container_name} || docker image rm -f ${container_name} || true
}
trap remove_docker_container EXIT

echo "--- Running container"

# Try building the docker image
docker build -t rocm -f Dockerfile.rocm .

# Setup cleanup
remove_docker_container() { docker rm -f rocm || true; }
trap remove_docker_container EXIT
remove_docker_container

# Run the image
export HIP_VISIBLE_DEVICES=1
docker run --device /dev/kfd --device /dev/dri --network host -e HIP_VISIBLE_DEVICES --name rocm rocm python3 -m vllm.entrypoints.api_server &

# Wait for the server to start
wait_for_server_to_start() {
timeout=300
counter=0

while [ "$(curl -s -o /dev/null -w ''%{http_code}'' localhost:8000/health)" != "200" ]; do
sleep 1
counter=$((counter + 1))
if [ $counter -ge $timeout ]; then
echo "Timeout after $timeout seconds"
break
fi
done
}
wait_for_server_to_start
docker run \
--device /dev/kfd --device /dev/dri \
--network host \
--rm \
-e HF_TOKEN \
--name ${container_name} \
${container_name} \
/bin/bash -c $(echo $1 | sed "s/^'//" | sed "s/'$//")

# Test a simple prompt
curl -X POST -H "Content-Type: application/json" \
localhost:8000/generate \
-d '{"prompt": "San Francisco is a"}'
5 changes: 5 additions & 0 deletions .buildkite/run-benchmarks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,11 @@ echo '```' >> benchmark_results.md
tail -n 20 benchmark_serving.txt >> benchmark_results.md # last 20 lines
echo '```' >> benchmark_results.md

# if the agent binary is not found, skip uploading the results, exit 0
if [ ! -f /workspace/buildkite-agent ]; then
exit 0
fi

# upload the results to buildkite
/workspace/buildkite-agent annotate --style "info" --context "benchmark-results" < benchmark_results.md

Expand Down
14 changes: 14 additions & 0 deletions .buildkite/run-neuron-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,20 @@ set -e

# Try building the docker image
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-west-2.amazonaws.com

# prune old image and containers to save disk space, and only once a day
# by using a timestamp file in tmp.
if [ -f /tmp/neuron-docker-build-timestamp ]; then
last_build=$(cat /tmp/neuron-docker-build-timestamp)
current_time=$(date +%s)
if [ $((current_time - last_build)) -gt 86400 ]; then
docker system prune -f
echo $current_time > /tmp/neuron-docker-build-timestamp
fi
else
echo $(date +%s) > /tmp/neuron-docker-build-timestamp
fi

docker build -t neuron -f Dockerfile.neuron .

# Setup cleanup
Expand Down
23 changes: 21 additions & 2 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,27 +17,38 @@ steps:
- VLLM_ATTENTION_BACKEND=FLASH_ATTN pytest -v -s basic_correctness/test_basic_correctness.py
- VLLM_ATTENTION_BACKEND=XFORMERS pytest -v -s basic_correctness/test_chunked_prefill.py
- VLLM_ATTENTION_BACKEND=FLASH_ATTN pytest -v -s basic_correctness/test_chunked_prefill.py
- VLLM_TEST_ENABLE_ARTIFICIAL_PREEMPT=1 pytest -v -s basic_correctness/test_preemption.py

- label: Core Test
mirror_hardwares: [amd]
command: pytest -v -s core

- label: Distributed Comm Ops Test
command: pytest -v -s test_comm_ops.py
working_dir: "/vllm-workspace/tests/distributed"
num_gpus: 2 # only support 1 or 2 for now.
num_gpus: 2

- label: Distributed Tests
working_dir: "/vllm-workspace/tests/distributed"

num_gpus: 2 # only support 1 or 2 for now.
mirror_hardwares: [amd]

commands:
- pytest -v -s test_pynccl.py
- pytest -v -s test_pynccl_library.py
- TEST_DIST_MODEL=facebook/opt-125m pytest -v -s test_basic_distributed_correctness.py
- TEST_DIST_MODEL=meta-llama/Llama-2-7b-hf pytest -v -s test_basic_distributed_correctness.py
- TEST_DIST_MODEL=facebook/opt-125m pytest -v -s test_chunked_prefill_distributed.py
- TEST_DIST_MODEL=meta-llama/Llama-2-7b-hf pytest -v -s test_chunked_prefill_distributed.py

- label: Distributed Tests (Multiple Groups)
working_dir: "/vllm-workspace/tests/distributed"
num_gpus: 4
commands:
- pytest -v -s test_pynccl.py

- label: Engine Test
mirror_hardwares: [amd]
command: pytest -v -s engine tokenization test_sequence.py test_config.py test_logger.py

- label: Entrypoints Test
Expand All @@ -48,6 +59,7 @@ steps:

- label: Examples Test
working_dir: "/vllm-workspace/examples"
mirror_hardwares: [amd]
commands:
# install aws cli for llava_example.py
- pip install awscli
Expand All @@ -61,29 +73,35 @@ steps:
parallelism: 4

- label: Models Test
mirror_hardwares: [amd]
commands:
- bash ../.buildkite/download-images.sh
- pytest -v -s models --ignore=models/test_llava.py --ignore=models/test_mistral.py

- label: Llava Test
mirror_hardwares: [amd]
commands:
- bash ../.buildkite/download-images.sh
- pytest -v -s models/test_llava.py

- label: Prefix Caching Test
mirror_hardwares: [amd]
commands:
- pytest -v -s prefix_caching

- label: Samplers Test
command: pytest -v -s samplers

- label: LogitsProcessor Test
mirror_hardwares: [amd]
command: pytest -v -s test_logits_processor.py

- label: Worker Test
mirror_hardwares: [amd]
command: pytest -v -s worker

- label: Speculative decoding tests
mirror_hardwares: [amd]
command: pytest -v -s spec_decode

- label: LoRA Test %N
Expand All @@ -101,6 +119,7 @@ steps:

- label: Benchmarks
working_dir: "/vllm-workspace/.buildkite"
mirror_hardwares: [amd]
commands:
- pip install aiohttp
- bash run-benchmarks.sh
Expand Down
25 changes: 20 additions & 5 deletions .buildkite/test-template.j2
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,29 @@ steps:
limit: 5
- wait

- label: "AMD Test"
agents:
queue: amd
command: bash .buildkite/run-amd-test.sh
- group: "AMD Tests"
depends_on: ~
steps:
{% for step in steps %}
{% if step.mirror_hardwares and "amd" in step.mirror_hardwares %}
- label: "AMD: {{ step.label }}"
agents:
queue: amd
command: bash .buildkite/run-amd-test.sh "'cd {{ (step.working_dir or default_working_dir) | safe }} && {{ step.command or (step.commands | join(' && ')) | safe }}'"
env:
DOCKER_BUILDKIT: "1"
{% endif %}
{% endfor %}

- label: "Neuron Test"
depends_on: ~
agents:
queue: neuron
command: bash .buildkite/run-neuron-test.sh
soft_fail: true

- label: "CPU Test"
- label: "Intel Test"
depends_on: ~
command: bash .buildkite/run-cpu-test.sh

{% for step in steps %}
Expand All @@ -44,6 +56,9 @@ steps:
plugins:
- kubernetes:
podSpec:
{% if step.num_gpus %}
priorityClassName: gpu-priority-cls-{{ step.num_gpus }}
{% endif %}
volumes:
- name: dshm
emptyDir:
Expand Down
49 changes: 49 additions & 0 deletions .github/ISSUE_TEMPLATE/750-RFC.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: 💬 Request for comments (RFC).
description: Ask for feedback on major architectural changes or design choices.
title: "[RFC]: "
labels: ["RFC"]

body:
- type: markdown
attributes:
value: >
#### Please take a look at previous [RFCs](https://github.com/vllm-project/vllm/issues?q=label%3ARFC+sort%3Aupdated-desc) for reference.
- type: textarea
attributes:
label: Motivation.
description: >
The motivation of the RFC.
validations:
required: true
- type: textarea
attributes:
label: Proposed Change.
description: >
The proposed change of the RFC.
validations:
required: true
- type: textarea
attributes:
label: Feedback Period.
description: >
The feedback period of the RFC. Usually at least one week.
validations:
required: false
- type: textarea
attributes:
label: CC List.
description: >
The list of people you want to CC.
validations:
required: false
- type: textarea
attributes:
label: Any Other Things.
description: >
Any other things you would like to mention.
validations:
required: false
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
2 changes: 2 additions & 0 deletions .github/scripts/run-tests
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,8 @@ do
CUDA_VISIBLE_DEVICES=0,1 pytest ${CC_PYTEST_FLAGS} --junitxml=${RESULT_XML} ${TEST} || LOCAL_SUCCESS=$?
elif [[ "${TEST}" == *"test_models_logprobs"* ]]; then
pytest --forked ${CC_PYTEST_FLAGS} --junitxml=${RESULT_XML} ${TEST} || LOCAL_SUCCESS=$?
elif [[ "${TEST}" == *"basic_correctness/test_preemption"* ]]; then
VLLM_TEST_ENABLE_ARTIFICIAL_PREEMPT=1 pytest ${CC_PYTEST_FLAGS} --junitxml=${RESULT_XML} ${TEST} || LOCAL_SUCCESS=$?
else
pytest ${CC_PYTEST_FLAGS} --junitxml=${RESULT_XML} ${TEST} || LOCAL_SUCCESS=$?
fi
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/build-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,19 @@ on:
build_label:
description: "requested runner label (specifies instance)"
type: string
default: gcp-k8s-build
default: aws-avx512-192G-4-T4-64G
build_timeout:
description: "time limit for build in minutes "
type: string
default: "60"
Gi_per_thread:
description: 'requested GiB to reserve per thread'
type: string
default: "1"
default: "4"
nvcc_threads:
description: "number of threads nvcc build threads"
type: string
default: "4"
default: "8"
# test related parameters
test_label_solo:
description: "requested runner label (specifies instance)"
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/mypy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ jobs:
- name: Mypy
run: |
mypy vllm/attention --config-file pyproject.toml
mypy vllm/core --config-file pyproject.toml
mypy vllm/distributed --config-file pyproject.toml
mypy vllm/entrypoints --config-file pyproject.toml
mypy vllm/executor --config-file pyproject.toml
Expand All @@ -42,9 +43,8 @@ jobs:
mypy vllm/engine --config-file pyproject.toml
mypy vllm/worker --config-file pyproject.toml
mypy vllm/spec_decode --config-file pyproject.toml
mypy vllm/model_executor --config-file pyproject.toml
mypy vllm/lora --config-file pyproject.toml

# TODO(sang): Fix nested dir
mypy vllm/model_executor/*.py --config-file pyproject.toml
mypy vllm/core/*.py --follow-imports=skip --config-file pyproject.toml
mypy vllm/logging --config-file pyproject.toml
mypy vllm/model_executor --config-file pyproject.toml

Loading
Loading