vllm.utils ¶
Modules:
| Name | Description |
|---|---|
argparse_utils | Argument parsing utilities for vLLM. |
async_utils | Contains helpers related to asynchronous code. |
cache | |
collection_utils | Contains helpers that are applied to collections. |
deep_gemm | Compatibility wrapper for DeepGEMM API changes. |
flashinfer | Compatibility wrapper for FlashInfer API changes. |
func_utils | Contains helpers that are applied to functions. |
gc_utils | |
hashing | |
import_utils | Contains helpers related to importing modules. |
jsontree | Helper functions to work with nested JSON structures. |
math_utils | Math utility functions for vLLM. |
mem_constants | |
mem_utils | |
nccl | |
network_utils | |
platform_utils | |
profiling | |
serial_utils | |
system_utils | |
tensor_schema | |
torch_utils | |
MULTIMODAL_MODEL_MAX_NUM_BATCHED_TOKENS module-attribute ¶
POOLING_MODEL_MAX_NUM_BATCHED_TOKENS module-attribute ¶
_DEPRECATED_MAPPINGS module-attribute ¶
_DEPRECATED_MAPPINGS = {
"cprofile": "profiling",
"cprofile_context": "profiling",
"get_open_port": "network_utils",
}
AtomicCounter ¶
An atomic, thread-safe counter
Source code in vllm/utils/__init__.py
Counter ¶
Device ¶
LayerBlockType ¶
__dir__ ¶
__getattr__ ¶
Module-level getattr to handle deprecated utilities.
Source code in vllm/utils/__init__.py
length_from_prompt_token_ids_or_embeds ¶
length_from_prompt_token_ids_or_embeds(
prompt_token_ids: list[int] | None,
prompt_embeds: Tensor | None,
) -> int
Calculate the request length (in number of tokens) give either prompt_token_ids or prompt_embeds.
Source code in vllm/utils/__init__.py
warn_for_unimplemented_methods ¶
A replacement for abc.ABC. When we use abc.ABC, subclasses will fail to instantiate if they do not implement all abstract methods. Here, we only require raise NotImplementedError in the base class, and log a warning if the method is not implemented in the subclass.