vllm.v1.sample.logits_processor.builtin ¶
LogitBiasLogitsProcessor ¶
Bases: LogitsProcessor
Source code in vllm/v1/sample/logits_processor/builtin.py
logits_slice instance-attribute
¶
__init__ ¶
Source code in vllm/v1/sample/logits_processor/builtin.py
_device_tensor ¶
apply ¶
is_argmax_invariant ¶
is_argmax_invariant() -> bool
Logit bias can rebalance token probabilities and change the outcome of argmax in greedy sampling.
update_state ¶
update_state(batch_update: Optional[BatchUpdate])
Source code in vllm/v1/sample/logits_processor/builtin.py
MinPLogitsProcessor ¶
Bases: LogitsProcessor
Source code in vllm/v1/sample/logits_processor/builtin.py
min_p_cpu_tensor instance-attribute
¶
min_p_cpu_tensor = zeros(
(max_num_reqs,),
dtype=float32,
device="cpu",
pin_memory=is_pin_memory,
)
min_p_device instance-attribute
¶
__init__ ¶
__init__(
vllm_config: VllmConfig,
device: device,
is_pin_memory: bool,
)
Source code in vllm/v1/sample/logits_processor/builtin.py
apply ¶
Source code in vllm/v1/sample/logits_processor/builtin.py
get_min_p_by_index ¶
update_state ¶
update_state(batch_update: Optional[BatchUpdate])
Source code in vllm/v1/sample/logits_processor/builtin.py
MinTokensLogitsProcessor ¶
Bases: LogitsProcessor
Source code in vllm/v1/sample/logits_processor/builtin.py
logits_slice instance-attribute
¶
__init__ ¶
__init__(
vllm_config: VllmConfig,
device: device,
is_pin_memory: bool,
)
Source code in vllm/v1/sample/logits_processor/builtin.py
_device_tensor ¶
add_request staticmethod
¶
add_request(
params: SamplingParams,
_: Optional[list[int]],
output_tok_ids: list[int],
) -> Optional[tuple[int, Sequence[int], set[int]]]
Source code in vllm/v1/sample/logits_processor/builtin.py
apply ¶
is_argmax_invariant ¶
is_argmax_invariant() -> bool
By censoring stop tokens, min-tokens can change the outcome of the argmax operation in greedy sampling.
update_state ¶
update_state(batch_update: Optional[BatchUpdate])
Source code in vllm/v1/sample/logits_processor/builtin.py
process_dict_updates ¶
process_dict_updates(
req_entries: dict[int, T],
batch_update: Optional[BatchUpdate],
new_state: Callable[
[SamplingParams, Optional[list[int]], list[int]],
Optional[T],
],
) -> bool
Utility function to update dict state for sparse LogitsProcessors.