vllm.model_executor.models.transformers_pooling ¶
Wrapper around transformers
models for pooling tasks.
TransformersEmbeddingModel ¶
Bases: TransformersPoolingBase
Source code in vllm/model_executor/models/transformers_pooling.py
pooler instance-attribute
¶
pooler = DispatchPooler(
{
"encode": for_encode(pooler_config),
"embed": for_embed(pooler_config),
}
)
__init__ ¶
__init__(*, vllm_config: VllmConfig, prefix: str = '')
Source code in vllm/model_executor/models/transformers_pooling.py
TransformersForSequenceClassification ¶
Bases: TransformersPoolingBase
Source code in vllm/model_executor/models/transformers_pooling.py
pooler instance-attribute
¶
pooler = DispatchPooler(
{
"encode": for_encode(pooler_config),
"classify": ClassifierPooler(
pooling=CLSPool(),
classifier=classifier,
act_fn=act_fn_for_seq_cls(model_config),
),
"score": ClassifierPooler(
pooling=CLSPool(),
classifier=classifier,
act_fn=act_fn_for_cross_encoder(model_config),
),
}
)
__init__ ¶
__init__(*, vllm_config: VllmConfig, prefix: str = '')
Source code in vllm/model_executor/models/transformers_pooling.py
TransformersMoEEmbeddingModel ¶
Bases: TransformersMoEBase
, TransformersEmbeddingModel
Source code in vllm/model_executor/models/transformers_pooling.py
TransformersMoEForSequenceClassification ¶
Bases: TransformersMoEBase
, TransformersForSequenceClassification
Source code in vllm/model_executor/models/transformers_pooling.py
TransformersPoolingBase ¶
Bases: TransformersBase
, VllmModelForPooling
Source code in vllm/model_executor/models/transformers_pooling.py
hf_to_vllm_mapper class-attribute
instance-attribute
¶
hf_to_vllm_mapper = WeightsMapper(
orig_to_new_prefix={
"roberta": "model",
"bert": "model",
"": "model.",
"model.model.": "model.",
"model.score": "classifier",
"model.classifier": "classifier",
},
orig_to_new_suffix={
".gamma": ".weight",
".beta": ".bias",
},
)
__init__ ¶
__init__(*, vllm_config: VllmConfig, prefix: str = '')
Source code in vllm/model_executor/models/transformers_pooling.py
create_attention_instances ¶
create_attention_instances(
attn_type: AttentionType = DECODER,
) -> dict[int, Attention]
Source code in vllm/model_executor/models/transformers_pooling.py
forward ¶
forward(
input_ids: Optional[Tensor],
positions: Tensor,
intermediate_tensors: Optional[
IntermediateTensors
] = None,
inputs_embeds: Optional[Tensor] = None,
) -> Union[Tensor, IntermediateTensors]