Skip to content

vllm.model_executor.layers.quantization.utils.mxfp8_utils ¶

logger `module-attribute` ¶

logger = init_logger(__name__)

mxfp8_quantize ¶

mxfp8_quantize(x: Tensor) -> tuple[Tensor, Tensor]

Source code in vllm/model_executor/layers/quantization/utils/mxfp8_utils.py

def mxfp8_quantize(x: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
    try:
        from flashinfer import mxfp8_quantize
    except ImportError as err:
        raise ImportError(
            "The package `flashinfer` is required to do "
            "MX-FP8 quantization. Please install it with"
            "`pip install flashinfer`"
        ) from err

    return mxfp8_quantize(x, is_sf_swizzled_layout=False)