When nvfp4 version?
Is it possible to release nvfp4 version of both models? Now we only have fp8.
Not planned however llmcompressor is a cool library to make quantization if you'd like to try !
Should a quality quantinization be done from bf16 version that is not released by mistral and not fp8? I guess there is no other way if we have just the fp8 model released. Thanks for the answer. :)
Now I was thinking to do a finetune of the model but the script asks for 'float16', 'bfloat16', 'float32' weights. So now I can't use this model. Other Mistral models are released in bf16 why not this one?
"LLM Compressor does support NVFP4 quantization, but it does not support taking an already‑FP8 model and directly turning it into NVFP4; you need to start from an unquantized (FP16/BF16/FP32) checkpoint."
Hey ! to have BF16 one way would be to simply descale the weights by multiplying by the scales for the weights :)
Oh and to answer the second part of the question, we don't release in BF16 anymore to limit the number of ckpts we release to make it easier to identify our models and avoid cluttering our org with duplicated checkpoints. For all inference cases it is strictly better to use FP8 as our models are natively trained to handle this format, so it is free memory gain.
As per my comment above, it is quite easy to retrieve a BF16 from a FP8 model so it shouldn't be a blocker !
So you are not willing to publish the bf16 and not willing to make nvfp4 from the original bf16 for your users so the users have to do a workaround to cast fp8 to get the non original bf16 to get nvfp4. I am afraid the fp8 cast to bf16 would be not identical to your private bf16, and that would degrade the nvfp4 performance.