Weights Information
This model contains weights bundled within model.pt (17GB).
In the AWS Neuron reference format, weights are typically stored separately as:
weights/tp0_sharded_checkpoint.safetensorsweights/tp1_sharded_checkpoint.safetensors
To extract weights to safetensors format, you would need to:
- Load the model using optimum-neuron
- Extract the state_dict
- Convert to safetensors format
- Shard by tensor parallel rank
This is currently not straightforward for compiled Neuron models as the weights are embedded in the compiled format.
Current Structure
The model.pt file contains:
- Compiled graphs (NEFF format)
- Model weights (optimized for Neuron)
- Runtime metadata
The separate directories contain:
context_encoding_model/: NEFF files for context encodingtoken_generation_model/: NEFF files for token generationlayout_opt/: Layout optimization artifacts
##Usage
Load this model using:
from optimum.neuron import NeuronModelForCausalLM
model = NeuronModelForCausalLM.from_pretrained("path/to/model")