Qwen2.5-3B-Instruct is an instruction-tuned language model in the Qwen2.5 series, with approximately 3.09 billion parameters. The main features of this model include:
Model Type: Causal Language Model
Training Stages: Pre-training and post-training
Architecture: Transformer, adopting RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
Number of Parameters: 3.09 billion (2.77 billion non-embedding parameters)
Number of Layers: 36 layers
Number of Attention Heads (GQA): 16 query heads, 2 key-value heads
Context Length: Supports a full 32,768 tokens, with a maximum generation of 8,192 tokens
This model shows significant improvements in instruction understanding, long-text generation, and structured data comprehension, and supports multilingual capabilities across 29 languages including English, Chinese, and French.