Gemma 4 Unique Technical Features
Gemma 4's key innovations include configurable image token budgets, native bounding box detection, Per-Layer Embeddings, shared KV cache, and native function calling.
Gemma 4 introduces several notable technical features: 1. Configurable image token budget (70, 140, 280, 560, 1120 tokens per image) — trade speed vs detail without pre-processing images. Use low budgets for classification, high budgets for OCR. 2. Native JSON bounding box output for object detection and GUI element pointing (1000x1000 coordinate space) — useful for UI automation and screen-parsing agents, no special prompting needed. 3. Per-Layer Embeddings (PLE) — a secondary embedding pathway that feeds token-specific signals into every decoder layer, allowing smaller models to punch above their weight. 4. Shared KV Cache — last N layers reuse key/value tensors from earlier layers, reducing memory and compute for long-context generation. 5. Variable aspect ratio support in vision encoder. 6. Native function calling and structured JSON output. 7. 140+ language support.