All your AI models in one API—LLM, TTS, image gen, instantly.
`nomic-embed-text` is a large context length text encoder that surpasses OpenAI `text-embedding-ada-002` and `text-embedding-3-small` performance on short and long context tasks.
As of March 2024, this model archives SOTA performance for Bert-large sized models on the MTEB. It outperforms commercial models like OpenAIs `text-embedding-3-large` model and matches the performance of model 20x its size. `mxbai-embed-large` was trained with no overlap of the MTEB data, which indicates that the model generalizes well across several domains, tasks and text length.
`BGE-M3` is based on the XLM-RoBERTa architecture and is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity: * **Multi-Functionality**: It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval. * **Multi-Linguality**: It can support more than 100 working languages. * **Multi-Granularity**: It is able to process inputs of different granularities, spanning from short sentences to long documents of up to 8192 tokens.
`snowflake-arctic-embed` is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance. The models are trained by leveraging existing open-source text representation models, such as `bert-base-uncased`, and are trained in a multi-stage pipeline to optimize their retrieval performance. This model is available in 5 parameter sizes: * `snowflake-arctic-embed:335m` (default) * `snowflake-arctic-embed:137m` * `snowflake-arctic-embed:110m` * `snowflake-arctic-embed:33m` * `snowflake-arctic-embed:22m`
`snowflake-arctic-embed` is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance. The models are trained by leveraging existing open-source text representation models, such as `bert-base-uncased`, and are trained in a multi-stage pipeline to optimize their retrieval performance. This model is available in 5 parameter sizes: * `snowflake-arctic-embed:335m` (default) * `snowflake-arctic-embed:137m` * `snowflake-arctic-embed:110m` * `snowflake-arctic-embed:33m` * `snowflake-arctic-embed:22m`
`snowflake-arctic-embed` is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance. The models are trained by leveraging existing open-source text representation models, such as `bert-base-uncased`, and are trained in a multi-stage pipeline to optimize their retrieval performance. This model is available in 5 parameter sizes: * `snowflake-arctic-embed:335m` (default) * `snowflake-arctic-embed:137m` * `snowflake-arctic-embed:110m` * `snowflake-arctic-embed:33m` * `snowflake-arctic-embed:22m`
`snowflake-arctic-embed` is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance. The models are trained by leveraging existing open-source text representation models, such as `bert-base-uncased`, and are trained in a multi-stage pipeline to optimize their retrieval performance. This model is available in 5 parameter sizes: * `snowflake-arctic-embed:335m` (default) * `snowflake-arctic-embed:137m` * `snowflake-arctic-embed:110m` * `snowflake-arctic-embed:33m` * `snowflake-arctic-embed:22m`
`snowflake-arctic-embed` is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance. The models are trained by leveraging existing open-source text representation models, such as `bert-base-uncased`, and are trained in a multi-stage pipeline to optimize their retrieval performance. This model is available in 5 parameter sizes: * `snowflake-arctic-embed:335m` (default) * `snowflake-arctic-embed:137m` * `snowflake-arctic-embed:110m` * `snowflake-arctic-embed:33m` * `snowflake-arctic-embed:22m`
The model is intended to be used as a sentence and short paragraph encoder. Given an input text, it outputs a vector which captures the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks.
The model is intended to be used as a sentence and short paragraph encoder. Given an input text, it outputs a vector which captures the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks.
Snowflake's frontier embedding model. Arctic Embed 2.0 adds multilingual support without sacrificing English performance or scalability.
Gemma is a lightweight, family of models from Google built on Gemini technology. The Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages. Available in 1B, 4B, 12B, and 27B parameter sizes, they excel in tasks like question answering, summarization, and reasoning, while their compact design allows deployment on resource-limited devices.
Gemma is a lightweight, family of models from Google built on Gemini technology. The Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages. Available in 1B, 4B, 12B, and 27B parameter sizes, they excel in tasks like question answering, summarization, and reasoning, while their compact design allows deployment on resource-limited devices.
Gemma is a lightweight, family of models from Google built on Gemini technology. The Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages. Available in 1B, 4B, 12B, and 27B parameter sizes, they excel in tasks like question answering, summarization, and reasoning, while their compact design allows deployment on resource-limited devices.
Gemma is a lightweight, family of models from Google built on Gemini technology. The Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages. Available in 1B, 4B, 12B, and 27B parameter sizes, they excel in tasks like question answering, summarization, and reasoning, while their compact design allows deployment on resource-limited devices.
The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes (text + images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks. **Supported Languages**: For text only tasks, English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Note for image+text applications, English is the only language supported.
`MiniCPM-V 2.6` is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding. Notable features of MiniCPM-V 2.6 include: * **🔥 Leading Performance**: MiniCPM-V 2.6 achieves an average score of 65.2 on the latest version of OpenCompass, a comprehensive evaluation over 8 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4o mini, GPT-4V, Gemini 1.5 Pro, and Claude 3.5 Sonnet for single image understanding. * **🖼️ Multi Image Understanding and In-context Learning**: MiniCPM-V 2.6 can also perform conversation and reasoning over multiple images. It achieves state-of-the-art performance on popular multi-image benchmarks such as Mantis-Eval, BLINK, Mathverse mv and Sciverse mv, and also shows promising in-context learning capability. * **💪 Strong OCR Capability**: MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344). It achieves state-of-the-art performance on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V, and Gemini 1.5 Pro. Based on the the latest RLAIF-V and VisCPM techniques, it features trustworthy behaviors, with significantly lower hallucination rates than GPT-4o and GPT-4V on Object HalBench, and supports multilingual capabilities on English, Chinese, German, French, Italian, Korean, etc. * **🚀 Superior Efficiency**: In addition to its friendly size, MiniCPM-V 2.6 also shows state-of-the-art token density (i.e., number of pixels encoded into each visual token). It produces only 640 tokens when processing a 1.8M pixel image, which is 75% fewer than most models. This directly improves the inference speed, first-token latency, memory usage, and power consumption.
`llava-llama-3-8b-v1_1` is a LLaVA model fine-tuned from `meta-llama/Meta-Llama-3-8B-Instruct` and `CLIP-ViT-Large-patch14-336` with ShareGPT4V-PT and InternVL-SFT by XTuner.
`Moondream` is an open-source visual language model that understands images using simple text prompts. It's fast, wildly capable — and just 1GB in size. * **Vision AI at Warp Speed**: Forget everything you thought you needed to know about computer vision. With Moondream, there's no training, no ground truth data, and no heavy infrastructure. Just a model, a prompt, and a whole world of visual understanding. * **Ridiculously lightweight**: Under 2B parameters. Quantized to 4-bit. Just 1GB. Moondream runs anywhere — from edge devices to your laptop. * **Actually affordable**: Run it locally for free. Or use our cloud API to process a high volume of images quickly and cheaply. Free tier included. * **Simple by design**: Choose a capability. Write a prompt. Get results. That's it. Moondream is designed for developers who don't want to babysit models. * **Versatile as hell**: Go beyond basic visual Q&A. Moondream can caption, detect objects, locate things, read documents, follow gaze, and more. * **Tried, tested, trusted**: 6M+ downloads. 8K+ GitHub stars. Used across industries — from healthcare to robotics to mobile apps.
Model Summary: `granite-vision-3.2-2b` is a compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more. The model was trained on a meticulously curated instruction-following dataset, comprising diverse public datasets and synthetic datasets tailored to support a wide range of document understanding and general image tasks. It was trained by fine-tuning a Granite large language model with both image and text modalities.
Model Card for Mistral-Small-3.1-24B-Instruct-2503 Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks. This model is an instruction-finetuned version of: `Mistral-Small-3.1-24B-Base-2503`. Mistral Small 3.1 can be deployed locally and is exceptionally "knowledge-dense," fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized. It is ideal for: * Fast-response conversational agents. * Low-latency function calling. * Subject matter experts via fine-tuning. * Local inference for hobbyists and organizations handling sensitive data. * Programming and math reasoning. * Long document understanding. * Visual understanding. For enterprises requiring specialized capabilities (increased context, specific modalities, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community. Learn more about Mistral Small 3.1 in our [blog post](https://mistral.ai/news/mistral-small-3-1). **Key Features** * **Vision**: Vision capabilities enable the model to analyze images and provide insights based on visual content in addition to text. * **Multilingual**: Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi. * **Agent-Centric**: Offers best-in-class agentic capabilities with native function calling and JSON outputting. * **Advanced Reasoning**: State-of-the-art conversational and reasoning capabilities. * **Apache 2.0 License**: Open license allowing usage and modification for both commercial and non-commercial purposes. * **Context Window**: A 128k context window. * **System Prompt**: Maintains strong adherence and support for system prompts. * **Tokenizer**: Utilizes a Tekken tokenizer with a 131k vocabulary size.
The Cogito v1 Preview LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use. * Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models). * The LLMs are trained using Iterated Distillation and Amplification (IDA) - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement. * The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts. * In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks. * Each model is trained in over 30 languages and supports a context length of 128k.
The Cogito v1 Preview LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use. * Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models). * The LLMs are trained using Iterated Distillation and Amplification (IDA) - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement. * The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts. * In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks. * Each model is trained in over 30 languages and supports a context length of 128k.
The Cogito v1 Preview LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use. * Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models). * The LLMs are trained using Iterated Distillation and Amplification (IDA) - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement. * The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts. * In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks. * Each model is trained in over 30 languages and supports a context length of 128k.
The Cogito v1 Preview LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use. * Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models). * The LLMs are trained using Iterated Distillation and Amplification (IDA) - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement. * The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts. * In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks. * Each model is trained in over 30 languages and supports a context length of 128k.
The Cogito v1 Preview LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use. * Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models). * The LLMs are trained using Iterated Distillation and Amplification (IDA) - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement. * The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts. * In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks. * Each model is trained in over 30 languages and supports a context length of 128k.
`DeepSeek R1 Distill Qwen 1.5B` is a distilled large language model based on `Qwen 2.5 Math 1.5B`, using outputs from DeepSeek R1. It's a very small and efficient model which outperforms GPT 4o 0513 on Math Benchmarks. Other benchmark results include: * AIME 2024 pass@1: 28.9 * AIME 2024 cons@64: 52.7 * MATH-500 pass@1: 83.9 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
`DeepSeek R1 Distill Qwen 14B` is a distilled large language model based on Qwen 2.5 14B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: * AIME 2024 pass@1: 69.7 * MATH-500 pass@1: 93.9 * CodeForces Rating: 1481 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
`DeepSeek R1 Distill Qwen 32B` is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: * AIME 2024 pass@1: 72.6 * MATH-500 pass@1: 94.3 * CodeForces Rating: 1691 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
`DeepSeek R1 Distill Llama 70B` is a distilled large language model based on `Llama-3.3-70B-Instruct`, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: * AIME 2024 pass@1: 70.0 * MATH-500 pass@1: 94.5 * CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
DeepSeek's first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen.
`DeepSeek R1 Distill Llama 8B` is a distilled large language model based on `Llama-3.1-8B-Instruct`, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: * AIME 2024 pass@1: 50.4 * MATH-500 pass@1: 89.1 * CodeForces Rating: 1205 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.
The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. **Supported languages**: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Mistral is a 7B parameter model, distributed with the Apache license. It is available in both instruct (instruction following) and text completion. The Mistral AI team has noted that Mistral 7B: * Outperforms Llama 2 13B on all benchmarks * Outperforms Llama 1 34B on many benchmarks * Approaches CodeLlama 7B performance on code, while remaining good at English tasks
Mistral NeMo is a 12B model built in collaboration with NVIDIA. Mistral NeMo offers a large context window of up to 128k tokens. Its reasoning, world knowledge, and coding accuracy are state-of-the-art in its size category. As it relies on standard architecture, Mistral NeMo is easy to use and a drop-in replacement in any system using Mistral 7B.
`phi-4` is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning. `phi-4` underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
`Phi-4-mini-instruct` is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4 model family and supports 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning and direct preference optimization to support precise instruction adherence and robust safety measures
`Qwen2.5` is the latest series of Qwen large language models. For Qwen2.5, a range of base language models and instruction-tuned models are released, with sizes ranging from 0.5 to 72 billion parameters. Qwen2.5 introduces the following improvements over Qwen2: * It possesses significantly more knowledge and has greatly enhanced capabilities in coding and mathematics, due to specialized expert models in these domains. * It demonstrates significant advancements in instruction following, long-text generation (over 8K tokens), understanding structured data (e.g., tables), and generating structured outputs, especially in JSON format. It is also more resilient to diverse system prompts, improving role-play and condition-setting for chatbots. * It supports long contexts of up to 128K tokens and can generate up to 8K tokens. * It offers multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
`Qwen2.5` is the latest series of Qwen large language models. For Qwen2.5, a range of base language models and instruction-tuned models are released, with sizes ranging from 0.5 to 72 billion parameters. Qwen2.5 introduces the following improvements over Qwen2: * It possesses significantly more knowledge and has greatly enhanced capabilities in coding and mathematics, due to specialized expert models in these domains. * It demonstrates significant advancements in instruction following, long-text generation (over 8K tokens), understanding structured data (e.g., tables), and generating structured outputs, especially in JSON format. It is also more resilient to diverse system prompts, improving role-play and condition-setting for chatbots. * It supports long contexts of up to 128K tokens and can generate up to 8K tokens. * It offers multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
`Qwen2.5` is the latest series of Qwen large language models. For Qwen2.5, a range of base language models and instruction-tuned models are released, with sizes ranging from 0.5 to 72 billion parameters. Qwen2.5 introduces the following improvements over Qwen2: * It possesses significantly more knowledge and has greatly enhanced capabilities in coding and mathematics, due to specialized expert models in these domains. * It demonstrates significant advancements in instruction following, long-text generation (over 8K tokens), understanding structured data (e.g., tables), and generating structured outputs, especially in JSON format. It is also more resilient to diverse system prompts, improving role-play and condition-setting for chatbots. * It supports long contexts of up to 128K tokens and can generate up to 8K tokens. * It offers multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
`Qwen2.5` is the latest series of Qwen large language models. For Qwen2.5, a range of base language models and instruction-tuned models are released, with sizes ranging from 0.5 to 72 billion parameters. Qwen2.5 introduces the following improvements over Qwen2: * It possesses significantly more knowledge and has greatly enhanced capabilities in coding and mathematics, due to specialized expert models in these domains. * It demonstrates significant advancements in instruction following, long-text generation (over 8K tokens), understanding structured data (e.g., tables), and generating structured outputs, especially in JSON format. It is also more resilient to diverse system prompts, improving role-play and condition-setting for chatbots. * It supports long contexts of up to 128K tokens and can generate up to 8K tokens. * It offers multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
`QwQ` is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. `QwQ-32B` is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.
Mistral AI is contributing Mathstral to the science community to bolster efforts in advanced mathematical problems requiring complex, multi-step logical reasoning. The Mathstral release is part of their broader effort to support academic projects—it was produced in the context of Mistral AI’s collaboration with Project Numina. Akin to Isaac Newton in his time, Mathstral stands on the shoulders of Mistral 7B and specializes in STEM subjects. It achieves state-of-the-art reasoning capacities in its size category across various industry-standard benchmarks.
Over the past year, we have dedicated significant effort to researching and enhancing the reasoning capabilities of large language models, with a particular focus on their ability to solve arithmetic and mathematical problems. Today, we are delighted to introduce a series of math-specific large language models of our Qwen2 series, `Qwen2-Math` and `Qwen2-Math-Instruct-1.5B/7B/72B`. Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT-4o). We hope that Qwen2-Math can contribute to the community for solving complex mathematical problems.
Over the past year, we have dedicated significant effort to researching and enhancing the reasoning capabilities of large language models, with a particular focus on their ability to solve arithmetic and mathematical problems. Today, we are delighted to introduce a series of math-specific large language models of our Qwen2 series, `Qwen2-Math` and `Qwen2-Math-Instruct-1.5B/7B/72B`. Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT-4o). We hope that Qwen2-Math can contribute to the community for solving complex mathematical problems.
🚀 Democratizing Reinforcement Learning for LLMs 🌟 `DeepScaleR-1.5B-Preview` is a language model fine-tuned from `DeepSeek-R1-Distilled-Qwen-1.5B` using distributed reinforcement learning (RL) to scale up to long context lengths. The model achieves 43.1% Pass@1 accuracy on AIME 2024, representing a 15% improvement over the base model (28.8%) and surpassing OpenAI’s O1-Preview performance with just 1.5B parameters.
**Powerful**: `Qwen2.5-Coder-32B-Instruct` has become the current SOTA open-source code model, matching the coding capabilities of GPT-4o. While demonstrating strong and comprehensive coding abilities, it also possesses good general and mathematical skills; **Diverse**: Building on the previously open-sourced two sizes of 1.5B / 7B, this release brings four model sizes, including 0.5B / 3B / 14B / 32B. As of now, Qwen2.5-Coder has covered six mainstream model sizes to meet the needs of different developers; **Practical**: We explore the practicality of Qwen2.5-Coder in two scenarios, including code assistants and Artifacts, with some examples showcasing the potential applications of Qwen2.5-Coder in real-world scenarios
**Powerful**: `Qwen2.5-Coder-32B-Instruct` has become the current SOTA open-source code model, matching the coding capabilities of GPT-4o. While demonstrating strong and comprehensive coding abilities, it also possesses good general and mathematical skills; **Diverse**: Building on the previously open-sourced two sizes of 1.5B / 7B, this release brings four model sizes, including 0.5B / 3B / 14B / 32B. As of now, Qwen2.5-Coder has covered six mainstream model sizes to meet the needs of different developers; **Practical**: We explore the practicality of Qwen2.5-Coder in two scenarios, including code assistants and Artifacts, with some examples showcasing the potential applications of Qwen2.5-Coder in real-world scenarios
**Powerful**: `Qwen2.5-Coder-32B-Instruct` has become the current SOTA open-source code model, matching the coding capabilities of GPT-4o. While demonstrating strong and comprehensive coding abilities, it also possesses good general and mathematical skills; **Diverse**: Building on the previously open-sourced two sizes of 1.5B / 7B, this release brings four model sizes, including 0.5B / 3B / 14B / 32B. As of now, Qwen2.5-Coder has covered six mainstream model sizes to meet the needs of different developers; **Practical**: We explore the practicality of Qwen2.5-Coder in two scenarios, including code assistants and Artifacts, with some examples showcasing the potential applications of Qwen2.5-Coder in real-world scenarios
**Powerful**: `Qwen2.5-Coder-32B-Instruct` has become the current SOTA open-source code model, matching the coding capabilities of GPT-4o. While demonstrating strong and comprehensive coding abilities, it also possesses good general and mathematical skills; **Diverse**: Building on the previously open-sourced two sizes of 1.5B / 7B, this release brings four model sizes, including 0.5B / 3B / 14B / 32B. As of now, Qwen2.5-Coder has covered six mainstream model sizes to meet the needs of different developers; **Practical**: We explore the practicality of Qwen2.5-Coder in two scenarios, including code assistants and Artifacts, with some examples showcasing the potential applications of Qwen2.5-Coder in real-world scenarios
`DeepCoder-14B-Preview` is a code reasoning LLM fine-tuned from `DeepSeek-R1-Distilled-Qwen-14B` using distributed reinforcement learning (RL) to scale up to long context lengths. The model achieves 60.6% Pass@1 accuracy on LiveCodeBench v5 (8/1/24-2/1/25), representing a 8% improvement over the base model (53%) and achieving similar performance to OpenAI's o3-mini with just 14B parameters.
`CodeGemma` is a collection of powerful, lightweight models that can perform a variety of coding tasks like fill-in-the-middle code completion, code generation, natural language understanding, mathematical reasoning, and instruction following.
`CodeGemma` is a collection of powerful, lightweight models that can perform a variety of coding tasks like fill-in-the-middle code completion, code generation, natural language understanding, mathematical reasoning, and instruction following.
`Deepseek Coder` is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks. * **Massive Training Data**: Trained from scratch on 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages. * **Highly Flexible & Scalable**: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements. * **Superior Model Performance**: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks. * **Advanced Code Completion Capabilities**: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.
`Deepseek Coder` is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks. * **Massive Training Data**: Trained from scratch on 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages. * **Highly Flexible & Scalable**: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements. * **Superior Model Performance**: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks. * **Advanced Code Completion Capabilities**: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.
We present `DeepSeek-Coder-V2`, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K.
`Kokoro` is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
The IBM Granite 3.3 8B model is an 8-billion-parameter instruction-tuned LLM with a 128K token context window, optimized for reasoning, instruction following, fill-in-the-middle code completion, and structured reasoning.
`Qwen2.5-VL` is the new flagship vision-language model series from Qwen, representing a significant leap from the previous `Qwen2-VL`. **Key Features**: * **Understand Things Visually**: `Qwen2.5-VL` is proficient in recognizing common objects (flowers, birds, fish, insects) and excels at analyzing texts, charts, icons, graphics, and layouts within images. * **Agentic Capabilities**: Acts as a visual agent that can reason and dynamically direct tools, enabling computer and phone use. * **Visual Localization**: Accurately localizes objects in an image by generating bounding boxes or points, providing stable JSON outputs for coordinates and attributes. * **Structured Outputs**: Supports structured outputs for data like scans of invoices, forms, and tables, beneficial for finance, commerce, etc. **Performance**: The flagship model, `Qwen2.5-VL-72B-Instruct`, achieves competitive performance across benchmarks. Smaller models like `Qwen2.5-VL-7B-Instruct` outperform `GPT-4o-mini` in several tasks. The `Qwen2.5-VL-3B` model, designed for edge AI, even surpasses the 7B model of the previous `Qwen2-VL` version. (Note: Requires Ollama 0.7.0 or later).
`Qwen2.5-VL` is the new flagship vision-language model series from Qwen, representing a significant leap from the previous `Qwen2-VL`. **Key Features**: * **Understand Things Visually**: `Qwen2.5-VL` is proficient in recognizing common objects and excels at analyzing texts, charts, icons, graphics, and layouts within images. * **Agentic Capabilities**: Acts as a visual agent that can reason and dynamically direct tools. * **Visual Localization**: Accurately localizes objects, providing stable JSON outputs for coordinates and attributes. * **Structured Outputs**: Supports structured outputs for data like scans of invoices, forms, and tables. **Performance**: `Qwen2.5-VL-7B-Instruct` outperforms `GPT-4o-mini` in a number of tasks. It is part of a series where the flagship `Qwen2.5-VL-72B-Instruct` achieves competitive performance across many benchmarks. (Note: Requires Ollama 0.7.0 or later).
`Qwen2.5-VL` is the new flagship vision-language model series from Qwen. This 32B variant offers a powerful balance of capability and resource requirements within the `Qwen2.5-VL` family. **Key Features**: * **Advanced Visual Understanding**: Proficient in recognizing diverse objects and analyzing complex visual content including text, charts, and layouts. * **Strong Agentic Capabilities**: Functions as a visual agent, capable of reasoning and directing tools for tasks like computer and phone interaction. * **Precise Visual Localization**: Accurately localizes objects using bounding boxes or points, with stable JSON output for coordinates. * **Structured Data Extraction**: Efficiently generates structured outputs from visual data such as invoices and forms. **Performance**: As part of the `Qwen2.5-VL` series, the 32B model benefits from the architectural improvements that allow the flagship `Qwen2.5-VL-72B-Instruct` to achieve SOTA-competitive results. It offers a step up in performance from the smaller variants for more demanding tasks. (Note: Requires Ollama 0.7.0 or later).
`Qwen2.5-VL-72B-Instruct` is the flagship vision-language model from Qwen, showcasing top-tier performance and a comprehensive feature set. **Key Features**: * **Superior Visual Understanding**: Excels in recognizing a wide array of objects and analyzing intricate visual details in texts, charts, icons, graphics, and layouts. * **Highly Agentic**: Functions effectively as a visual agent, demonstrating strong reasoning and tool utilization for complex interactions like computer and phone operation. * **Accurate Visual Localization**: Precisely identifies and localizes objects, generating bounding boxes or points with stable JSON outputs for coordinates and attributes. * **Robust Structured Output Generation**: Adept at extracting and structuring information from visual documents like invoices, forms, and tables, ideal for applications in finance and commerce. **Performance**: `Qwen2.5-VL-72B-Instruct` achieves competitive performance in a series of benchmarks covering diverse domains and tasks, including college-level problems, math, document understanding, general question answering, and visual agent capabilities. It demonstrates significant advantages in understanding documents and diagrams and can operate as a visual agent without task-specific fine-tuning. (Note: Requires Ollama 0.7.0 or later).
`Phi-4 Reasoning` and `Phi-4 Reasoning Plus` are 14-billion-parameter models from Microsoft, designed to rival much larger models on complex reasoning tasks. * **`Phi-4 Reasoning`**: Trained via supervised fine-tuning (SFT) of `Phi-4` on carefully curated reasoning demonstrations, including from OpenAI’s `o3-mini`. This highlights how meticulous data curation and high-quality synthetic datasets enable smaller models to compete with larger counterparts. * **`Phi-4 Reasoning Plus`**: Builds upon `Phi-4 Reasoning` and is further trained with reinforcement learning (RL) to deliver higher accuracy. **Performance**: These models consistently outperform the base `Phi-4` model by significant margins on representative reasoning benchmarks (mathematical and scientific reasoning). They exceed `DeepSeek-R1 Distill Llama 70B` (5x larger) and demonstrate competitive performance against significantly larger models like `DeepSeek-R1`.
`Qwen3` is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. The 4B variant provides an efficient entry point into the Qwen3 family. **Key Capabilities**: * **Thinking/Non-Thinking Modes**: Uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, coding) and non-thinking mode (for efficient, general-purpose dialogue) within a single model. * **Enhanced Reasoning**: Significant improvements in reasoning, surpassing previous `QwQ` (thinking mode) and `Qwen2.5 Instruct` (non-thinking mode) models in mathematics, code generation, and logical reasoning. `Qwen3-4B` can rival the performance of `Qwen2.5-72B-Instruct` in some aspects. * **Human Preference Alignment**: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following. * **Agent Capabilities**: Precise integration with external tools in both modes, with leading performance among open-source models in complex agent-based tasks. * **Multilingual Support**: Supports 100+ languages and dialects with strong capabilities for multilingual instruction following and translation.
`Qwen3` is the latest generation of large language models in the Qwen series. The 8B variant offers a balanced blend of performance and efficiency. **Key Capabilities**: * **Thinking/Non-Thinking Modes**: Supports seamless switching between modes for complex reasoning/coding and general dialogue. * **Enhanced Reasoning**: Significant improvements in mathematics, code generation, and logical reasoning over previous Qwen generations. * **Human Preference Alignment**: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following. * **Agent Capabilities**: Precise integration with external tools, leading in complex agent-based tasks among open-source models. * **Multilingual Support**: Supports 100+ languages with strong instruction following and translation. * **Extended Context**: This variant supports a 128k context window.
`Qwen3` is the latest generation of Qwen LLMs. The 14B model provides enhanced capabilities for more demanding tasks. **Key Capabilities**: * **Dual Modes**: Seamlessly switches between thinking (complex logic, math, code) and non-thinking (general dialogue) modes. * **Advanced Reasoning**: Outperforms previous Qwen models (`QwQ`, `Qwen2.5 Instruct`) in math, coding, and logical reasoning. * **Superior Alignment**: Strong in creative writing, role-playing, multi-turn conversations, and following instructions. * **Expert Agent**: Integrates precisely with external tools, leading in agent-based tasks. * **Broad Multilingualism**: Supports over 100 languages and dialects. * **Large Context**: Features a 128k context window.
`Qwen3-30B-A3B` is a Mixture-of-Experts (MoE) model from the latest `Qwen3` LLM series, designed for high efficiency and performance. **Key Capabilities**: * **MoE Architecture**: Provides strong performance, outcompeting `QwQ-32B` with 10 times fewer activated parameters. * **Dual Operational Modes**: Supports seamless switching between a "thinking mode" for complex reasoning, math, and coding tasks, and a "non-thinking mode" for efficient, general-purpose dialogue. * **Significantly Enhanced Reasoning**: Surpasses previous `QwQ` (in thinking mode) and `Qwen2.5 Instruct` models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. * **Superior Human Preference Alignment**: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following, delivering a more natural and engaging conversational experience. * **Expertise in Agent Capabilities**: Enables precise integration with external tools in both modes and achieves leading performance among open-source models in complex agent-based tasks. * **Extensive Multilingual Support**: Supports over 100 languages and dialects with strong capabilities for multilingual instruction following and translation. * **Large Context Window**: Supports a context length of 128k tokens.
The `Qwen3-32B` model is a powerful dense model from the latest `Qwen3` LLM series, offering strong all-around capabilities. **Key Capabilities**: * **Dual Operational Modes**: Features seamless switching between "thinking mode" (for complex tasks like logical reasoning, math, and coding) and "non-thinking mode" (for efficient, general-purpose dialogue). * **Enhanced Reasoning Abilities**: Demonstrates significant improvements in reasoning, surpassing previous generations like `QwQ` (in thinking mode) and `Qwen2.5 Instruct` models (in non-thinking mode) in mathematics, code generation, and logical reasoning. * **Excellent Human Preference Alignment**: Strong performance in creative writing, role-playing, multi-turn dialogues, and instruction following, leading to more natural and engaging interactions. * **Advanced Agent Functionality**: Capable of precise integration with external tools in both operational modes, achieving leading performance among open-source models for complex agent-based tasks. * **Comprehensive Multilingual Support**: Supports over 100 languages and dialects, with robust capabilities for multilingual instruction following and translation. * **Large Context Window**: Supports a 128k token context length.