Liquid AI unveils LFM2-VL: Fast, efficient AI for on-device vision-language
Liquid AI has unveiled LFM2-VL, a new family of vision-language foundation models engineered for efficient deployment across a broad spectrum of hardware, from smartphones and laptops to wearables and embedded systems. These models promise to deliver low-latency performance and robust accuracy, offering significant flexibility for real-world applications.
Building on the company’s established LFM2 architecture, LFM2-VL extends its capabilities into multimodal processing, seamlessly integrating both text and image inputs at various resolutions. Liquid AI asserts that these new models can achieve up to twice the GPU inference speed of comparable vision-language models, all while maintaining competitive performance across standard benchmarks. Ramin Hasani, co-founder and CEO of Liquid AI, underscored the company’s core philosophy in an announcement, stating, “Efficiency is our product.” He highlighted the release of two open-weight variants, measuring 440 million and 1.6 billion parameters, noting their enhanced GPU speed, native 512x512 image processing, and smart patching for larger images.
The LFM2-VL release comprises two distinct model sizes tailored for different operational needs. The LFM2-VL-450M is a highly efficient model, featuring fewer than half a billion parameters, designed specifically for environments with severe resource constraints. Complementing this is the LFM2-VL-1.6B, a more capable model that remains sufficiently lightweight for deployment on single-GPU systems and directly on devices. Both variants are engineered to process images at their native resolutions of up to 512x512 pixels, thereby preventing distortion or unnecessary upscaling. For larger images, the system employs a technique of non-overlapping patching, augmenting these sections with a thumbnail for global context, which allows the model to discern both fine details and the broader scene.
Liquid AI was founded by former researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) with an ambitious goal: to develop AI architectures that transcend the limitations of the widely used transformer model. Their flagship innovation, the Liquid Foundation Models (LFMs), are rooted in principles derived from dynamical systems, signal processing, and numerical linear algebra. This foundational approach yields general-purpose AI models adept at handling diverse data types, including text, video, audio, time series, and other sequential information. Unlike conventional architectures, Liquid’s methodology aims to achieve comparable or superior performance with substantially fewer computational resources, enabling real-time adaptability during inference while minimizing memory demands. This makes LFMs well-suited for both extensive enterprise applications and resource-limited edge deployments.
Further solidifying its platform strategy, Liquid AI introduced the Liquid Edge AI Platform (LEAP) in July 2025. LEAP is a cross-platform Software Development Kit (SDK) designed to simplify the process for developers to run small language models directly on mobile and embedded devices. It offers operating system-agnostic support for both iOS and Android, integrating seamlessly with Liquid’s proprietary models as well as other open-source small language models (SLMs). The platform includes a built-in library featuring models as compact as 300MB, small enough for modern smartphones with minimal RAM. Its companion application, Apollo, empowers developers to test models entirely offline, aligning with Liquid AI’s emphasis on privacy-preserving, low-latency AI. Together, LEAP and Apollo underscore the company’s commitment to decentralizing AI execution, reducing reliance on cloud infrastructure, and enabling developers to craft optimized, task-specific models for real-world scenarios.
The technical design of LFM2-VL incorporates a modular architecture, combining a language model backbone with a SigLIP2 NaFlex vision encoder and a multimodal projector. The projector itself features a two-layer MLP connector with pixel unshuffle, an efficient mechanism that reduces the number of image tokens and enhances processing throughput. Users have the flexibility to adjust parameters, such as the maximum number of image tokens or patches, allowing them to fine-tune the balance between speed and quality based on their specific deployment needs. The training process for these models involved approximately 100 billion multimodal tokens, sourced from a combination of open datasets and in-house synthetic data.
In terms of performance, the LFM2-VL models demonstrate competitive benchmark results across a spectrum of vision-language evaluations. The LFM2-VL-1.6B model, for instance, achieved strong scores in RealWorldQA (65.23), InfoVQA (58.68), and OCRBench (742), while maintaining solid performance in broader multimodal reasoning tasks. During inference testing, LFM2-VL recorded the fastest GPU processing times in its class when subjected to a standard workload involving a 1024x1024 image and a brief text prompt.
The LFM2-VL models are now publicly available on Hugging Face, accompanied by example fine-tuning code accessible via Colab. They are fully compatible with Hugging Face transformers and TRL. These models are released under a custom “LFM1.0 license,” which Liquid AI describes as being based on the principles of Apache 2.0, though the complete license text has yet to be published. The company has indicated that commercial use will be permitted under specific conditions, with differing terms for businesses above and below $10 million in annual revenue. With LFM2-VL, Liquid AI aims to democratize access to high-performance multimodal AI, making it viable for on-device and resource-limited deployments without compromising on capability.