Apache Flink 2.1.0 Integrates AI for Real-Time Decision-Making
The Apache Flink Project Management Committee (PMC) has announced the release of Apache Flink 2.1.0, a significant upgrade to its real-time data processing engine. This latest version introduces robust support for defining, managing, and invoking AI models in real time, laying the groundwork for end-to-end, real-time AI workflows.
A central feature of Flink 2.1.0 is its enhanced AI capabilities. Users can now define and manage AI models programmatically through the Model DDL (Data Definition Language) Table API, available for both Java and Python. This offers a flexible, code-driven approach to integrating and managing models within Flink applications. Complementing this, the ML_PREDICT
table-valued function has been expanded, enabling seamless, real-time model inference directly within SQL queries. This allows machine learning models to be applied to data streams as they arrive. The implementation supports Flink's built-in model providers, such as OpenAI, and offers interfaces for users to define custom model providers, marking a strategic shift for Flink towards becoming a unified real-time AI platform.
Beyond AI integration, Apache Flink 2.1 introduces Process Table Functions (PTFs), described by the PMC as the most powerful type of function for Flink SQL and Table API. PTFs serve as a superset of all other user-defined functions, capable of mapping zero, one, or multiple input tables to zero, one, or multiple output rows. This functionality empowers users to implement sophisticated custom operators that can rival the feature richness of built-in operations, with PTFs having access to Flink’s managed state, event-time processing, table change logs, and timer services.
Another notable addition in Flink 2.1 is the VARIANT
data type, designed to improve handling of semi-structured data like JSON. This new type allows for storing any semi-structured data, including arrays, maps (with string keys), and scalar types, while preserving field type information in a JSON-like structure. Unlike the ROW
and STRUCTURED
types, VARIANT
offers superior flexibility for managing deeply nested and evolving schemas. Users can convert JSON-formatted string data to VARIANT
using the PARSE_JSON
or TRY_PARSE_JSON
functions.
Further enhancements in Apache Flink 2.1 include:
- The introduction of a
DeltaJoin
operator for stream processing jobs, accompanied by optimizations for simpler streaming join pipelines. - Added support for the Smile binary format for compiled plans, providing a memory-efficient alternative to JSON for serialization and deserialization.
- A new pluggable batching mechanism for Async Sink at runtime, allowing users to define custom batching write strategies tailored to specific requirements.
- A new connector for keyed state, which enables users to query keyed state directly from a checkpoint or savepoint using Flink SQL. This simplifies the process of inspecting, debugging, and validating the state of Flink jobs without requiring custom tooling.
These updates collectively reinforce Apache Flink’s position as a leading real-time data processing engine, now with significantly expanded capabilities for AI-driven applications and improved flexibility for diverse data types and operational needs.