Memp: Boosting LLM Agent Efficiency with Lifelong Procedural Memory

Marktechpost

Large language model (LLM) agents have advanced significantly, demonstrating impressive capabilities in handling intricate tasks from web research and report generation to data analysis and multi-step software workflows. Despite these strides, a critical limitation persists: their struggle with procedural memory. Unlike humans, who instinctively build and reuse routines from past experiences, current LLM agents often possess procedural knowledge that is rigid, manually hard-coded, or deeply embedded within their model weights. This inherent inflexibility makes them remarkably fragile; unexpected disruptions, such as network outages or user interface changes, can necessitate a complete restart of their operations. Existing frameworks offer structural abstractions but largely leave the optimization of memory lifecycles unresolved, preventing agents from systematically building, refining, and reusing learned procedural skills.

Memory is fundamental to the functionality of language agents, enabling them to recall past interactions across short-term, episodic, and long-term contexts. While contemporary systems employ techniques like vector embeddings, semantic search, and hierarchical structures for information storage and retrieval, the effective management of memory—particularly procedural memory—remains a significant hurdle. Procedural memory is crucial for agents to internalize and automate recurring tasks, yet the strategies for its construction, updating, and reuse have been largely underexplored. Similarly, while agents learn from experience through methods such as reinforcement learning, imitation, or replay, they frequently encounter issues of low efficiency, poor generalization, and the tendency to forget previously learned information.

Addressing these challenges, researchers from Zhejiang University and Alibaba Group have introduced Memp, an innovative framework designed to equip agents with a lifelong, adaptable procedural memory. Memp fundamentally transforms past operational trajectories into both granular, step-level instructions and more abstract, higher-level scripts. Crucially, it provides systematic strategies for memory construction, retrieval, and continuous updating. Unlike static approaches that fix knowledge, Memp dynamically refines its memory through a cycle of addition, validation, reflection, and discarding outdated information, thereby ensuring relevance and efficiency. Comprehensive testing on two distinct environments, ALFWorld and TravelPlanner, demonstrated that Memp consistently improved task accuracy, significantly reduced unnecessary exploratory actions, and optimized the use of computational tokens. A particularly notable finding was Memp’s ability to transfer procedural memory built from more powerful models to weaker ones, resulting in substantial performance boosts for the smaller systems. This underscores Memp’s capacity to enable agents to learn, adapt, and generalize effectively across diverse tasks.

When an agent interacts with its environment, executing actions, utilizing tools, and refining its behavior over multiple steps, it effectively operates within a Markov Decision Process. Each interaction generates states, actions, and feedback, forming trajectories that also yield rewards based on task success. However, without an efficient memory system, agents tackling new tasks in unfamiliar environments often waste computational steps and tokens by repeating exploratory actions already performed in earlier, similar contexts. Inspired by the human ability to recall and reuse learned procedures, Memp equips agents with a dedicated memory module that stores, retrieves, and updates this crucial procedural knowledge. This enables agents to leverage past experiences, drastically reducing redundant trials and enhancing overall efficiency in complex, multi-step tasks.

The experiments conducted on the TravelPlanner and ALFWorld datasets provided compelling evidence. Storing trajectories, whether as highly detailed steps or as abstract scripts, demonstrably enhanced accuracy and curtailed exploration time. Retrieval strategies based on semantic similarity further refined the utility of this memory. Concurrently, dynamic update mechanisms—including validation of new information, adjustment based on feedback, and reflection on outcomes—allowed agents to correct errors, discard obsolete knowledge, and continually hone their skills. The results clearly indicate that procedural memory not only boosts task completion rates and operational efficiency but also facilitates effective knowledge transfer from more robust models to less capable ones, providing smaller systems with significant performance gains. Interestingly, while scaling memory retrieval generally improved outcomes, there was a point beyond which excessive memory could overwhelm the agent’s contextual understanding, paradoxically reducing effectiveness. This highlights procedural memory as a potent pathway to making artificial agents more adaptive, efficient, and akin to human learning processes.

In essence, Memp is a task-agnostic framework that elevates procedural memory to a core optimization target for LLM-based agents. By systematically engineering strategies for memory construction, retrieval, and dynamic updating, Memp empowers agents to distill, refine, and reuse their past experiences, leading to improved efficiency and accuracy in long-horizon tasks such as those found in TravelPlanner and ALFWorld. Unlike static or manually engineered memory systems, Memp evolves dynamically, continuously updating and discarding outdated knowledge. The observed outcomes consistently show steady performance gains, more efficient learning, and even transferable benefits when memory is migrated from stronger to weaker models. Looking ahead, the integration of richer retrieval methods and advanced self-assessment mechanisms promises to further bolster agents’ adaptability and performance in complex, real-world scenarios.