Optimizing Monorepo Merges: Strategies for Green Builds

Thenewstack

For many small software development teams or those managing multiple independent code repositories, the act of merging new code changes into the main codebase seems straightforward: a simple click of a button. Yet, this seemingly trivial task transforms into one of the most significant bottlenecks in the software delivery pipeline for large organizations operating within a single, shared codebase, commonly known as a monorepo. Here, dozens or even hundreds of engineers contribute simultaneously, escalating the complexity of integration.

While the debate between monorepos and polyrepos is extensive, it’s clear that monorepos offer distinct advantages, such as streamlined vulnerability identification and patching, easier cross-project refactoring, and consistent tooling and shared libraries across diverse projects. However, these benefits come with inherent challenges. Developers frequently encounter stale dependencies due to pull requests (PRs) based on outdated main branches, subtle conflicts arising from simultaneous work on similar code, and persistent infrastructure issues like timeouts. Furthermore, managing internal and third-party dependencies becomes complex, and shared states can lead to inconsistent, “flaky” test behaviors. As an engineering organization scales, these challenges multiply exponentially, often leading to developers spending unproductive hours simply waiting for build processes to complete successfully.

To mitigate these escalating delays, modern development workflows have increasingly adopted merge automation tools, such as GitHub Merge Queues, GitLab Merge Trains, and similar solutions. These systems fundamentally change the game by introducing automated gates that regulate the flow of changes into the main codebase. The process typically involves a developer marking a pull request as ready for integration. The system then automatically rebases this PR onto the very latest version of the main branch. A continuous integration (CI) process is then triggered in this updated context. If the CI checks pass, the system proceeds to merge the changes. Critically, if new PRs arrive while a CI run is in progress, they are queued and await their turn, ensuring an orderly and validated integration sequence.

While merge queues provide a foundational solution, the sheer volume of changes in large monorepos necessitates further optimization. One common strategy is batching pull requests. Instead of processing one PR at a time, several are grouped together and then subjected to a single CI run. This approach significantly reduces the total number of CI executions and shortens overall waiting times. If the CI process for a batch succeeds, all included PRs are merged simultaneously. Should a failure occur within a batch, the system can systematically “bisect” the batch to pinpoint the problematic PR, allowing the remaining successful changes to proceed. While batching can dramatically reduce merge times under ideal conditions—for instance, hypothetically cutting a 50-hour wait down to 12.5 hours with a batch size of four—real-world scenarios with even a modest 10% failure rate can extend the total merge time considerably, potentially doubling it, and increasing the number of CI runs due to repeated processing.

To push efficiency further, optimistic queues introduce a paradigm shift, moving from a serial processing model to a more parallel approach. Rather than waiting for one pull request’s CI process to fully complete, the system optimistically assumes it will pass. It then creates an alternate mainline branch to immediately begin the CI process for the next pull request in the queue. If the initial PR passes its CI, it merges into the main branch; similarly, the subsequent PR merges upon its successful completion. If the first PR fails, the alternate mainline is discarded, and a new one is created without the problematic change, allowing the validation process to restart for the remaining PRs. Combining this “optimistic” approach with batching leads to optimistic batching, where entire groups of PRs are processed in parallel, with failures leading to a split-and-identify mechanism.

Another advanced technique is predictive modeling. This involves analyzing historical data and characteristics of pull requests—such as lines of code changed, file types modified, or the number of dependencies—to calculate a score indicating the likelihood of success or failure. By leveraging these predictions, the system can prioritize or reorder PRs, focusing CI resources on paths most likely to succeed, thereby reducing overall CI costs and accelerating merges.

For truly massive monorepos, a single queue can still become a bottleneck. This is addressed by multi-queues and affected targets. Modern monorepo build tools like Bazel, Nx, or Turborepo can precisely identify which parts of the codebase are impacted by a specific change. This intelligence allows the system to group pull requests into independent, parallel queues based on the “affected targets.” For example, if a system produces four distinct build types (A, B, C, D) and incoming PRs only affect a subset of these, separate queues can be established for each build type. This ensures that unrelated changes do not block each other, significantly speeding up the overall integration process by allowing concurrent execution of independent builds.

Beyond these queue-based strategies, other complementary optimizations enhance workflow efficiency. Reordering changes involves prioritizing pull requests with a lower risk of failure or higher business importance, placing them earlier in the queue to minimize cascading failures. More complex or uncertain changes are scheduled later. The “fail fast” principle dictates prioritizing the execution of tests most likely to fail early in the CI process, ensuring problematic changes are identified and addressed quickly. Finally, splitting test execution can involve running a set of fast, critical tests pre-merge to catch common issues, while more extensive or slower tests (like integration or smoke tests) are executed post-merge. In the rare event of a post-merge failure, an automated rollback mechanism can mitigate the risk.

Ultimately, the sophisticated orchestration of these merge strategies aims to strike a crucial balance: maintaining high reliability and code quality while simultaneously maximizing release velocity. Beyond merely saving CI cycles, advanced merge automation significantly reduces developer waiting times, accelerates the delivery of features, and preserves the sanity of engineering teams navigating the complexities of large-scale software development.