Feb 15, 202110 min read

Too Much Code for Bazel Monorepo? Try Going Virtual

Updated: Aug 17, 2021

Bazel is really optimized for monorepo. Yet, at Wix we are able to run it with source dependencies between 50 interdependent git repositories. Read on to learn just how we do it. Photo by Evan Wise on Unsplash: A clonal colony of an individual male quaking aspen determined to be a single living organism by identical genetic markers and assumed to have one massive underground root system (wiki)

Background

As a source-dependency build tool with aggressive caching mechanisms, Bazel is optimized for building one big monorepo. The more interdependent repositories you have, the more complexity is introduced into build triggering and management of external repo dependencies (source).

The problem is that Git, which developers love and which Wix uses, does not support very large monorepos.

This in turn means that with Bazel, there is a need to manage a “virtual” monorepo that includes more than one repo with source code dependencies between them. In this post I will share how we managed to do it with 50 interdependent Git repositories.

Motivation Monorepo

When it comes to consuming "2nd party libraries", which are essentially code that was built inside a different unit but still internally, Wix backend has always believed in the "latest consumption" approach more than in the versioned one. The goal with this approach is to avoid version drift which may be forcing us to support a long tail of versions and likely introduce special hotfixes.

Of course choosing "latest consumption" requires a very efficient continuous integration system that makes sure all components are always built against the latest version of their 2nd party dependency. In a regular monorepo built with Bazel - this task is made easy. We identified a few amazing advantages of Bazel:

Real latest consumption - unlike Maven/Gradle SNAPSHOT, with Bazel source dependency and monorepo a change to core library will be consumed by different applications as soon as it's pushed. There's no need to have separate builds for core libraries and applications and configure the triggering chain - everything is built in a single Bazel invocation.
Complete feedback for developers on the branch - in the past it was very challenging for developers to understand the impact of any change they made to other components before merging that change to master. Only after push to master a build would upload their new change to the binary repository and a trigger chain would trigger downstream builds that would consume it. With Bazel it is very much possible to give complete feedback by running all test targets that were affected by a change before it is merged to master.
Complete isolation of feature branch for application - Developers working on a feature branch can have long running work uninterfered by latest binary drops that might force them to lose focus.

Why not Real Monorepo

When we tried to put all of Wix backend code into a single Git repository we noticed that everyday operations like running "git status" become insanely slow (>>10 minutes). There was work done around supporting large monorepo on top of Git by Microsoft (GVFS) but the codebase was not mature enough / did not support our current platforms (GitHub / MacOS). Twitter also has their own internal solution for Git but it was never public. Our only option to have monorepo was to move away from Git.

Why did we stay with Git then?

Keep Developers happy - Moving to Bazel was very challenging on the developer experience front: lack of proper tooling, relatively small community and global knowledge, and the need to learn a whole new way to define your builds. We understood that changing the source control as well would be just too much at that point.
Defer problem to the future - We genuinely believe that in the future we will have better solutions around Git and large monorepos, or that we will manage to get a team that will focus solely on source control and be able to make decisions around choosing the right solutions.

Nevertheless - we wanted to enjoy source dependency and incremental builds as much as possible so we chose to consolidate our > 1000 repositories into 50 big monorepos interdependent with latest HEAD consumption.

Virtual monorepo - requirements

So we can't really use monorepo… bummer. Let's go back to the features that led us to monorepo in the first place and see if we can implement them in some other way. In monorepo all the code of all services, common libraries, APIs, and third party definitions can be found in a single place. This gives us several benefits:

Target visibility - all (public) targets are visible / accessible from any other target inside the repo.
Determinism - The current code state almost exclusively defines the build results: a. Creating a branch means locking the state of your code, as well as 2nd party and 3rd party defs. b. Pulling code from master means updating both your code, 2nd party deps and 3rd party defs.
Efficient CI - Single CI job can give complete feedback on any code change - running any target at Wix that was affected by the change (using Bazel cache will promise correct efficient build).

So can we actually get all those 3 things without really having monorepo?

Visibility - Bazel external repository

Bazel actually has a built-in mechanism for external repos source dependency.

Given that we have two Bazel repositories - fruits.git and milkshake.git - milkshake can depend on fruits by defining the following in its WORKSPACE file:

Then any BUILD file in milkshake repo can reference targets from fruits repository:

So how does it help us? For simplicity let's assume that we have 3 repositories: ⬝wix-a.git, ⬝ wix-b.git and ⬝ wix-c.git.

Solution Part1: 2nd party macro file

We can now introduce "2nd party macro file" `wix_2nd_party.bzl` containing the following code:

Determinism - external repos management scripts

So this kinda solves requirement [A] Visibility. But some questions remain:

Is this file committed to each repo?
How do we keep the "commit" attribute up to date?

The bad choice - git-aware 2nd party macro file

Let's assume we decide to make the file git-aware. This would also answer [B] deterministic.

Every push to any of the git repositories will trigger automation that will update that file in all 3 repositories.
The automation will open a PR and pre-submit checks will validate that this change didn't break any targets in the current repository.
There will be certain solutions to avoid infinite loop (Updating the commit attribute effectively creates another push → triggers automation → Update the commit attribute)

Even if we manage to solve (3) - this solution would flood the commit history with tons of mechanical commits, and the problem will grow bigger as more and more developers join in. We dropped this option almost immediately.

Solution Part2 - external repos management scripts

So let's take the 2nd party file out of Git. So how can it be created? Here comes the second part of the solution - 2nd party management scripts - a very thin executable that generates/regenerates the 2nd party macro file with the latest HEAD commits of all repositories**. The scripts can be called in two use cases:

From Bazel wrapper* (tools/bazel) in "create file if it does not exist" mode.
Directly by a user in "overwrite the file" mode.

*Bazel wrapper

The Bazel wrapper script, an optional executable located in <workspace_root>/tools/bazel is a very useful yet, strangely, quite undocumented feature of Bazel. In case this file exists - any invocation of Bazel inside your repo will call the wrapper instead of bazel. The script can do some pre-processing, generate some files, and then call to the real Bazel executable.

I couldn't find the docs for it. But you can have a look at the test.

** How do we know the latest HEAD commits? Simple yet efficient - we created a backend service that receives webhooks from Github and stores the latest commit of any virtual monorepo member in an atomic KV store.

We solved the problem of polluting the Git history with mechanical commits. Super! But we lost Determinism. There's an unmanaged file that affects our builds results. We still need to solve these two:

Locally we chose to solve this by managing a local store of the file per branch. So we had something like this:

So now we answered [a] - when a developer moves to a new branch - we copy the current 2nd party file into a new file that matches that current branch - and that's how we "lock" the code of any 2nd party.

What about [b] ? The most correct solution would be to couple the `git pull` with updating the current 2nd party file. Although we gave tools to users to do it in a single operation (using IDE plugin) - we chose not to force users to update the 2nd party file as often as they pull from master, for various reasons:

One - re-downloading all 2nd party repos was not efficient / fast enough. Two - if upstream build causes mass cache invalidation, the developer has to wait much, much longer for Bazel to finish its work.

At that point we realized we must relax our reproducibility to allow for decent performance.

What about CI builds? We identified two categories: (1) builds that have to work with the latest VMR commits (master builds, pre-merge checks), and (2) those that can accept locked VMR commits ("branch only" builds). By default the folder with the 2nd party files is ignored by git so that users would avoid pushing those mechanical files. But we provided yet another tool that allows developers to "lock" their local 2nd party file - making git aware of it. This way developers could work on the same VMR file when working on the same branch and "branch only" builds would also consider that file.

Pseudo code of the Bazel wrapper file:

So with that we got:

Full target visibility
Partial determinism - the tricky part is to get users to understand the role of the 2nd party file and when to update it.

We think that we can do better with determinism - but for now we understand that we need to live with current setup until we solve the build performance problems (eg: better mechanism to retrieve external code which would not re-download full repos on each new commit + unused deps cleanup + super low latency, always up to date dev cache, and many other crazy ideas).

Efficient CI - Cross-repo checks and trigger all masters

Let's move on to the CI - how do we make sure CI gives complete feedback on each PR? Our problem: Running `bazel test //…` on the current repository will only test affected targets in it. But we may have many more targets that were affected by the change in any of the other 49 repositories.

Observation: Since we use generated 2nd party file and source dependency, we can generate special 2nd party files with modified commits to make the build consume the latest commit of a specific branch (not master).

Question: Which repos should we trigger with this special "cross repo check" mode?

Expensive and naive solution:

Trigger all other repos. If the change didn't cause cache invalidation, you will get a "no-op" build that will be "cheap enough" in terms of time and money. Problem: no-op build still requires us to warm up the build agent, run the analysis phase… Sometimes that alone can be expensive. Also some repos were suffering from flaky tests. We rather not run checks on them, if possible.

Optimal solution:

Create some kind of target determinator. We know that Dropbox and Pinterest already implemented something like that for their real monorepo. A pre-build step to determine which targets are affected by a change. The trick for us is to find the solution for cross repo… (simple `bazel query rdeps` will not return valid answers for external targets that depend on current target).

Since creating our own skygraph always-up-to-date service can be challenging, we decided to defer this choice to later stages.

Solution Part 3 -semi-manual cross-repo CI checks:

Each repo will have `cross_repo_check_configuration` - that will tell our triggering system which other repositories to trigger on each PR. We chose to keep this file in a `.ciconfig` directory in the root, but an alternative solution might have been to allow more granular configuration (allows configuring different x-repo checks based on areas in the code).

Now if a developer really needs full feedback on their branch - they can get the status of running all of Wix code against their change. This was a huge win for our infra developers. For the first time they could get more confidence in merging changes to master.

This actually became a product and can be a great topic for a different engineering blog post.

Complete honesty

The virtual monorepo (VMR) solution allowed us to overcome a huge blocker in migrating Wix from Maven to Bazel. That said, it came with a large mental and technical overhead; we understood that that code is not affected only by the current git repository commit , but by 49 other commits. We had to build many custom solutions around it for both local and CI.

If you ask our engineers what's the problem with VMR they would probably say:

Bad Performance - unlike real monorepo, whenever a commit of an external 2nd party repo changes, you are forced to re-download the whole repo (that's how git_repository is implemented, we moved to `http_archive` for better performance but it's still not satisfying). Also - we have a bigger issue of large cache invalidation due to neglected build file hygiene (but I think that this is not in the scope of VMR, but more questioning whether Bazel is indeed fast enough with super large repos, and how hard we need to work in order to keep it that way).
Mental complexity - If only I had a nickel whenever a developer asked me "when do I need to update the 2nd party file", and a dollar whenever a local build started failing because a developer would forget to update a second party file… No matter how many times we explained this concept - developers ran into issues and forgot about this mechanism.

Want to know more about migrating Wix from Maven to Bazel? Watch this meetup "Principles on How to Build Fast at Scale" and also listen to this podcast's episode:

Summary

If you wanna use Bazel and can put all of your code in a single repository - go for it! You are very lucky! :-) Wix virtual mono repo solution was made possible thanks to:

Bazel support for external source dependencies.
Floating external dependencies management script.
Smart CI system that can support cross repo pre-checks.

The features above mitigated the fact that we had a lot of repositories, but it came with a price of performance <> correctness tradeoff and mental complexity that we are still dealing with today.

If you want to get more details or you solved the same problem differently - feel free to reach out to me. I'd love to hear about it.

Stay tuned for more posts from our devex group on how we do Bazel at Wix.

If you found that interesting and you want to be a part of this challenge? I'm hiring.

Contact me - ors@wix.com

This post was written by Or Shachar

You can follow him on Twitter

For more engineering updates and insights:

Follow us on: Twitter | Facebook | LinkedIn
Join our Telegram channel
Visit us on GitHub
Subscribe to our monthly newsletter
Subscribe to our YouTube channel
Follow our Medium publication
Listen to our podcast on Apple, Spotify or Google