Using Buildkite for scaling out and running parallel CI steps
Learn how Redpanda engineers use Buildkite and GitHub to automatically trigger multiple instances of CI steps running in parallel.
At Redpanda, we want to always provide an experience that is fast, simple, and productive for developers. That applies to our own team of engineers, too. When considering how we could achieve a more stable continuous integration (CI) pipeline, we wanted that same experience: fast, simple, productive. By running multiple instances of our pipeline steps in parallel on our CI platform, Buildkite, we can now run multiple repetitions of the same Buildkite step and use only the amount of time needed for a single step.
Today, our devs can kick off any number of builds in parallel simply by attaching a label to their PR like “ci-repeat-X.” In the rest of this post, I’ll discuss how we made this easy dev experience possible. I discuss how we achieve repeatable builds by taking advantage of Buildkite’s parallelism
attribute and pre-command
hook, in combination with GitHub labels on pull requests for triggering parallel builds.
Buildkite parallel programming
When Buildkite introduced a new feature to run multiple repetitions of a build step in parallel, we took advantage of this by adding an attribute in our CI pipeline configuration called parallelism. We use this attribute to define the desired level of parallelism. We started off using a constant value of 1.
However, the challenge is to have the parallelism
value configurable so that users can enable/disable it whenever they want, providing a value of their choice that represents the number of the parallel instances per step. Ideally, we want to grant the ability to developers to configure this number outside Buildkite’s context. A good candidate for that is GitHub, but we need a “bridge” between it and Buildkite. The bridge cannot be configured in a step’s command querying GitHub’s pull request because it would be too late to configure the parallelism attribute of a step at runtime. In seeking a way to do this, we discovered Buildkite’s pre-command
hook.
Buildkite pre-command
Buildkite includes hooks that we can enable in order to have them automatically executed before a step’s command is initiated (pre-command
), or after a step run (post-command
). We took advantage of the pre-command
hook to discover the value that the user wants to configure as the parallelism
value. By doing this, we created a way to run any bash script we want before a pipeline’s step gets executed. This means that we can tweak a variable in the pre-command
hook in order to update the parallelism
attribute of a Buildkite step.
Having done this, we addressed the next natural question: what is the most productive process for users to follow in order to update this variable when opening a pull request? Our options were:
- Comment on the pull request (e.g.
/hey-buildkite repeat 5
) - Edit a file to update the value and push the code
- Add a GitHub label (e.g.
ci-repeat-5
)
Our choice trails are:
- Productive
- Easy-to-use
- Clean
If we go with choice number one, we will end up having a big pull request conversation with many scattered comments that clutter up what should be a conversation between developers about a pull request. Thus, we didn’t select this option because it violates the second and third trails.
For choice number two, we would have to answer the questions:
- What happens when we want to merge the PR?
- Do we want our default branch to be based on this file and run in parallel? (If so, what’s the impact on our cost?)
Thus, the questions raised by option two also suggested it wasn’t the best course to take. Besides, it violates the second choice trail because the user has to push code each time they want to update the request level of parallelism.
So, we decided to go with the third and best choice: add a GitHub label. Using this process, users who desire to run their PR tests in parallel need only to add a label in their PR and rebuild the pipeline.
The workflow
The parallelism
attribute is set in each Buildkite step of the pipeline.yml
configuration. Its value is dynamically provided via an environment variable called PARALLEL_STEPS
. We just have to modify this environment variable using the pre-command
hook.
We wrote a script to run before the steps are loaded into Buildkite that queries the GitHub API. This allows us to get the labels of this PR (Buildkite provides the PR number as environment variable BUILDKITE_PULL_REQUEST
) and match those against the pattern ci-repeat-NN
. Thus, we have the whole workflow ready: the hook queries and gets the specific label, discovers the number, and exports it as the environment variable PARALLEL_STEPS
.
What about the cost? Shouldn’t we require users to delete this label after their job is done? Otherwise, won’t every commit have Buildkite run multiple steps in parallel? As mentioned, we aim to increase developer productivity. Requiring users to delete the label after the job is a manual step, and we avoid these as much as we can. When the pre-command
discovers the label, then it’s useless to keep it on the PR, so the bot we’re using can delete it. Thus, we decrease the manual steps required of the developer and improve the cost, just by deleting a label.
Building with DevProd in mind
In summary, our process for running multiple instances of CI steps in parallel was created with developer productivity in mind. By parallelizing and running multiple instances of CI steps on Buildkite, we decreased our build’s total running time and improved the stability of CI testing in Redpanda.
Learn more about Redpanda and download our binary on GitHub. Interact with our developers directly by joining our Slack Community to ask questions about our CI steps or anything else. For more information about Redpanda and its features, browse our documentation.
Let's keep in touch
Subscribe and never miss another blog post, announcement, or community event. We hate spam and will never sell your contact information.