How Flux and Pulumi give each other superpowers

This is a talk I gave at the Weave User Group, back in March 2023. The source code for the demo is all in https://github.com/squaremo/flux-plus-pulumi.

Hi, I’m Michael Bridgen, I worked on Flux for quite a while, then I worked on Pulumi for a bit, and I’m going to talk about how those two things, Flux and Pulumi, can be combined to great effect.

Since you’re here, you are probably familiar with Flux, but you may not be familiar with Pulumi. So I’ll start by talking a bit about what Pulumi is and what it’s good at. I have some things to say about what Flux is good at too! But later.

Pulumi is a product that self-identifies as “Infrastructure as Code”. This means, simply, that you write out a description of your infrastructure – virtual machines, databases and so on, on AWS, Azure, Google Cloud and so on – as a program, and Pulumi will run that program and create, update and delete the infrastructure as necessary. If this reminds you of Terraform, then you are absolutely on the right page. The big difference is that Terraform is based on its own special language – HCL – while Pulumi uses general purpose programming languages like JavaScript, Go, Python and some others. If you want to read or even participate in comparisons, I can recommend Reddit.

Here is a Pulumi program. It looks just like a regular JavaScript program – here’s some imports, I assign some new objects to variables here, and I refer to fields of the objects when creating more new objects. This all represents a description of some infrastructure.

The clever bit that Pulumi does is to register every time you create a new object representing some bit of infrastucture – even this secret here – then, once your program is finished, it goes away and figures out what needs to be created, deleted, updated or replaced so that it all exists as described. Where there are dependencies between objects, like using a value that is provided by the platform, it orders things such that the dependencies have been realised before doing the next bit. It also knows what can be updated in place when things change, and will do that where possible, to avoid causing downtime.

So far so good! You can write a Pulumi program in YAML. This, to me, is really neat, because it is a representation as data. Or, more practically: anything that looks like YAML can probably go into a Kubernetes resource. Some foreshadowing there!

Here’s a Pulumi program written as YAML. You can see here it’s doing a similar thing to the JavaScript program, putting objects (like this service account) against variables (“default”). And here, I’m referring to a field of a resource when constructing another resource: ${default.email}. It gets interpolated into that field value.

Back up here I’m generating a service account key, and making that available for another program to use by putting it as an “output”. Later we’ll see that this is crucial part of being able to layer your infrastructure definitions – you make a security context, then provide that as the environment for the next layer.

I said I was going to talk a bit about what makes Flux special. In my opinion, it’s not so much what it does, but what it encourages you to do. And that is to structure the description of your system. In particular, into layers.

Diagram of layering Flux syncs

What I mean is this: mechanically, Flux gives you sources and (for want of a better term) appliers, so GitRepositorys and Kustomizations, say. If you want to apply things from different places, you have to introduce “hops” – you put the definition for another GitRepository or whatever in the git repository you were already syncing.

Sometimes I have thought of this as more bookkeeping and complication, and it can be that; but it’s also an opportunity, and an encouragement, to layer your system. Perhaps you are the platform team and you are handing over control of a namespace to an app team. In the first layer, you set up the environment for the next layer – the namespace and a service account and RBAC objects, say – and a sync for that next hop, which will be run in the security context you set up for it. We’ll see how that idea plays out on a larger scale in the demo, in a bit.

The other big win with Flux is that it gives you points of integration for continuous delivery. Taking the system description from git or OCI artifacts and applying it gives you something to tie automation and human processes to. The obvious one is the original byline for GitOps: “operations by pull request”. Reviewing pull requests ties a human-level workflow to what happens in git, and Flux ties what happens in git to what happens in the cluster. Now you have a human-level workflow, with built-in audit and rollback, around what happens in your Kubernetes cluster! That’s GitOps in a nutshell.

Diagram of how config flows through git to Flux to Kubernetes

But there are more integration points: working with git and OCI artifacts means you can use the nice code signing and security scanning features that come with git- and artifact-hosting these days. Validate your system description against policies, before you package it as an OCI artifact, which you sign. Only then is it considered fit to be deployed by Flux. You might move the same artifact through a series of environments, as it passes further checks. I think Weave GitOps builds this kind of thing in.

This brings us to the first Flux+Pulumi combo: you can use Flux sources for your Pulumi programs, and get those pipelines going. If your Pulumi program uses an NPM package with a CVE against it, the continuous delivery pipeline can tell you about it before it gets run in the cluster; and as a backstop, Flux will refuse to provide the source code to Pulumi if it’s not signed.

The thing that makes this combo possible is the Pulumi Kubernetes operator, or Pulumi operator (since we’re talking with respect to Kubernetes here). This is a Kubernetes operator, like the controllers Flux uses, which runs Pulumi programs from within a Kubernetes cluster. It runs in Kubernetes but the programs don’t have to be about Kubernetes. They can describe infrastructure in AWS, or Google Cloud, or Azure, or any of the hundred or so platforms and APIs available to use with Pulumi programs.

And this is the neatest thing: you can supply Pulumi programs as Kubernetes resources, so they can be synced by Flux.

Let me replay that from another angle: with the Pulumi operator, Flux can sync Kubernetes resources that describe all sorts of infrastructure, possibly all your infrastructure.

Now you can have human-level workflows, with built-in auditing and rollback, around what happens in your entire infrastructure! That’s GitOps, in a nutshell. For a much larger nut.

Diagram of how config flows through git to Flux to Pulumi

Let’s have a look at this in action. Demo time!

I want to get to the point where I can describe an application and its required ex-Kubernetes infrastructure, as Kubernetes resources synced by Flux. That means my app team can describe their app in terms of Deployments and Services, but also the database that holds the data. The app team is going to run … Wordpress on Google Cloud!

Before we can run Flux or the Pulumi operator, we have to have a Kubernetes cluster. There’s a bootstrapping part of this, and you could do it many ways, but it bottoms out at provisioning a Kubernetes cluster. I’ll do that with a Pulumi program, here it is again.

This sets up the cluster, and creates a service account, with a key which I’m going to give to the Pulumi operator so it can create things in Google Cloud. There’s that idea of creating a security context in which the next layer runs – I run this under my superuser permissions, but I create a principal with fewer permissions for the next layer to use. Actually it’s got pretty broad permissions, because I was spending too much time going back and forth to the Google Cloud docs. But you get the idea.

I’m also creating a CloudSQL instance here, because it takes ages and I don’t want to have to wait around for it.

This is what happens when I run the program (pulumi up --show-sames). It’s all already there, I ran it yesterday, so there’s nothing to do.

I construct a kubeconfig for the cluster to use later; it’s this big interpolated string here. It gets encrypted before it’s sent anywhere, and decrypted when it’s used in a program again.

There’s a second bit of provisioning that has to occur before Flux is running the show – that’s installing Flux itself. I could do a manual flux bootstrap, but I’m trying to stay within Pulumi. I can’t do that with YAML, for arcane and hopefully temporary reasons, otherwise it could go in the same YAML file as the cluster. But here, we have that JavaScript program again. It uses a Pulumi SDK that wraps the Terraform provider to generate the appropriate YAMLs for installing Flux, then it applies them.

This relies on the kubeconfig generated before – you can see here a StackReference, which is used to fetch the output “kubeconfig”. Again there’s this idea of creating a context for the next layer and handing it onward.

You can see here some secrets that various bits to come later will use for authenticating to the Pulumi service and for authenticating to Google Cloud.

The program also creates a GitRepository and Kustomization to sync whatever else I put in this repo under the directory sync/. It’s another “hop” – I could create a namespace and service account here, and restrict what the objects in sync/ can do.

The syncs themselves are here in sync/. This is where the Pulumi operator appears, finally. Here’s a GitRepository for it, and a Kustomization for the CRDs which go first, and the operator, which needs the CRDs defined.

I could have put it earlier, in the previous program. But just applying the YAMLs is actually the simplest way to do it! And this way, now we’re in Flux territory, and everything happens through Flux.

Let’s run this one.

pulumi up

You can see it calculates, and reports, all the Kubernetes bits that are generated by the Flux provider. It has some idea of what depends on what, and when they are ready to use, so you can see it doing lots of things in parallel, with some things being held back or retried.

In the end it makes it. Hooray!

We can now look at the cluster through the lens of Flux.

flux get source all

That shows the original git repository, and the git repository that’s defined within to sync the Pulumi operator. Is that running? Let’s have a look:

flux get kustomization

There it is. And did it make anything?

kubectl get deploy

There’s the Pulumi operator. Let’s give it something to do.

Here are the app teams YAMLs: app-infra.yaml and wordpress.yaml. In app-infra.yaml there’s a Pulumi program. This is basically the Pulumi YAML you saw before, with some Kubernetes bits wrapped around it. The Stack definition tells the operator how to run the program. Notably, it provides the Google Cloud service account key from before, so the program can actually create things in Google Cloud.

Eagle-eyed audience members might spot that I reference the bootstrap stack, as I did before, and might reason that if I can do that, can’t I escape whatever restrictions were created for me in the layer below? The answer is Yes – that’s just me being hasty. Strictly these should go in separate Pulumi accounts, so you can’t just reach over layers like that.

In the program, it defines a database and a user for the database, and sticks the user’s name and password into a secret for the Deployment to use. This is a trapeze act that Kubernetes users will be familiar with.

Lastly, wordpress.yaml has a regular old Deployment. It refers to the secrets created by the program in the usual ways (as env entries, and mounted as a volume).

I’m going to cheat a little and just apply these directly, rather than committing them to git and letting Flux do it. That’s because I had to do it again and again when making this demo, and it’s a pain to reset everything.

kubectl apply -f app-infra.yaml -f wordpress.yaml

What happens? We can check on the Stack. It should succeed pretty quickly, because another cheat, all the stuff like the database exists already and it’s just ensuring it’s up to date.

kubectl get stacks -w

And what about the mighty app?

kubectl get deploy wordpress

There it is! And, we can port forward to it, and see what it looks like in the browser.

kubectl port-forward deploy/wordpress :80

Plot twist! If I had more time I would debug it and check in the necessary changes for Flux to sync, and it would work.

What did we learn?

Flux can work with the Pulumi operator to hook it into continuous delivery pipelines. Pulumi can work with Flux to extend the scope of your GitOpsing to, potentially, your whole infrastructure. Win win!

Are there any …. caveats?

Yes, of course there are. As always, it’s quite tricky trying to make things happen at a distance.

Thanks everyone, and sorry you didn’t get to hijack my Wordpress installation.