Learning’s from writing a Kubernetes Operator as a Solution Architect
As a Kubernetes Solution Architect, you are rarely tasked with writing a Kubernetes operator — typically, that’s the domain of platform engineers. However, if you find yourself with such an opportunity and some Go skills in your toolkit, it can be an incredibly fun, challenging and rewarding journey.
During my time at Giant Swarm, I lead the effort on standardizing Teleport as a unified access solution for all managed infrastructure, including Kubernetes clusters and cluster nodes (SSH).
Alongside infrastructure access, we also conducted PoC around using Teleport for accessing internal web applications such as Grafana dashboard, Backstage UI. The upcoming Teleport v16 feature called “VNet” looked promising to fulfill that need. However, at the time of this writing, it’s only Mac compatible and still in active development by folks at Teleport.
We soon realized that integrating Teleport into our managed Kubernetes offering based on Cluster API (CAPI) required more than simply deploying a highly-available self-hosted Teleport cluster. We needed a custom Kubernetes operator that could dynamically manage the registration of Kubernetes clusters and cluster nodes with the Teleport cluster.
At GiantSwarm, customers, engineers and cluster test suites spins up many Kubernetes clusters every day. As such automating registration of these newly created clusters with Teleport is a critical part of being able to access those clusters, along with ability to SSH into cluster nodes for troubleshooting and debugging needs.
This led to the development of a custom Kubernetes operator, not to be confused with Teleport Kubernetes Operator by Teleport, which we use in our CI/CD pipeline to manage Teleport-specific resources such as Roles, Bots, etc.
Here’s what I’ve learned helping develop this custom Kubernetes Operator, which we simply called teleport-operator.
1. Go Is Approachable
Go is a nice approachable language, compared to say Rust. Having written a couple of tools in Go, such as ssl-handshake and cleanup-aws-access-keys, I found Go to be intuitive and enjoyable to work with. Go’s simplicity, clear syntax, and robust standard library make it a de facto language for developing Kubernetes operators.
Framework such as Kubebuilder simplifies the process of bootstrapping a Kubernetes operator from scratch and significantly reduces the boilerplate code and accelerate the development process. Even if you’re new to Go, you can quickly pick it up and get up to speed solving problems.
2. Leverage AI for Boilerplate Code
Writing an operator involves managing Kubernetes resources like ConfigMaps and Secrets. These are common tasks that can consume a lot of time.
GenAI tools such as ChatGPT, Claude, or even Perplexity AI (in favor of Stack overflow) can help you assist in generating boilerplate code, accelerating your development and enabling you to focus on solving the core problem your operator addresses. This makes the development process more enjoyable and productive.
The future is AI-driven programming, so having a foundational understanding is key to getting started effectively. Make sure to grasp the fundamentals of Go before diving into writing operators.
3. Testing Is Crucial
After implementing the logic, writing unit tests is essential to ensure your operator behaves as expected.
Go’s built-in support for unit testing along with framework such as Ginkgo simplifies this process by providing a structured approach to testing your operator.
This is especially important for mission-critical operators like teleport-operator. If it fails, clusters won’t register with Teleport, leading to frustrated engineers and disrupting their productivity. Thorough testing can save you from a lot of headaches down the line.
4. Collaborate and Seek Help
Don’t develop in isolation. Reach out to fellow experienced platform engineers who have written operators before. Their insights can help you navigate challenges and avoid common pitfalls.
5. Release Frequently and Iterate
Aim for a minimal viable operator to begin with that covers the basic logic. Avoid bundling all functionality into a single bigger release.
Smaller, iterative releases make code reviews manageable, reduces the cognitive load on reviewer and foster quality reviews. This iterative feedback loop improves the code quality over time and is key to building a robust and stable Kubernetes operator.
6. It’s All About Problem-Solving
Despite being a Solution Architect, writing an operator felt like a natural extension of my role. Diving into Kubernetes internals, client APIs, Teleport APIs, and brain storming with fellow platform engineers to solve hard problems to meet business requirements is both fun and rewarding.
You learn a ton in the process and gain a deeper understanding and appreciation for the Kubernetes ecosystem. I think, this is crucial for any architects to grasp the bigger picture inside out, and be a well-rounded engineer.
Final Thoughts
Writing a Kubernetes operator is fun, challenging and rewarding. If it directly integrates into your company’s product, that adds a layer of excitement and responsibility to ensure operator is fully tested, stable and performant.
This experience for me reinforced the importance of understanding the “nuts and bolts” of systems as a Solution Architect. It’s not just about designing solutions — it’s about building them too.
As Richard Feynman famously said: “What I cannot create, I do not understand.”