Menu
Building a Kubernetes Platform That Scales From SaaS To Self-Managed - Florian Forster, GitLab

Building a Kubernetes Platform That Scales From SaaS To Self-Managed - Florian Forster, GitLab

CNCF [Cloud Native Computing Foundation]

1 views 2 hours ago

Video Summary

GitLab is evolving its platform to scale from single self-managed instances to its massive SaaS offering. Initially built on a Ruby on Rails monolith, the company faced scalability challenges. To address this, GitLab developed "Runway," an internal platform designed to deploy satellite services independently of the monolith. Runway's first iteration used Cloud Run for rapid development, but this approach wasn't suitable for self-managed customers due to data residency and regulatory concerns. Consequently, GitLab transitioned to Kubernetes as a common platform, utilizing Helm charts for packaging and deployment, aiming for a unified experience across both SaaS and self-managed offerings. A key insight is that while vendor solutions like Cloud Run offer velocity, they can lead to vendor lock-in, necessitating a shift to open standards like Kubernetes for greater portability and scalability.

Short Highlights

  • GitLab's core is a large Ruby on Rails monolith that presents scalability challenges.
  • "Runway" is an internal platform built to deploy "satellite services" around the monolith.
  • Early Runway version used Cloud Run for velocity, but it wasn't viable for self-managed customers.
  • Kubernetes was adopted as a common platform for both GitLab.com (SaaS) and self-managed offerings.
  • GitLab utilizes Helm charts for packaging and deploying services, aiming for consistency.
  • The shift to Kubernetes introduces complexity for operations teams and infrastructure provisioning responsibilities for self-managed customers.
  • GitLab Environment Toolkit (GET) helps self-managed customers provision necessary infrastructure.
  • GitLab's "LabKit" application framework and common CI tasks aim to standardize development practices.

Key Details

The Monolith's Limitations and the Genesis of Runway [01:15]

  • GitLab's core architecture is a large Ruby on Rails monolith that, despite its historical success, is showing signs of scalability friction.
  • To overcome this, GitLab developed "Runway," an internal platform initially focused on deploying services that surround the monolith, termed "satellite services."
  • The initial version of Runway leveraged Cloud Run to prioritize velocity and quickly meet business needs, abstracting away operational complexities for developers.
  • Developers provide a container image, and Runway handles the deployment process, enabling a "push on green" deployment mechanism integrated with CI.

"This monolithic architecture, this Ruby on Rails monolith has served GitLab fairly well and this is why we are where we are at the moment. But it is showing kind of signs of weaknesses..."

The Challenge of Serving Both SaaS and Self-Managed Customers [05:48]

  • GitLab faces the dual challenge of serving its highly scaled SaaS offering (gitlab.com) and its diverse self-managed customer base.
  • Self-managed customers often have specific requirements, such as data residency or regulatory compliance, which make cloud-native solutions like Cloud Run non-viable options.
  • The company recognized that a vendor lock-in decision made for internal development (using Cloud Run) was not transferable to self-managed clients.

"And for those reasons for many of them cloud run is not an option. It's not a preference. It's not the the cost. It's simply not an option for them to use cloud run because it's not an option to kind of hand the data over to another cloud provider multinational like that."

Transitioning to Kubernetes as a Unified Platform [08:08]

  • Kubernetes was chosen as the common platform to serve both the SaaS offering and to assist self-managed customers in running their services.
  • The key advantages of Kubernetes are its open-standard nature, avoiding vendor lock-in, and its scalability to manage vast infrastructure.
  • Helm packages were adopted as a common denominator for packaging, aligning with industry standards and customer familiarity.

"There are two main reasons. The first main reason is it it's an open standard. So you're not tied to any specific vendor to kind of provide the APIs."

The Role of Helm Charts in the New Architecture [09:29]

  • The Kubernetes-based platform introduced an intermediate step of generating Helm charts for each satellite service.
  • This strategy was a conscious decision to plan for eventual self-managed support, ensuring a future-proof approach.
  • Developers maintain a similar experience with CI pipelines building container images that are automatically pushed to Kubernetes clusters for GitLab.com.
  • This unified Helm chart approach allows for a single deployment mechanism for services like the Duo coding agent, enabling earlier bug detection.

"We didn't necessarily need that at the time. So we didn't need that to make the transition just from Cloud Run uh to GKE and EKS. But we made the decision in order to be kind of future proof into to plan for an eventual self-managed support."

Empowering Teams and Evolving Release Cycles [11:20]

  • The new setup necessitates a departure from the past model of long release cycles tied to the monolith.
  • While a "GitLab distribution" (akin to a Linux distribution) will still exist for tested and supported versions, individual teams are no longer constrained by monthly or yearly releases.
  • This grants teams the freedom to move faster and innovate independently.

"The individual teams, they are no longer uh tied to these long release cycles. They can move much faster and they can innovate."

Addressing Self-Managed Customer Needs with Helm and GET [12:31]

  • Ongoing work involves integrating satellite services into a single umbrella Helm chart for self-managed customers, a topic of active internal discussion.
  • The adoption of Kubernetes introduces complexity, requiring operations teams to develop new skill sets and handle infrastructure provisioning responsibilities.
  • For services requiring relational databases or blob storage, self-managed customers must provide and point to their own infrastructure, unlike the SaaS offering.
  • The GitLab Environment Toolkit (GET) provides an opinionated Terraform and Ansible codebase to assist customers in provisioning the necessary infrastructure.

"Customers now have the responsibility to provide us with the infrastructure needed. So if a satellite service say needs a relational database and blob storage, we are not going to like we're going to provision that for gitlab.com. We are not going to provision that for self-managed customers."

Key Takeaways: Tradeoffs, Standards, and Consistency [16:38]

  • Vendor solutions are a legitimate trade-off providing value but potentially limiting users or causing lock-in; these decisions must be carefully reviewed.
  • Trade-offs should be revisited when they no longer serve current needs, and it's acceptable to change course as requirements evolve.
  • Open standards like Kubernetes offer significant advantages, enabling scalability from single machines to massive SaaS offerings.
  • Using the same orchestrator for both SaaS and self-managed offerings allows GitLab to "eat its own dog food" and maintain consistency.

"Open standards are a genuine advantage not just for your kubernes eretes and the ecosystem but also for for companies building on Kubernetes because this entire scalability that we can go from like a single machine to the githlip.com SAS offering at scale that is essentially powered by Kubernetes."

Future Directions and Compatibility Considerations [18:49]

  • The general plan is to move away from the monolithic architecture towards more independently deployable satellite services or modules.
  • While simplicity for customers is a goal, the unified Helm chart, though convenient, is becoming difficult to maintain due to its manual curation.
  • The omnibus package, a popular and simple installation method, will not be removed; new services will increasingly focus on Kubernetes, with omnibus handling core functionality.
  • Ensuring backward compatibility for APIs is crucial but challenging, requiring strict adherence and static checks to prevent unintended breakage.

"The problem with the unified health drive like it it's convenient. I totally get it, but it's very hard to maintain because at the moment it's all kind of hand curated and that doesn't scale unfortunately."

Extending the Platform into the Application [25:26]

  • Beyond deployments, GitLab emphasizes extending the platform into applications via frameworks like "LabKit."
  • LabKit provides standardized metrics, tracing, and logging endpoints, ensuring internal services operate consistently.
  • The use of common CI tasks, including dependency updates and semantic releases, further streamlines development workflows.
  • A copier template allows for rapid setup of new repositories with pre-configured tools, enabling developers to focus on business logic.

"What I want you to take away is that deployments is not everything. You also kind of need to think in the other direction. You need to ex extend your platform into the application."

Other People Also See