The Cause and Effect of Cluster Sprawl

Sep 01, 2020

Alex Hisaka

D2iQ

Kubernetes gives organizations the ability to run Kubernetes clusters at scale across different cloud infrastructures and distributions. Unfortunately, this is where many of the challenges begin. As the number of clusters and workloads grow, they are being managed independently with very little consistency. The result is a chaotic environment of cluster and workload sprawl, creating redundant efforts and wasted resources, a myriad of governance challenges, and a software environment that is difficult if not impossible for the organization to support.

Below are four ways that cluster sprawl impacts your multi-cluster operations.

Provisioning

In adopting Kubernetes, organizations need to maintain granular control over how and where new clusters are used, as well as who is able to engage in policy and operational needs of those services. However, various teams are provisioning and using clusters across a wide variety of projects with very little consistency, unified management, and visibility to empower divisions of labor across roles in the organization.

As an organization grows in its use of infrastructure as a service, its ability to monitor, manage, and optimize those resources in a cost-effective manner often fails to keep pace. And this problem only grows in complexity as new clusters are added, new users on-board, off-board, or change teams, and projects multiply.

Configuration

As your organization expands its usage of Kubernetes, clusters will exist in different pockets each with differing policies, roles, and configurations in their usage.

Individual teams need to keep their environments secure, well patched up, and up-to-date. If you only have one or two ways that things are configured that means your staff is more likely to do the right thing. Conversely, when there are infinite levels of variation on a hundred different ways of configuring and using Kubernetes, it makes it incredibly challenging to simplify and create consistency across organizational clusters. Teams can’t configure clusters and services based on intended architectures from the beginning or define best practice architectures for their Kubernetes environments. They also lose the flexibility to configure access and delegate responsibilities as a user’s role within the organization changes.

Without a consistent way to configure new clusters, it can have a serious impact on the speed and efficiency of the deployment process.

Deployment

Deploying each service is time-consuming and requires significant engineering effort. And if these technologies aren’t configured correctly or standardized, the results can be detrimental to the business.

Services are deployed repeatedly and independently within and across clusters. In addition, all configuration and policy management, such as roles and secrets, are repeated, wasting time and creating the opportunity for mistakes.

If software is deployed with inconsistent builds or versions, it can introduce inconsistent performance and reliability issues, security risks, and snowflake (or highly unique) implementations that only certain personnel know how to maintain.

All of which leads to an ever-expanding list of IT responsibilities in scope and complexity.

Updates and Ongoing Maintenance

The complexity of achieving operational readiness and managing ongoing operations -- on Day 2 and beyond -- is a headache for many businesses because there was a lack of governance and management in place from the beginning.

When there’s a lack of centralized governance and visibility over how and where these resources are provisioned, it negatively impacts the bottom line. If a cluster goes down, you can’t troubleshoot problems without losing valuable time. You can’t easily obtain insights on cluster performance to deliver better resource utilization. You also can’t identify role violations, assess governance risk, and perform compliance checks. And if there are dozens of potential software versions in use, managing all of them across the organization is nearly impossible. When more time is spent putting out fires, there is less time for efficient operations.

To learn more about how to control cluster sprawl, download the cheat sheet, “Multi-Cluster Management: Reduce Overhead and Redundant Efforts.”

< Previous

Next >