Kubernetes, Kubeflow, Machine Learning, Kaptain, AI/ML

Cryptomining Attacks on Kubeflow: What You Need to Know

Learn how D2iQ Kaptain and Konvoy can future proof your infrastructure from widespread malicious workloads and cryptomining campaigns

Jun 16, 2021

Anton Kirillov


5 min read

Microsoft recently reported two widespread cryptomining attacks targeting Kubeflow, a popular cloud-native platform for machine learning (ML) workloads on Kubernetes. Attackers targeted Kubeflow installations using either the Kubeflow central dashboard interface or Kubeflow Pipelines interface for scheduling crypto-mining workloads. Not only can attacks like these put a strain on infrastructure resources, but they can expose intellectual property, personnel files, and other at-risk assets, all of which can damage a business, if breached. How can you future proof your infrastructure from cryptomining campaigns like these? Read on to learn how using D2iQ Konvoy and Kaptain’s security integration can protect your business against malicious workloads and security breaches.

The most recent cryptomining attack was similar to one that Microsoft reported last June, targeting Kubeflow installations which exposed Kubeflow Central Dashboard to launch a widespread cryptomining campaign. The URL address of the misconfigured Istio Gateway can be publicly exposed when it is deployed as a Load Balancer service type. Cloud security settings can often overlook situations like this, and as a result, the Kubeflow access endpoint becomes publicly available. In case there’s no authentication mechanism integrated with the Kubeflow installation, anonymous users can create a valid user namespace and start deploying their workloads.

In addition to exposing Kubeflow Central Dashboard, attackers hijacked access to the Kubeflow Pipelines UI to create a new pipeline. All the workloads can be scheduled by a Kubeflow user having access to the system. This convenience and the fact that all the workloads run in containers allow users to run any containerized application on the cluster. In the case of the attacks, the containerized applications were running crypto-mining software.

Another problem with the default Kubeflow installations revealed in the first attack report is the automatic namespace creation for the newly logged-in users. In fact, the namespace is only one of the several resources that are created for users during the onboarding process. The parent resource that is created is called Profile and is managed by the Profile Controller. When a Profile is deployed to a cluster, it is associated with a single user. The Profile Controller creates a namespace for that user and also creates a set of Istio Authorization Policies to manage the access to the platform components such as Kubeflow Pipelines API service that allows scheduling pipelines. Once the Profile is created a user can start scheduling workloads in their namespace. With the default Kubeflow installation, the Profile creation happens automatically if there’s no Profile found for the user.

Both the public exposure of the unprotected UI endpoint and the automatic Profile creation lead to a situation where anybody having the URL address of the gateway can access Kubeflow without any credentials required, create a Profile, and execute any containerized workload. Not only can attacks like these make it impossible to use the cluster for business-critical workloads, it also imposes the risk of malevolent users gaining access to the in-cluster data storage and internal APIs. How can businesses take the appropriate steps  to prevent unauthorized access to their cluster and avoid malicious payload scheduling?  That’s where D2iQ Kaptain and Konvoy can help.

D2iQ Kaptain is an enterprise-grade Machine Learning (ML) platform that ships with a fully-integrated security stack enabled by default based on the tight integration with Konvoy. All the exposed endpoints are secured with authentication and automatic resource creation is prohibited, thus protecting the default installation against the described types of attacks. Let’s take a closer look at Kaptain’s security mechanisms and integrations that prevent unauthorized access and workload scheduling on Kubeflow clusters.
In order to prevent unauthorized access to the web interfaces and Istio Gateway, Kaptain includes OIDC AuthService configured to work with the Dex platform service which is a part of Konvoy.  Dex is an OIDC Identity Provider that integrates with other Identity Providers. Konvoy configures the Kubernetes cluster to use Dex as its OIDC Identity Provider. Once Kaptain is installed on a Konvoy cluster, it automatically integrates with Dex and all unauthenticated web requests are redirected to a login page. Kaptain creates a specific EnvoyFilter for the Istio Gateway so that every request is routed through the OIDC AuthService that checks it for a user header and a session token and triggers the authentication workflow if they are not  present or expired. With this setup, a publicly available Kaptain endpoint is secure and protected from unauthorized access. Kaptain also supports Dex installations external to the cluster and provides support for custom domain names and TLS certificates for the endpoints used to access its web interfaces.

In Kaptain, the automatic Profile creation is disabled by default and it is recommended to onboard new users by creating Profiles using Kubernetes manifests. Here’s an example of the Profile manifest:

apiVersion: kubeflow.org/v1
kind: Profile
  name: “user”
    kind: User
    name: "user@company.com"
      cpu: 20
      memory: 80G
      pods: 20
      nvidia.com/gpu: 10
      persistentvolumeclaims: 20
      requests.storage: 200G

There are two benefits of disabling the automatic Profile creation:

  1. First, when the automatic Profile creation is disabled, even if attackers compromise the identity provider and retrieve access credentials, they will be unable to run any workload or connect to the internal APIs to schedule it. 
  2. Second, is the ability to configure quotas for the user namespace. Whether it’s a malicious workload attempting to use all the cluster resources or just a human error in scheduling a large number of experiments, resource quotas minimize the “noisy neighbour” impact and cap the amount of resources allowed for allocation by a single user.

Cryptomining attacks are serious issues that your business shouldn’t take lightly. To learn how you can take the appropriate steps for defense with  Kaptain, visit our product page or official documentation.

Ready to get started?