Secure AWS Instance Profiles on DC/OS

For more than five years, DC/OS has enabled some of the largest, most sophisticated enterprises in the world to achieve unparalleled levels of efficiency, reliability, and scalability from their IT infrastructure. But now it is time to pass the torch to a new generation of technology: the D2iQ Kubernetes Platform (DKP). Why? Kubernetes has now achieved a level of capability that only DC/OS could formerly provide and is now evolving and improving far faster (as is true of its supporting ecosystem). That’s why we have chosen to sunset DC/OS, with an end-of-life date of October 31, 2021. With DKP, our customers get the same benefits provided by DC/OS and more, as well as access to the most impressive pace of innovation the technology world has ever seen. This was not an easy decision to make, but we are dedicated to enabling our customers to accelerate their digital transformations, so they can increase the velocity and responsiveness of their organizations to an ever-more challenging future. And the best way to do that right now is with DKP.

May 06, 2020

Jan Ulferts


3 min read

Security is always an important topic in today's distributed systems. With DC/OS Enterprise, we offer a feature called DC/OS Secrets, which makes it possible to inject secure information like passwords or cryptographic keys into your application. No other application is able to read or change this information, and with DC/OS Identity and Access Management (IAM) you can also restrict the group of users that have access to this information.
The Usual Workflow
Let's assume you have an application that wants to access AWS resources, like an S3 bucket. With Secrets, you can easily create an IAM user and assign it a Policy which is able to access your particular bucket. You take the ACCESS_KEY and add it to your Marathon specification, then store the SECRET_ACCESS_KEY into a DC/OS Secret in your default vault which you also specify in your Marathon application. This practice is not bad, but it means you must rotate these credentials on a regular basis and therefore you need to update your application from time to time.
AWS Agents Can Do Better
If your agents are already running on AWS Instances, there is a way better and best practice solution to this problem: Instance Profiles. Instance profiles allow you to assign Roles to Instances. The AWS SDK running on an AWS Instance will try to retrieve credentials from the AWS Metadata API. The huge benefit of this is that you do not need to rotate credentials by yourself, as AWS takes care of it. These credentials will have a short lifetime, so even if they get leaked a user will only have a certain amount of time to use them.
Not Every Task Should Have This Privilege
On DC/OS, multiple applications will share the same agent and therefore share the same instance profile. This sharing is something that you should avoid. In the initially described process, you only hand out the users credentials to the applications that you’ve selected, so you decide based on the secret containing the credentials which application gets the credentials.
AssumeRole and external_id
We can combine the security of instance profiles with the selective authorization of DC/OS secrets. AWS offers a process called AssumeRole. With this process, a Role (Instance,User) is able to retrieve temporary credentials for another role (even in other AWS accounts). So, in our example, the Instance would assume a Role that has access to the S3 bucket. This process alone does not really change the authorization problem, as every application would still be able to use it, but AWS gives an additional layer of security to this procedure called external_id. The external_id is a Pre-Shared-Key (PSK) added to the trust relationship of a role which allows us to assume this role. This PSK will allow us to use DC/OS Secrets acting as an authorization instance for our application by placing an AssumeRole configuration that includes the external_id.
This repository contains a main.tf with an example setup. You only need to place a DC/OS license in your home folder ($HOME/license.txt) and the public key file of the SSH-Key you've loaded into your ssh-agent at ~/.ssh/id_rsa.pub. If these files are at different locations, just edit the main.tf and change the path for your environment.
Creating the cluster
Once you've downloaded all the files of this repository (git clone https://github.com/fatz/dcos-secure-instance-profiles && cd dcos-secure-instance-profiles), you will need to initialize terraform and start creating the cluster.
Before you start creating the cluster, make sure your AWS setup is finished and working. Either $AWS_PROFILE needs to be set to the profile you want to use or make sure that you've properly set up your AWS CLI aws configure. To ensure you are using the correct account, you should run aws sts get-caller-identity and see the account id  that you will be using.
Be aware, this repository is using Universal Installer to install DC/OS. At the moment you can only use Terraform 0.11.x. Check the DC/OS Documentation for install instructions.
terraform init -upgrade .
terraform apply
If not already done, download the dcos-cli
# on OSX
brew install dcos-cli

# on linux
curl https://downloads.dcos.io/cli/releases/binaries/dcos/linux/x86-64/latest/dcos -o dcos
chmod +x ./dcos
sudo mv dcos /usr/local/bin

Attach to cluster
After successfully creating the cluster, we have to attach to the cluster:
# in this setup we have to use --insecure as we did not give the load balancer a ACM cert and so it is an self signed one.
dcos cluster setup $(terraform output masters_dns_name) --password=deleteme --username=bootstrapuser --insecure
Ensure enterprise CLI
Let’s make sure we have the enterprise features available in our CLI (this is usually just needed for older versions of DC/OS (cli)):
dcos package install dcos-enterprise-cli --cli --yes
AWS config secret
We already prepared the aws config for the application in our main.tf. Next we create the secret from it:
dcos security secrets create /instance-profile-app/aws-config -v "$(terraform output secret_aws_conf)"
Install EdgeLB
To access our app, let's install EdgeLB. As we’re running strict mode, we have to create a service-account and a service-account-secret
Prepare service account and secret
dcos security org service-accounts keypair edge-lb-private-key.pem edge-lb-public-key.pem
dcos security org service-accounts create -p edge-lb-public-key.pem -d "Edge-LB service account" edge-lb-principal
dcos security secrets create-sa-secret --strict edge-lb-private-key.pem edge-lb-principal dcos-edgelb/edge-lb-secret
dcos security org groups add_user superusers edge-lb-principal
Install and configure EdgeLB
echo '{"service": {"secretName": "dcos-edgelb/edge-lb-secret","principal": "edge-lb-principal","mesosProtocol": "https"}}' | dcos package install edgelb --options=/dev/stdin --yes
And wait for EdgeLB to respond:
until dcos edgelb ping; do sleep 1; done
Deploy the marathon app
The last step is to finally deploy our simple app using the bucket that we've prepared. We’re using the template given in our terraform file. You can review it simply by using the terraform output marathon_app_definition.
terraform output marathon_app_definition | dcos marathon app add
As we are using the EdgeLB AutoPool feature, let's wait for the pool to come up:
until dcos edgelb status auto-default; do sleep 1; done
Let's ensure our app became healthy meanwhile:
until dcos marathon app show /instance-profile-app | jq -e .tasksHealthy==1 >/dev/null; do echo "waiting for app becoming healthy" && sleep 10;done
Using the app
Once the app is healthy, we can post data to it with curl.
echo "foobar" | curl --user testuser:password -X POST -H "Host: binapp.mesosphere.com" -d @- $(terraform output public-agents-loadbalancer)/bin
This app is creating an ID for the posted content. With this ID, it is storing the content into the specified bucket. So, we can now use the aws-cli to see if it has worked for us. Please replace the URL with the id you received from the command above.
aws s3 cp s3://$(terraform output s3_bucket_name)/bin/<id returned by the post> -
You can see that there is a file in the s3 bucket and its content is what we posted above.
All of this is based on Instance Profile and AssumeRole without any static credentials but the external_id, which only works in combination with the Account and Role of our DC/OS cluster.
You can tinker around with this technique and it's even more valuable if you're running an AWS multi-account setup.

Ready to get started?