Recently the DragonOps team was on the hunt for a VPN solution we could build out that provides out-of-the-box private access along with our infrastructure. We were open to most solutions, but had a few key sticking points:
This guide walks you through a technical, detailed solution of what we ended up implementing, in the hopes that it helps others looking for a similar solution!
Like we mentioned above, finding the right VPN solution can be a tall order to fill. You need something that checks all the important boxes, like affordability, scalability, speed, security, and ease-of-use. This can be difficult with many of the common solutions that are already out there, especially as most require you to peer your VPCs in order to obtain connectivity to both at the same time. Peering networks defeats the purpose of environment isolation, so for us that was a huge factor in our decision to build out our own, integrated solution.
Some other factors we took into account were cost-optimization (we don't want our clients paying more than they need to), as well as ease of use to control access across networks. We found that OpenVPN and AWS VPN can both get pretty pricey for medium to large teams, configuration isn't always straightforward, and VPC peering was a requirement for simultaneous connectivity. Tailscale was one of the few solutions we found that didn't require VPC peering, but requires users to interact with a third-party dashboard to finish configuration, which we were trying to abstract as much as possible.
A few hours of research later, we decided to go the custom route. Here's what we found!
One of the first things we found when researching was WireGuard. We actually found it as a result of investigating how Tailscale worked under the hood, and, because we had a little familiarity with it already, we decided to give it a try. The issue we ran in to again and again was that the documentation wasn't really there for what we wanted to do, and there was a lot of trial and error involved.
Fast-forward to now, and we have a robust, secure, fast, scalable VPN solution built-in to DragonOps that allows developers to connect to multiple networks via a single WireGuard tunnel, without the need to peer networks. This was such a huge win for us that we wanted to share how we were able to get it working for anyone else who might want to do the same thing.
One callout we want to make is that some VPN solutions out there do provide access via SSO, and do not use static keys. This is not one of them (in its current state at least). WireGuard is set up with static key-pairs, and is therefore inherently less secure than an SSO solution. With good practices around security and control of your private key though, it is a solution more than secure enough for most use-cases.
While our setup varies from this slightly in that it's built into our platform, we have used this exact setup for clients in the past, and it's served them well. For this guide, we've provided instructions for a complete tutorial, where are resources are spun up from scratch, as well as partial instructions for plugging in our solution to your existing infrastructure.
To keep things clear in this guide, we want to quickly define two terms as we use them in this post:
Each of the below points will be covered in much greater detail, but for a more holistic view, our WireGuard solution will look like this when we are finished:
Updates are made via a Lambda function (code provided), which uses DynamoDB as a datastore. DynamoDB changes trigger another Lambda function (code provided) which updates the WireGuard servers.
To follow along with the entirety of this guide, you will need the following:
We've provided Terraform for you to make the process simpler. Feel free to clone/steal/copy anything you want and make it your own as needed.
Here's the repo: WireGuard Infrastructure.
We have organized the WireGuard resources into modules for easier consumption and organization. The
wireguard_vpn_server
module provisions an EC2 instance, security group, EIP, and IAM permissions for the WireGuard server. Thewireguard_updater
modules provisions the Lambda function and DynamoDB table needed for updating user access to your networks. You can choose to use one, both, or the entire repository depending on what is the best fit for your use-case.
We are aware there are optimizations and improvements that could be made to the Terraform provided, but in an effort to make this guide as simple and flexible as possible, we've opted to purposefully avoid more advanced Terraform topics, such as workspaces, backend state storage, and numerous stacks. Our goal is to provide the minimum code needed to explain the setup, allowing our readers the option to customize as they need for their more specific use-cases.
To deploy our entire solution with example VPCs and private instances for testing, simply:
git clone repo-link-goes-here
.cd repo-name
../generate_wireguard_keys.sh
in your terminal twice.vpn.tf
, replace GENERATE_ME_WITH_SCRIPT
for each vpn
with the outputs from running the script.terraform get
.terraform init
.terraform apply
.yes
.That's it! Here's what you deployed:
You can now move on to Managing network access.
Already have a network defined in Terraform, and just need the code to add the VPN and updater Lambda?
You'll need to do a couple of things:
modules
directory into your own IaC (or, if you already have a `modules` directory, just copy the two modules into your existing directory)../generate_wireguard_keys.sh
in your terminal for each vpn module you add.vpn.tf
, replace GENERATE_ME_WITH_SCRIPT
for each vpn
with the outputs from running the script.terraform get
.terraform init
.terraform apply
.yes
.You can create your own system for generating the WireGuard public and private key pair, so that the private key does not end up in Terraform state. In our setup, we have some Golang code that can handle the automation part, but for the purposes of this guide we wanted to provide an easy way to get up and running even if you don't have automation built out.
# Deployed per network you want to manage access to.
module "vpn_dev" {
source = "./modules/wireguard_vpn_server"
environment = "dev"
account_id = data.aws_caller_identity.account.id
subnet_id = module.vpc_dev.public_subnets[0] # Change if using your own IaC / existing network
vpc_id = module.vpc_dev.vpc_id # Change if using your own IaC / existing network
wireguard_port = "65731" # Should be a new, random port per WireGuard server
wireguard_ip_address = "192.168.2.2/32" # Should be an unused IP in the range XXXXX
wireguard_public_key = "GENERATE_ME_WITH_SCRIPT" # Generate this and the below using the script provided in this repository
wireguard_private_key = "GENERATE_ME_WITH_SCRIPT"
}
# Only deployed a single time, NOT per network/environment, but updated with the environments you want it to manage access to.
module "wireguard_updater" {
source = "./modules/wireguard_updater"
account_id = data.aws_caller_identity.account
}
# Update the below map with your environments.
# If you have different terraform workspaces or environments, you can utilize stack outputs instead of modules outputs.
vpn_environments = [
{
environment = "dev" # Change if using your own IaC / existing network
vpc_cidr = module.vpc_dev.vpc_cidr_block # Change if using your own IaC / existing network
instance_id = module.vpn_dev.wireguard_instance_id
public_key = module.vpn_dev.wireguard_public_key
wireguard_endpoint = module.vpn_dev.wireguard_public_endpoint
}
]
}
# Optional, if you don't already have a data block defined for aws_caller_identity.
data "aws_caller_identity" "account" {}
That's it! Here's what you deployed:
You can now move on to Managing network access.