How to: Set up a WireGuard VPN for multiple VPCs in AWS

WireGuard VPN Setup

Introduction

Recently the DragonOps team was on the hunt for a VPN solution we could build out that provides out-of-the-box private access along with our infrastructure. We were open to most solutions, but had a few key sticking points:

  • We didn't want to peer preproduction and production networks
  • We didn't want users to need another third-party dashboard to manage access
  • We wanted it to be very, very affordable for teams of all sizes
  • We wanted the big three: scalability, reliability, and speed

This guide walks you through a technical, detailed solution of what we ended up implementing, in the hopes that it helps others looking for a similar solution!

The Problem

Like we mentioned above, finding the right VPN solution can be a tall order to fill. You need something that checks all the important boxes, like affordability, scalability, speed, security, and ease-of-use. This can be difficult with many of the common solutions that are already out there, especially as most require you to peer your VPCs in order to obtain connectivity to both at the same time. Peering networks defeats the purpose of environment isolation, so for us that was a huge factor in our decision to build out our own, integrated solution.

Some other factors we took into account were cost-optimization (we don't want our clients paying more than they need to), as well as ease of use to control access across networks. We found that OpenVPN and AWS VPN can both get pretty pricey for medium to large teams, configuration isn't always straightforward, and VPC peering was a requirement for simultaneous connectivity. Tailscale was one of the few solutions we found that didn't require VPC peering, but requires users to interact with a third-party dashboard to finish configuration, which we were trying to abstract as much as possible.

A few hours of research later, we decided to go the custom route. Here's what we found!

The Solution

One of the first things we found when researching was WireGuard. We actually found it as a result of investigating how Tailscale worked under the hood, and, because we had a little familiarity with it already, we decided to give it a try. The issue we ran in to again and again was that the documentation wasn't really there for what we wanted to do, and there was a lot of trial and error involved.

Fast-forward to now, and we have a robust, secure, fast, scalable VPN solution built-in to DragonOps that allows developers to connect to multiple networks via a single WireGuard tunnel, without the need to peer networks. This was such a huge win for us that we wanted to share how we were able to get it working for anyone else who might want to do the same thing.

One callout we want to make is that some VPN solutions out there do provide access via SSO, and do not use static keys. This is not one of them (in its current state at least). WireGuard is set up with static key-pairs, and is therefore inherently less secure than an SSO solution. With good practices around security and control of your private key though, it is a solution more than secure enough for most use-cases.

Implementation Overview

While our setup varies from this slightly in that it's built into our platform, we have used this exact setup for clients in the past, and it's served them well. For this guide, we've provided instructions for a complete tutorial, where are resources are spun up from scratch, as well as partial instructions for plugging in our solution to your existing infrastructure.

To keep things clear in this guide, we want to quickly define two terms as we use them in this post:

  • server: A server is an EC2 instance running in AWS with WireGuard configured and running.
  • client: Any other machine (such as a developer's laptop) that needs access to one/more private network(s) in AWS.

High-level setup

Each of the below points will be covered in much greater detail, but for a more holistic view, our WireGuard solution will look like this when we are finished:

  • Each network (VPC) needs to have its own WireGuard server (we use Terraform, provided below).
  • A non-overlapping subnet mask is used to "connect" all WireGuard clients and servers.
  • Each WireGuard server and client is assigned an IP address from the subnet mask.
  • Each WireGuard server includes a configuration file which denotes which clients have access.
  • Each WireGuard client has a config file which denotes which networks it can access (access has to also exist on the server).
  • Updates are made via a Lambda function (code provided), which uses DynamoDB as a datastore. DynamoDB changes trigger another Lambda function (code provided) which updates the WireGuard servers.

Prerequisites

To follow along with the entirety of this guide, you will need the following:

  • terraform installed on your local
  • AWS account and the aws cli installed with credentials configured

Deploy Infrastructure

We've provided Terraform for you to make the process simpler. Feel free to clone/steal/copy anything you want and make it your own as needed.

Here's the repo: WireGuard Infrastructure.

We have organized the WireGuard resources into modules for easier consumption and organization. The wireguard_vpn_server module provisions an EC2 instance, security group, EIP, and IAM permissions for the WireGuard server. The wireguard_updater modules provisions the Lambda function and DynamoDB table needed for updating user access to your networks. You can choose to use one, both, or the entire repository depending on what is the best fit for your use-case.

We are aware there are optimizations and improvements that could be made to the Terraform provided, but in an effort to make this guide as simple and flexible as possible, we've opted to purposefully avoid more advanced Terraform topics, such as workspaces, backend state storage, and numerous stacks. Our goal is to provide the minimum code needed to explain the setup, allowing our readers the option to customize as they need for their more specific use-cases.

Deploy everything - Terraform

To deploy our entire solution with example VPCs and private instances for testing, simply:

  1. Clone the repo: git clone repo-link-goes-here.
  2. Change directories: cd repo-name.
  3. Run ./generate_wireguard_keys.sh in your terminal twice.
  4. In vpn.tf, replace GENERATE_ME_WITH_SCRIPT for each vpn with the outputs from running the script.
  5. Export/refresh your aws credentials.
  6. Run terraform get.
  7. Run terraform init.
  8. Run terraform apply.
  9. Accept the changes by typing yes.

That's it! Here's what you deployed:

  • Two VPCs! Each with: two public subnets, two private subnets, a WireGuard server (pre-configured and already running the WireGuard process), and a private instance with which we can test connectivity.
  • A Python Lambda function for updating access (adds/removes/updates rows in the DynamoDB table)
  • A DynamoDB table for managing the state of user access (changes here trigger updates to the WireGuard servers)

You can now move on to Managing network access.

Deploy VPN only - Terraform

Already have a network defined in Terraform, and just need the code to add the VPN and updater Lambda?

You'll need to do a couple of things:

  1. Copy the entire modules directory into your own IaC (or, if you already have a `modules` directory, just copy the two modules into your existing directory).
  2. Copy-paste the below into the root of your Terraform stack, updating the values indicated below. If our naming conventions don't align, or you prefer more specific tagging, update the code in the modules you already copied over.
  3. Run ./generate_wireguard_keys.sh in your terminal for each vpn module you add.
  4. In vpn.tf, replace GENERATE_ME_WITH_SCRIPT for each vpn with the outputs from running the script.
  5. Export/refresh your aws credentials.
  6. Run terraform get.
  7. Run terraform init.
  8. Run terraform apply.
  9. Accept the changes by typing yes.

You can create your own system for generating the WireGuard public and private key pair, so that the private key does not end up in Terraform state. In our setup, we have some Golang code that can handle the automation part, but for the purposes of this guide we wanted to provide an easy way to get up and running even if you don't have automation built out.

# Deployed per network you want to manage access to.
module "vpn_dev" {
  source                = "./modules/wireguard_vpn_server"
  environment           = "dev"
  account_id            = data.aws_caller_identity.account.id
  subnet_id             = module.vpc_dev.public_subnets[0]   # Change if using your own IaC / existing network
  vpc_id                = module.vpc_dev.vpc_id              # Change if using your own IaC / existing network
  wireguard_port        = "65731"                            # Should be a new, random port per WireGuard server
  wireguard_ip_address  = "192.168.2.2/32"             # Should be an unused IP in the range XXXXX
  wireguard_public_key  = "GENERATE_ME_WITH_SCRIPT"    # Generate this and the below using the script provided in this repository
  wireguard_private_key = "GENERATE_ME_WITH_SCRIPT"
}

# Only deployed a single time, NOT per network/environment, but updated with the environments you want it to manage access to.
module "wireguard_updater" {
  source     = "./modules/wireguard_updater"
  account_id = data.aws_caller_identity.account
}

# Update the below map with your environments.
# If you have different terraform workspaces or environments, you can utilize stack outputs instead of modules outputs.
  vpn_environments = [
    {
      environment         = "dev"                             # Change if using your own IaC / existing network
      vpc_cidr            = module.vpc_dev.vpc_cidr_block     # Change if using your own IaC / existing network
      instance_id         = module.vpn_dev.wireguard_instance_id      
      public_key          = module.vpn_dev.wireguard_public_key
      wireguard_endpoint  = module.vpn_dev.wireguard_public_endpoint
    }
  ]
}

# Optional, if you don't already have a data block defined for aws_caller_identity.
data "aws_caller_identity" "account" {}

That's it! Here's what you deployed:

  • As many VPN Wireguard servers as you defined, pre-configured and already running the WireGuard process.
  • A Python Lambda function for updating access (adds/removes/updates rows in the DynamoDB table)
  • A DynamoDB table for managing the state of user access (changes here trigger updates to the WireGuard servers)

You can now move on to Managing network access.