AWS Private Link and Cross-account Package deployment

Recently I got a task from the Company’s app team, to install an agent on EC2 instances across several AWS accounts and enable connection via AWS private link to a target AWS account where the management server is hosted. To achieve this task, here I will see how to use AWS Systems Manager for software distribution and installation for multiple AWS accounts and install and configure AWS private link using Terraform.

Automate agent installation via SSM Run Command

Here I need a shell script to:

  • Identify Running Instances: Detect all running Linux and Windows EC2 instances in the specified AWS account.
  • Tag Instances Based on SSM Availability: Check if the SSM agent is available for each instance and assign tags to reflect the SSM status (ssm-linux, no-ssm-linux, ssm-windows, no-ssm-windows).
  • Create SSM Command Documents:
    • For Linux: Install Agent by running a shell script.
    • For Windows: Run a testing command to retrieve the Windows OS version.
  • Execute Commands via SSM: Run the corresponding commands on Linux and Windows instances that have SSM agents available to install the agent to cover both Windows and Linux.
#!/bin/bash

# Define the tag key and new values
TAG_KEY="dtagent"
REGION="ap-southeast-2"

# Step 1: Get a list of all running Linux and Windows instance IDs

# Get Linux instance IDs
LINUX_INSTANCE_IDS=$(aws ec2 describe-instances \
  --filters "Name=platform-details,Values=Linux/UNIX" "Name=instance-state-name,Values=running" \
  --query "Reservations[*].Instances[*].InstanceId" \
  --output text)

# Get Windows instance IDs
WINDOWS_INSTANCE_IDS=$(aws ec2 describe-instances \
  --filters "Name=platform-details,Values=Windows" "Name=instance-state-name,Values=running" \
  --query "Reservations[*].Instances[*].InstanceId" \
  --output text)

# Step 2: Check for SSM agent availability and tag instances accordingly

# Function to check SSM availability and assign tags
tag_instance_based_on_ssm() {
  local INSTANCE_ID=$1
  local PLATFORM=$2

  # Check if the instance is managed by SSM
  SSM_STATUS=$(aws ssm describe-instance-information \
    --filters "Key=InstanceIds,Values=$INSTANCE_ID" \
    --query "InstanceInformationList[*].PingStatus" \
    --output text)

  # Determine tag value based on SSM status and platform
  if [ "$PLATFORM" == "Linux" ]; then
    if [ "$SSM_STATUS" == "Online" ]; then
      TAG_VALUE="ssm-linux"
    else
      TAG_VALUE="no-ssm-linux"
    fi
  elif [ "$PLATFORM" == "Windows" ]; then
    if [ "$SSM_STATUS" == "Online" ]; then
      TAG_VALUE="ssm-windows"
    else
      TAG_VALUE="no-ssm-windows"
    fi
  fi

  # Apply the determined tag to the instance
  aws ec2 create-tags --resources "$INSTANCE_ID" --tags Key=$TAG_KEY,Value=$TAG_VALUE
}

# Tag Linux instances based on SSM availability
for INSTANCE_ID in $LINUX_INSTANCE_IDS; do
  tag_instance_based_on_ssm "$INSTANCE_ID" "Linux"
done

# Tag Windows instances based on SSM availability
for INSTANCE_ID in $WINDOWS_INSTANCE_IDS; do
  tag_instance_based_on_ssm "$INSTANCE_ID" "Windows"
done

echo "Tagging complete. Instances have been tagged with SSM availability status."

sleep 10

# Step 3: Create JSON files for SSM command documents (Linux and Windows)
cat  EOF  linux_dtcommand.json
{
  "schemaVersion": "2.2",
  "description": "Dynatrace oneagent installation command for Linux via SSM",
  "mainSteps": [
    {
      "action": "aws:runShellScript",
      "name": "LinuxDTCommand",
      "inputs": {
        "runCommand": [
          "curl -o /tmp/dtssm.sh https://raw.githubusercontent.com/ZackZhouHB/zack-gitops-project/refs/heads/editing/Python_scripts/testssm.sh",
          "chmod +x /tmp/dtssm.sh",
          "/tmp/dtssm.sh"
        ]
      }
    }
  ]
}
EOF

cat EOF  windows_dtcommand.json
{
  "schemaVersion": "2.2",
  "description": "Fetch Windows OS version via SSM",
  "mainSteps": [
    {
      "action": "aws:runPowerShellScript",
      "name": "WindowsOSVersionCheck",
      "inputs": {
        "runCommand": [
          "(Get-ItemProperty -Path 'HKLM:\\\\SOFTWARE\\\\Microsoft\\\\Windows NT\\\\CurrentVersion').ProductName"        ]
      }
    }
  ]
}
EOF


# Step 4: Create the SSM documents
aws ssm create-document \
  --name "LinuxDTCommand" \
  --document-type "Command" \
  --content file://linux_dtcommand.json

aws ssm create-document \
  --name "WindowsDTCommand" \
  --document-type "Command" \
  --content file://windows_dtcommand.json

# Step 5: Send the SSM command to Linux instances with "ssm-linux" tag
aws ssm send-command \
  --document-name "LinuxDTCommand" \
  --targets "Key=tag:$TAG_KEY,Values=ssm-linux" \
  --comment "Execute Dynatrace oneagent installation on all Linux instances with SSM" \
  --max-concurrency "50" \
  --max-errors "0" \
  --region $REGION

# Step 6: Send the SSM command to Windows instances with "ssm-windows" tag
aws ssm send-command \
  --document-name "WindowsDTCommand" \
  --targets "Key=tag:$TAG_KEY,Values=ssm-windows" \
  --comment "Execute Dynatrace oneagent installation on all Windows instances with SSM" \
  --max-concurrency "50" \
  --max-errors "0" \
  --region $REGION

echo "Commands sent to instances with SSM agent available."

The result can be verified via EC2 console for the tags, together in AWS SSM console for the commands execution status.

Cross account AWS private link setup

Here I will use both of my own AWS accounts `zack` and `joe`, to set up and verify a PrivateLink connection:

  • Provider Account (Zack): Hosting a service (Python HTTP server on port 9080) running on an EC2 instance behind a Network Load Balancer (NLB).
  • Consumer Account (Joe): Accessing the service securely through a VPC Endpoint without exposing the service to the internet.

Run a simple Python HTTP server on port 9080 in the provider account (Zack)

In Provider Account (Zack):

  • SSH into Zackblog EC2 running a docker Python HTTP service on port 9080.
root@ip-172-31-26-78:/var/snap/amazon-ssm-agent/9881# docker pull python:3

root@ip-172-31-26-78:/var/snap/amazon-ssm-agent/9881# docker run -d -p 9080:9080 --name test-container python:3 python

root@ip-172-31-26-78:/var/snap/amazon-ssm-agent/9881# docker ps
CONTAINER ID   IMAGE      COMMAND                  CREATED             STATUS             PORTS                                       NAMES
6e0dabbd7d5b   python:3   "python -m http.serv…"   About an hour ago   Up About an hour   0.0.0.0:9080->9080/tcp, :::9080->9080/tcp   test-container

root@ip-172-31-26-78:/var/snap/amazon-ssm-agent/9881# curl localhost:9080
  • Create an NLB with Security group forwards traffic to TCP 9080 port on the EC2 instance.

image tooltip here

  • Register the instance as a target group, create VPC Endpoint Service to expose the NLB to consumer accounts.

image tooltip here

  • Manual approve when receiving a connection request from the consumer account.

image tooltip here

In Consumer Account (Joe):

  • Create a VPC Interface Endpoint request and connect to the provider's endpoint service.

image tooltip here

  • Create a testing EC2 with port 9080 allowed from the security group, run curl to the private DNS name with port 9080 to verify the private link.
ubuntu@ip-172-31-54-206:~$ curl http://vpce-0961b67b61820a01e-6kvjc4rs.vpce-svc-056bd4182a8904985.ap-southeast-2.vpce.amazonaws.com:9080

Here it can be seen that the traffic from the consumer EC2 Ubuntu instance is routed through the endpoint to the NLB and then to the service in the provider account.

Using Terraform to create the above resources, so it can be easily destroyed for lab purposes and reused across more AWS accounts.

First in the provider account, create NLB, Security group for NLB to allow 9080, Target group to register Zackblog EC2, Listener, VPC endpoint service.

# main.tf for provider account

# Provider Account Configuration
provider "aws" {
  profile = "zack" # AWS account profile for provider
  region  = "ap-southeast-2"
}

# Fetch Existing VPC and Subnet
data "aws_vpc" "default" {
  default = true
}

data "aws_subnet" "subnet1" {
  id = "subnet-0ssssssssss69a" 
}

data "aws_subnet" "subnet2" {
  id = "subnet-073ssssssssss46db" 
}

data "aws_subnet" "subnet3" {
  id = "subnet-09sssssss129" # 
}

# Fetch Existing Security Group for EC2 Instance
data "aws_security_group" "ec2_sg" {
  id = "sg-01ssssssss7c" # Security Group ID of Zackblog EC2 instance
}


# Create Security Group for NLB
resource "aws_security_group" "nlb_sg" {
  name        = "nlb-private-link-sg"
  vpc_id      = data.aws_vpc.default.id
  description = "Allow traffic to NLB from consumer accounts"

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 9080
    to_port     = 9080
    protocol    = "tcp"
    cidr_blocks = ["172.31.48.0/20"] # Updated to match consumer's new CIDR
  }
}

# Allow NLB to Access EC2 on Port 9080
resource "aws_security_group_rule" "ec2_inbound_from_nlb" {
  type                     = "ingress"
  security_group_id        = data.aws_security_group.ec2_sg.id # Existing EC2 SG
  from_port                = 9080
  to_port                  = 9080
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.nlb_sg.id # Allow traffic from NLB SG
}


# Create NLB
resource "aws_lb" "nlb" {
  name               = "private-link-nlb"
  internal           = true
  load_balancer_type = "network"
  subnets            = [
    data.aws_subnet.subnet1.id,
    data.aws_subnet.subnet2.id,
    data.aws_subnet.subnet3.id
  ]
  security_groups    = [aws_security_group.nlb_sg.id]
}

# Create NLB Target Group
resource "aws_lb_target_group" "tg" {
  name        = "private-link-tg"
  port        = 9080
  protocol    = "TCP"
  vpc_id      = data.aws_vpc.default.id
  target_type = "instance"
}

# Attach EC2 Instance to Target Group
resource "aws_lb_target_group_attachment" "tg_attachment" {
  target_group_arn = aws_lb_target_group.tg.arn
  target_id        = "i-076sssssssscf2" # Zackblog instance ID
  port             = 9080
}

# Add Listener for NLB
resource "aws_lb_listener" "nlb_listener" {
  load_balancer_arn = aws_lb.nlb.arn  # Reference the NLB created in your Terraform configuration
  port              = 9080           # Listener port
  protocol          = "TCP"          # Protocol for the listener

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.tg.arn  # Reference the target group created in your Terraform configuration
  }
}

# Create Endpoint Service
resource "aws_vpc_endpoint_service" "private_link_service" {
  acceptance_required = true
  network_load_balancer_arns = [
    aws_lb.nlb.arn,
  ]
  allowed_principals = [
    "arn:aws:iam::8ssssssssss5:root" # Replace with account joe's ID
  ]
  private_dns_name = "zzservice.internal" # Provide a custom private DNS name
}

output "endpoint_service_name" {
  value       = aws_vpc_endpoint_service.private_link_service.service_name
  description = "The name of the VPC Endpoint Service to share with the consumer account"
}

Then in the consumer account, create an interface endpoint, security group for the Interface Endpoint, and a testing Ubuntu EC2 in a new non-overlapping subnet.

# main.tf for joe account as consumer

# Consumer Account Configuration
provider "aws" {
  profile = "joe" # AWS account profile for consumer
  region  = "ap-southeast-2"
}

# Fetch Existing VPC and Subnet
data "aws_vpc" "default" {
  default = true
}

# Create a new non-overlapping subnet in account Joe
resource "aws_subnet" "new_consumer_subnet" {
  vpc_id                  = data.aws_vpc.default.id
  cidr_block              = "172.31.48.0/20"     # Updated CIDR to avoid overlap
  availability_zone       = "ap-southeast-2c" # Same AZ as the testing EC2 instance
  map_public_ip_on_launch = false             # Optional: Prevent public IP assignment
  tags = {
    Name = "Consumer-New-PrivateLink-Subnet"
  }
}

# Create Security Group for Interface Endpoint
resource "aws_security_group" "endpoint_sg" {
  name        = "private-link-endpoint-sg"
  vpc_id      = data.aws_vpc.default.id
  description = "Allow traffic to the interface endpoint"

  ingress {
    from_port   = 9080
    to_port     = 9080
    protocol    = "tcp"
    cidr_blocks = ["172.31.48.0/20"] # Updated to match new consumer CIDR
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_vpc_endpoint" "interface_endpoint" {
  vpc_id             = data.aws_vpc.default.id
  service_name       = "com.amazonaws.vpce.ap-southeast-2.vpce-svc-0ssssssssss5"
  subnet_ids         = [aws_subnet.new_consumer_subnet.id]
  security_group_ids = [aws_security_group.endpoint_sg.id]
  private_dns_enabled = false # Temporarily disable for accept connection from provider account then change to true
  vpc_endpoint_type  = "Interface"
}

# Replace the null resource with a manual acceptance process (preferred)
output "accept_endpoint_instructions" {
  value = < \
    --vpc-endpoint-ids 
  EOT
}

# Dynamically Fetch the Subnet
data "aws_subnet" "new_consumer_subnet" {
  filter {
    name   = "cidr-block"
    values = ["172.31.48.0/20"] # Replace with the CIDR block of the desired subnet
  }

  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.default.id]
  }
}

# Create Security Group for Testing
resource "aws_security_group" "testing_sg" {
  name        = "testing-private-link-sg"
  vpc_id      = data.aws_vpc.default.id
  description = "Security group for testing EC2 instance"

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"] # Allow SSH from everywhere (change for production use)
  }

  ingress {
    from_port   = 9080
    to_port     = 9080
    protocol    = "tcp"
    cidr_blocks = ["172.31.0.0/16"] # Allow traffic to PrivateLink endpoint within VPC
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# Launch EC2 Instance in 172.31.48.0/20 Subnet
resource "aws_instance" "testing_ec2" {
  ami                  = "ami-040e71e7b8391cae4" # Replace with a valid Amazon Linux 2 AMI for ap-southeast-2
  instance_type        = "t2.micro"
  subnet_id            = data.aws_subnet.new_consumer_subnet.id
  vpc_security_group_ids = [aws_security_group.testing_sg.id] # Use the correct attribute for SGs with subnets

  associate_public_ip_address = true # Assign a public IP address

  key_name = "xxxxx1" 

  tags = {
    Name = "PrivateLink-Testing-EC2"
  }
}

Finally, SSH into the new EC2 instance and run the following command to test the PrivateLink endpoint.

Conclusion

  • Traffic Flow
    • Consumer EC2 → PrivateLink Endpoint: Consumer EC2 uses the private DNS name to send requests to the endpoint. The VPC interface endpoint forwards traffic securely to the NLB in the provider account.
    • NLB → EC2 Instance: The NLB routes traffic from the VPC endpoint to the EC2 instance based on the target group configuration.
    • EC2 Instance → Service: The EC2 instance processes the request and responds back via the same path.
  • What We Accomplished
    • Used Terraform to automate resource creation for NLB, target group, VPC endpoint service, and consumer endpoint.
    • Successfully accessed the Python HTTP server from the consumer EC2 instance using the private DNS name via PrivateLink.
    • Verified connectivity from consumer EC2 to provider service.

Welcome to Zack's Blog

Join me for fun journey about ##AWS ##DevOps ##Kubenetes ##MLOps

  • Latest Posts