Implementing MCP OAuth 2.1 with Keycloak on AWS
Overview
Introduction
The Model Context Protocol (MCP) ecosystem mandates OAuth 2.1-compliant authorization servers to facilitate secure, federated access to AI model services. MCP clients, such as Claude Code, Cursor, and VS Code extensions, rely on modern OAuth specifications including Dynamic Client Registration (RFC 7591), PKCE (RFC 7636), and crucially, Resource Indicators (RFC 8707) for audience-restricted tokens.
However, most Identity-as-a-Service (IDaaS) providers, including the open-source Keycloak platform, currently lack full RFC 8707 support. Keycloak, while robust in OAuth 2.0 capabilities, employs a proprietary audience parameter in contrast to the standardized resource parameter defined in RFC 8707. For a comprehensive analysis of this compatibility landscape, refer to my previous post: Technical Deconstruction of MCP Authorization: A Deep Dive into OAuth 2.1 and IETF RFC Specifications.
This article provides a detailed guide on configuring Keycloak as an MCP-compatible authorization server through strategic use of protocol mappers and realm configuration. The implemented solution encompasses:
- RFC 8707 Workaround: Custom audience protocol mappers to inject correct
audclaims into JWT tokens. - Dynamic Client Registration: Automated client onboarding via realm default scopes.
- Zero-Configuration MCP Support: Automatic audience restriction without manual client configuration.
- Infrastructure Automation: Terraform deployment on AWS utilizing ECS Fargate and Aurora PostgreSQL.
Upon completion of this guide, you will possess a clear understanding of how to configure Keycloak for seamless MCP client support, enabling dynamic client registration with automated audience restriction.
Architecture Overview
This deployment leverages AWS managed services to establish a scalable Keycloak infrastructure tailored for MCP OAuth workflows.
Core Components
Compute Layer (ECS Fargate)
Keycloak operates as containerized workloads on AWS Fargate, offering managed compute capacity:
- Custom Docker Image: Built from the official Keycloak 26.4.4 release, pre-configured with JDBC_PING for clustering.
- Multi-AZ Deployment: Tasks are strategically distributed across multiple Availability Zones for resilience.
- Health Monitoring: Integrated with AWS CloudWatch Container Insights for robust performance and health visibility.
Database Layer (Aurora PostgreSQL)
Amazon Aurora provides a highly available, scalable PostgreSQL-compatible database backend:
- Database Engine: PostgreSQL 16 (Keycloak 26.4.4 requires PostgreSQL 13+ minimum, 16.8 recommended).
- Scalability: Aurora Serverless v2, featuring configurable capacity and auto-scaling.
- High Availability: Multi-AZ deployment with automatic failover mechanisms.
- Security: Data encryption at rest and automated backup procedures.
Load Balancing (Application Load Balancer)
The Application Load Balancer (ALB) manages TLS termination and intelligent traffic distribution:
- HTTPS/TLS: Certificate management handled by AWS Certificate Manager (ACM).
- Health Checks: Continuously monitors Keycloak health endpoints to ensure service availability.
- Session Affinity: Supports sticky sessions for maintaining stateful client connections.
Networking Infrastructure
A Virtual Private Cloud (VPC) provides a logically isolated network environment:
- Subnets: Public and private subnets distributed across multiple Availability Zones.
- NAT Gateways: Enable secure outbound internet access for resources within private subnets.
- VPC Endpoints: Facilitate private connectivity to select AWS services.
- Security Groups: Enforce granular network access controls.
Deployment Workflow
The infrastructure deployment adheres to a phased approach:
flowchart TD
A[Create VPC & Networking] --> B[Deploy Aurora RDS]
B --> C[Create ECS Cluster]
C --> D[Build & Push Container Image]
D --> E[Start ECS Tasks]
E --> F[Configure MCP OAuth Realm]
F --> G[Verify Dynamic Client Registration]
This structured methodology ensures that foundational infrastructure is provisioned prior to implementing MCP-specific Keycloak configurations.
Understanding Keycloak's RFC 8707 Gap
To comprehend the necessity of custom configuration in this deployment, we must analyze the incompatibility between Keycloak's audience implementation and the MCP specification's requirements.
The RFC 8707 Standard
RFC 8707 (Resource Indicators for OAuth 2.0) specifies a standardized mechanism for audience restriction within OAuth access tokens. This specification introduces a resource parameter, which clients include in both authorization and token requests:
1POST /token HTTP/1.1
2Host: auth.example.com
3Content-Type: application/x-www-form-urlencoded
4
5grant_type=authorization_code
6&code=ABC123
7&redirect_uri=https://client.example.com/callback
8&resource=https://api.example.com ← Target audience
9&client_id=CLIENT_ID
The Authorization Server (AS) utilizes this resource parameter to populate the JWT's aud (audience) claim, thereby ensuring the token's validity is restricted to the specified Resource Server (RS).
Keycloak's Proprietary Approach
Keycloak's audience functionality was implemented prior to the publication of RFC 8707 in February 2020. As detailed in the MCP authorization compatibility matrix, Keycloak employs a proprietary audience parameter that predates the standardized approach.
The Problem: MCP clients (e.g., Claude Code, Cursor, VS Code extensions) adhere to RFC 8707 and transmit the resource parameter. Keycloak, however, disregards this parameter, resulting in JWT tokens that either lack the mandatory aud claim or contain incorrect audience values.
The Consequence: MCP servers validate the aud claim to mitigate token replay attacks, addressing the "Confused Deputy" problem. Without proper audience restriction, tokens risk rejection or potential misuse across disparate resource servers.
The Workaround Architecture
The proposed solution strategically leverages Keycloak's Protocol Mappers to automatically inject the correct aud claim, circumventing the absence of native RFC 8707 support. This architecture integrates three key components:
flowchart LR
A[Dynamic Client Registration] --> B{Realm Default Scopes}
B --> C[Auto-assign mcp:run scope]
C --> D[mcp:run has Audience Mapper]
D --> E[Token Request]
E --> F{Mapper Active?}
F -->|Yes| G[Inject aud claim]
G --> H[JWT with correct audience]
H --> I[MCP Server validates aud]
I --> J[Access Granted]
style D fill:#f59e0b
style G fill:#10b981
style I fill:#3b82f6
Component 1: Audience Protocol Mapper
A hardcoded claim mapper, associated with the mcp:run client scope, injects the MCP server's URL into the aud claim:
1resource "keycloak_openid_hardcoded_claim_protocol_mapper" "mcp_run_audience_mapper" {
2 realm_id = keycloak_realm.mcp.id
3 client_scope_id = keycloak_openid_client_scope.mcp_run.id
4 name = "mcp-audience"
5
6 claim_name = "aud"
7 claim_value = var.resource_server_uri # e.g., "https://mcp-server.example.com/mcp"
8 claim_value_type = "String"
9
10 add_to_id_token = false
11 add_to_access_token = true # ← Critical: Only in access tokens
12 add_to_userinfo = false
13}
Component 2: Realm Default Scopes
By configuring mcp:run as a realm-wide default scope, all clients, including those registered via Dynamic Client Registration, automatically inherit this audience mapper:
1resource "keycloak_realm_default_client_scopes" "mcp_realm_defaults" {
2 realm_id = keycloak_realm.mcp.id
3
4 default_scopes = [
5 "profile",
6 "email",
7 "mcp:run", # ← Critical: Auto-assigned to DCR clients
8 "roles",
9 "web-origins",
10 "acr",
11 "basic",
12 ]
13}
Component 3: DCR Allowed Scopes Configuration
Client Registration Policies are configured to permit mcp:run within the allowed scopes for dynamically registered clients. This step is performed using the Keycloak Admin REST API due to current Terraform provider limitations:
1# Extract Client Registration Policy component ID
2COMPONENT_ID=$(curl -s "${KEYCLOAK_URL}/admin/realms/mcp/components" \
3 -H "Authorization: Bearer ${ADMIN_TOKEN}" | \
4 jq -r '.[] | select(.name=="Allowed Client Scopes") | .id')
5
6# Update allowed scopes to include mcp:run
7curl -X PUT "${KEYCLOAK_URL}/admin/realms/mcp/components/${COMPONENT_ID}" \
8 -H "Authorization: Bearer ${ADMIN_TOKEN}" \
9 -H "Content-Type: application/json" \
10 -d '{
11 "config": {
12 "allow-default-scopes": ["true"],
13 "allowed-client-scopes": ["openid", "profile", "email", "mcp:run"]
14 }
15 }'
Complete Flow of Operations
When an MCP client (e.g., Claude Code) attempts to access a protected MCP server:
- Discovery: The client retrieves the MCP server's metadata (RFC 9728) to identify the required Authorization Server.
- Registration: The client dynamically registers with Keycloak via a POST request to
/clients-registrations/openid-connect. - Automatic Scope Inheritance: Keycloak automatically assigns the
mcp:runscope (due to realm default configuration) to the newly registered client. - Authorization Flow: The client initiates the OAuth Authorization Code flow, incorporating PKCE.
- Token Issuance: Keycloak generates a JWT access token, and the audience mapper injects the
aud: "https://mcp-server.example.com/mcp"claim. - Validation: The MCP server validates the
audclaim against its own identifier and grants access to the MCP resources.
Result: The MCP client achieves full functionality without requiring any manual configuration within the Keycloak administrative console. This pattern of realm default scopes combined with an audience mapper establishes fully automated MCP compatibility.
MCP OAuth 2.1 Configuration Deep Dive
This section details the Terraform configurations that transform a standard Keycloak deployment into an MCP-compliant authorization server.
RFC Compliance Matrix
The implementation ensures OAuth 2.1 compatibility through selective RFC adoption:
| RFC | Specification | Implementation Status | Notes |
|---|---|---|---|
| RFC 7591 | Dynamic Client Registration | ✅ Complete | Anonymous DCR enabled for zero-configuration clients |
| RFC 7636 | PKCE (Proof Key for Code Exchange) | ✅ Complete | S256 challenge method mandatory for all clients |
| RFC 8414 | Authorization Server Metadata | ✅ Complete | OIDC discovery at /.well-known/openid-configuration |
| RFC 8707 | Resource Indicators | ✅ Complete | Via audience mapper workaround (native support in development) |
| RFC 9728 | Protected Resource Metadata | ⚠️ MCP Server-dependent | Implemented by MCP servers, not the AS |
Realm Configuration
The MCP realm (mcp-realm.tf) is meticulously configured to establish security policies and token lifespans, optimized for AI model access patterns:
1resource "keycloak_realm" "mcp" {
2 realm = "mcp"
3 enabled = true
4
5 display_name = "MCP Authorization Server"
6 display_name_html = "<b>Model Context Protocol</b>"
7
8 # Token lifespans - optimized for MCP sessions
9 access_token_lifespan = "1h" # Longer for AI workflows
10 sso_session_idle_timeout = "30m"
11 sso_session_max_lifespan = "10h"
12 offline_session_idle_timeout = "720h" # 30 days
13
14 # Security policies
15 ssl_required = "external" # Require HTTPS for external connections
16
17 password_policy = "length(12) and upperCase(1) and lowerCase(1) and digits(1) and specialChars(1)"
18
19 security_defenses {
20 headers {
21 x_frame_options = "DENY"
22 content_security_policy = "frame-src 'self'; frame-ancestors 'self'; object-src 'none';"
23 content_security_policy_report_only = ""
24 x_content_type_options = "nosniff"
25 x_robots_tag = "none"
26 x_xss_protection = "1; mode=block"
27 strict_transport_security = "max-age=31536000; includeSubDomains"
28 }
29
30 brute_force_detection {
31 permanent_lockout = false
32 max_login_failures = 5
33 wait_increment_seconds = 60
34 quick_login_check_milli_seconds = 1000
35 minimum_quick_login_wait_seconds = 60
36 max_failure_wait_seconds = 900
37 failure_reset_time_seconds = 900
38 }
39 }
40}
Client Scopes and Audience Mapper
The mcp:run client scope (mcp-scopes.tf) forms the core of the workaround, intelligently combining scope definition with the critical audience mapper:
1# Define the mcp:run client scope
2resource "keycloak_openid_client_scope" "mcp_run" {
3 realm_id = keycloak_realm.mcp.id
4 name = "mcp:run"
5 description = "Scope for MCP model execution with audience restriction"
6 consent_screen_text = "Access MCP model servers"
7 include_in_token_scope = true
8}
9
10# Attach the audience mapper to mcp:run scope
11resource "keycloak_openid_hardcoded_claim_protocol_mapper" "mcp_run_audience_mapper" {
12 realm_id = keycloak_realm.mcp.id
13 client_scope_id = keycloak_openid_client_scope.mcp_run.id
14 name = "mcp-audience"
15
16 claim_name = "aud"
17 claim_value = var.resource_server_uri
18 claim_value_type = "String"
19
20 add_to_id_token = false
21 add_to_access_token = true
22 add_to_userinfo = false
23}
24
25# Make mcp:run a default scope for all clients
26resource "keycloak_realm_default_client_scopes" "mcp_realm_defaults" {
27 realm_id = keycloak_realm.mcp.id
28
29 default_scopes = [
30 "profile",
31 "email",
32 keycloak_openid_client_scope.mcp_run.name, # ← Critical
33 "roles",
34 "web-origins",
35 "acr",
36 "basic",
37 ]
38}
Key Design Decision: The mapper specifically configures add_to_access_token = true and add_to_id_token = false. This intentional design ensures the aud claim is present in the access token (for resource server validation) but excluded from the ID token (consumed by the client for user information).
Two-Phase Deployment Pattern
The Keycloak Terraform Provider currently exhibits a limitation: it cannot directly manage Client Registration Policies, which govern DCR behavior. This necessitates a hybrid deployment approach:
Phase 1: Terraform Resources (terraform apply)
This phase declaratively provisions the infrastructure:
- Realm with defined security policies
- Client scopes with embedded protocol mappers
- Realm default scopes
- Optional example clients
Phase 2: REST API Configuration (Bash scripts)
This phase configures imperative settings using the Keycloak Admin REST API:
fix-allowed-scopes.sh: Modifies the Client Registration Policy to includemcp:runin the allowed scopes list.disable-trusted-hosts.sh: Removes the Trusted Hosts policy to accommodate custom redirect URI schemes (e.g.,cursor://,vscode://,claude://).enable-dcr.sh: Verifies Dynamic Client Registration functionality and confirms proper scope inheritance.
The integrated deploy.sh orchestrator automates the execution of both phases:
1#!/bin/bash
2set -e
3
4echo "Phase 1: Terraform deployment..."
5terraform init
6terraform apply -auto-approve
7
8echo "Phase 2: REST API configuration..."
9./fix-allowed-scopes.sh
10./disable-trusted-hosts.sh
11
12echo "Verification: Testing DCR..."
13./enable-dcr.sh
14
15echo "Deployment complete! MCP OAuth 2.1 realm ready."
Trusted Hosts Policy Removal
MCP clients frequently employ non-standard redirect URI schemes that Keycloak's default policies typically reject. The solution involves completely removing the Trusted Hosts policy component:
1# Find the Trusted Hosts policy component
2TRUSTED_HOSTS_ID=$(curl -s "${KEYCLOAK_URL}/admin/realms/mcp/components" \
3 -H "Authorization: Bearer ${ADMIN_TOKEN}" | \
4 jq -r '.[] | select(.name=="Trusted Hosts") | .id')
5
6# Delete it entirely
7curl -X DELETE "${KEYCLOAK_URL}/admin/realms/mcp/components/${TRUSTED_HOSTS_ID}" \
8 -H "Authorization: Bearer ${ADMIN_TOKEN}"
Security Consideration: This action allows all redirect URI schemes, including http://localhost:* for development purposes. For production deployments safeguarding sensitive data, it is recommended to implement a custom policy that explicitly whitelists only approved schemes such as https://, cursor://, vscode://, and claude://.
Infrastructure Components
Beyond the OAuth configuration, several AWS infrastructure components provide essential support for the deployment.
JDBC_PING Clustering for ECS Fargate
Keycloak's native clustering mechanism, JGroups, typically relies on UDP multicast for node discovery. However, AWS VPCs do not support multicast, and ECS Fargate instances lack static IP addresses. The adopted solution is JDBC_PING, which utilizes the PostgreSQL database as a robust coordination mechanism.
How JDBC_PING Functions:
- Each Keycloak container registers its IP address and port within the
JGROUPSPINGtable in PostgreSQL. - Containers periodically query this table to discover active cluster members.
- Session data is replicated across the discovered cluster members.
- Upon container termination, its corresponding entry is gracefully removed from the table.
Configuration (cache-ispn-jdbc-ping.xml):
1<config xmlns="urn:org:jgroups"
2 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3 xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.2.xsd">
4 <TCP bind_addr="${jgroups.bind.address,jgroups.tcp.address:SITE_LOCAL}"
5 bind_port="${jgroups.bind.port,jgroups.tcp.port:7800}"
6 recv_buf_size="5m"
7 send_buf_size="1m"
8 max_bundle_size="64k"/>
9
10 <JDBC_PING
11 connection_driver="org.postgresql.Driver"
12 connection_url="${env.KC_DB_URL}"
13 connection_username="${env.KC_DB_USERNAME}"
14 connection_password="${env.KC_DB_PASSWORD}"
15 initialize_sql="CREATE TABLE IF NOT EXISTS JGROUPSPING (
16 own_addr VARCHAR(200) NOT NULL,
17 cluster_name VARCHAR(200) NOT NULL,
18 ping_data BYTEA,
19 constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name)
20 );"
21 info_writer_sleep_time="500"
22 remove_all_data_on_view_change="true"
23 stack.combine="REPLACE"
24 stack.position="MPING"/>
25
26 <MERGE3 min_interval="10000" max_interval="30000"/>
27 <FD_SOCK/>
28 <FD_ALL timeout="60000" interval="15000"/>
29 <VERIFY_SUSPECT timeout="5000"/>
30 <pbcast.NAKACK2 use_mcast_xmit="false" xmit_interval="1000"/>
31 <UNICAST3 xmit_interval="500"/>
32 <pbcast.STABLE desired_avg_gossip="50000" max_bytes="8m"/>
33 <pbcast.GMS print_local_addr="true" join_timeout="2000"/>
34 <UFC max_credits="2m" min_threshold="0.4"/>
35 <MFC max_credits="2m" min_threshold="0.4"/>
36 <FRAG2 frag_size="60k"/>
37</config>
Keycloak Container Configuration:
The Dockerfile integrates this configuration during the build process:
1FROM quay.io/keycloak/keycloak:26.4.4 as builder
2
3ENV KC_HEALTH_ENABLED=true
4ENV KC_METRICS_ENABLED=true
5ENV KC_HTTP_RELATIVE_PATH=/auth
6ENV KC_DB=postgres
7
8# Copy JDBC_PING configuration
9COPY ./cache-ispn-jdbc-ping.xml /opt/keycloak/conf/cache-ispn-jdbc-ping.xml
10
11# Build optimized image
12RUN /opt/keycloak/bin/kc.sh build --cache-config-file=cache-ispn-jdbc-ping.xml
13
14FROM quay.io/keycloak/keycloak:26.4.4
15COPY --from=builder /opt/keycloak /opt/keycloak
16
17EXPOSE 7800 # JDBC_PING coordination port
18ENTRYPOINT ["/opt/keycloak/bin/kc.sh"]
ECS Task Definition:
The ECS task definition exposes port 7800 to facilitate cluster communication:
1{
2 "containerDefinitions": [{
3 "name": "keycloak",
4 "image": "${ecr_repository_url}:${image_tag}",
5 "portMappings": [
6 {
7 "containerPort": 8080,
8 "protocol": "tcp"
9 },
10 {
11 "containerPort": 7800,
12 "protocol": "tcp"
13 }
14 ],
15 "environment": [
16 {"name": "KC_DB", "value": "postgres"},
17 {"name": "KC_DB_URL", "value": "jdbc:postgresql://..."},
18 {"name": "KC_PROXY_HEADERS", "value": "xforwarded"},
19 {"name": "KC_CACHE_CONFIG_FILE", "value": "cache-ispn-jdbc-ping.xml"}
20 ],
21 "secrets": [
22 {"name": "KC_DB_PASSWORD", "valueFrom": "arn:aws:ssm:..."},
23 {"name": "KEYCLOAK_ADMIN_PASSWORD", "valueFrom": "arn:aws:ssm:..."}
24 ],
25 "healthCheck": {
26 "command": ["CMD-SHELL", "curl -f http://localhost:8080/auth/health || exit 1"],
27 "interval": 30,
28 "timeout": 5,
29 "retries": 3
30 }
31 }]
32}
Result: ECS can dynamically scale tasks up or down. New containers automatically join the cluster, and terminated containers are gracefully removed, ensuring session data persistence across container restarts.
Aurora Serverless v2 Configuration
Aurora Serverless v2 offers a PostgreSQL-compatible database with sub-second scaling and granular pay-per-second billing:
1module "aurora_postgresql" {
2 source = "terraform-aws-modules/rds-aurora/aws"
3 version = "~> 8.0"
4
5 name = "keycloak-db"
6 engine = "aurora-postgresql"
7 engine_version = "16.8"
8 instance_class = "db.serverless"
9 instances = {
10 one = {}
11 two = {} # Multi-AZ for high availability
12 }
13
14 serverlessv2_scaling_configuration = {
15 min_capacity = 0.5 # 1 GB RAM - minimal idle cost
16 max_capacity = 2 # 4 GB RAM - handles production traffic
17 }
18
19 vpc_id = module.vpc.vpc_id
20 db_subnet_group_name = aws_db_subnet_group.aurora.name
21 security_group_rules = {
22 keycloak_ingress = {
23 source_security_group_id = aws_security_group.keycloak_ecs.id
24 }
25 }
26
27 storage_encrypted = true
28 apply_immediately = true
29
30 backup_retention_period = 7
31 preferred_backup_window = "03:00-04:00"
32
33 database_name = "keycloak"
34 master_username = "keycloak"
35}
Scaling Behavior: Aurora Serverless v2 actively monitors database load (CPU, connections, memory) and adjusts ACU capacity in sub-second increments. Typical Keycloak workload scaling ranges are:
- Idle/Development: 0.5 ACU (approximately $0.12/hour)
- Normal Production: 1-1.5 ACU (approximately $0.24/hour)
- High Load (authentication storms): 1.5-2 ACU (approximately $0.36/hour)
ECS Service Configuration
The ECS service is responsible for managing task placement and continuous health monitoring:
1resource "aws_ecs_service" "keycloak" {
2 name = "keycloak"
3 cluster = aws_ecs_cluster.main.id
4 task_definition = aws_ecs_task_definition.keycloak.arn
5 desired_count = var.desired_count # 2 for HA
6
7 launch_type = "FARGATE"
8 platform_version = "LATEST"
9
10 deployment_maximum_percent = 200 # Allow 2x capacity during updates
11 deployment_minimum_healthy_percent = 100 # Always maintain full capacity
12 health_check_grace_period_seconds = 600 # Allow time for Keycloak startup
13
14 network_configuration {
15 subnets = module.vpc.private_subnets
16 security_groups = [aws_security_group.keycloak_ecs.id]
17 }
18
19 load_balancer {
20 target_group_arn = aws_lb_target_group.keycloak.arn
21 container_name = "keycloak"
22 container_port = 8080
23 }
24
25 depends_on = [aws_lb_listener.https]
26}
Deployment Strategy: This configuration ensures zero-downtime updates through the following sequence:
- New tasks are initiated, temporarily increasing capacity to 200%.
- New tasks successfully pass health checks (Keycloak startup is allotted 600 seconds).
- Traffic is progressively diverted to the new tasks.
- Old tasks are gracefully drained and terminated.
- The system returns to its stable state of 100% capacity (2 tasks).
Deployment Walkthrough
This section provides a comprehensive, step-by-step guide for deploying the Keycloak infrastructure.
Prerequisites
Local Tools:
- Terraform >= 1.0
- AWS CLI v2, configured with appropriate credentials
- Docker (for building container images)
jq(for JSON parsing in scripts)make(optional, for simplified command execution)
AWS Permissions:
- Creation of VPC, Subnet, Security Group, and NAT Gateway resources.
- Provisioning of RDS Aurora cluster and instances.
- Management of ECS cluster, task definitions, and services.
- Creation of ECR repositories and pushing container images.
- Creation of IAM roles for ECS task execution.
- Read/write access to SSM Parameter Store.
- Creation of ACM certificates (or access to an existing certificate ARN).
Step 1: Clone and Initialize Infrastructure
1# Clone the repository
2git clone https://github.com/your-org/terraform-keycloak-aws.git
3cd terraform-keycloak-aws
4
5# Create a new environment
6cp -r environments/template environments/production
7cd environments/production
8
9# Configure terraform.tfvars
10cat > terraform.tfvars <<EOF
11aws_region = "us-east-1"
12environment = "production"
13vpc_cidr = "10.0.0.0/16"
14availability_zones = ["us-east-1a", "us-east-1b"]
15
16# Start with 0 to avoid costs during initial setup
17desired_count = 0
18
19# Use existing ACM certificate or create new one
20certificate_arn = "arn:aws:acm:us-east-1:ACCOUNT:certificate/CERT_ID"
21domain_name = "auth.example.com"
22
23# Database configuration
24db_instance_class = "db.serverless"
25db_allocated_storage = 20
26db_engine_version = "16.8"
27
28# Aurora Serverless v2 scaling
29aurora_serverless_min_capacity = 0.5
30aurora_serverless_max_capacity = 2
31EOF
32
33# Initialize and create infrastructure (no running tasks yet)
34make all
35# Or manually:
36# terraform init
37# terraform plan
38# terraform apply
Result: The VPC, subnets, NAT gateways, Aurora RDS, ECS cluster, and ALB are successfully provisioned. No ECS tasks are yet operational.
Step 2: Build and Push Container Image
1cd ../../build/keycloak
2
3# Configure environment
4export AWS_REGION=us-east-1
5export ENV_NAME=production
6
7# Build and push (uses Makefile automation)
8make all
9
10# Or manually:
11# aws ecr get-login-password --region us-east-1 | \
12# docker login --username AWS --password-stdin $(aws sts get-caller-identity --query Account --output text).dkr.ecr.us-east-1.amazonaws.com
13# docker build -t keycloak-mcp:latest .
14# docker tag keycloak-mcp:latest ECR_URL:latest
15# docker push ECR_URL:latest
Result: A custom Keycloak container image, configured with JDBC_PING clustering, is built and pushed to Amazon ECR.
Step 3: Scale Up ECS Service
1cd ../../environments/production
2
3# Update terraform.tfvars
4sed -i 's/desired_count = 0/desired_count = 2/' terraform.tfvars
5
6# Apply changes
7make update
8# Or: terraform apply
Result: Two Keycloak containers are launched within private subnets, establish a cluster via JDBC_PING, and register with the ALB. Keycloak becomes accessible at https://auth.example.com/auth.
Step 4: Create Admin User
1# Get admin password from SSM Parameter Store
2ADMIN_PASSWORD=$(aws ssm get-parameter \
3 --name "/keycloak/production/admin_password" \
4 --with-decryption \
5 --query Parameter.Value \
6 --output text)
7
8echo "Admin URL: https://auth.example.com/auth/admin"
9echo "Username: admin"
10echo "Password: ${ADMIN_PASSWORD}"
Log in to the Keycloak admin console to verify the deployment's integrity.
Step 5: Configure MCP OAuth Realm
1cd mcp-oauth
2
3# Auto-generate configuration from parent deployment
4./init-from-parent.sh --mcp-server-url "https://mcp-server.example.com/mcp"
5
6# Review generated terraform.tfvars
7cat terraform.tfvars
8
9# Deploy MCP OAuth realm (Terraform + REST API)
10make deploy
11
12# Verify Dynamic Client Registration
13./enable-dcr.sh
Result: The MCP realm is successfully created and configured with:
- An
mcp:runclient scope, incorporating the audience mapper. - Properly configured realm default scopes.
- Enabled and verified Dynamic Client Registration.
- Removal of the Trusted Hosts policy.
Step 6: Test with MCP Client
Configure an MCP client (e.g., Claude Code, Cursor, VS Code) to establish a connection:
MCP Server Configuration Example:
1{
2 "servers": {
3 "my-mcp-server": {
4 "url": "https://mcp-server.example.com/mcp",
5 "auth": {
6 "type": "oauth",
7 "authorizationUrl": "https://auth.example.com/auth/realms/mcp/protocol/openid-connect/auth",
8 "tokenUrl": "https://auth.example.com/auth/realms/mcp/protocol/openid-connect/token"
9 }
10 }
11 }
12}
Expected Flow:
- The MCP client discovers Authorization Server metadata from Keycloak's OIDC discovery endpoint.
- The client dynamically registers via DCR, obtaining a
client_id. - The client initiates the Authorization Code flow with PKCE.
- The user authenticates through Keycloak.
- The client receives a JWT access token containing
aud: "https://mcp-server.example.com/mcp". - The MCP server validates the token's
audclaim and grants access to MCP resources.
Deployment Considerations
High Availability Configuration
The deployment is engineered for high availability, incorporating redundancy across multiple layers:
Multi-AZ Distribution:
- ECS Tasks: Distributed across two or more Availability Zones using ECS placement strategies.
- Aurora: Multi-AZ cluster with automated failover, achieving a Recovery Time Objective (RTO) typically under 60 seconds.
- ALB: Employs cross-zone load balancing by default.
- NAT Gateways: One per AZ (total of two) for independent outbound connectivity.
Failure Scenarios:
- Single ECS Task Failure: The ALB reroutes traffic to healthy tasks, and ECS automatically initiates a replacement.
- Availability Zone Failure: Aurora promotes a replica, and ECS tasks in other AZs continue to serve traffic.
- Database Primary Failure: Aurora automatically fails over to a replica in a different AZ.
Zero-Downtime Deployments:
The deployment_minimum_healthy_percent = 100 configuration ensures continuous full capacity during service updates:
1Initial state: [Task1] [Task2] (100% capacity)
2Update triggered: [Task1] [Task2] [Task3] [Task4] (200% capacity)
3Health checks: [Task1] [Task2] [Task3✓] [Task4✓]
4Drain old: [Task3✓] [Task4✓]
5Final state: [Task3] [Task4] (100% capacity)
Monitoring and Observability
CloudWatch Logs:
All container output is streamed to CloudWatch Logs with configurable retention policies:
1resource "aws_cloudwatch_log_group" "keycloak" {
2 name = "/ecs/keycloak"
3 retention_in_days = 30
4}
Key Logs to Monitor:
- Authentication failures: Search for
WARN.*org.keycloak.events. - Database errors: Search for
ERROR.*Hibernate. - Clustering issues: Search for
WARN.*JGroupsorJDBC_PING.
Container Insights:
ECS Container Insights should be enabled for comprehensive cluster-level metrics:
1resource "aws_ecs_cluster" "main" {
2 name = "keycloak-cluster"
3
4 setting {
5 name = "containerInsights"
6 value = "enabled"
7 }
8}
This provides metrics for:
- CPU and memory utilization per task.
- Network ingress/egress.
- Task startup and health check durations.
Keycloak Built-in Endpoints:
- Health:
https://auth.example.com/auth/health→ Returns{"status": "UP"}. - Metrics (Prometheus):
https://auth.example.com/auth/metrics→ Offers detailed application metrics. - Server Info: Accessible via Admin Console → Server Info → provides version, clustering status, and memory usage.
Recommended Alarms:
1resource "aws_cloudwatch_metric_alarm" "ecs_cpu_high" {
2 alarm_name = "keycloak-cpu-high"
3 comparison_operator = "GreaterThanThreshold"
4 evaluation_periods = "2"
5 metric_name = "CPUUtilization"
6 namespace = "AWS/ECS"
7 period = "300"
8 statistic = "Average"
9 threshold = "80"
10 alarm_description = "ECS CPU utilization is too high"
11 alarm_actions = [aws_sns_topic.alerts.arn]
12
13 dimensions = {
14 ClusterName = aws_ecs_cluster.main.name
15 ServiceName = aws_ecs_service.keycloak.name
16 }
17}
18
19resource "aws_cloudwatch_metric_alarm" "aurora_cpu_high" {
20 alarm_name = "keycloak-db-cpu-high"
21 comparison_operator = "GreaterThanThreshold"
22 evaluation_periods = "2"
23 metric_name = "CPUUtilization"
24 namespace = "AWS/RDS"
25 period = "300"
26 statistic = "Average"
27 threshold = "80"
28 alarm_description = "Aurora CPU utilization is too high"
29 alarm_actions = [aws_sns_topic.alerts.arn]
30
31 dimensions = {
32 DBClusterIdentifier = module.aurora_postgresql.cluster_id
33 }
34}
Security Best Practices
Encryption Everywhere:
- ALB: Enforces TLS 1.2+ with ACM certificates.
- RDS: AES-256 encryption at rest, managed by KMS.
- ECR: Encrypted container images.
- Parameter Store: SecureString encryption for sensitive credentials.
- In-transit: All external communication is secured via HTTPS.
Secrets Management:
Sensitive values are securely stored in AWS Systems Manager Parameter Store:
1# Store admin password
2aws ssm put-parameter \
3 --name "/keycloak/production/admin_password" \
4 --value "$(openssl rand -base64 32)" \
5 --type SecureString
6
7# Store database password
8aws ssm put-parameter \
9 --name "/keycloak/production/db_password" \
10 --value "$(openssl rand -base64 32)" \
11 --type SecureString
ECS tasks retrieve secrets at runtime through IAM role permissions, eliminating hardcoded credentials in task definitions or source code.
Network Isolation:
- ECS Tasks: Confined to private subnets (no direct internet access).
- RDS: Located in private subnets, accessible only from the ECS security group.
- ALB: Deployed in public subnets (internet-facing).
- NAT Gateways: Reside in public subnets, providing outbound-only internet access for private subnets.
Security Group Rules:
1# ALB security group - allow HTTPS from anywhere
2resource "aws_security_group" "alb" {
3 name_prefix = "keycloak-alb-"
4 vpc_id = module.vpc.vpc_id
5
6 ingress {
7 from_port = 443
8 to_port = 443
9 protocol = "tcp"
10 cidr_blocks = ["0.0.0.0/0"]
11 }
12
13 egress {
14 from_port = 0
15 to_port = 0
16 protocol = "-1"
17 cidr_blocks = ["0.0.0.0/0"]
18 }
19}
20
21# ECS security group - allow traffic only from ALB
22resource "aws_security_group" "keycloak_ecs" {
23 name_prefix = "keycloak-ecs-"
24 vpc_id = module.vpc.vpc_id
25
26 ingress {
27 from_port = 8080
28 to_port = 8080
29 protocol = "tcp"
30 security_groups = [aws_security_group.alb.id]
31 }
32
33 # Allow clustering between ECS tasks
34 ingress {
35 from_port = 7800
36 to_port = 7800
37 protocol = "tcp"
38 self = true
39 }
40
41 egress {
42 from_port = 0
43 to_port = 0
44 protocol = "-1"
45 cidr_blocks = ["0.0.0.0/0"]
46 }
47}
48
49# RDS security group - allow traffic only from ECS
50resource "aws_security_group" "aurora" {
51 name_prefix = "keycloak-aurora-"
52 vpc_id = module.vpc.vpc_id
53
54 ingress {
55 from_port = 5432
56 to_port = 5432
57 protocol = "tcp"
58 security_groups = [aws_security_group.keycloak_ecs.id]
59 }
60}
Troubleshooting Common Issues
Issue 1: DCR Clients Missing mcp:run Scope
Symptoms: Dynamically registered clients receive tokens without the aud claim, or with aud: [] (an empty array).
Root Cause: The mcp:run scope is not configured as a realm default scope, or the Client Registration Policy does not permit it.
Solution:
1# Verify realm default scopes
2curl -s "${KEYCLOAK_URL}/admin/realms/mcp" \
3 -H "Authorization: Bearer ${ADMIN_TOKEN}" | \
4 jq '.defaultDefaultClientScopes'
5
6# Expected output should include "mcp:run".
7# If missing, recreate the realm or manually add via the admin console:
8# Realm Settings → Client Scopes → Default Client Scopes → Add "mcp:run".
9
10# Verify Client Registration Policy allows mcp:run
11cd environments/production/mcp-oauth
12./fix-allowed-scopes.sh
Issue 2: Trusted Hosts Policy Blocking Custom Redirect URIs
Symptoms: Dynamic Client Registration succeeds, but authorization requests fail with an "Invalid redirect URI" error for schemes like cursor:// or vscode://.
Root Cause: Keycloak's Trusted Hosts policy defaults to rejecting non-HTTPS schemes.
Solution:
1cd environments/production/mcp-oauth
2./disable-trusted-hosts.sh
Verify within the admin console: Client Registration → Policies → (confirm "Trusted Hosts" policy is absent).
Issue 3: Clustering Failures ("Split Brain")
Symptoms: Users experience inconsistent authentication states, or sessions unexpectedly expire. Logs display repeated VIEW_CHANGE messages or WARN.*JDBC_PING errors.
Root Cause: JDBC_PING communication issues, typically due to:
- Database connectivity problems.
- Security group rules blocking port 7800.
- Multiple tasks simultaneously attempting to write to the
JGROUPSPINGtable.
Solution:
1# Check JGROUPSPING table
2psql -h AURORA_ENDPOINT -U keycloak -d keycloak -c "SELECT * FROM JGROUPSPING;"
3
4# The table should display one row per running ECS task.
5# If empty or stale, investigate:
6# 1. Ensure the security group permits port 7800 between ECS tasks.
7# 2. Verify the `KC_DB_URL` environment variable is correct.
8# 3. Confirm database credentials are valid.
9
10# Restart the ECS service to force a cluster re-join
11aws ecs update-service \
12 --cluster keycloak-cluster \
13 --service keycloak \
14 --force-new-deployment
Issue 4: ALB Health Checks Failing
Symptoms: ECS tasks initiate, pass initial health checks, but then repeatedly fail and restart.
Root Cause: The health check path /auth/health may not respond promptly during Keycloak startup (which can take 60-120 seconds), or the health check interval is too aggressive.
Solution:
Increase the health check grace period:
1resource "aws_ecs_service" "keycloak" {
2 # ...
3 health_check_grace_period_seconds = 600 # 10 minutes
4}
Alternatively, utilize a more reliable health check path:
1resource "aws_lb_target_group" "keycloak" {
2 # ...
3 health_check {
4 enabled = true
5 path = "/auth/realms/master" # More reliable than /auth/health
6 port = "traffic-port"
7 protocol = "HTTP"
8 timeout = 5
9 interval = 30
10 healthy_threshold = 2
11 unhealthy_threshold = 3
12 matcher = "200"
13 }
14}
Conclusion
This guide has elucidated the process of configuring Keycloak as an MCP-compatible OAuth 2.1 authorization server. By strategically leveraging Keycloak's protocol mapper extensibility and realm configuration, this solution effectively addresses the platform's native lack of RFC 8707 support, while enabling a zero-configuration experience for MCP clients.
Key Implementation Takeaways:
- Audience Mapper Workaround: Custom protocol mappers are employed to inject the requisite
audclaim into JWT access tokens, compensating for Keycloak's inherent RFC 8707 limitations. - Realm Default Scopes: The configuration of
mcp:runas a realm default scope ensures that all dynamically registered clients automatically inherit the audience mapper. - Two-Phase Configuration: A hybrid approach, combining Terraform resources with REST API configuration, effectively navigates the current limitations of the Keycloak Terraform provider concerning Client Registration Policies.
- Automated Infrastructure: Terraform modules facilitate repeatable AWS deployments, incorporating ECS Fargate, Aurora PostgreSQL, and essential networking components.
When to Choose Keycloak for MCP:
- Self-hosted Requirements: When on-premises or private cloud deployment is mandated.
- Keycloak-Specific Features: When advanced features like user federation, identity brokering, or custom authentication flows are necessary.
- Cost Considerations: For scenarios prioritizing open-source licensing with infrastructure-only costs.
- Customization Needs: When full control over authentication flows and user management is essential.
Alternative Solutions for Consideration:
For organizations preferring managed identity solutions, IDaaS providers offering native RFC 8707 support should be evaluated:
- Amazon Cognito: Provides native RFC 8707 support, though it requires custom implementation for the DCR endpoint.
- Ping Identity: Offers comprehensive RFC 8707 and RFC 7591 compliance (as detailed in the MCP authorization compatibility analysis).
For a deeper technical understanding of the OAuth 2.1 specifications and RFC requirements underpinning MCP authorization, refer to my comprehensive analysis: Technical Deconstruction of MCP Authorization: A Deep Dive into OAuth 2.1 and IETF RFC Specifications.
The complete Terraform configuration, Dockerfile, and deployment automation scripts are available in the terraform-keycloak-aws repository.
Resources & References
Official Documentation
- Keycloak Documentation: Official Keycloak server administration and configuration guide.
- AWS ECS Best Practices: AWS guidance on container orchestration with ECS Fargate.
- Aurora Serverless v2 Documentation: Detailed information on Aurora Serverless scaling and configuration.
OAuth and MCP Specifications
- RFC 7591 - OAuth 2.0 Dynamic Client Registration: Dynamic client registration protocol specification.
- RFC 7636 - Proof Key for Code Exchange (PKCE): PKCE specification for authorization code flow protection.
- RFC 8707 - Resource Indicators for OAuth 2.0: Audience restriction via resource parameter.
- Model Context Protocol Specification: Official MCP specification and requirements.
Related Articles
- Technical Deconstruction of MCP Authorization: A Deep Dive into OAuth 2.1 and IETF RFC Specifications: Comprehensive analysis of OAuth 2.1 RFCs and MCP requirements.
- Building an MCP Agentic Chatbot on AWS: Patterns for MCP server implementation on AWS.
- Using MCP Client OAuthClientProvider with AWS Agentcore: Practical OAuth client implementation with MCP.
GitHub Repository
- terraform-keycloak-aws: Complete source code for this deployment, including Terraform modules, Dockerfile, and deployment automation scripts.