Skip to content

flashiam12/migration-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Migration Pipeline on Confluent Cloud

Transfer CDC Data between MySQL DB on RDS to Bigquery in a realtime, highly secure and scalable way

This repository contains Terraform configurations for CDC Data pipeline from RDS MySQL DB to BigQuery using Confluent Cloud. It aims to provide infrastructure as code (IaC) to manage resources and deploy applications on AWS, GCP and Confluent over private networking and stream governance.

Architecture

Architecture Diagram

  • AWS: RDS, RDS Proxy, PrivateLink Service, Network Load balancer, Secrets Manager, EC2, VPC Endpoint, Route53
  • GCP: BigQuery, IAM
  • Confluent: PLATT Networking on AWS, BQ & Mysql CDC V2 Connectors, Schema Registry, Enterprise Kafka Cluster

Pre-requisites

  • Terraform: Version
  • Cloud Provider CLI: (AWS, GCP & Confluent Cloud)
  • Cloud Provider Access: (AWS, GCP & Confluent Cloud)
    • AWS: ACCESS_KEY & SECRET_KEY - Network Administrator, RDS, KMS, Secret Manager, EC2
    • GCP: Service Account - BigQuery writer, IAM Editor
    • Confluent: Cloud API KEY & SECRET
  • Linux Tools - mysql client, git, vscode
  • Terraform Cloud/State backend (optional): If using a remote backend for storing the Terraform state, ensure that it is set up (e.g., AWS S3, HashiCorp Consul, etc.).

Initialization

Scaffolding

    # Enabling RDS with proper binlog config for cdc on mysql 
    aws rds create-db-parameter-group \
    --db-parameter-group-name confluent-mysql8 \
    --db-parameter-group-family MySQL8.0 \
    --description "Parameter group binlog setting for cdc"

    aws rds modify-db-parameter-group \
        --db-parameter-group-name confluent-mysql8 \
        --parameters "ParameterName=binlog_format,ParameterValue=ROW,ApplyMethod=immediate"

    aws rds modify-db-instance --db-instance-identifier <YOUR_DB_INSTANCE_IDENTIFIER> --db-parameter-group-name confluent-mysql8 --apply-immediately

    aws rds reboot-db-instance --db-instance-identifier <YOUR_DB_INSTANCE_IDENTIFIER>
    
    # Running the terraform plan 
    terraform init
    terraform plan 
    terraform apply 

Nginx proxy setup

Confluent Cloud Proxy Setup with NGINX

This document outlines the steps to install and configure an NGINX proxy to route traffic to your Confluent Cloud cluster. This setup uses Server Name Indication (SNI) to direct traffic to the appropriate servers on ports 443 and 9092.

Prerequisites

  • A Virtual Machine (VM) in your VPC or VNet that is connected to Confluent Cloud.
  • Access to the Confluent Cloud Console.
  • Basic Linux command-line knowledge.

Installation and Configuration

  1. Provision a VM:

    • Create a VM in your VPC or VNet that has network connectivity to your Confluent Cloud environment. Default VM properties are sufficient.
  2. Install NGINX:

    • Connect to your VM via SSH.
    • For Ubuntu/Debian:
      sudo apt update
      sudo apt install nginx
    • For RedHat:
      sudo yum install nginx
  3. Test NGINX Configuration:

    • Verify the NGINX installation and configuration syntax:
      nginx -t
  4. Enable ngx_stream_module (if needed):

    • If you encounter an error related to ngx_stream_module.so, locate the module. Common locations are /usr/lib/nginx/modules or /usr/lib64/nginx/modules.
    • Add the following line to the top of /etc/nginx/nginx.conf:
      load_module /usr/lib/nginx/modules/ngx_stream_module.so; #adjust the path if needed
    • Re-test the configuration:
      nginx -t
  5. Configure NGINX for SNI Routing:

    • Replace the contents of /etc/nginx/nginx.conf with the following:
      events {}
      stream {
        map $ssl_preread_server_name $targetBackend {
           default $ssl_preread_server_name;
       }
      
       server {
         listen 9092;
      
         proxy_connect_timeout 1s;
         proxy_timeout 7200s;
      
         resolver 127.0.0.53;
      
         proxy_pass $targetBackend:9092;
         ssl_preread on;
       }
      
       server {
         listen 443;
      
         proxy_connect_timeout 1s;
         proxy_timeout 7200s;
      
         resolver 127.0.0.53;
      
         proxy_pass $targetBackend:443;
         ssl_preread on;
       }
      
       log_format stream_routing '[$time_local] remote address $remote_addr'
                          'with SNI name "$ssl_preread_server_name" '
                          'proxied to "$upstream_addr" '
                          '$protocol $status $bytes_sent $bytes_received '
                          '$session_time';
       access_log /var/log/nginx/stream-access.log stream_routing;
      }
    • Important: Do not replace $targetBackend. This variable is used for SNI routing.
  6. Verify DNS Resolver:

    • Test the resolver configuration:
      nslookup <ConfluentCloud_BootstrapHostname> 127.0.0.53
      • Replace <ConfluentCloud_BootstrapHostname> with your Confluent Cloud bootstrap hostname.
    • Check /var/log/nginx/error.log for resolver errors.
    • If DNS resolution fails, adjust the resolver directive in both server blocks:
      • AWS: resolver 169.254.169.253;
      • Azure: resolver 168.63.129.16;
      • Google Cloud: resolver 169.254.169.254;
  7. Restart NGINX:

    • Apply the changes:
      sudo systemctl restart nginx
  8. Verify NGINX Status:

    • Ensure NGINX is running:
      sudo systemctl status nginx
  9. Configure DNS Resolution:

    • On your local machine (not the proxy VM), update your DNS configuration (e.g., /etc/hosts) to route Confluent Cloud traffic through the proxy.
    • Add lines similar to the following, replacing placeholders with your VM's public IP and Confluent Cloud endpoints:
      <Public IP Address of VM instance> <Kafka-REST-Endpoint>
      <Public IP Address of VM instance> <Flink-private-endpoint>
      
      • Retrieve the <Kafka-REST-Endpoint> from the Confluent Cloud Console.
      • The Kafka bootstrap and REST endpoints often share the same hostname, differing only in port number.

Notes

  • Ensure your VM's security groups allow inbound traffic on ports 443 and 9092.
  • The proxy_timeout is set to 7200 seconds (2 hours). Adjust as needed.
  • This setup assumes your Confluent Cloud cluster uses standard ports 443 and 9092.
  • If you are using a firewall on the VM, ensure it allows connections to the Confluent Cloud cluster.

Teardown

    terraform destroy

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published