DevOps Articles / Blogs / Perficient https://blogs.perficient.com/category/services/platforms-and-technology/devops/ Expert Digital Insights Thu, 25 Sep 2025 10:34:59 +0000 en-US hourly 1 https://blogs.perficient.com/files/favicon-194x194-1-150x150.png DevOps Articles / Blogs / Perficient https://blogs.perficient.com/category/services/platforms-and-technology/devops/ 32 32 30508587 Terraform Code Generator Using Ollama and CodeGemma https://blogs.perficient.com/2025/09/25/terraform-code-generator-using-ollama-and-codegemma/ https://blogs.perficient.com/2025/09/25/terraform-code-generator-using-ollama-and-codegemma/#comments Thu, 25 Sep 2025 10:34:37 +0000 https://blogs.perficient.com/?p=387185

In modern cloud infrastructure development, writing Terraform code manually can be time-consuming and error-prone—especially for teams that frequently deploy modular and scalable environments. There’s a growing need for tools that:

  • Allow natural language input to describe infrastructure requirements.
  • Automatically generate clean, modular Terraform code.
  • Integrate with cloud authentication mechanisms.
  • Save and organize code into execution-ready files.

This model bridges the gap between human-readable Infrastructure descriptions and machine-executable Terraform scripts, making infrastructure-as-code more accessible and efficient. To build this model, we utilize CodeGemma, a lightweight AI model optimized for coding tasks, which runs locally via Ollama.

Qadkyxzvpwpsnkuajbujylwozlw36aeyw Mos4qgcxocvikd9fqwlwi18nu1eejv9khrb52r Ak3lastherfdzlfuhwfzzf4kelmucdplzzkdezh90a

In this blog, we explore how to build a Terraform code generator web app using:

  • Flask for the web interface
  • Ollama’s CodeGemma model for AI-powered code generation
  • Azure CLI authentication using service principal credentials
  • Modular Terraform file creation based on user queries

This tool empowers developers to describe infrastructure needs in natural language and receive clean, modular Terraform code ready for deployment.

Technologies Used

CodeGemma

CodeGemma is a family of lightweight, open-source models optimized for coding tasks. It supports code generation from natural language.

Running CodeGemma locally via Ollama means:

  • No cloud dependency: You don’t need to send data to external APIs.
  • Faster response times: Ideal for iterative development.
  • Privacy and control: Your infrastructure queries and generated code stay on your machine.
  • Offline capability: Ideal for use in restricted or secure environments.
  • Zero cost: Since the model runs locally, there’s no usage fee or subscription required—unlike cloud-based AI services.

Flask

We chose Flask as the web framework for this project because of its:

  • Simplicity and flexibility: Flask is a lightweight and easy-to-set-up framework, making it ideal for quick prototyping.

Initial Setup

  • Install Python.
winget install Python.Python.3
ollama pull codegemma:7b
ollama run codegemma:7b
  • Install the Ollama Python library to use Gemma 3 in your Python projects.
pip install ollama

Folder Structure

Folder Structure

 

Code

from flask import Flask, jsonify, request, render_template_string
from ollama import generate
import subprocess
import re
import os

app = Flask(__name__)
# Azure credentials
CLIENT_ID = "Enter your credentials here."
CLIENT_SECRET = "Enter your credentials here."
TENANT_ID = "Enter your credentials here."

auth_status = {"status": "not_authenticated", "details": ""}
input_fields_html = ""
def authenticate_with_azure():
    try:
        result = subprocess.run(
            ["cmd.exe", "/c", "C:\\Program Files\\Microsoft SDKs\\Azure\\CLI2\\wbin\\az.cmd",
             "login", "--service-principal", "-u", CLIENT_ID, "-p", CLIENT_SECRET, "--tenant", TENANT_ID],
            capture_output=True, text=True, check=True
        )
        auth_status["status"] = "success"
        auth_status["details"] = result.stdout
    except subprocess.CalledProcessError as e:
        auth_status["status"] = "failed"
        auth_status["details"] = e.stderr
    except Exception as ex:
        auth_status["status"] = "terminated"
        auth_status["details"] = str(ex)

@app.route('/', methods=['GET', 'POST'])
def home():
    terraform_code = ""
    user_query = ""
    input_fields_html = ""

    if request.method == 'POST':
        user_query = request.form.get('query', '')

        base_prompt = (
            "Generate modular Terraform code using best practices. "
            "Create separate files for main.tf, vm.tf, vars.tf, terraform.tfvars, subnet.tf, kubernetes_cluster etc. "
            "Ensure the code is clean and execution-ready. "
            "Use markdown headers like ## Main.tf: followed by code blocks."
        )

        full_prompt = base_prompt + "\n" + user_query
        try:
            response_cleaned = generate(model='codegemma:7b', prompt=full_prompt)
            terraform_code = response_cleaned.get('response', '').strip()
        except Exception as e:
            terraform_code = f"# Error generating code: {str(e)}"

            provider_block = f"""
              provider "azurerm" {{
              features {{}}
              subscription_id = "Enter your credentials here."
              client_id       = "{CLIENT_ID}"
              client_secret   = "{CLIENT_SECRET}"
              tenant_id       = "{TENANT_ID}"
            }}"""
            terraform_code = provider_block + "\n\n" + terraform_code

        with open('main.tf', 'w', encoding='utf-8') as f:
            f.write(terraform_code)


        # Create output directory
        output_dir = r"C:\Users\riya.achkarpohre\Desktop\AI\test7\terraform_output"
        os.makedirs(output_dir, exist_ok=True)

        # Define output paths
        paths = {
            "main.tf": os.path.join(output_dir, "Main.tf"),
            "vm.tf": os.path.join(output_dir, "VM.tf"),
            "subnet.tf": os.path.join(output_dir, "Subnet.tf"),
            "vpc.tf": os.path.join(output_dir, "VPC.tf"),
            "vars.tf": os.path.join(output_dir, "Vars.tf"),
            "terraform.tfvars": os.path.join(output_dir, "Terraform.tfvars"),
            "kubernetes_cluster.tf": os.path.join(output_dir, "kubernetes_cluster.tf")
        }

        # Split response using markdown headers
        sections = re.split(r'##\s*(.*?)\.tf:\s*\n+```(?:terraform)?\n', terraform_code)

        # sections = ['', 'Main', '<code>', 'VM', '<code>', ...]
        for i in range(1, len(sections), 2):
            filename = sections[i].strip().lower() + '.tf'
            code_block = sections[i + 1].strip()

            # Remove closing backticks if present
            code_block = re.sub(r'```$', '', code_block)

            # Save to file if path is defined
            if filename in paths:
                with open(paths[filename], 'w', encoding='utf-8') as f:
                    f.write(code_block)
                    print(f"\n--- Written: {filename} ---")
                    print(code_block)
            else:
                print(f"\n--- Skipped unknown file: {filename} ---")

        return render_template_string(f"""
        <html>
        <head><title>Terraform Generator</title></head>
        <body>
            <form method="post">
                <center>
                    <label>Enter your query:</label><br>
                    <textarea name="query" rows="6" cols="80" placeholder="Describe your infrastructure requirement here..."></textarea><br><br>
                    <input type="submit" value="Generate Terraform">
                </center>
            </form>
            <hr>
            <h2>Generated Terraform Code:</h2>
            <pre>{terraform_code}</pre>
            <h2>Enter values for the required variables:</h2>
            <h2>Authentication Status:</h2>
            <pre>Status: {auth_status['status']}\n{auth_status['details']}</pre>
        </body>
        </html>
        """)

    # Initial GET request
    return render_template_string('''
    <html>
    <head><title>Terraform Generator</title></head>
    <body>
        <form method="post">
            <center>
                <label>Enter your query:</label><br>
                <textarea name="query" rows="6" cols="80" placeholder="Describe your infrastructure requirement here..."></textarea><br><br>
                <input type="submit" value="Generate Terraform">
            </center>
        </form>
    </body>
    </html>
    ''')

authenticate_with_azure()
@app.route('/authenticate', methods=['POST'])
def authenticate():
    authenticate_with_azure()
    return jsonify(auth_status)

if __name__ == '__main__':
    app.run(debug=True)

Open Visual Studio, create a new file named file.py, and paste the code into it. Then, open the terminal and run the script by typing:

python file.py

Flask Development Server

Out1

Code Structure Explanation

  • Azure Authentication
    • The app uses the Azure CLI (az.cmd) via Python’s subprocess.run() to authenticate with Azure using a service principal. This ensures secure access to Azure resources before generating Terraform code.
  • User Query Handling
    • When a user submits a query through the web form, it is captured using:
user_query = request.form.get('query', '')
  • Prompt Construction
    • The query is appended to a base prompt that instructs CodeGemma to generate modular Terraform code using best practices. This prompt includes instructions to split the code into files, such as main.tf, vm.tf, subnet.tf, etc.
  • Code Generation via CodeGemma
    • The prompt is sent to the CodeGemma:7b model using:
response_cleaned = generate(model='codegemma:7b', prompt=full_prompt)
  • Saving the Full Response
    • The entire generated Terraform code is first saved to a main.tf file as a backup.
  • Output Directory Setup
    • A specific output directory is created using os.makedirs() to store the split .tf files:
output_dir = r"C:\Users\riya.achkarpohre\Desktop\AI\test7\terraform_output"
  • File Path Mapping
    • A dictionary maps expected filenames (such as main.tf and vm.tf) to their respective output paths. This ensures each section of the generated code is saved correctly.
  • Code Splitting Logic
    • The response is split using a regex-based approach, based on markdown headers like ## main.tf: followed by Terraform code blocks. This helps isolate each module.
  • Conditional File Writing
    • For each split section, the code checks if the filename exists in the predefined path dictionary:
      • If defined, the code block is written to the corresponding file.
      • If not defined, the section is skipped and logged as  “unknown file”.
  • Web Output Rendering
    • The generated code and authentication status are displayed on the webpage using render_template_string().

Terminal

Term1

The Power of AI in Infrastructure Automation

This project demonstrates how combining AI models, such as CodeGemma, with simple tools like Flask and Terraform can revolutionize the way we approach cloud infrastructure provisioning. By allowing developers to describe their infrastructure in natural language and instantly receive clean, modular Terraform code, we eliminate the need for repetitive manual scripting and reduce the chances of human error.

Running CodeGemma locally via Ollama ensures:

  • Full control over data
  • Zero cost for code generation
  • Fast and private execution
  • Seamless integration with existing workflows

The use of Azure CLI authentication adds a layer of real-world applicability, making the generated code deployable in enterprise environments.

Whether you’re a cloud engineer, DevOps practitioner, or technical consultant, this tool empowers you to move faster, prototype smarter, and deploy infrastructure with confidence.

As AI continues to evolve, tools like this will become essential in bridging the gap between human intent and machine execution, making infrastructure-as-code not only powerful but also intuitive.

]]>
https://blogs.perficient.com/2025/09/25/terraform-code-generator-using-ollama-and-codegemma/feed/ 2 387185
Automating Azure Key Vault Secret and Certificate Expiry Monitoring with Azure Function App https://blogs.perficient.com/2025/08/26/azure-keyvault-monitoring-automation/ https://blogs.perficient.com/2025/08/26/azure-keyvault-monitoring-automation/#respond Tue, 26 Aug 2025 14:15:25 +0000 https://blogs.perficient.com/?p=386349

How to monitor hundreds of Key Vaults across multiple subscriptions for just $15-25/month

The Challenge: Key Vault Sprawl in Enterprise Azure

If you’re managing Azure at enterprise scale, you’ve likely encountered this scenario: Key Vaults scattered across dozens of subscriptions, hundreds of certificates and secrets with different expiry dates, and the constant fear of unexpected outages due to expired certificates. Manual monitoring simply doesn’t scale when you’re dealing with:

  • Multiple Azure subscriptions (often 10-50+ in large organizations)
  • Hundreds of Key Vaults across different teams and environments
  • Thousands of certificates with varying renewal cycles
  • Critical secrets that applications depend on
  • Different time zones and rotation schedules

The traditional approach of spreadsheets, manual checks, or basic Azure Monitor alerts breaks down quickly. You need something that scales automatically, costs practically nothing, and provides real-time visibility across your entire Azure estate.

The Solution: Event-Driven Monitoring Architecture

Keyvaultautomation

Single Function App, Unlimited Key Vaults

Instead of deploying monitoring resources per Key Vault (expensive and complex), we use a centralized architecture:

Management Group (100+ Key Vaults)
           ↓
   Single Function App
           ↓
     Action Group
           ↓
    Notifications

This approach provides:

  • Unlimited scalability: Monitor 1 or 1000+ Key Vaults with the same infrastructure
  • Cross-subscription coverage: Works across your entire Azure estate
  • Real-time alerts: Sub-5-minute notification delivery
  • Cost optimization: $15-25/month total (not per Key Vault!)

How It Works: The Technical Deep Dive

1. Event Grid System Topics (The Sensors)

Azure Key Vault automatically generates events when certificates and secrets are about to expire. We create Event Grid System Topics for each Key Vault to capture these events:

Event Types Monitored:
• Microsoft.KeyVault.CertificateNearExpiry
• Microsoft.KeyVault.CertificateExpired  
• Microsoft.KeyVault.SecretNearExpiry
• Microsoft.KeyVault.SecretExpired

The beauty? These events are generated automatically by Azure – no polling, no manual checking, just real-time notifications when things are about to expire.

2. Centralized Processing (The Brain)

A single Azure Function App processes ALL events from across your organization:

// Simplified event processing flow
eventGridEvent → parseEvent() → extractMetadata() → 
formatAlert() → sendToActionGroup()

Example Alert Generated:
{
  severity: "Sev1",
  alertTitle: "Certificate Expired in Key Vault",
  description: "Certificate 'prod-ssl-cert' has expired in Key Vault 'prod-keyvault'",
  keyVaultName: "prod-keyvault",
  objectType: "Certificate",
  expiryDate: "2024-01-15T00:00:00.000Z"
}

3. Smart Notification Routing (The Messenger)

Azure Action Groups handle notification distribution with support for:

  • Email notifications (unlimited recipients)
  • SMS alerts for critical expiries
  • Webhook integration with ITSM tools (ServiceNow, Jira, etc.)
  • Voice calls for emergency situations.

Implementation: Infrastructure as Code

The entire solution is deployed using Terraform, making it repeatable and version-controlled. Here’s the high-level infrastructure:

Resource Architecture

# Single monitoring resource group
resource "azurerm_resource_group" "monitoring" {
  name     = "rg-kv-monitoring-${var.timestamp}"
  location = var.primary_location
}

# Function App (handles ALL Key Vaults)
resource "azurerm_linux_function_app" "kv_processor" {
  name                = "func-kv-monitoring-${var.timestamp}"
  service_plan_id     = azurerm_service_plan.function_plan.id
  # ... configuration
}

# Event Grid System Topics (one per Key Vault)
resource "azurerm_eventgrid_system_topic" "key_vault" {
  for_each = { for kv in var.key_vaults : kv.name => kv }
  
  name                   = "evgt-${each.key}"
  source_arm_resource_id = "/subscriptions/${each.value.subscriptionId}/resourceGroups/${each.value.resourceGroup}/providers/Microsoft.KeyVault/vaults/${each.key}"
  topic_type            = "Microsoft.KeyVault.vaults"
}

# Event Subscriptions (route events to Function App)
resource "azurerm_eventgrid_event_subscription" "certificate_expiry" {
  for_each = { for kv in var.key_vaults : kv.name => kv }
  
  azure_function_endpoint {
    function_id = "${azurerm_linux_function_app.kv_processor.id}/functions/EventGridTrigger"
  }
  
  included_event_types = [
    "Microsoft.KeyVault.CertificateNearExpiry",
    "Microsoft.KeyVault.CertificateExpired"
  ]
}

CI/CD Pipeline Integration

The solution includes an Azure DevOps pipeline that:

  1. Discovers Key Vaults across your management group automatically
  2. Generates Terraform variables with all discovered Key Vaults
  3. Deploys infrastructure using infrastructure as code
  4. Validates deployment to ensure everything works
# Simplified pipeline flow
stages:
  - stage: DiscoverKeyVaults
    # Scan management group for all Key Vaults
    
  - stage: DeployMonitoring  
    # Deploy Function App and Event Grid subscriptions
    
  - stage: ValidateDeployment
    # Ensure monitoring is working correctly

Cost Analysis: Why This Approach Wins

Traditional Approach (Per-Key Vault Monitoring)

100 Key Vaults × $20/month per KV = $2,000/month
Annual cost: $24,000

This Approach (Centralized Monitoring)

Base infrastructure: $15-25/month
Event Grid events: $2-5/month  
Total: $17-30/month
Annual cost: $204-360

Savings: 98%+ reduction in monitoring costs

Detailed Cost Breakdown

ComponentMonthly CostNotes
Function App (Basic B1)$13.14Handles unlimited Key Vaults
Storage Account$1-3Function runtime storage
Log Analytics$2-15Centralized logging
Event Grid$0.50-2$0.60 per million operations
Action Group$0Email notifications free
Total$17-33Scales to unlimited Key Vaults

Implementation Guide: Getting Started

Prerequisites

  1. Azure Management Group with Key Vaults to monitor
  2. Service Principal with appropriate permissions:
    • Reader on Management Group
    • Contributor on monitoring subscription
    • Event Grid Contributor on Key Vault subscriptions
  3. Azure DevOps or similar CI/CD platform

Step 1: Repository Setup

Create this folder structure:

keyvault-monitoring/
├── terraform/
│   ├── main.tf              # Infrastructure definitions
│   ├── variables.tf         # Configuration variables
│   ├── terraform.tfvars     # Your specific settings
│   └── function_code/       # Function App source code
├── azure-pipelines.yml      # CI/CD pipeline
└── docs/                    # Documentation

Step 2: Configuration

Update terraform.tfvars with your settings:

# Required configuration
notification_emails = [
  "your-team@company.com",
  "security@company.com"
]

primary_location = "East US"
log_retention_days = 90

# Optional: SMS for critical alerts
sms_notifications = [
  {
    country_code = "1"
    phone_number = "5551234567"
  }
]

# Optional: Webhook integration
webhook_url = "https://your-itsm-tool.com/api/alerts"

Step 3: Deployment

The pipeline automatically:

  1. Scans your management group for all Key Vaults
  2. Generates infrastructure code with discovered Key Vaults
  3. Deploys monitoring resources using Terraform
  4. Validates functionality with test events

Expected deployment time: 5-10 minutes

Step 4: Validation

Test the setup by creating a short-lived certificate:

# Create test certificate with 1-day expiry
az keyvault certificate create \
  --vault-name "your-test-keyvault" \
  --name "test-monitoring-cert" \
  --policy '{
    "issuerParameters": {"name": "Self"},
    "x509CertificateProperties": {
      "validityInMonths": 1,
      "subject": "CN=test-monitoring"
    }
  }'

# You should receive an alert within 5 minutes

Operational Excellence

Monitoring the Monitor

The solution includes comprehensive observability:

// Function App performance dashboard
FunctionAppLogs
| where TimeGenerated > ago(24h)
| summarize 
    ExecutionCount = count(),
    SuccessRate = (countif(Level != "Error") * 100.0) / count(),
    AvgDurationMs = avg(DurationMs)
| extend PerformanceScore = case(
    SuccessRate >= 99.5, "Excellent",
    SuccessRate >= 99.0, "Good", 
    "Needs Attention"
)

Advanced Features and Customizations

1. Integration with ITSM Tools

The webhook capability enables integration with enterprise tools:

// ServiceNow integration example
const serviceNowPayload = {
  short_description: `${objectType} '${objectName}' expiring in Key Vault '${keyVaultName}'`,
  urgency: severity === 'Sev1' ? '1' : '3',
  category: 'Security',
  subcategory: 'Certificate Management',
  caller_id: 'keyvault-monitoring-system'
};

2. Custom Alert Routing

Different Key Vaults can route to different teams:

// Route alerts based on Key Vault naming convention
const getNotificationGroup = (keyVaultName) => {
  if (keyVaultName.includes('prod-')) return 'production-team';
  if (keyVaultName.includes('dev-')) return 'development-team';
  return 'platform-team';
};

3. Business Hours Filtering

Critical alerts can bypass business hours, while informational alerts respect working hours:

const shouldSendImmediately = (severity, currentTime) => {
  if (severity === 'Sev1') return true; // Always send critical alerts
  
  const businessHours = isBusinessHours(currentTime);
  return businessHours || isNearBusinessHours(currentTime, 2); // 2 hours before business hours
};

Troubleshooting Common Issues

Issue: No Alerts Received

Symptoms:

Events are visible in Azure, but no notifications are arriving

Resolution Steps:

  1. Check the Action Group configuration in the Azure Portal
  2. Verify the Function App is running and healthy
  3. Review Function App logs for processing errors
  4. Validate Event Grid subscription is active

Issue: High Alert Volume

Symptoms:

Too many notifications, alert fatigue

Resolution:

// Implement intelligent batching
const batchAlerts = (alerts, timeWindow = '15m') => {
  return alerts.reduce((batches, alert) => {
    const key = `${alert.keyVaultName}-${alert.objectType}`;
    batches[key] = batches[key] || [];
    batches[key].push(alert);
    return batches;
  }, {});
};

Issue: Missing Key Vaults

Symptoms: Some Key Vaults are not included in monitoring

Resolution:

  1. Re-run the discovery pipeline to pick up new Key Vaults
  2. Verify service principal has Reader access to all subscriptions
  3. Check for Key Vaults in subscriptions outside the management group
]]>
https://blogs.perficient.com/2025/08/26/azure-keyvault-monitoring-automation/feed/ 0 386349
Part 2: Implementing Azure Virtual WAN – A Practical Walkthrough https://blogs.perficient.com/2025/08/21/part-2-implementing-azure-virtual-wan-a-practical-walkthrough/ https://blogs.perficient.com/2025/08/21/part-2-implementing-azure-virtual-wan-a-practical-walkthrough/#respond Thu, 21 Aug 2025 09:33:21 +0000 https://blogs.perficient.com/?p=386292

In Part 1 (Harnessing the Power of AWS Bedrock through CloudFormation / Blogs / Perficient), we discussed what Azure Virtual WAN is and why it’s a powerful solution for global networking. Now, let’s get hands-on and walk through the actual implementation—step by step, in a simple, conversational way.

Architecturediagram

1.     Creating the Virtual WAN – The Network’s Control Plane

Virtual WAN is the heart of a global network, not just another resource. It replaces: Isolated VPN gateways per region, Manual ExpressRoute configurations, and complex peering relationships.

Setting it up is easy:

  • Navigate to Azure Portal → Search “Virtual WAN”
  • Click Create and configure.
  • Name: Naming matters for enterprise environments
  • Resource Group: Create new rg-network-global (best practice for lifecycle management)
  • Type: Standard (Basic lacks critical features like ExpressRoute support)

Azure will set up the Virtual WAN in a few seconds. Now, the real fun begins.

2. Setting Up the Virtual WAN Hub – The Heart of The Network

The hub is where all connections converge. It’s like a major airport hub where traffic from different locations meets and gets efficiently routed. Without a hub, you’d need to configure individual gateways for every VPN and ExpressRoute connection, leading to higher costs and management overhead.

  • Navigate to the Virtual WAN resource → Click Hubs → New Hub.
  • Configure the Hub.
  • Region: Choose based on: Primary user locations & Azure service availability (some regions lack certain services)
  • Address Space: Assign a private IP range (e.g., 10.100.0.0/24).

Wait for Deployment, this takes about 30 minutes (Azure is building VPN gateways, ExpressRoute gateways, and more behind the scenes).

Once done, the hub is ready to connect everything: offices, cloud resources, and remote users.

3. Connecting Offices via Site-to-Site VPN – Building Secure Tunnels

Branches and data centres need a reliable, encrypted connection to Azure. Site-to-Site VPN provides this over the public internet while keeping data secure. Without VPN tunnels, branch offices would rely on slower, less secure internet connections to access cloud resources, increasing latency and security risks.

  • In the Virtual WAN Hub, go to VPN (Site-to-Site) → Create VPN Site.
  • Name: branch-nyc-01
  • Private Address Space: e.g., 192.168.100.0/24 (must match on-premises network)
  • Link Speed: Set accurately for Azure’s QoS calculations
  • Download VPN Configuration: Azure provides a config file—apply it to the office’s VPN device (like a Cisco or Fortinet firewall).
  • Lastly, connect the VPN Site to the Hub.
  • Navigate to VPN connections → Create connection → Link the office to the hub.

Now, the office and Azure are securely connected.

4. Adding ExpressRoute – The Private Superhighway

For critical applications (like databases or ERP systems), VPNs might not provide enough bandwidth or stability. ExpressRoute gives us a dedicated, high-speed connection that bypasses the public internet. Without ExpressRoute, latency-sensitive applications (like VoIP or real-time analytics) could suffer from internet congestion or unpredictable performance.

  • Order an ExpressRoute Circuit: We can do this via the Azure Portal or through an ISP (like AT&T or Verizon).
  • Authorize the Circuit in Azure
  • Navigate to the Virtual WAN Hub → ExpressRoute → Authorize.
  • Linking it to Hub: Once it is authorized, connect the ExpressRoute circuit to the hub.

Now, the on-premises network has a dedicated, high-speed connection to Azure—no internet required.

5. Enabling Point-to-Site VPN for Remote Workers – The Digital Commute

Employees working from home need secure access to internal apps without exposing them to the public internet. P2S VPN lets them “dial in” securely from anywhere. Without P2S VPN, remote workers might resort to risky workarounds like exposing RDP or databases to the internet.

  • Configure P2S in The Hub
  • Navigate to VPN (Point-to-Site) → Configure.
  • Set Up Authentication: Choose certificate-based auth (secure and easy to manage) and upload the root/issuer certificates.
  • Assign an IP Pool. e.g., 192.168.100.0/24 (this is where remote users will get their IPs).
  • Download & Distribute the VPN Client

Employees install this on their laptops to connect securely. Now, the team can access Azure resources from anywhere just like they’re in the office.

6. Linking Azure Virtual Networks (VNets) – The Cloud’s Backbone

Applications in one VNet (e.g., frontend servers) often need to talk to another (e.g., databases). Rather than complex peering, the Virtual WAN handles routing automatically. Without VNet integration, it needs manual peering and route tables for every connection, creating a management nightmare at scale.

  • VNets need to be attached.
  • Navigate to The Hub → Virtual Network Connections → Add Connection.
  • Select the VNets. e.g., Connect vnet-app (for applications) and vnet-db (for databases).
  • Azure handles the Routing: Traffic flows automatically through the hub-no manual route tables needed.

Now, the cloud resources communicate seamlessly.

Monitoring & Troubleshooting

Networks aren’t “set and forget.” We need visibility to prevent outages and quickly fix issues. We can use tools like Azure Monitor, which tracks VPN/ExpressRoute health—like a dashboard showing all trains (data packets) moving smoothly. Again, Network Watcher can help to diagnose why a branch can’t connect.

Common Problems & Fixes

  • When VPN connections fail, the problem is often a mismatched shared key—simply re-enter it on both ends.
  • If ExpressRoute goes down, check with your ISP—circuit issues usually require provider intervention.
  • When VNet traffic gets blocked, verify route tables in the hub—missing routes are a common culprit.
]]>
https://blogs.perficient.com/2025/08/21/part-2-implementing-azure-virtual-wan-a-practical-walkthrough/feed/ 0 386292
House Price Predictor – An MLOps Learning Project Using Azure DevOps https://blogs.perficient.com/2025/08/06/house-price-predictor-an-mlops-learning-project-using-azure-devops/ https://blogs.perficient.com/2025/08/06/house-price-predictor-an-mlops-learning-project-using-azure-devops/#comments Wed, 06 Aug 2025 12:28:37 +0000 https://blogs.perficient.com/?p=385548

Machine Learning (ML) is no longer limited to research labs — it’s actively driving decisions in real estate, finance, healthcare, and more. But deploying and managing ML models in production is a different ballgame. That’s where MLOps comes in.

In this blog, we’ll walk through a practical MLOps learning project — building a House Price Predictor using Azure DevOps as the CI/CD backbone. We’ll explore the evolution from DevOps to MLOps, understand the model development lifecycle, and see how to automate and manage it effectively.

What is MLOps?

MLOps (Machine Learning Operations) is the discipline of combining Machine Learning, DevOps, and Data Engineering to streamline the end-to-end ML lifecycle.

It aims to:

  • Automate training, testing, and deployment of models
  • Enable reproducibility and version control for data and models
  • Support continuous integration and delivery (CI/CD) for ML workflows
  • Monitor model performance in production

MLOps ensures that your model doesn’t just work in Jupyter notebooks but continues to deliver accurate predictions in production environments over time.

From DevOps to MLOps: The Evolution

DevOps revolutionized software engineering by integrating development and operations through automation, CI/CD, and infrastructure as code (IaC). However, ML projects add new complexity:

Aspect Traditional DevOps MLOps
Artifact Source code Code + data + models
Version Control Git Git + Data Versioning (e.g., DVC)
Testing Unit & integration tests Data validation + model validation
Deployment Web services, APIs ML models, pipelines, batch jobs
Monitoring Logs, uptime, errors Model drift, data drift, accuracy decay

So, MLOps builds on DevOps but extends it with data-centric workflows, experimentation tracking, and model governance.

House Price Prediction: Project Overview

Our goal is to build an ML model that predicts house prices based on input features like square footage, number of bedrooms, location, etc. This learning project is structured to follow MLOps best practices, using Azure DevOps pipelines for automation.

 Project Structure

house-price-predictor/

├── configs/               # Model configurations stored in YAML format

├── data/                  # Contains both raw and processed data files

├── deployment/  

│    └── mlflow/           # Docker Compose files to set up MLflow tracking

├── models/                # Saved model artifacts and preprocessing objects

├── notebooks/             # Jupyter notebooks for exploratory analysis and prototyping

├── src/

│    ├── data/             # Scripts for data preparation and transformation

│    ├── features/         # Logic for generating and engineering features

│    ├── models/           # Code for model building, training, and validation

├── k8s/

│    ├── deployment.yaml        # Kubernetes specs to deploy the Streamlit frontend

│    └── fast_model.yaml        # Kubernetes specs to deploy the FastAPI model service

├── requirements.txt       # List of required Python packages

 Setting Up Your Development Environment

Before getting started, make sure the following tools are installed on your machine:

 Preparing Your Environment

  • Fork this repo on GitHub to your personal or organization account.
  • Clone your forked repository
# Replace 'xxxxxx' with your GitHub username or organization
git clone https://github.com/xxxxxx/house-price-predictor.git
cd house-price-predictor
  • Create a virtual environment using UV:
uv venv --python python3.11
source .venv/bin/activate
  • Install the required Python packages:
uv pip install -r requirements.txt

 Configure MLflow for Experiment Tracking

To enable experiment and model run tracking with MLflow:

cd deployment/mlflow
docker compose -f mlflow-docker-compose.yml up -d
docker compose ps

 Using Podman Instead of Docker?

podman compose -f mlflow-docker-compose.yml up -d
podman compose ps

Access the MLflow UI. Once running, open your browser and navigate to http://localhost:5555

Model Workflow

 Step 1: Data Processing

Perform cleaning and preprocessing on the raw housing dataset:

python src/data/run_processing.py   --input data/raw/house_data.csv   --output data/processed/cleaned_house_data.csv

 Step 2: Feature Engineering

Perform data transformations and feature generation:

python src/features/engineer.py   --input data/processed/cleaned_house_data.csv   --output data/processed/featured_house_data.csv   --preprocessor models/trained/preprocessor.pkl

 Step 3: Modeling & Experimentation

Train the model and track all metrics using MLflow:

python src/models/train_model.py   --config configs/model_config.yaml   --data data/processed/featured_house_data.csv   --models-dir models   --mlflow-tracking-uri http://localhost:5555

Step 4: Building FastAPI and Streamlit

The source code for both applications — the FastAPI backend and the Streamlit frontend — is already available in the src/api and streamlit_app directories, respectively. To build and launch these applications:

  • Add a Dockerfile in the src/api directory to containerize the FastAPI service.
  • Add a Dockerfile inside streamlit_app/ to package the Streamlit interface.
  • Create a docker-compose.yaml file at the project root to orchestrate both containers.
    Make sure to set the environment variable API_URL=http://fastapi:8000 for the Streamlit app to connect to the FastAPI backend.

Once both services are up and running, you can access the Streamlit web UI in your browser to make predictions.

You can also test the prediction API directly by sending requests to the FastAPI endpoint.

curl -X POST "http://localhost:8000/predict" \

-H "Content-Type: application/json" \

-d '{

  "sqft": 1500,

  "bedrooms": 3,

  "bathrooms": 2,

  "location": "suburban",

  "year_built": 2000,

  "condition": fair

}'

Be sure to replace http://localhost:8000/predict with the actual endpoint based on where it’s running.

At this stage, your project is running locally. Now it’s time to implement the same workflow using Azure DevOps.

Prerequisites for Implementing This Approach in Azure DevOps.

To implement a similar MLOps pipeline using Azure DevOps, the following prerequisites must be in place:

  1. Azure Service Connection (Workload Identity-based)
    • Create a Workload Identity Service Connection in Azure DevOps.
    • Assign it Contributor access to the target Azure subscription or resource group.
    • This enables secure and passwordless access to Azure resources from the pipeline.
  2. Azure Kubernetes Service (AKS) Cluster
    • Provision an AKS cluster to serve as the deployment environment for your ML application.
    • Ensure the service connection has sufficient permissions (e.g., Azure Kubernetes Service Cluster User RBAC role) to interact with the cluster.

Start by cloning the existing GitHub repository into your Azure Repos. Inside the repository, you’ll find the azure-pipeline.yaml file, which defines the Azure DevOps CI/CD pipeline consisting of the following four stages:

  1. Data Processing Stage – Handles data cleaning and preparation.
  2. Model Training Stage – Trains the machine learning model and logs experiments.
  3. Build and Publish Stage – Builds Docker images and publishes them to the container registry.
  4. Deploy to AKS Stage – Deploys the application components to Azure Kubernetes Service (AKS).

This pipeline automates the end-to-end ML workflow from raw data to production deployment.

The CI/CD pipeline is already defined in the existing YAML file and is configured to run manually based on the parameters specified at runtime.

This pipeline is manually triggered (no automatic trigger on commits or pull requests) and supports the conditional execution of specific stages using parameters.

It consists of four stages, each representing a step in the MLOps lifecycle:

  1. Data Processing Stage

Condition: Runs if run_all or run_data_processing is set to true.

What it does:

  • Check out the code.
  • Sets up Python 3.11.13 and installs dependencies.
  • Runs scripts to:
    • Clean and preprocess the raw dataset.
    • Perform feature engineering.
  • Publishes the processed data and the trained preprocessor as pipeline artifacts
  1. Model Training Stage

Depends on: DataProcessing
Condition: Runs if run_all or run_model_training is set to true.

What it does:

  • Downloads the processed data artifact.
  • Spins up an MLflow server using Docker.
  • Waits for MLflow to be ready.
  • Trains the machine learning model using the processed data.
  • Logs the training results to MLflow.
  • Publishes the trained model as a pipeline artifact.
  • Stops and removes the temporary MLflow container.
  1. Build and Publish Stage

Depends on: ModelTraining
Condition: Runs if run_all or run_build_and_publish is set to true.

What it does:

  • Downloads trained model and preprocessor artifacts.
  • Builds Docker images for:
    • FastAPI (model API)
    • Streamlit (frontend)
  • Tag both images using the current commit hash and the latest.
  • Runs and tests both containers locally (verifies /health and web access).
  • Pushes the tested Docker images to Docker Hub using credentials stored in the pipeline.
  1. Deploy to AKS Stage

Depends on: BuildAndPublish
Condition: Runs only if the previous stages succeed.

What it does:

  • Uses the Azure CLI to:
    • Set the AKS cluster context. #Make sure to update the cluster name
    • Update Kubernetes deployment YAML files with the new Docker image tags.
    • Apply the updated deployment configurations to the AKS cluster using kubectl.

Now, the next step is to set up the Kubernetes deployment and service configuration for both components of the application:

  • Streamlit App: This serves as the frontend interface for users.
  • FastAPI App: This functions as the backend, handling API requests from the Streamlit frontend and returning model predictions.

Both deployment and service YAML files for these components are already present in the k8s/ folder and will be used for deploying to Azure Kubernetes Service (AKS).

This k8s/deployment.yaml file sets up a Streamlit app on Kubernetes with two key components:

  • Deployment: Runs 2 replicas of the Streamlit app using a Docker image. It exposes port 8501 and sets the API_URL environment variable to connect with the FastAPI backend.
  • Service: Creates a LoadBalancer service that exposes the app on port 80, making it accessible externally.

In short, it deploys the Streamlit frontend and makes it publicly accessible while connecting it to the FastAPI backend for predictions.

This k8s/fastapi_model.yaml file deploys the FastAPI backend for the house price prediction app:

  • It creates a Deployment named house-price-api with 2 replicas running the FastAPI app on port 8000.
  • A LoadBalancer Service named house-price-api-service exposes the app externally on port 8000, allowing other services (like Streamlit) or users to access the API.

In short, it runs the backend API in Kubernetes and makes it accessible for predictions.

Now it’s time for the final run to verify the deployment on the AKS cluster. Trigger the pipeline by selecting the run_all parameter.

Run All Image

 

After the pipeline completes successfully, all four stages and their corresponding jobs will be executed, confirming that the application has been successfully deployed to the AKS cluster.

 

Mlops Stages

Mlops Jobs

 

Now, log in to the Azure portal and retrieve the external IP address of the Streamlit app service. Once accessed in your browser, you’ll see the House Price Prediction Streamlit application up and running.

 

Aks Ips

 

Mlops Page

 

Now, go ahead and perform model inference by selecting the appropriate parameter values and clicking on “Predict Price” to see how the model generates the prediction.

 

Mlops Predict

Conclusion

In this blog, we explored the fundamentals of MLOps and how it bridges the gap between machine learning development and scalable, production-ready deployment. We walked through a complete MLOps workflow—from data processing and feature engineering to model training, packaging, and deployment—using modern tools like FastAPI, Streamlit, and MLflow.

Using Azure DevOps, we implemented a robust CI/CD pipeline to automate each step of the ML lifecycle. Finally, we deployed the complete House Price Predictor application on an Azure Kubernetes Service (AKS) cluster, enabling a user-friendly frontend (Streamlit) to interact seamlessly with a predictive backend (FastAPI).

This end-to-end project not only showcases how MLOps principles can be applied in real-world scenarios but also provides a strong foundation for deploying scalable and maintainable ML solutions in production.

]]>
https://blogs.perficient.com/2025/08/06/house-price-predictor-an-mlops-learning-project-using-azure-devops/feed/ 1 385548
AI in Medical Device Software: From Concept to Compliance https://blogs.perficient.com/2025/07/31/ai-in-medical-device-software-development-lifecycle/ https://blogs.perficient.com/2025/07/31/ai-in-medical-device-software-development-lifecycle/#respond Thu, 31 Jul 2025 14:30:11 +0000 https://blogs.perficient.com/?p=385582

Whether you’re building embedded software for next-gen diagnostics, modernizing lab systems, or scaling user-facing platforms, the pressure to innovate is universal, and AI is becoming a key differentiator. When embedded into the software development lifecycle (SDLC), AI offers a path to reduce costs, accelerate timelines, and equip the enterprise to scale with confidence. 

But AI doesn’t implement itself. It requires a team that understands the nuance of regulated software, SDLC complexities, and the strategic levers that drive growth. Our experts are helping MedTech leaders move beyond experimentation and into execution, embedding AI into the core of product development, testing, and regulatory readiness. 

“AI is being used to reduce manual effort and improve accuracy in documentation, testing, and validation.” – Reuters MedTech Report, 2025 

Whether it’s generating test cases from requirements, automating hazard analysis, or accelerating documentation, we help clients turn AI into a strategic accelerator. 

AI-Accelerated Regulatory Documentation 

Outcome: Faster time to submission, reduced manual burden, improved compliance confidence 

Regulatory documentation remains one of the most resource-intensive phases of medical device development.  

  • Risk classification automation: AI can analyze product attributes and applicable standards to suggest classification and required documentation. 
  • Drafting and validation: Generative AI can produce up to 75% of required documentation, which is then refined and validated by human experts. 
  • AI-assisted review: Post-editing, AI can re-analyze content to flag gaps or inconsistencies, acting as a second set of eyes before submission. 

AI won’t replace regulatory experts, but it will eliminate the grind. That’s where the value lies. 

For regulatory affairs leaders and product teams, this means faster submissions, reduced rework, and greater confidence in compliance, all while freeing up resources to focus on innovation. 

Agentic AI in the SDLC 

Outcome: Increased development velocity, reduced error rates, scalable automation 

Agentic AI—systems of multiple AI agents working in coordination—is emerging as a force multiplier in software development. 

  • Task decomposition: Complex development tasks are broken into smaller units, each handled by specialized agents, reducing hallucinations and improving accuracy. 
  • Peer review by AI: One agent can validate the output of another, creating a self-checking system that mirrors human code reviews. 
  • Digital workforce augmentation: Repetitive, labor-intensive tasks (e.g., documentation scaffolding, test case generation) are offloaded to AI, freeing teams to focus on innovation. This is especially impactful for engineering and product teams looking to scale development without compromising quality or compliance. 
  • Guardrails and oversight mechanisms: Our balanced implementation approach maintains security, compliance, and appropriate human supervision to deliver immediate operational gains and builds a foundation for continuous, iterative improvement. 

Agentic AI can surface vulnerabilities early and propose mitigations faster than traditional methods. This isn’t about replacing engineers. It’s about giving them a smarter co-pilot. 

AI-Enabled Quality Assurance and Testing 

Outcome: Higher product reliability, faster regression cycles, better user experiences 

AI is transforming QA from a bottleneck into a strategic advantage. 

  • Smart regression testing: AI frameworks run automated test suites across releases, identifying regressions with minimal human input. 
  • Synthetic test data generation: AI creates high-fidelity, privacy-safe test data in minutes—data that once took weeks to prepare. 
  • GenAI-powered visual testing: AI evaluates UI consistency and accessibility, flagging issues that traditional automation often misses. 
  • Chatbot validation: AI tools now test AI-powered support interfaces, ensuring they provide accurate, compliant responses. 

We’re not just testing functionality—we’re testing intelligence. That requires a new kind of QA.

Organizations managing complex software portfolios can unlock faster, safer releases. 

AI-Enabled, Scalable Talent Solutions 

Outcome: Scalable expertise without long onboarding cycles 

AI tools are only as effective as the teams that deploy them. We provide specialized talent—regulatory technologists, QA engineers, data scientists—that bring both domain knowledge and AI fluency. 

  • Accelerate proof-of-concept execution: Our teams integrate quickly into existing workflows, leveraging Agile and SAFe methodologies to deliver iterative value and maintain velocity. 
  • Reduce internal training burden: AI-fluent professionals bring immediate impact, minimizing ramp-up time and aligning with sprint-based development cycles. 
  • Ensure compliance alignment from day one: Specialists understand regulated environments and embed quality and traceability into every phase of the SDLC, consistent with Agile governance models. 

Whether you’re a CIO scaling digital health initiatives or a VP of Software managing multiple product lines, our AI-fluent teams integrate seamlessly to accelerate delivery and reduce risk. 

Proof of Concept Today, Scalable Solution Tomorrow 

Outcome: Informed investment decisions, future-ready capabilities 

Many of the AI capabilities discussed are already in early deployment or active pilot phases. Others are in proof-of-concept, with clear paths to scale. 

We understand that every organization is on a unique AI journey. Whether you’re starting from scratch, experimenting with pilots, or scaling AI across your enterprise, we meet you where you are. Our structured approach delivers value at every stage, helping you turn AI from an idea into a business advantage. 

As you evaluate your innovation and investment priorities across the SDLC, consider these questions: 

  1. Are we spending too much time on manual documentation?
  2. Do we have visibility into risk classification and mitigation?
  3. Can our QA processes scale with product complexity?
  4. How are we building responsible AI governance?
  5. Do we have the right partner to operationalize AI?

Final Thought: AI Demands a Partner, Not Just a Platform 

AI isn’t the new compliance partner. It’s the next competitive edge, but only when guided by the right strategy. For MedTech leaders, AI’s real opportunity comes by adopting and scaling it with precision, speed, and confidence. That kind of transformation can be accelerated by a partner who understands the regulatory terrain, the complexity of the SDLC, and the business outcomes that matter most. 

No matter where you sit — on the engineering team, in the lab, in business leadership, or in patient care — AI is reshaping how MedTech companies build, test, and deliver value. 

From insight to impact, our industry, platform, data, and AI expertise help organizations modernize systems, personalize engagement, and scale innovation. We deliver AI-powered transformation that drives engagement, efficiency, and loyalty throughout the lifecycle—from product development to commercial success. 

  • Business Transformation: Deepen collaboration, integration, and support throughout the value chain, including channel sales, providers, and patients. 
  • Modernization: Streamline legacy systems to drive greater connectivity, reduce duplication, and enhance employee and consumer experiences. 
  • Data + Analytics: Harness real-time data to support business success and to impact health outcomes. 
  • Consumer Experience: Support patient and consumer decision making, product usage, and outcomes through tailored digital experiences. 

Ready to move from AI potential to performance? Let’s talk about how we can accelerate your roadmap with the right talent, tools, and strategy. Contact us to get started. 

]]>
https://blogs.perficient.com/2025/07/31/ai-in-medical-device-software-development-lifecycle/feed/ 0 385582
Document Summarization with AI on Amazon Bedrock https://blogs.perficient.com/2025/07/30/document-summarization-with-ai-on-amazon-bedrock/ https://blogs.perficient.com/2025/07/30/document-summarization-with-ai-on-amazon-bedrock/#comments Wed, 30 Jul 2025 07:03:11 +0000 https://blogs.perficient.com/?p=385454

Objective

Enable automated document summarization by allowing us to upload TXT, PDF, or DOCX files, extracting content, summarizing it using Amazon Bedrock, and delivering the summary either via API response or by storing it for future retrieval.

Why This Is Needed

  • Organizations face information overload with a large number of documents.
  • Manual summarization is time-consuming and inconsistent.
  • AI enables faster, accurate, and scalable content summarization.
  • Amazon Bedrock provides easy access to powerful foundation models without managing infrastructure.
  • Helps improve decision-making by delivering quick, reliable insights.

Architecture Overview

Blog Ai1

  1. Uploads a document (TXT, PDF, or DOCX) to an S3 bucket.
  2. S3 triggers a Lambda function.
  3. Extracted content is passed to Amazon Bedrock for summarization (e.g., Claude 3 Sonnet).
  4. The summary is stored in Amazon S3.
  5. Lambda returns a response confirming successful summarization and storage.

AWS Services We Used

  • Amazon S3: Used to upload and store original documents like TXT, PDF, or DOCX files.
  • AWS Lambda: Handles the automation logic, triggered by S3 upload, it parses content and invokes Bedrock.
  • Amazon Bedrock: Provides powerful foundation models (Claude, Titan, or Llama 3) for generating document summaries.
  • IAM Roles: Securely manage permissions across services to ensure least-privilege access control.

Step-by-Step Guide

1. Create an S3 Bucket

  1. Navigate to AWS Console → S3 → Create Bucket
  • Example bucket name: kunaldoc-bucket

Note: Use the US East (N. Virginia) region (us-east-1) since Amazon Bedrock is not available in most of the regions (e.g., not available in Ohio).

  1. Inside the bucket, create two folders:
  • uploads/ – to store original documents (TXT, PDF, DOCX)
  • summaries/ – to save the AI-generated summaries.

Picture2

Step 2: Enable Amazon Bedrock Access

  1. Go to the Amazon Bedrock console.
  2. Navigate to Model access from the left menu.

Picture3

  1.  Select and enable access to the foundation models to be used, such as:
    • Claude 3.5 Sonnet (I used)
    • Meta Llama 3
    • Anthropic Claude
  2. Wait for the status to show as Access granted (this may take a few minutes).

Note: Make sure you’re in the same region as your Lambda function (e.g., us-east-1 / N. Virginia).

Picture4

Step 3: Set Up IAM Role for Lambda

  1. Go to IAM > Roles > Create Role
  2. Choose Lambda as the trusted entity type
  3. Attach these AWS managed policies:
  • AmazonS3FullAccess
  • AmazonBedrockFullAccess
  • AWSLambdaBasicExecutionRole
  1. Name the role something like: LambdaBedrockExecutionRole

This role allows Lambda functions to securely access S3, invoke Amazon Bedrock, and write logs to CloudWatch.

Picture5

Step 4: Create the Lambda Function

  1. Go to AWS Lambda > Create Function
  2. Set the Function Name: docSummarizerLambda (I used that)
  3. Select Runtime: Python 3.9
  4. Choose the Execution Role you created earlier.

(LambdaBedrockExecutionRole)

  1. Upload your code:
  • I added the lambda_function.py code to the GitHub repo.
  • Dependencies (like lxml, PDF) are also included in the same GitHub repo.
  • Download the dependencies zip file to your local machine and attach it as a Lambda Layer during Lambda configuration.

Picture6

This Lambda function handles document parsing, model invocation, and storing the generated summary

Step 5: Set S3 as the Trigger for Lambda

  1. Go to your Lambda function → Configuration → Triggers → Click “Add trigger”
  2. Select Source as S3
  3. Choose the S3 bucket you created earlier (which contains uploads/ and summaries/ folders)
  4. Set the Event type – PUT
  5. Under Prefix, enter: uploads/
  6. Leave Suffix empty (optional)
  7. Click “Add” to finalize the trigger.

Picture7

This ensures your Lambda function is automatically invoked whenever a new file is uploaded to the uploads/ folder in your bucket.

Step 6: Add Lambda Layer for Dependencies

To include external Python libraries (like lxml, pdfminer.six, or python-docx), create a Lambda Layer:

  1. Download the dependencies ZIP

  • Clone or download the dependencies folder from the GitHub repo.
  1. Create the Layer

  • Go to AWS Lambda > Layers > Create layer
  • Name it (e.g., kc-lambda-layer)
  • Upload the ZIP file you downloaded
  • Set the compatible runtime to Python 3.9
  • Click Create

Picture8

  1. Attach Layer to Lambda Function

  • Open your Lambda function
  • Go to Configuration > Layers
  • Click Add a layer > Custom layers
  • Select the one you just downloaded
  • Click Add

Picture9

Picture10

The final version of the Lambda function is shown below:

Picture11

 

Step 7: Upload a Document

  1. Navigate to S3 > uploads/  folder.
  2. Upload your document

Picture12

Once uploaded, the Lambda function is automatically triggered and performs the following actions:

  • Sends content to Bedrock for AI-based summarization.
  • Saves the summary in the summaries/ folder in the same S3 bucket.

Sample data of Document Summarization with AI on Amazon Bedrock file:

Picture13

Step 8: Monitor Lambda Logs in CloudWatch

Debug or verify your Lambda execution:

  1. Go to your Lambda Function in the AWS Console.
  2. Click on the Monitor tab → then View CloudWatch Logs.
  3. Open the Log stream to inspect detailed logs and execution steps.

This helps track any errors or view how the document was processed and summarized.

Picture14

Step 9: View Output Summary

  1. Navigate to your S3 bucket → open the summaries/ folder.
  2. Download the generated file (e.g., script_summary.txt).

Picture15

Results

We can see that the summary for the document summarization with the AI.txt file is successfully generated and saved as document summarization with_summary.txt inside the summaries/ folder.

Picture16

Conclusion

With this serverless workflow, you’ve built an automated document summarization pipeline using Amazon S3, Lambda, and Bedrock. This solution allows us to upload documents in various formats (TXT, PDF, DOCX) and receive concise summaries stored securely in S3 without manual intervention. It’s scalable, cost-effective, and ideal for document-heavy workflows like legal, academic, or business reporting.

We can further enhance it by adding an API Gateway to fetch summaries on demand or integrating DynamoDB for indexing and search.

]]>
https://blogs.perficient.com/2025/07/30/document-summarization-with-ai-on-amazon-bedrock/feed/ 1 385454
AI-Driven Auto-Tagging of EC2 Instances Using Amazon SageMaker https://blogs.perficient.com/2025/07/24/ai-driven-auto-tagging-of-ec2-instances-using-amazon-sagemaker/ https://blogs.perficient.com/2025/07/24/ai-driven-auto-tagging-of-ec2-instances-using-amazon-sagemaker/#respond Thu, 24 Jul 2025 07:22:38 +0000 https://blogs.perficient.com/?p=384469

Managing cloud infrastructure effectively requires consistent and meaningful tagging of resources. Manual tagging is prone to errors and difficult to scale. In this blog, I’ll show you how to use Amazon SageMaker and Python to automatically apply intelligent tags to your EC2 instances using either rule-based logic or AI-powered enhancements.

What You Will Learn

  • How to use SageMaker Studio to run Python scripts for EC2 auto-tagging
  • How to set up SageMaker Domains and user profiles
  • How to securely grant EC2 permissions to SageMaker
  • How to fetch EC2 metadata and apply intelligent tag logic

Why This Is Needed

  • Manual tagging doesn’t scale across large environments
  • Lack of consistent tags causes billing, visibility, and compliance issues
  • AI can intelligently assign tags based on patterns in instance metadata

How It Works

  1. Launch SageMaker Studio (requires a domain and user profile)
  2. Use a notebook to run a Python script that fetches all EC2 instances
  3. Apply simple rule-based or AI-enhanced logic
  4. Use Boto3 to update tags automatically

Pre-requisites

  • AWS account
  • Existing EC2 instances
  • SageMaker execution role with permissions:
    • ec2:DescribeInstances
    • ec2:CreateTags
  • Familiarity with Python and basic AWS concepts

Architecture

Architecture Auto Tag Sagemaker

Step-by-Step Guide

1. Create a SageMaker Domain and User

  1. Go to the AWS Console → Amazon SageMaker → Domains
  2. Click “Create domain.”
  3. Use IAM authentication
  4. Create a user profile (e.g., ai-user)
  5. Choose ml.t3.medium for the Studio instance type
  6. Click Create and wait for provisioning

Sagemaker Ai

2. Add Permissions to the Execution Role

  1. Go to IAM Console → Roles
  2. Search and select the SageMaker execution role (e.g., AmazonSageMaker-ExecutionRole-*)
  3. Attach the AmazonEC2ReadOnlyAccess policy
  4. Add inline permissions for ec2:CreateTags if needed

Iam Role Picup

3. Launch SageMaker Studio

  1. Open the SageMaker domain
  2. Click on your user profile → Open Studio
  3. In Studio, go to File → New → Notebook
  4. Choose the Python 3 (Data Science) kernel

Studio CreateNoteboon CreateJupiter Lab Create

Python Scripting

Python CodeJupiter Note Book

Validate the Output

  • Go to EC2 Console → Instances
  • Check the Tags tab
  • You should see Environment and Tagged By tags

Results

Conclusion

This process leverages the power of SageMaker and Python to auto-tag EC2 instances. It enhances consistency, reduces manual effort, and lays the foundation for ML-driven infrastructure management.

Future improvements can include utilizing Amazon Comprehend or Bedrock for more intelligent tag inference, or automating the process using Lambda and EventBridge.

]]>
https://blogs.perficient.com/2025/07/24/ai-driven-auto-tagging-of-ec2-instances-using-amazon-sagemaker/feed/ 0 384469
Zero Trust & Infosec Mesh: Org’s Survival Guide https://blogs.perficient.com/2025/07/21/cybersecurity-mesh-zero-trust/ https://blogs.perficient.com/2025/07/21/cybersecurity-mesh-zero-trust/#comments Mon, 21 Jul 2025 14:14:18 +0000 https://blogs.perficient.com/?p=384742

Zero Trust & Cybersecurity Mesh: The New Security Paradigm

Traditional cybersecurity methods have fallen apart under their own presumptions in a world where employees access systems from kitchen counters, cafés, and even virtual reality headsets, and data is no longer housed behind a single firewall.

It’s not only that the castle-and-moat model(a classic method in which the internal network is considered as a protected area, like to a castle, with strong perimeter fortifications (the moat) like firewalls and VPNs to keep external threats out), in which everything outside the network is the enemy and everything inside is trusted, is out of date. It’s risky.

Presenting the dual revolution in contemporary digital defense:

“Never trust, always verify” is the motto of zero trust security.

Cybersecurity Mesh Architecture (CSMA): Contextualized security for anything, anywhere.

Together, they are changing the definition of what it means to create safe systems in the era of edge computing, cloud-first deployments, decentralization, and AI-powered agents.

What Is Zero Trust?

Fundamentally, Zero Trust assumes that no individual, gadget, or service—not even within the boundaries of the company—is intrinsically reliable.
Rather than granting users full access after they are “in,” Zero Trust systems:

  • Constantly confirm your identity
  • Examine the posture of the device.
  • Use the least privilege principle.
  • Track the context of access (location, network, behaviour)

Every interaction turns into a transactional validation.

Real-World Analogy

Consider it similar to airport security:

  • You are not only inspected at the door.
  • At every gate, checkpoint and aircraft, you are validated.

Tech Stack in Zero Trust

  • Identity Providers: Azure AD and Okta
  • MFA/SSO: Ping Identity, Duo
  • Device Credibility: Jamf, Kandji, and CrowdStrike
  • Access Guidelines: ZScaler, Tailscale, and Google BeyondCorp

What Is Cybersecurity Mesh?

Cybersecurity Mesh Architecture (CSMA) acknowledges the decentralisation of organisations.

These days, data, users, devices, and workloads include:

  • Several cloud service providers
  • Data centers that are hybrid
  • Teams that work remotely first
  • IoT devices, containers, and APIs

The Mesh architecture surrounds each asset—not the network—with context-aware, modular security controls. It makes it possible for security to be dynamic, extensible, and modular wherever data moves.

Essential Idea:

“As opposed to location, security follows the asset.”

Why Are These Models Critical Now?

Micro-services, SaaS software, and remote work broke down the perimeter. As a result, threats are dispersed, persistent, and more advanced than before:

Threat Type Why Traditional Models Fail
Supply chain attacks Trust assumptions in 3rd-party code
Insider threats No visibility into internal access
Cloud misconfigurations Poor access boundaries
AI hallucination/exfiltration No identity enforcement for LLMs

Gartner Prediction: Organisations who use Cybersecurity Mesh will see a 90% reduction in the financial impact of intrusions by 2026.

Zero Trust + Mesh: A Power Combo

The two aren’t rivals—they’re complementary.

Feature Zero Trust Cybersecurity Mesh
Focus Identity & trust minimisation Distributed access enforcement
Scope Per user/device Per resource/location/context
Best for Apps, users, endpoints APIs, microservices, data fabric
Integration points Identity providers, MFA, policy engine Multi-cloud policy enforcement layers

Collectively, they provide:

  • Granular command
  • Adaptable coverage
  • Robust reaction to intrusions and unidentified dangers

Real-World Adoption

Google BeyondCorp :

After the 2010 Aurora hack, Google incorporated Zero Trust into its core values by switching from VPNs to real-time identity-aware proxies.

BM’s Cybersecurity Mesh Suite

Provides cross-cloud visibility, dynamic policy enforcement, and distributed identity brokering for contemporary businesses.

U.S. Department of Defense

Mission-critical workloads are being moved to Zero Trust + Mesh in response to 5G edge deployments and hybrid cloud operations.

Getting Started: A Playbook

For Security Architects :

  • To begin with, use Identity Federation (Okta, Azure AD).
  • Next, Make use of policy-as-code technologies (HashiCorp Sentinel, Open Policy Agent).
  • Map the micro-perimeters surrounding micro-services and APIs.

Regarding Developers:

  • Never assume a trusted origin while writing code; instead, use the principle of least privilege.
  • Use device-aware endpoint debugging.
  • Instead of using session-level tokens for authentication, use per-request.

Considering DevSecOps:

  • Use CI/CD to automate security scans.
  • For runtime enforcement, use a service mesh (such as Istio + eBPF).
  • Utilise Grafana + Prometheus integrations to track security observability.

What’s Next: Zero Trust for AI

With the growth of LLMs, agents, and autonomous APIs, we are suddenly confronted with algorithmic risks.

New Questions for Zero Trust:

  • Can you confirm the caller’s identity?
  • Should all memory tokens be accessible through that API?
  • Can you prove the identification and behaviour of your agents?

Similar to today’s user IDs, Zero Trust will be implemented in 2026+ for model-serving APIs, memory access barriers, and prompt injection.

Final Thoughts

There is no life within the boundary. Trust is not taken for granted; it must be earned. Identity-awareness, modularity, and composability are necessary for security.

Zero Trust and Cybersecurity Mesh are not merely trendy terms; they are your survival guide.

 

]]>
https://blogs.perficient.com/2025/07/21/cybersecurity-mesh-zero-trust/feed/ 2 384742
Mitigate DNS Vulnerabilities Proactively with Amazon Route 53 Resolver DNS Firewall https://blogs.perficient.com/2025/07/02/mitigate-dns-vulnerabilities-proactively-with-amazon-route-53-resolver-dns-firewall/ https://blogs.perficient.com/2025/07/02/mitigate-dns-vulnerabilities-proactively-with-amazon-route-53-resolver-dns-firewall/#respond Wed, 02 Jul 2025 10:48:29 +0000 https://blogs.perficient.com/?p=383529

In today’s cloud-first world, securing your DNS layer is more critical than ever. DNS (Domain Name System) is a foundational element of network infrastructure, but it’s often overlooked as a security risk. Attackers frequently exploit DNS to launch phishing campaigns, exfiltrate data, and communicate with command-and-control servers. Proactive DNS security is no longer optional – it’s essential.

To strengthen DNS-layer security, Amazon Route 53 Resolver DNS Firewall provides robust control over DNS traffic by enabling the use of domain lists, allowing specific domains to be explicitly permitted or denied. Complementing these custom lists are AWS Managed Domain Lists, which autonomously block access to domains identified as malicious, leveraging threat intelligence curated by AWS and its trusted security partners. While this method is highly effective in countering known threats, cyber adversaries are increasingly employing sophisticated evasion techniques that go undetected by conventional blocklists. In this blog, I’ll explore DNS vulnerabilities, introduce Route 53 Resolver DNS Firewall, and walk you through practical strategies to safeguard your cloud resources.

By analyzing attributes such as query entropy, length, and frequency, the service can detect and intercept potentially harmful DNS traffic, even when interacting with previously unknown domains. This proactive approach enhances defense against advanced tactics, such as DNS tunneling and domain generation algorithms (DGAs), which attackers often use to establish covert communication channels or maintain malware connectivity with command-and-control servers.

In this blog, I’ll guide you through a hands-on journey into the world of DNS-layer threats and the tools available to defend against them. You’ll discover how to configure effective Route 53 Resolver DNS Firewall Advanced rules. I’ll also walk through a real-world threat detection scenario, demonstrating how the service seamlessly integrates with AWS Security Hub to provide enhanced visibility and actionable alerts. By the end of this post, you’ll be equipped with the knowledge to implement DNS Firewall rules that deliver intelligent, proactive protection for your AWS workloads.

Risks Linked to DNS Tunneling and Domain Generation Algorithms

DNS tunneling and Domain Generation Algorithms (DGAs) are sophisticated techniques employed by cyber adversaries to establish hidden communication channels and evade traditional security measures.

DNS Tunneling: This method exploits the DNS protocol by encapsulating non-DNS data within DNS queries and responses. Since DNS traffic is typically permitted through firewalls and security devices to facilitate normal internet operations, attackers leverage this trust to transmit malicious payloads or exfiltrate sensitive data without detection. The risks associated with DNS tunneling are significant, including unauthorized data transfer, persistent command-and-control (C2) communication, and the potential for malware to bypass network restrictions. Detecting such activity requires vigilant monitoring for anomalies such as unusually large DNS payloads, high-frequency queries to unfamiliar domains, and irregular query patterns.

Domain Generation Algorithms (DGAs): DGAs enable malware to generate a vast number of pseudo-random domain names, which are used to establish connections with Command and Control (C2) servers. This dynamic approach makes it challenging for defenders to block malicious domains using traditional blacklisting techniques, as the malware can swiftly switch to new domains if previous ones are taken down. The primary risks posed by DGAs include the resilience of malware infrastructures, difficulty in predicting and blocking malicious domains, and the potential for widespread distribution of malware updates. Effective mitigation strategies involve implementing advanced threat intelligence, machine learning models to detect anomalous domain patterns, and proactive domain monitoring to identify and block suspicious activities.

Understanding and addressing the threats posed by DNS tunneling and DGAs are crucial for maintaining robust cybersecurity defenses.

Let’s See How DNS Firewall Works

Route 53 Resolver DNS Firewall Advanced enhances DNS-layer security by intelligently analyzing DNS queries in real time to detect and block threats that traditional firewalls or static domain blocklists might miss. Here’s a breakdown of how it operates:

  1. Deep DNS Query Inspection

When a DNS query is made from resources within your VPC, it is routed through the Amazon Route 53 Resolver. DNS Firewall Advanced inspects each query before it is resolved. It doesn’t just match the domain name against a list—it analyses the structure, behaviour, and characteristics of the domain itself.

  1. Behavioural Analysis Using Machine Learning

The advanced firewall uses machine learning models trained on massive datasets of real-world domain traffic. These models understand what “normal” DNS behaviour looks like and can flag anomalies such as:

  • Randomized or algorithm-generated domain names (used by DGAs)
  • Unusual query patterns
  • High entropy in domain names
  • Excessive subdomain nesting (common in DNS tunnelling)

This allows it to detect suspicious domains, even if they’ve never been seen before.

  1. Confidence Thresholds

Each suspicious query is scored based on how closely it resembles malicious behaviour. You can configure confidence levels—High, Medium, or Low:

  • High Confidence: Detects obvious threats, with minimal false positives (ideal for production).
  • Medium Confidence: Balanced sensitivity for broader detection.
  • Low Confidence: Aggressive detection for highly secure or test environments
  1. Action Controls (Block, Alert, Allow)

Based on your configured rules and confidence thresholds, the firewall can:

  • Block the DNS query
  • Alert (log the suspicious activity, but allow the query)
  • Allow known safe queries

These controls give you flexibility to tailor the firewall’s behavior to your organization’s risk tolerance.

  1. Rule Groups and Customization

You can organize rules into rule groups, apply AWS Managed Domain Lists, and define custom rules based on your environment’s needs. You can also associate these rule groups with specific VPCs, ensuring DNS protection is applied at the network boundary.

  1. Real-Time Response Without Latency

Despite performing deep inspections, the firewall processes each DNS request in under a millisecond. This ensures there is no perceptible impact on application performance.

Blank Diagram

The above figure shows Route 53 DNS Firewall logs ingested into CloudWatch and analysed through Contributor Insights.

Demonstration

To begin, I’ll demonstrate how to manually create a Route 53 Resolver DNS Firewall Advanced rule using the AWS Management Console. This rule will be configured to block DNS queries identified as high-confidence DNS tunneling attempts.

Step 1: Navigate to Route 53 Resolver DNS Firewall

  • Sign in to the AWS Management Console.
  • In the search bar, type “Route 53” and select “Route 53 Resolver”.
  • In the left navigation pane, choose “DNS Firewall Rule groups” under the DNS Firewall section.

Picture1

Step 2: Create a New Rule Group

  • Click on “Create rule group”.
  • Enter a name and optional description (e.g., BlockHighConfidenceDNS
  • Click Next to proceed to add rules.

Picture2

Step 3: Add a Rule to the Rule Group

  • Click “Add rule”.

Picture3

  • For Rule name, enter a name (e.g., BlockTunnelingHighConfidence).

Picture4

  • Under DNS Firewall, Advanced protection
    1. Select DNS tunneling detection.
    2. For the Confidence threshold, select High.
    3. Leave the Query Type field blank to apply the rule to all query types.
  • Under the Action Section:
    1. Set the Action to Block.
    2. For the Response type, choose OVERRIDE.
    3. In the Record value field, enter: dns-firewall-advanced-block.
    4. For the Record type, select CNAME.
    5. Click Add rule to save the configuration.

Picture5

Monitoring and Insights

Route 53 Resolver query logging offers comprehensive visibility into DNS queries originating from resources within your VPCs, allowing you to monitor and analyze DNS traffic for both security and compliance purposes. When enabled, query logging captures key details for each DNS request—such as the queried domain name, record type, response code, and the source VPC or instance. This capability becomes especially powerful when paired with Route 53 Resolver DNS Firewall, as it enables you to track blocked DNS queries and refine your security rules based on real traffic behavior within your environment. Below are sample log entries generated when the DNS Firewall identifies and acts upon suspicious activity, showcasing the depth of information available for threat analysis and incident response.

Example log entry: DNS tunneling block

The following is an example of a DNS tunneling block.

Picture6

Key Indicators of DNS Tunneling

  • query_name: Very long, random-looking domain name—typical of data being exfiltrated via DNS.
  • rcode: NXDOMAIN indicates no valid domain exists—often seen in tunneling.
  • answers: The query response was overridden with a controlled CNAME (dns-firewall-advanced-block.).
  • firewall_rule_action: Shows this was an intentional BLOCK action.
  • firewall_protection: Labeled as DNS_TUNNELING, indicating why the query was blocked.
  • srcids: Helps trace back to the source EC2 instance making the suspicious request.

Example log entry: DNS tunneling alert

Picture7

Use Case

This type of alert is useful in:

  • Monitoring mode during firewall tuning.
  • Staging environments where you want visibility without enforcement.
  • Incident investigations—tracking which resources may be compromised or leaking data.

Final Thoughts

Amazon Route 53 Resolver DNS Firewall Advanced marks a significant advancement in protecting organizations against sophisticated DNS-layer threats. As discussed, DNS queries directed to the Route 53 Resolver take a distinct route that bypasses conventional AWS security measures such as security groups, network ACLs, and even AWS Network Firewall, introducing a potential security blind spot within many environments. In this post, I’ve examined how attackers exploit this gap using techniques like DNS tunneling and domain generation algorithms (DGAs), and how Route 53 Resolver DNS Firewall Advanced leverages real-time pattern recognition and anomaly detection to mitigate these risks. You also explored how to set up the service via the AWS Management Console and deploy it using a CloudFormation template that includes pre-configured rules to block high-confidence threats and alert on suspicious activity. Additionally, you saw how enabling query logging enhances visibility into DNS behavior and how integrating with AWS Security Hub consolidates threat insights across your environment. By adopting these capabilities, you can better safeguard your infrastructure from advanced DNS-based attacks that traditional blocklists often miss, strengthening your cloud security posture without compromising performance.

]]>
https://blogs.perficient.com/2025/07/02/mitigate-dns-vulnerabilities-proactively-with-amazon-route-53-resolver-dns-firewall/feed/ 0 383529
Developing a Serverless Blogging Platform with AWS Lambda and Python https://blogs.perficient.com/2025/06/11/developing-a-serverless-blogging-platform-with-aws-lambda-and-python/ https://blogs.perficient.com/2025/06/11/developing-a-serverless-blogging-platform-with-aws-lambda-and-python/#respond Thu, 12 Jun 2025 04:55:52 +0000 https://blogs.perficient.com/?p=382159

Introduction

Serverless is changing the game—no need to manage servers anymore. In this blog, we’ll see how to build a serverless blogging platform using AWS Lambda and Python. It’s scalable, efficient, and saves cost—perfect for modern apps.

How It Works

 

Lalit Serverless

Prerequisites

Before starting the demo, make sure you have: an AWS account, basic Python knowledge, AWS CLI and Boto3 installed.

Demonstration: Step-by-Step Guide

Step 1: Create a Lambda Function

Open the Lambda service and click “Create function.” Choose “Author from scratch,” name it something like BlogPostHandler, select Python 3.x, and give it a role with access to DynamoDB and S3. Then write your code using Boto3 to handle CRUD operations for blog posts stored in DynamoDB.

Lamda_Function.txt

Step 2: Set Up API Gateway

First, go to REST API and click “Build.” Choose “New API,” name it something like BlogAPI, and select “Edge optimized” for global access. Then create a resource like /posts, add methods like GET or POST, and link them to your Lambda function (e.g. BlogPostHandler) using Lambda Proxy integration. After setting up all methods, deploy it by creating a stage like prod. You’ll get an Invoke URL which you can test using Postman or curl.

Picture1

 

Step 3: Configure DynamoDB

Open DynamoDB and click “Create table.” Name it something like BlogPosts, set postId as the partition key. If needed, add a sort key like category for filtering. Default on-demand capacity is fine—it scales automatically. You can also add extra attributes like timestamp or tags for sorting and categorizing. Once done, hit “Create.”

.

 

Picture2

Step 4: Deploy Static Content on S3

First, make your front-end files—HTML, CSS, maybe some JavaScript. Then go to AWS S3, create a new bucket with a unique name, and upload your files like index.html. This will host your static website.

Index.html

After uploading, set the bucket policy to allow public read access so anyone can view your site. That’s it—your static website will now be live from S3.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::your-bucket-name/*"
        }
    ]
}

After uploading, don’t forget to replace your-bucket-name in the bucket policy with your actual S3 bucket name. This makes sure the permissions work properly. Now your static site is live—S3 will serve your HTML, CSS, and JS smoothly and reliably.

Step 5: Distribute via CloudFront

Go to CloudFront and create a new Web distribution. Set the origin to your S3 website URL (like your-bucket-name.s3-website.region.amazonaws.com, not the ARN). For Viewer Protocol Policy, choose “Redirect HTTP to HTTPS” for secure access. Leave other settings as-is unless you want to tweak cache settings. Then click “Create Distribution”—your site will now load faster worldwide.

Picture3

To let your frontend talk to the backend, you need to enable CORS in API Gateway. Just open the console, go to each method (like GET, POST, DELETE), click “Actions,” and select “Enable CORS.” That’s it—your frontend and backend can now communicate properly.

Picture4

Additionally, in your Lambda function responses.(We already added in our lambda function), make sure to include the following headers.

 

Results

That’s it—your serverless blogging platform is ready! API Gateway gives you the endpoints, Lambda handles the logic, DynamoDB stores your blog data, and S3 + CloudFront serve your frontend fast and globally. Fully functional, scalable, and no server headaches!

 

Picture5

Conclusion

Building a serverless blog with AWS Lambda and Python shows how powerful and flexible serverless really is. It’s low-maintenance, cost-effective, and scales easily perfect for anything from a personal blog to a full content site. A solid setup for modern web apps!

]]>
https://blogs.perficient.com/2025/06/11/developing-a-serverless-blogging-platform-with-aws-lambda-and-python/feed/ 0 382159
Boost Cloud Efficiency: AWS Well-Architected Cost Tips https://blogs.perficient.com/2025/06/09/boost-cloud-efficiency-aws-well-architected-cost-tips/ https://blogs.perficient.com/2025/06/09/boost-cloud-efficiency-aws-well-architected-cost-tips/#respond Mon, 09 Jun 2025 06:36:11 +0000 https://blogs.perficient.com/?p=378814

In today’s cloud-first world, building a secure, high-performing, resilient, and efficient infrastructure is more critical than ever. That’s where the AWS Well-Architected Framework comes in a powerful guide designed to help architects and developers make informed decisions and build better cloud-native solutions.

What is the AWS Well-Architected Framework?

The AWS Well-Architected Framework provides a consistent approach for evaluating and improving your cloud architecture. It’s built around six core pillars that represent key areas of focus for building robust and scalable systems:

  • Operational Excellence – Continuously monitor and improve systems and processes.
  • Security – Protect data, systems, and assets through risk assessments and mitigation strategies.
  • Reliability – Ensure workloads perform as intended and recover quickly from failures.
  • Performance Efficiency – Use resources efficiently and adapt to changing requirements.
  • Cost Optimization – Avoid unnecessary costs and maximize value.
  • Sustainability – Minimize environmental impact by optimizing resource usage and energy consumption

98bb6d5d218aea2968fc8e8bba96ef68b6a7730c 1600x812

Explore the AWS Well-Architected Framework here https://aws.amazon.com/architecture/well-architected

AWS Well -Architected Timeline

Time to time, AWS made some changes in the framework and introduce new resources which we can follow to utilize them better for our use cases and get better architecture.

Oip

AWS Well-Architected Tool

To help you apply these principles, AWS offers the Well-Architected Tool—a free service that guides you through evaluating your workloads against the six pillars.

How it Works:

  • Select a workload.
  • Answer a series of questions aligned with the framework.
  • Review insights and recommendations.
  • Generate reports and track improvements over time.

Try the AWS Well-Architected Tool here https://aws.amazon.com/well-architected-tool/

Go Deeper with Labs and Lenses

AWS also Provides:

Deep Dive: Cost Optimization Pillar

Cost Optimization is not just about cutting costs—it’s about maximizing value. It ensures that your cloud investments align with business goals and scale efficiently.

Why It Matters:

  • Understand your spending patterns.
  • Ensure costs support growth, not hinder it.
  • Maintain control as usage scales.

5 Best Practices for Cost Optimization

  1. Practice Cloud Financial Management
  • Build a cost optimization team.
  • Foster collaboration between finance and tech teams.
  • Use budgets and forecasts.
  • Promote cost-aware processes and culture.
  • Quantify business value through automation and lifecycle management.
  1. Expenditure and Usage Awareness
  • Implement governance policies.
  • Monitor usage and costs in real-time.
  • Decommission unused or underutilized resources.
  1. Use Cost-Effective Resources
  • Choose the right services and pricing models.
  • Match resource types and sizes to workload needs.
  • Plan for data transfer costs.
  1. Manage Demand and Supply
  • Use auto-scaling, throttling, and buffering to avoid over-provisioning.
  • Align resource supply with actual demand patterns.
  1. Optimize Over Time
  • Regularly review new AWS features and services.
  • Adopt innovations that reduce costs and improve performance.

Conclusion

The AWS Well-Architected Framework is more than a checklist—it’s a mindset. By embracing its principles, especially cost optimization, you can build cloud environments that are not only efficient and scalable but also financially sustainable.

]]>
https://blogs.perficient.com/2025/06/09/boost-cloud-efficiency-aws-well-architected-cost-tips/feed/ 0 378814
Mastering Databricks Jobs API: Build and Orchestrate Complex Data Pipelines https://blogs.perficient.com/2025/06/06/mastering-databricks-jobs-api-build-and-orchestrate-complex-data-pipelines/ https://blogs.perficient.com/2025/06/06/mastering-databricks-jobs-api-build-and-orchestrate-complex-data-pipelines/#respond Fri, 06 Jun 2025 18:45:09 +0000 https://blogs.perficient.com/?p=382492

In this post, we’ll dive into orchestrating data pipelines with the Databricks Jobs API, empowering you to automate, monitor, and scale workflows seamlessly within the Databricks platform.

Why Orchestrate with Databricks Jobs API?

When data pipelines become complex involving multiple steps—like running notebooks, updating Delta tables, or training machine learning models—you need a reliable way to automate and manage them with ease. The Databricks Jobs API offers a flexible and efficient way to automate your jobs/workflows directly within Databricks or from external systems (for example AWS Lambda or Azure Functions) using the API endpoints.

Unlike external orchestrators such as Apache Airflow, Dagster etc., which require separate infrastructure and integration, the Jobs API is built natively into the Databricks platform. And the best part? It doesn’t cost anything extra. The Databricks Jobs API allows you to fully manage the lifecycle of your jobs/workflows using simple HTTP requests.

Below is the list of API endpoints for the CRUD operations on the workflows:

  • Create: Set up new jobs with defined tasks and configurations via the POST /api/2.1/jobs/create Define single or multi-task jobs, specifying the tasks to be executed (e.g., notebooks, JARs, Python scripts), their dependencies, and the compute resources.
  • Retrieve: Access job details, check statuses, and review run logs using GET /api/2.1/jobs/get or GET /api/2.1/jobs/list.
  • Update: Change job settings such as parameters, task sequences, or cluster details through POST /api/2.1/jobs/update and /api/2.1/jobs/reset.
  • Delete: Remove jobs that are no longer required using POST /api/2.1/jobs/delete.

These full CRUD capabilities make the Jobs API a powerful tool to automate job management completely, from creation and monitoring to modification and deletion—eliminating the need for manual handling.

Key components of a Databricks Job

  • Tasks: Individual units of work within a job, such as running a notebook, JAR, Python script, or dbt task. Jobs can have multiple tasks with defined dependencies and conditional execution.
  • Dependencies: Relationships between tasks that determine the order of execution, allowing you to build complex workflows with sequential or parallel steps.
  • Clusters: The compute resources on which tasks run. These can be ephemeral job clusters created specifically for the job or existing all-purpose clusters shared across jobs.
  • Retries: Configuration to automatically retry failed tasks to improve job reliability.
  • Scheduling: Options to run jobs on cron-based schedules, triggered events, or on demand.
  • Notifications: Alerts for job start, success, or failure to keep teams informed.

Getting started with the Databricks Jobs API

Before leveraging the Databricks Jobs API for orchestration, ensure you have access to a Databricks workspace, a valid Personal Access Token (PAT), and sufficient privileges to manage compute resources and job configurations. This guide will walk through key CRUD operations and relevant Jobs API endpoints for robust workflow automation.

1. Creating a New Job/Workflow:

To create a job, you send a POST request to the /api/2.1/jobs/create endpoint with a JSON payload defining the job configuration.

{
  "name": "Ingest-Sales-Data",
  "tasks": [
    {
      "task_key": "Ingest-CSV-Data",
      "notebook_task": {
        "notebook_path": "/Users/name@email.com/ingest_csv_notebook",
        "source": "WORKSPACE"
      },
      "new_cluster": {
        "spark_version": "15.4.x-scala2.12",
        "node_type_id": "i3.xlarge",
        "num_workers": 2
      }
    }
  ],
  "schedule": {
    "quartz_cron_expression": "0 30 9 * * ?",
    "timezone_id": "UTC",
    "pause_status": "UNPAUSED"
  },
  "email_notifications": {
    "on_failure": [
      "name@email.com"
    ]
  }
}

This JSON payload defines a Databricks job that executes a notebook-based task on a newly provisioned cluster, scheduled to run daily at 9:30 AM UTC. The components of the payload are explained below:

  • name: The name of your job.
  • tasks: An array of tasks to be executed. A job can have one or more tasks.
    • task_key: A unique identifier for the task within the job. Used for defining dependencies.
    • notebook_task: Specifies a notebook task. Other task types include spark_jar_task, spark_python_task, spark_submit_task, pipeline_task, etc.
      • notebook_path: The path to the notebook in your Databricks workspace.
      • source: The source of the notebook (e.g., WORKSPACE, GIT).
    • new_cluster: Defines the configuration for a new cluster that will be created for this job run. You can also use existing_cluster_id to use an existing all-purpose cluster (though new job clusters are recommended).
      • spark_version, node_type_id, num_workers: Standard cluster configuration options.
  • schedule: Defines the job schedule using a cron expression and timezone.
  • email_notifications: Configures email notifications for job events.

To create a Databricks workflow, the above JSON payload can be included in the body of a POST request sent to the Jobs API’s create endpoint—either using curl or programmatically via the Python requests library as shown below:

Using Curl:

curl -X POST \
  https://<databricks-instance>.cloud.databricks.com/api/2.1/jobs/create \
  -H "Authorization: Bearer <Your-PAT>" \
  -H "Content-Type: application/json" \
  -d '@workflow_config.json' #Place the above payload in workflow_config.json

Using Python requests library:

import requests
import json
create_response = requests.post("https://<databricks-instance>.cloud.databricks.com/api/2.1/jobs/create", data=json.dumps(your_json_payload), auth=("token", token))
if create_response.status_code == 200:
    job_id = json.loads(create_response.content.decode('utf-8'))["job_id"]
    print("Job created with id: {}".format(job_id))
else:
    print("Job creation failed with status code: {}".format(create_response.status_code))
    print(create_response.text)

The above example demonstrated a basic single-task workflow. However, the full potential of the Jobs API lies in orchestrating multi-task workflows with dependencies. The tasks array in the job payload allows you to configure multiple dependent tasks.
For example, the following workflow defines three tasks that execute sequentially: Ingest-CSV-DataTransform-Sales-DataWrite-to-Delta.

{
  "name": "Ingest-Sales-Data-Pipeline",
  "tasks": [
    {
      "task_key": "Ingest-CSV-Data",
      "notebook_task": {
        "notebook_path": "/Users/name@email.com/ingest_csv_notebook",
        "source": "WORKSPACE"
      },
      "new_cluster": {
        "spark_version": "15.4.x-scala2.12",
        "node_type_id": "i3.xlarge",
        "num_workers": 2
      }
    },
    {
      "task_key": "Transform-Sales-Data",
      "depends_on": [
        {
          "task_key": "Ingest-CSV-Data"
        }
      ],
      "notebook_task": {
        "notebook_path": "/Users/name@email.com/transform_sales_data",
        "source": "WORKSPACE"
      },
      "new_cluster": {
        "spark_version": "15.4.x-scala2.12",
        "node_type_id": "i3.xlarge",
        "num_workers": 2
      }
    },
    {
      "task_key": "Write-to-Delta",
      "depends_on": [
        {
          "task_key": "Transform-Sales-Data"
        }
      ],
      "notebook_task": {
        "notebook_path": "/Users/name@email.com/write_to_delta_notebook",
        "source": "WORKSPACE"
      },
      "new_cluster": {
        "spark_version": "15.4.x-scala2.12",
        "node_type_id": "i3.xlarge",
        "num_workers": 2
      }
    }
  ],
  "schedule": {
    "quartz_cron_expression": "0 30 9 * * ?",
    "timezone_id": "UTC",
    "pause_status": "UNPAUSED"
  },
  "email_notifications": {
    "on_failure": [
      "name@email.com"
    ]
  }
}

 

Picture1


2. Updating Existing Workflows:

For modifying existing workflows, we have two endpoints: the update endpoint /api/2.1/jobs/update and the reset endpoint /api/2.1/jobs/reset. The update endpoint applies a partial update to your job. This means you can tweak parts of the job — like adding a new task or changing a cluster spec — without redefining the entire workflow. While the reset endpoint does a complete overwrite of the job configuration. Therefore, when resetting a job, you must provide the entire desired job configuration, including any settings you wish to keep unchanged, to avoid them being overwritten or removed entirely. Let us go over a few examples to better understand the endpoints better.

2.1. Update Workflow Name & Add New Task:

Let us modify the above workflow by renaming it from Ingest-Sales-Data-Pipeline to Sales-Workflow-End-to-End, adding an input parametersource_location to the Ingest-CSV-Data, and introducing a new task Write-to-Postgres, which runs after the successful completion of Transform-Sales-Data.

{
  "job_id": 947766456503851,
  "new_settings": {
    "name": "Sales-Workflow-End-to-End",
    "tasks": [
      {
        "task_key": "Ingest-CSV-Data",
        "notebook_task": {
          "notebook_path": "/Users/name@email.com/ingest_csv_notebook",
          "base_parameters": {
            "source_location": "s3://<bucket>/<key>"
          },
          "source": "WORKSPACE"
        },
        "new_cluster": {
          "spark_version": "15.4.x-scala2.12",
          "node_type_id": "i3.xlarge",
          "num_workers": 2
        }
      },
      {
        "task_key": "Transform-Sales-Data",
        "depends_on": [
          {
            "task_key": "Ingest-CSV-Data"
          }
        ],
        "notebook_task": {
          "notebook_path": "/Users/name@email.com/transform_sales_data",
          "source": "WORKSPACE"
        },
        "new_cluster": {
          "spark_version": "15.4.x-scala2.12",
          "node_type_id": "i3.xlarge",
          "num_workers": 2
        }
      },
      {
        "task_key": "Write-to-Delta",
        "depends_on": [
          {
            "task_key": "Transform-Sales-Data"
          }
        ],
        "notebook_task": {
          "notebook_path": "/Users/name@email.com/write_to_delta_notebook",
          "source": "WORKSPACE"
        },
        "new_cluster": {
          "spark_version": "15.4.x-scala2.12",
          "node_type_id": "i3.xlarge",
          "num_workers": 2
        }
      },
      {
        "task_key": "Write-to-Postgres",
        "depends_on": [
          {
            "task_key": "Transform-Sales-Data"
          }
        ],
        "notebook_task": {
          "notebook_path":"/Users/name@email.com/write_to_postgres_notebook",
          "source": "WORKSPACE"
        },
        "new_cluster": {
          "spark_version": "15.4.x-scala2.12",
          "node_type_id": "i3.xlarge",
          "num_workers": 2
        }
      }
    ],
    "schedule": {
      "quartz_cron_expression": "0 30 9 * * ?",
      "timezone_id": "UTC",
      "pause_status": "UNPAUSED"
    },
    "email_notifications": {
      "on_failure": [
        "name@email.com"
      ]
    }
  }
}

Picture2

2.2. Update Cluster Configuration:

Cluster startup can take several minutes, especially for larger, more complex clusters. Sharing the same cluster allows subsequent tasks to start immediately after previous ones complete, speeding up the entire workflow. Parallel tasks can also run concurrently sharing the same cluster resources efficiently. Let’s update the above workflow to share the same cluster between all the tasks.

{
  "job_id": 947766456503851,
  "new_settings": {
    "name": "Sales-Workflow-End-to-End",
    "job_clusters": [
      {
        "job_cluster_key": "shared-cluster",
        "new_cluster": {
          "spark_version": "15.4.x-scala2.12",
          "node_type_id": "i3.xlarge",
          "num_workers": 2
        }
      }
    ],
    "tasks": [
      {
        "task_key": "Ingest-CSV-Data",
        "notebook_task": {
          "notebook_path": "/Users/name@email.com/ingest_csv_notebook",
          "base_parameters": {
            "source_location": "s3://<bucket>/<key>"
          },
          "source": "WORKSPACE"
        },
        "job_cluster_key": "shared-cluster"
      },
      {
        "task_key": "Transform-Sales-Data",
        "depends_on": [
          {
            "task_key": "Ingest-CSV-Data"
          }
        ],
        "notebook_task": {
          "notebook_path": "/Users/name@email.com/transform_sales_data",
          "source": "WORKSPACE"
        },
        "job_cluster_key": "shared-cluster"
      },
      {
        "task_key": "Write-to-Delta",
        "depends_on": [
          {
            "task_key": "Transform-Sales-Data"
          }
        ],
        "notebook_task": {
          "notebook_path": "/Users/name@email.com/write_to_delta_notebook",
          "source": "WORKSPACE"
        },
        "job_cluster_key": "shared-cluster"
      },
      {
        "task_key": "Write-to-Postgres",
        "depends_on": [
          {
            "task_key": "Transform-Sales-Data"
          }
        ],
        "notebook_task": {
          "notebook_path":"/Users/name@email.com/write_to_postgres_notebook",
          "source": "WORKSPACE"
        },
        "job_cluster_key": "shared-cluster"
      }
    ],
    "schedule": {
      "quartz_cron_expression": "0 30 9 * * ?",
      "timezone_id": "UTC",
      "pause_status": "UNPAUSED"
    },
    "email_notifications": {
      "on_failure": [
        "name@email.com"
      ]
    }
  }
}

Picture3

2.3. Update Task Dependencies:

Let’s add a new task named Enrich-Sales-Data and update the dependency as shown below:
Ingest-CSV-Data →
Enrich-Sales-Data → Transform-Sales-Data →[Write-to-Delta, Write-to-Postgres].Since we are updating dependencies of existing tasks, we need to use the reset endpoint /api/2.1/jobs/reset.

{
  "job_id": 947766456503851,
  "new_settings": {
    "name": "Sales-Workflow-End-to-End",
    "job_clusters": [
      {
        "job_cluster_key": "shared-cluster",
        "new_cluster": {
          "spark_version": "15.4.x-scala2.12",
          "node_type_id": "i3.xlarge",
          "num_workers": 2
        }
      }
    ],
    "tasks": [
      {
        "task_key": "Ingest-CSV-Data",
        "notebook_task": {
          "notebook_path":"/Users/name@email.com/ingest_csv_notebook",
          "base_parameters": {
            "source_location": "s3://<bucket>/<key>"
          },
          "source": "WORKSPACE"
        },
        "job_cluster_key": "shared-cluster"
      },
      {
        "task_key": "Enrich-Sales-Data",
        "depends_on": [
          {
            "task_key": "Ingest-CSV-Data"
          }
        ],
        "notebook_task": {
          "notebook_path":"/Users/name@email.com/enrich_sales_data",
          "source": "WORKSPACE"
        },
        "job_cluster_key": "shared-cluster"
      },
      {
        "task_key": "Transform-Sales-Data",
        "depends_on": [
          {
            "task_key": "Enrich-Sales-Data"
          }
        ],
        "notebook_task": {
          "notebook_path":"/Users/name@email.com/transform_sales_data",
          "source": "WORKSPACE"
        },
        "job_cluster_key": "shared-cluster"
      },
      {
        "task_key": "Write-to-Delta",
        "depends_on": [
          {
            "task_key": "Transform-Sales-Data"
          }
        ],
        "notebook_task": {
          "notebook_path":"/Users/name@email.com/write_to_delta_notebook",
          "source": "WORKSPACE"
        },
        "job_cluster_key": "shared-cluster"
      },
      {
        "task_key": "Write-to-Postgres",
        "depends_on": [
          {
            "task_key": "Transform-Sales-Data"
          }
        ],
        "notebook_task": {
          "notebook_path":"/Users/name@email.com/write_to_postgres_notebook",
          "source": "WORKSPACE"
        },
        "job_cluster_key": "shared-cluster"
      }
    ],
    "schedule": {
      "quartz_cron_expression": "0 30 9 * * ?",
      "timezone_id": "UTC",
      "pause_status": "UNPAUSED"
    },
    "email_notifications": {
      "on_failure": [
        "name@email.com"
      ]
    }
  }
}

Picture4

The update endpoint is useful for minor modifications like updating the workflow name, updating the notebook path, input parameters to tasks, updating the job schedule, changing cluster configurations like node count etc., while the reset endpoint should be used for deleting existing tasks, redefining task dependencies, renaming tasks etc.
The update endpoint does not delete tasks or settings you omit i.e. tasks not mentioned in the request will remain unchanged, while the reset endpoint removes/deletes any fields or tasks not included in the request.

3. Trigger an Existing Job/Workflow:

Use the/api/2.1/jobs/run-now endpoint to trigger a job run on demand. Pass the input parameters to your notebook tasks using thenotebook_paramsfield.

curl -X POST https://<databricks-instance>/api/2.1/jobs/run-now \
  -H "Authorization: Bearer <DATABRICKS_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": 947766456503851,
    "notebook_params": {
      "source_location": "s3://<bucket>/<key>"
    }
  }'

4. Get Job Status:

To check the status of a specific job run, use the /api/2.1/jobs/runs/get endpoint with the run_id. The response includes details about the run, including its state (e.g., PENDING, RUNNING, COMPLETED, FAILED etc).

curl -X GET \
  https://<databricks-instance>.cloud.databricks.com/api/2.1/jobs/runs/get?run_id=<your-run-id> \
  -H "Authorization: Bearer <Your-PAT>"

5. Delete Job:

To remove an existing Databricks workflow, simply call the DELETE /api/2.1/jobs/delete endpoint using the Jobs API. This allows you to programmatically clean up outdated or unnecessary jobs as part of your pipeline management strategy.

curl -X POST https://<databricks-instance>/api/2.1/jobs/delete \
  -H "Authorization: Bearer <DATABRICKS_PERSONAL_ACCESS_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{ "job_id": 947766456503851 }'

 

Conclusion:

The Databricks Jobs API empowers data engineers to orchestrate complex workflows natively, without relying on external scheduling tools. Whether you’re automating notebook runs, chaining multi-step pipelines, or integrating with CI/CD systems, the API offers fine-grained control and flexibility. By mastering this API, you’re not just building workflows—you’re building scalable, production-grade data pipelines that are easier to manage, monitor, and evolve.

]]>
https://blogs.perficient.com/2025/06/06/mastering-databricks-jobs-api-build-and-orchestrate-complex-data-pipelines/feed/ 0 382492