Serverless & Edge Computing

Serverless Monitoring & Observability

Serverless applications are inherently distributed — a single user request might touch an API Gateway, three Lambda functions, DynamoDB, SQS, and an external API. Traditional monitoring tools cannot trace this flow. We implement comprehensive observability for serverless applications using CloudWatch metrics, X-Ray distributed tracing, structured logging, and custom dashboards that give you full visibility into your system's health, performance, and cost.

Need this done for your project?

We implement, you ship. Async, documented, done in days.

Start a Brief

Structured Logging with Powertools

We implement structured JSON logging using AWS Lambda Powertools, which provides consistent log format, correlation IDs, and log level management across all functions.

// Powertools logger setup
import { Logger } from '@aws-lambda-powertools/logger';
import { Tracer } from '@aws-lambda-powertools/tracer';
import { Metrics, MetricUnit } from '@aws-lambda-powertools/metrics';

const logger = new Logger({
  serviceName: 'order-service',
  logLevel: process.env.LOG_LEVEL || 'INFO',
  persistentLogAttributes: {
    environment: process.env.ENVIRONMENT,
    version: process.env.APP_VERSION,
  },
});

const tracer = new Tracer({ serviceName: 'order-service' });
const metrics = new Metrics({ namespace: 'OrderService' });

export const handler = async (event: APIGatewayProxyEventV2) => {
  // Correlation ID from API Gateway
  const correlationId = event.requestContext.requestId;
  logger.appendKeys({ correlationId });

  logger.info('Processing order request', {
    path: event.rawPath,
    method: event.requestContext.http.method,
  });

  try {
    const order = await createOrder(event);
    
    metrics.addMetric('OrderCreated', MetricUnit.Count, 1);
    metrics.addMetric('OrderValue', MetricUnit.None, order.total);
    
    logger.info('Order created successfully', { orderId: order.id });
    return { statusCode: 201, body: JSON.stringify(order) };
  } catch (err) {
    logger.error('Order creation failed', { error: err as Error });
    metrics.addMetric('OrderFailed', MetricUnit.Count, 1);
    return { statusCode: 500, body: 'Internal error' };
  } finally {
    metrics.publishStoredMetrics();
  }
};

// Log output (JSON):
// {
//   "level": "INFO",
//   "message": "Order created successfully",
//   "service": "order-service",
//   "timestamp": "2025-06-15T10:30:00.000Z",
//   "correlationId": "abc-123",
//   "orderId": "ord_789",
//   "xray_trace_id": "1-abc-def"
// }

Structured logs are queryable via CloudWatch Insights. The correlation ID flows from API Gateway through every Lambda invocation and SQS message, letting you trace a complete request across services.

X-Ray Distributed Tracing

AWS X-Ray traces requests across Lambda, API Gateway, DynamoDB, SQS, and HTTP calls. We enable active tracing on every service and add custom subsegments for business logic.

// Enable X-Ray on all AWS resources
resource "aws_lambda_function" "order_handler" {
  # ... other config
  tracing_config {
    mode = "Active"
  }
}

resource "aws_apigatewayv2_stage" "prod" {
  # ... other config
  default_route_settings {
    detailed_metrics_enabled = true
  }
}

// Custom subsegments in code
import { Tracer } from '@aws-lambda-powertools/tracer';
const tracer = new Tracer({});

async function processOrder(order: Order) {
  // Automatic tracing for AWS SDK calls
  const dynamoClient = tracer.captureAWSv3Client(new DynamoDBClient({}));
  
  // Custom subsegment for business logic
  const subsegment = tracer.getSegment()!.addNewSubsegment('validateOrder');
  try {
    await validateInventory(order.items);
    subsegment.addAnnotation('orderValue', order.total);
    subsegment.addMetadata('items', order.items);
    subsegment.close();
  } catch (err) {
    subsegment.addError(err as Error);
    subsegment.close();
    throw err;
  }

  // External HTTP call — auto-traced
  const httpClient = tracer.captureHTTPsGlobal(require('https'));
  await callPaymentGateway(order);
}

X-Ray Service Map gives you a visual topology of your serverless application showing latency and error rates for every service interaction. We configure sampling rules to capture 5% of requests in production (to control costs) and 100% when errors are detected.

CloudWatch Dashboards & Alarms

We build operational dashboards that show the health of your entire serverless application at a glance.

resource "aws_cloudwatch_dashboard" "serverless" {
  dashboard_name = "${var.project}-${var.env}"
  dashboard_body = jsonencode({
    widgets = [
      {
        type   = "metric"
        x      = 0
        y      = 0
        width  = 12
        height = 6
        properties = {
          title   = "API Latency (p50/p95/p99)"
          metrics = [
            ["AWS/ApiGateway", "Latency", "ApiId", var.api_id, { stat = "p50" }],
            ["...", { stat = "p95" }],
            ["...", { stat = "p99" }],
          ]
          period = 60
          region = var.region
        }
      },
      {
        type   = "metric"
        width  = 12
        height = 6
        properties = {
          title   = "Lambda Errors by Function"
          metrics = [
            for fn in var.function_names :
            ["AWS/Lambda", "Errors", "FunctionName", fn, { stat = "Sum" }]
          ]
          period = 300
        }
      },
      {
        type   = "log"
        width  = 24
        height = 6
        properties = {
          title  = "Recent Errors"
          query  = "fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 20"
          region = var.region
          stacked = false
          view    = "table"
        }
      }
    ]
  })
}

# Composite alarm — triggers when multiple conditions are met
resource "aws_cloudwatch_composite_alarm" "api_degraded" {
  alarm_name = "${var.project}-api-degraded-${var.env}"
  alarm_rule = "ALARM(${aws_cloudwatch_metric_alarm.high_error_rate.alarm_name}) AND ALARM(${aws_cloudwatch_metric_alarm.high_latency.alarm_name})"
  alarm_actions = [aws_sns_topic.alerts.arn]
}

Composite alarms reduce alert fatigue by triggering only when multiple conditions are true simultaneously — a spike in both error rate and latency indicates a real problem, not just a transient hiccup. We configure PagerDuty or OpsGenie integration for critical alarms and Slack for warnings.

Custom Metrics & Business KPIs

Beyond infrastructure metrics, we publish custom business metrics that let you monitor product health alongside system health.

import { Metrics, MetricUnit } from '@aws-lambda-powertools/metrics';

const metrics = new Metrics({
  namespace: 'MyApp/Business',
  defaultDimensions: {
    environment: process.env.ENVIRONMENT!,
    service: 'order-service',
  },
});

// Business metrics published alongside Lambda execution
metrics.addMetric('OrderPlaced', MetricUnit.Count, 1);
metrics.addDimension('plan', tenant.plan);  // Per-plan breakdown
metrics.addMetric('OrderValue', MetricUnit.None, order.total);
metrics.addMetric('CheckoutDuration', MetricUnit.Milliseconds, duration);

// CloudWatch Insights query for business reporting
// filter @message like /OrderPlaced/
// | stats
//     count() as orders,
//     sum(orderValue) as revenue,
//     avg(checkoutDuration) as avgCheckoutMs
//   by bin(1h), plan

# Anomaly detection on business metrics
resource "aws_cloudwatch_metric_alarm" "order_volume_anomaly" {
  alarm_name          = "order-volume-anomaly"
  comparison_operator = "LessThanLowerOrGreaterThanUpperThreshold"
  evaluation_periods  = 2
  threshold_metric_id = "ad1"
  
  metric_query {
    id          = "m1"
    return_data = true
    metric {
      metric_name = "OrderPlaced"
      namespace   = "MyApp/Business"
      period      = 3600
      stat        = "Sum"
    }
  }
  
  metric_query {
    id          = "ad1"
    expression  = "ANOMALY_DETECTION_BAND(m1, 2)"
    label       = "Order Volume (expected)"
    return_data = true
  }
  
  alarm_actions = [aws_sns_topic.alerts.arn]
}

Anomaly detection on business metrics catches problems that threshold alarms miss — if your order volume drops 40% compared to the same time last week, CloudWatch detects the anomaly and alerts your team. We publish these metrics to a unified dashboard where you see infrastructure health and business KPIs side by side.

Why Anubiz Engineering

100% async — no calls, no meetings

Delivered in days, not weeks

Full documentation included

Production-grade from day one

Security-first approach

Post-delivery support included

Ready to get started?

Skip the research. Tell us what you need, and we'll scope it, implement it, and hand it back — fully documented and production-ready.

Start a Brief Managed Retainer Service