en

VPS Performance Monitoring: Tools and Best Practices

You cannot fix what you cannot see. Performance monitoring on a VPS lets you identify resource bottlenecks before they cause downtime, understand traffic patterns, and plan capacity upgrades with real data. This guide covers the full monitoring stack from quick CLI tools for immediate investigation to long-term metrics with dashboards and alerting.

Need this done for your project?

We implement, you ship. Async, documented, done in days.

Start a Brief

Essential CLI Tools for Real-Time Diagnostics

Before setting up persistent monitoring, master the CLI tools that give you immediate insight. `htop` is the essential process viewer - it shows CPU and memory per process, load average, and lets you sort by resource consumption and kill processes interactively. Install with `apt install htop`. Press `F6` to sort by CPU or memory, `F9` to send signals to processes. For disk I/O investigation: `iotop -ao` shows accumulated disk I/O per process. If your server is slow and CPU is low, the culprit is often I/O - iotop identifies exactly which process is hammering the disk. `iostat -x 1` (from `sysstat` package) shows per-device utilization with detailed stats including queue length and await time. A consistently high `%util` or `await` above 10ms indicates disk saturation. Network monitoring: `nethogs` shows bandwidth per process (install: `apt install nethogs`, run: `nethogs eth0`). `ss -s` gives socket statistics including TCP connection counts by state. High numbers of TIME_WAIT connections indicate your server is handling many short-lived connections (normal for busy web servers). `iftop` shows per-connection bandwidth in real time, useful for identifying which remote IP is consuming most bandwidth. These tools are invaluable for immediate triage; for historical analysis you need persistent metrics collection.

Setting Up Prometheus and Node Exporter

Prometheus is the industry standard for time-series metrics collection. Install Node Exporter on your VPS to expose system metrics: download from `https://github.com/prometheus/node_exporter/releases`, extract, and create a systemd service. Node Exporter exposes 700+ metrics including CPU, memory, disk, network, filesystem, and systemd service states on port 9100. Create `/etc/systemd/system/node_exporter.service` with the ExecStart pointing to your node_exporter binary. Add the `--collector.systemd` flag to also expose systemd unit states. Restrict port 9100 via firewall to only your Prometheus server: `ufw allow from PROMETHEUS_IP to any port 9100`. Never expose Node Exporter to the public internet as it leaks detailed server information. Install Prometheus on a separate monitoring server (or the same server for small setups) and configure a scrape job for your VPS. Add to `prometheus.yml`: a static_configs job pointing to `your-vps-ip:9100`. Prometheus scrapes metrics every 15 seconds by default and stores them for 15 days in its TSDB. Set up alerting rules: CPU above 90% for 5 minutes, available memory below 10%, disk above 85%, and any service entering failed state. AlertManager handles routing alerts to Slack, email, or PagerDuty.

Grafana Dashboards and Alerting

Grafana visualizes Prometheus metrics in configurable dashboards. Install Grafana on your monitoring server: add the Grafana repository and install via apt. Configure Prometheus as a data source in Grafana's UI at `http://grafana-ip:3000`. Import the Node Exporter Full dashboard (ID 1860 from grafana.com/grafana/dashboards) - it provides a comprehensive pre-built view of all system metrics without any manual panel creation. Create custom dashboards for your specific services. For a web server, track `nginx_http_requests_total`, response time percentiles from Nginx's Prometheus exporter, and upstream latency. For databases, use `postgres_exporter` or `mysqld_exporter` to expose query rates, connection counts, replication lag, and slow query counts. Custom dashboards focused on your actual stack are far more actionable than generic system dashboards. Configure Grafana alerts directly in dashboard panels. Set thresholds and notification channels (Slack webhook, email, Telegram bot). Grafana 9+ has a unified alerting system that fires on missing data as well as threshold breaches - critical for detecting when your metrics collection itself has failed. Schedule weekly capacity reports by using Grafana's snapshot feature or the reporting plugin to email dashboard screenshots to stakeholders every Monday.

Why Anubiz Host

100% async — no calls, no meetings
Delivered in days, not weeks
Full documentation included
Production-grade from day one
Security-first approach
Post-delivery support included

Ready to get started?

Skip the research. Tell us what you need, and we'll scope it, implement it, and hand it back — fully documented and production-ready.

Anubiz Chat AI

Online