Automation Guides

From Manual to Automated: CPU Remediation in 5 Minutes

Alberto Moraga Alberto Moraga
January 20, 2025
6 min read

Step-by-step guide to setting up automated CPU monitoring and remediation using Stavily's plugin system. Real-world example from a Nordic fintech.

From Manual to Automated: CPU Remediation in 5 Minutes

This step-by-step guide demonstrates how to set up automated CPU monitoring and remediation using Stavily's plugin system. We'll walk through a real-world example from a Nordic fintech company, "Nordic Finance," showcasing how they reduced CPU-related incidents by 60% and saved their engineers valuable time.

The Challenge:

Nordic Finance, a provider of online lending solutions, was experiencing frequent CPU spikes on their production servers. Their engineering team was spending hours manually monitoring CPU usage and restarting affected services. This manual process was time-consuming, error-prone, and disruptive to their customers.

The Solution:

Nordic Finance implemented Stavily and leveraged its plugin system to automate CPU monitoring and remediation. They deployed the following Stavily plugins:

  • Prometheus Metrics Trigger: This plugin monitored CPU usage on their servers using Prometheus metrics.
  • System Command Action: This plugin restarted affected services when CPU usage exceeded a predefined threshold.
  • Slack Notification Output: This plugin sent notifications to a dedicated Slack channel when a CPU-related incident occurred.

Step-by-Step Implementation Guide:

  1. Install the Prometheus Metrics Trigger: Install the Prometheus Metrics Trigger plugin in Stavily and configure it to monitor the cpu_usage_percent metric.
  2. Set the CPU Usage Threshold: Set the CPU usage threshold to 90%. This means that the trigger will fire when CPU usage exceeds 90%.
  3. Install the System Command Action: Install the System Command Action plugin in Stavily and configure it to restart the affected service.
  4. Configure the Slack Notification Output: Configure the Slack Notification Output plugin to send notifications to a dedicated Slack channel when a CPU-related incident occurs.
  5. Create a Workflow: Create a workflow in Stavily that uses the Prometheus Metrics Trigger to monitor CPU usage, the System Command Action to restart affected services, and the Slack Notification Output to send notifications.
  6. Deploy the Workflow: Deploy the workflow to your production servers.

The Results:

By implementing Stavily and its plugin system, Nordic Finance achieved the following results:

  • 60% Reduction in CPU-Related Incidents: They reduced the number of CPU-related incidents by 60%.
  • Reduced Manual Effort: Their engineering team saved countless hours by automating CPU monitoring and remediation tasks.
  • Improved Uptime: They improved the uptime of their services by automatically restarting them when CPU usage exceeded a predefined threshold.
  • Faster Incident Response: They reduced the time it took to respond to CPU-related incidents.

Key Takeaways:

This case study demonstrates the power of automated CPU monitoring and remediation. By implementing Stavily and its plugin system, SMBs can reduce CPU-related incidents, save time, improve uptime, and accelerate incident response. Stavily empowers SMBs to automate their CPU remediation processes and focus on more strategic initiatives.

Back to Blog

Stay Updated on DevOps Automation

Get the latest insights on plugin-oriented automation, cost optimization for small teams, and compliance-ready DevOps delivered to your inbox.

Weekly automation tips
Plugin development guides
Cost optimization for small teams

No spam, unsubscribe at any time. We respect your privacy.