Monitoring Sidekiq with email and SMS alerts

Sidekiq is a simple and efficient background processing tool for Ruby on Rails apps. You should use it for tasks that take too long to put in a controller or tasks that need to run on a schedule. Common examples are sending emails, generating pdf’s and connecting to other services through an API.

Like all things in IT, Sidekiq can crash, get slow or need more capacity. And for some applications that really should not happen. Because nobody directly interacts with Sidekiq it can be a long time before someone notices that something is wrong.

A good solution to quickly restart (kick, hehe) a crashed Sidekiq is to have a watchdog on the server. Both systemd and upstart can do that for you, and there are lot of other watchdogs you can install. But I’d still like to know something happend, or when an edge case happens where the watchdog cannot fix it. The solution? Monitoring Sidekiq from the outside.

In this short tutorial we’ll be using Updown.io to check if Sidekiq is still running smoothly. To do that we’ll add a route and simple controller method to connect to Sidekiq and output a text that updown.io can check for. You can use any other website monitoring tool that can check for a text on a page.

1. Add a route

Add this route to config/routes.rb. This will redirect yourdomain.com/health_checks to the sidekiq method in health_checks_controller.rb that we’ll create in the next step.

get 'health_checks' => 'health_checks#sidekiq'

2. Create the controller

Put the following code in the new health_checks_controller.rb

class HealthChecksController < ApplicationController
  # only if you use Pundit
  before_action :skip_authorization

  def sidekiq
    require 'sidekiq/api'

    latency = Sidekiq::Queue.new.latency
    stats = Sidekiq::Stats.new

    render(plain: 'No processes',
      status: :service_unavailable) && return if stats.processes_size == 0
    render(plain: "Too many enqueued (#{stats.enqueued})",
      status: :service_unavailable) && return if stats.enqueued > 250
    render(plain: "Latency more than 10 minutes (#{latency})",
      status: :service_unavailable) && return if latency > 600
    render plain: 'Sidekiq is alive and kicking',
      status: :ok
  end
end

You can probably make updown.io login to your app and add a user for it, but to me that seems like a waste of resources and useless complexity. So I just made the /health_check page public. There is no security risk, it’s just a text on a page that nobody knows about.

Because the page has to be public you have to tell Pundit to skip authorization. If you don’t use Pundit, just delete that line.

The code inside the sidekiq method checks if Sidekiq is alive, if there aren’t too many jobs in the queue and if it doesn’t take more than 10 minutes to complete a job. You can add more checks if you have different problems.

The text is a minimal explanation that will be emailed to you by updown.io if the “Sidekiq is alive and kicking” text is not visible. That gives you an indication of how urgent the problem is.

3. Setup updown.io check

I’ve written about updown.io before here: Simple downtime alerts for your Rails app

Follow the steps in that post to create an account and add a check that checks if “https://www.yourdomain.com/health_checks” displays “Sidekiq is alive”. You can use another monitoring tool if you already have one that you like and trust.

<figcaption id="caption-attachment-70" class="wp-caption-text">What it looks like in the browser.</figcaption></figure>

Surface Interval doesn’t use Sidekiq yet, but I’ve added the check so you can see what it looks like: https://www.surfaceinterval.co/health_checks

4. Add more checks

If you want to monitor something else just add another route and another method to the controller. Or if you want to be cheap and save on checks, add it to the same method.

I found no need to add a separate check for Redis, because Sidekiq will fail immediately if there is something wrong with Redis.