Migrating from Graphite to InfluxDB

Metrics platforms have evolved over the decades.

Before Graphite there was RRDTool, OpenTSDB, Nagios, Munin, and Cacti. Graphite was released in 2008 and was viewed as a major step forward, but as always, technology evolved.

InfluxDB was designed as a response to Graphite's shortcomings as companies and their needs evolved, and also as a solution to nascent market trends.

Contents

Reasons
Tradeoffs
- No timeShift() Function Yet
Implementation

Reasons

Graphite Pain Points

Has a rather opaque configuration that is prone to errors, which can lead to data loss.
Querying is notoriously difficult.
Operational requirements have evolved to the point that it is not a good fit anymore.
Not suitable for sparse metrics.
Not suitable for high cardinality metrics.

Performance

InfluxDB is a custom high performance datastore built from the ground up to handle high write and query loads. It is designed specifically for timestamped data, including DevOps monitoring, application metrics, IoT sensor data, and real-time analytics.

InfluxDB picks up where Graphite left off, as technology and product needs have evolved. InfluxDB focuses on write performance and data compression, and can handle sparse metrics and increased cardinality.

Tagging

InfluxDB supports tagging of data, something that feels natural and was not supported in Graphite. Tagging provides improved organization of data and simplifies querying.

Familliar SQL-Like Query Language

Working with InfluxDB's SQL-like query language makes it easy to get started and be confident in the correctness of the data returned by the system.

A relatively common and simple case might look like this:


            SELECT sum("points")
            

            FROM default_db.autogen.data_ingress
            

            WHERE
            

              time >= now() - 24h
            

              AND
            

              "component" = 'Telegraf'
            

            GROUP BY time(10m), "customer"
            

            FILL(null)

Complete Tool Suite

InfluxDB is the central piece in a complete suite of tools built to make collection, storage, graphing, and alerting on time series data incredibly easy. The “I” in TICK stands for InfluxDB. The other components in the platform are:

Telegraf

Telegraf is a plugin-driven server agent for collecting and reporting metrics. Telegraf has plugins or integrations to source a variety of metrics directly from the system it’s running on, to pull metrics from third party APIs, or even to listen for metrics via a StatsD and Kafka consumer services. It also has output plugins to send metrics to a variety of other datastores, services, and message queues, including InfluxDB, Graphite, OpenTSDB, Datadog, Librato, Kafka, MQTT, NSQ, and many others.

Kapacitor

Kapacitor is a native data processing engine. It can process both stream and batch data from InfluxDB. Kapacitor lets you plug in your own custom logic or user-defined functions to process alerts with dynamic thresholds, match metrics for patterns, compute statistical anomalies, and perform specific actions based on these alerts like dynamic load rebalancing. Kapacitor integrates with HipChat, OpsGenie, Alerta, Sensu, PagerDuty, Slack, and more.

Chronograf

Chronograf is the administrative user interface and visualization engine of the platform. It makes the monitoring and alerting for your infrastructure easy to setup and maintain. It is simple to use and includes templates and libraries to allow you to rapidly build dashboards with real-time visualizations of your data and to easily create alerting and automation rules.

Tradeoffs

No timeShift() Function Yet

Advanced dashboards might allow the user to compare one time period to the previous one. For example, you might want to compare today to yesterday, or this week to the previous one.

In Graphite, you can accomplish this with the timeShift() function; however, InfluxDB does not offer a way to do so, yet. You'll probably be able to make do without this functionality for now.

Implementation

As mentioned, InfluxDB provides a full suite of tools, including Telegraf, which can function as a StatsD receiver. The great part about this is that Telegraf's StatsD interface has built-in support for tags and field names.

Tags are straightforward; however, things get a little more complicated for fields.

The built-in support for field names is to use a custom format such as ocean_buoy.wind speed=7 direction=341 and that allows you to specify multiple fields and their values at once. However, there's no way to output such a format with the standard StatsD libraries/packages, so unless you want to use custom packages, we can work around it pretty easily.

Configure Telegraf to use the last part of the measurement name as the field name. There's no clean way to do it in the telegraf.conf file, so it will look a little messy like this:

templates = [
   "*.*.*.*.*.*.*.*.*.*.*.*.* measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.field",
   "*.*.*.*.*.*.*.*.*.*.*.* measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.field",
   "*.*.*.*.*.*.*.*.*.*.* measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.field",
   "*.*.*.*.*.*.*.*.*.* measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.field",
   "*.*.*.*.*.*.*.*.* measurement.measurement.measurement.measurement.measurement.measurement.measurement.measurement.field",
   "*.*.*.*.*.*.*.* measurement.measurement.measurement.measurement.measurement.measurement.measurement.field",
   "*.*.*.*.*.*.* measurement.measurement.measurement.measurement.measurement.measurement.field",
   "*.*.*.*.*.* measurement.measurement.measurement.measurement.measurement.field",
   "*.*.*.*.* measurement.measurement.measurement.measurement.field",
   "*.*.*.* measurement.measurement.measurement.field",
   "*.*.* measurement.measurement.field",
   "*.* measurement.field",
   "* measurement",
]

Also, notice that the last line in the configuration is an exception to the rule.

Looking at some examples will elucidate how this configuration behaves:

Incoming Metric Name Sent by Your Systems	Measurement Name Stored in InfluxDB	Field Name Stored in InfluxDB
ocean_buoy_wave_height	ocean_buoy_wave_height	value
ocean_buoy.wave_height	ocean_buoy	wave_height
ocean_buoy_wave_height.value	ocean_buoy_wave_height	value
ocean_buoy.wave_height.value	ocean_buoy.wave_height	value
ocean_buoy.wave.height	ocean_buoy.wave	height
ocean_buoy.wave.frequency	ocean_buoy.wave	frequency
ocean_buoy.wind.speed	ocean_buoy.wind	speed
ocean_buoy.wind.direction	ocean_buoy.wind	direction

To send these metrics to InfluxDB, you can call your StatsD library like this:

metrics.gauge('ocean_buoy.wind.speed', 7)

metrics.gauge('ocean_buoy.wind.direction', 341)

Of course, you can include tags this easily:

metrics.gauge('ocean_buoy.wind.speed,id=7BE91F,model=B5CPSW', 7)