InfluxDB in IoT world: Hosting and scaling on AWS (Part 2)
Table of Contents
In the previous part we took a bird’s-eye view of InfluxDB, it’s core features and reasons to embrace the database in the wake of IoT data onslaught. In this part, we’re going to see how easy it is to install and start using InfluxDB on AWS, see how to scale it and how fast InfluxDB is on different types of AWS instances.
Hosting on Amazon Web Services
AWS doesn’t have an out-of-the-box support of InfluxDB DBMS, so we’ll need to do some manual installation. Let’s begin with firing up a new EC2 instance, let’s say m4.large (as suggested in InfluxDB installation doc) with Amazon Linux AMI. Let’s also create a security group and allow incoming TCP traffic to ports 8086 and 8088. Next, let’s install the thing. Ssh to the server and execute the following:
sudo yum update sudo reboot cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo [influxdb] name = InfluxDB Repository - RHEL baseurl = https://repos.influxdata.com/rhel/7Server/x86_64/stable enabled = 1 gpgcheck = 1 gpgkey = https://repos.influxdata.com/influxdb.key EOF sudo yum install influxdb
Next, I’d highly recommend to setup authentication and HTTPS for a real system, as described in (the official documentation)[https://docs.influxdata.com/influxdb/v1.3/administration/security/].
Now, start it
sudo /etc/init.d/influxdb start, and let’s check if it’s running with
influx command. You should connect to local InfluxDB and see something similar to the below.
$ influx Connected to http://localhost:8086 version x.x.x InfluxDB shell version: x.x.x >
So far in this series InfluxDB was a knight atop a white stallion. There’s a twist, though. The OSS version does not support replication and sharding. You’re pretty much stuck with a single installation (pretty powerful though, as we saw in Part 1). High availability is achievable using influxdb-relay, but that’s one more layer and piece of infrastructure to manage. If you need out-of-the-box sharding, service availability, monitoring and other enterprise features (I’d just call them production-ready features, but that’s a fuel for another post), then your company have to go with commercial InfluxEnterprise or fully managed SaaS. For a free version, you can stick to a half-century old methods:
- Custom sharding
- Scale-up (use faster hardware)
Custom sharding is difficult and costly in development, requires a good domain understanding and change prediction (up to the level of oracle power). Still, it can win in the long run. The problem with scaling up is that eventually you’re greeted with the law of diminishing returns. That is, to get a twice faster machine you need to pay 10 times more $$, for example. And the price grows exponentially. Also, vertical scaling is not appropriate for all technologies. Depending on specific bottlenecks of a system, it’s possible that scaling up is almost impossible. For example, disk IO is not an easy thing to upgrade.
As for InfluxDB, vertical scalability is pretty feasible, as we’re going to see next.
Let’s see some data about how InfluxDB scales up. Below is the comparison of query execution times on different AWS instances for two different types of load. I’ll call them:
- Analytical read:
SELECT stddev("value") FROM "measurements" WHERE "type" = 'PM25' AND time > 'xxx' and time < 'yyy' GROUP BY time(30d) fill(none)
- Write: Writing ~4M multi-value data entries from another node in the same AWS network using InfluxDB benchmark project bulk_load_influx tool (see Part 1 for more details).
The execution time results:
|AWS instance type||vCPUs||RAM (approx.)||Analytical read execution time||Write execution time|
|m4.large||2||4 GB||157 seconds||130 seconds|
|m4.2xlarge||8||32 GB||130 seconds||34 seconds|
|m4.4xlarge||16||64 GB||119 seconds||19 seconds|
|r4.large||2||15 GB||154 seconds||125 seconds|
|r4.2xlarge||8||61 GB||132 seconds||34 seconds|
|c4.2xlarge||8||15 GB||120 seconds||33 seconds|
|c4.8xlarge||36||60 GB||122 seconds||13 seconds|
This is by no means a thorough benchmark. However, it is a fair approximation of Airly’s intended load for the DB.
How can the results be interpreted? It looks like write operation are CPU intensive. The following is how
top command looked like on c4.2xlarge (note 783% CPU usage).
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2686 influxdb 20 0 1748m 1.1g 54m S 783.8 7.3 2:58.15 influxd
Thus, write load throughput scales up nicely, especially with the number of CPUs.
Read load is probably more I/O intensive (disk and memory). That’s why we don’t see much decrease of the read query execution time. Provided that working set fit into RAM.
To summarize, CPU power is very important for a write throughput. RAM size is important if most of your working set can fit into RAM. In this case, high memory instance types (r4, m4, etc) would be recommended. InfluxDB can fit a lot of data in 64GB of RAM.
Coming up next…
In the next part we’re going to plot some graphs with Grafana using data from InfluxDB!