Introduction to Apache IoTDB

tags: Open source community publicity Apache IoTDB Open source

What is a time series database

Time series database - Give wings to the Internet of Things - HBase/Spark/BigData with attitude

Overall introduction

Apache IoTDB started at the School of Software at Tsinghua University and is a time series database. The main usage scenario isInternet of ThingsRelated industries, such as: Internet of Vehicles, Wind Power, Subway, Aircraft Monitoring, etc., you can view specific application cases and company details:IoTDB usage information collection in actual companies. It adopts columnar storage, data encoding, pre-computing and indexing technology, has a SQL-like interface that can support writing millions of data points per node per second, and can obtain query results of more than trillions of data points in seconds. It can also be easily integrated with Apache Hadoop, MapReduce, and Apache Spark for analysis.

Features of IoT data collection

The Internet of Things is characterized by the existence of one or more devices, which are organized together in various forms to observe or record data generated by the same environment at the same time.

The Past and Present Life of Apache IoTDB

In 2012, Sany Heavy Industry's actual business had 200,000 equipment stored data for 3 years, and TB-level data made Oracle unable to bear it. The key problem is not just the large stock data, but the new data is still growing at a very fast rate. Later the company contacted IoTDB The first batch of developers, but the plan at that time was still based on Cassandra. At that time, a cluster of 5 machines was planned, and the performance was just met, but as time went by, the total number of equipment was increasing, and the number of query requests for the business system was increasing.

In 2015, we began to develop a distributed timing database based on Cassandra, but because it was not completely self-developed, it was a bit restricted. Moreover, after a lot of effort, Cassandra finally found that if it was modified, it might require large-scale reconstruction of the code of Cassandra data. Finally, we decided to redesign a storage method to solve the efficient writing of timing data, low-latency reading, and high compression ratio persistence in the Internet of Things scenario. The IoTDB project began to be born and embarked on the road of independent research and development. Later, the IoTDB project was donated to the Apache Foundation project for incubation. Later, after graduation, it developed into the current Apache IoTDB.

System architecture

The IoTDB suite consists of several components, jointly forming a series of functions such as "data collection-data writing-data storage-data query-data visualization-data analysis". The following figure shows the overall application architecture formed after using all components of the IoTDB suite. It is hereby said that all components form the IoTDB suite, and IoTDB specifically refers to the time series database components in it.

In the above figure, users can import time series data such as system status data collected from sensors on the device, server load and CPU memory, time series data in message queues, application time series data or other databases into local or remote IoTDB through JDBC. Users can also write the above data directly to the TsFile file locally (or on HDFS).

TsFile files can be written to HDFS, thereby realizing data processing tasks such as exception detection and machine learning on Hadoop or Spark's data processing platforms.

For TsFile files written to HDFS or local, you can use the TsFile-Hadoop or TsFile-Spark connector to allow Hadoop or Spark for data processing.

For the analysis results, they can be written back into a TsFile file.

IoTDB and TsFile also provide corresponding client tools to meet the needs of users to view and write data, such as SQL forms, script forms and graphical forms.

Performance comparison

The test tool was developed by the Big Data Laboratory of Tsinghua University

1. Comparison of write performance

Dataset 2	Client	Storage Group	equipment	variable	batchsize	LOOP	Data volume	Write speed (point/s)	Hard disk data size
IoTDB	10	10	10	10	1000	1000000	1.00E+11	24750321.93	38306092
InfluxDB	10	10	10	10	1000	1000000	1.00E+11		304682932
TimescaleDB	10	10	10	10	1000	1000000	1.00E+11	737689.22	1610219064

Dataset 1	Client	Storage Group	equipment	variable	batchsize	LOOP	Data volume	Write speed (point/s)	Hard disk data size
IoTDB	10	10	10	10	1000	100000	10000000000	20706345.15	3599732
InfluxDB	10	10	10	10	1000	100000	10000000000	1729907.81	30546560
TimescaleDB	10	10	10	10	1000	100000	10000000000	715857	161026468
KairosDB	10	10	10	10	10000	10000	10000000000	24924.97	76263380

The above set of data shows that the write performance is more than 10 times higher than the same database, and the writing speed of a single machine reaches 20 million per second. Moreover, the hard disk occupies the smallest, which may be 1 to 2 hard disks per month in online businesses with relatively large data.

2. Query performance comparison

Raw data query

	Client	Storage Group	equipment	Sequence - Data volume	variable	Query the number of points	LOOP	Speed (point/s)	AVG	MIN
IoTDB	10	10	10	1.00E+09	1	1000000	100	12942984.85	740.27	457.04
InfluxDB	10	10	10	1.00E+09	1	1000000	100	1779606.4	5591	4666.39
TimescaleDB	10	10	10	1.00E+09	1	1000000	100	3781467.52	2345.69	1193.78

Aggregated data query

	Client	Storage Group	equipment	Sequence - Data volume	variable	LOOP	scope	Speed (point/s)	AVG	MIN
IoTDB-1	10	10	10	1.00E+09	1	100	0.0001	49.75	27.87	18.03
IoTDB-2	10	10	10	1.00E+09	1	100	0.001	49.75	49.14	19.87
IoTDB-3	10	10	10	1.00E+09	1	100	0.01	49.76	48.69	22.32
IoTDB-4	10	10	10	1.00E+09	1	100	0.1	48.68	99.14	25.56
IoTDB-5	10	10	10	1.00E+09	1	100	1	14	595.61	45.54
InfluxDB-1	10	10	10	1.00E+09	1	100	0.0001	234.32	40.28	21.63
InfluxDB-2	10	10	10	1.00E+09	1	100	0.001	28.88	341.9	238.1
InfluxDB-3	10	10	10	1.00E+09	1	100	0.01	3.07	3226.87	2664.86
TimescaleDB-1	10	10	10	1.00E+09	1	100	0.0001	42.39	220.57	120.5
TimescaleDB-2	10	10	10	1.00E+09	1	100	0.001	5.8	1502.9	754.15
TimescaleDB-3	10	10	10	1.00E+09	1	100	0.01	1.02	9711.55	7148.69

3. Comparison chart

Overall IoTDB Whether in writing, raw data query or aggregation query, it has almost 10 times the performance of competitor databases, and the hard disk occupies 10 times less than the same database.

Intelligent Recommendation

Apache IoTDB IoT database engine

Apache IoTDB is an open source Internet of Things native database designed to meet the stringent requirements of large-scale Internet of Things and Industrial Internet of Things (IoT and IIoT) applica...

Apache IoTDB server exits abnormally

exited without any exception, because the startup command is incorrect. The startup command in question is:./sbin/start-server.sh & The correct way to start...

Apache IoTDB year of the rat summary

The Year of the Ox is about to enter, and I wish you all a Happy New Year and all the best! This article summarizes this year's work~ The main text is 2430 words, and the expected reading time is 7 mi...

Answer on the Internet of Things - Apache Iotdb

Click aboveblue font,select"Set as a star standard” Reply to "Resources" to get more resources Apache Software Foundation announced in Beijing time to announce an Apache top progr...

Install Apache Iotdb under Linux

Download and installation The official is given three installation methods, where only binary files are installed. Preparation Install the JDK operating environment above 1.8 and configure the environ...