One of the key goals that were in mind when designing the Cassandra PV Archiver was scalability. The Cassandra PV Archiver is designed to work both for very small setups (possibly as small as a single node installation) and very large scale setups (with tens or even hundreds of nodes). By using Apache Cassandra as the data store, the Cassandra PV Archiver can scale linearly, increasing the number of channels that can be handled and the amount of data that can be stored with each node added.
The Cassandra PV Archiver is not just scalable when making the first deployment. In fact, an existing deployment can easily be scaled up by adding more nodes with zero downtime as the demand grows. However, there are a few limitations regarding the data that can be stored for individual channels, of which the administrator should be aware. These limitations are largely instrinsic to the use of Apache Cassandra as the data store, but for some of them there exist workarounds that are described in the next paragraphs.
The archiving of each sample results in an INSERT
statement being executed in the Cassandra database.
As the number of statements that can be executed per second is usually
limited to something in the order of 100,000 statements per second per
node, archiving samples at extremely high rates is typically not a
good idea.
For example, when having channels with an update rate of about 1 kHz,
only about one hundred channels could be archived per node.
In additition to that, samples for the same channel are archived one
after another.
This means that the next sample is only archived once the
INSERT
statement for the precding sample has
finished.
Due to the latency involved in executing each statement, this
effectively limits the rate at which samples for a single channel can
be written.
The workaround for this issue can be implemented by providing a custom
control-system support (see Chapter V, Extending Cassandra PV Archiver) that archives
samples at a lower rate.
For example, a control-system support can choose to accumulate all
samples that are received within a second and then create and archive
a “meta-sample” that actually contains the data of all these samples.
This reduces the number of INSERT
statements
required and can thus reduce the load significantly.
As a side effect, this also resolves the latency problem.
For most scenarios, it should not be necessary to implement this workaround: The Cassandra PV Archiver typically works fine at update rates of about 10 Hz and supervisory control and data acquisition (SCADA) systems rarely deal with significantly higher data rates. Therefore, implementing this workaround only has to be considered when archiving data from a system with exceptionally high update rates.
As described in Section 2, “Data storage”, archived samples are organized in sample buckets. In order to ensure data consistency even in the event of a server crash at a very incovenient point in time, the Cassandra PV Archiver takes special precautions when creating a new sample bucket. These precautions result in a significant overhead when creating a new sample bucket, so that creating a new sample bucket very frequently is not advisable. This means that a channel producing data at a rate of tens of megabytes per second should not be (directly) archived with the Cassandra PV Archiver.
More importantly, the meta-data about which sample buckets exist is
stored in a single partition.
When deleting old samples, the corresponding reference to the sample
bucket is removed by issuing a DELETE
statement
in the database.
In Apache Cassandra, a DELETE
statement results in
a so-called tombstone being written.
When a lot of tombstones accumulate, this can have a significant
impact on read operations, which is why Apache Cassandra aborts a read
opertion when it encounters too many tombstones (please refer to the
Cassandra documentation
for details).
Typically, this is not a problem, but when inserting large amounts of data at comparedly high rates and only retaining this data for a limited amount of time, the number of tombstones generated when deleting old data might actually exceed this limit.
There are two possible workarounds. The first one is changing the
configuration options for Apache Cassandra. By reducing the so-called
GC grace period, tombstones can be discarded earlier so that the
number of tombstones that is accumulated can be reduced.
Please be sure to understand the consequences of this change before
applying it.
It is very important that the periodic
nodetool repair
operation runs more frequently than
the GC grace period.
If not, deleted data can reappear, which in the context of the
Cassandra PV Archiver can result in data corruption.
The other change is increasing the number of tombstones that may be
encountered before aborting a read operation.
Increasing this number has an impact on the memory consumption of read
operations and read operations that encounter many tombstones may run
very slowly.
The second and preferred workaround is to store large amounts of data outside the Apache Cassandra database, for example using a scalable, distributed file-system (like Ceph). Such a solution can be implemented by providing a custom control-system support that stores the raw data in files and archives the meta-data (which file contains the data for a specific sample) using the Cassandra PV Archiver.
As a rule of thumb, you should consider storing the sample data outside the Cassandra database when the average data rate of a single channel exceeds the order of 50 KB per second. The average data rate means the rate averaged over an extended amount of time. For example, having a burst of data at a rate of 5 MB per second for ten seconds is fine when it is typically followed by a period of 30 minutes where virtually no data is archived.