The Cassandra PV Archiver stores its data in several CQL tables, listed in Table A.1, “Cassandra PV Archiver CQL tables”.
|cluster_servers||Status and location information for Cassandra PV Archiver servers|
|pending_channel_operations_by_server||Protective entries for channels in order to prevent concurrent modifications|
|channels||Channel configuration and information about sample buckets for each channel|
|channels_by_server||Channel configuration and state for all channels associated with each server (for faster startup)|
|generic_data_store||Generic configuration information (e.g. credentials for the administrative user interface)|
In addition to these tables, each control-system support has one ore more
Please refer to the documentation of the respective control-system support
Most of the tables listed earlier are considered internal to the operation
of the Cassandra PV Archiver and thus are not discussed in greater detail.
channels table is relevant for accessing data
stored in the archive. This table is discussed in
Section 1, “Table channels”.
channels table stores configuration information
and information about sample buckets for each channel.
The table’s structure is described by
Table A.2, “Columns of table channels”.
|Column name||Column type||Data type||Description|
|channel_name||Partition key||text||Channel name.|
|decimation_level||Clustering Key||int||Decimation level (identified by the decimation period in seconds). Zero indicates raw samples.|
|bucket_start_time||Clustering Key||bigint||Start time of the sample bucket (in nanoseconds since epoch, which is January 1st, 1970, 00:00:00 UTC).|
|bucket_end_time||Regular||bigint||End time of the sample bucket (in nanoseconds since epoch, which is January 1st, 1970, 00:00:00 UTC).|
|channel_data_id||Static||uuid||Data ID associated with the channel. This information is used to identify associated data in the control-system support’s table(s).|
|control_system_type||Static||text||ID of the control-system support used for the channel.|
|decimation_levels||Static||set<int>||Set containing all decimation levels that exist for the channel (identified by their decimation periods in seconds).|
|server_id||Static||uuid||UUID of the server to which the channel belongs.|
The channel name is used as the partition key and the decimation level and bucket start time are used as clustering keys. This means that for each channel, there is a partition and for each sample bucket there is a row in this partition. The ordering of the clustering keys (decimation level first, bucket start time second) makes it possible to search for sample buckets for a specific decimation level that are in a certain time range. All configuration information is stored in static columns (columns that are shared among all rows in the partition) because this information obviously does not depend on the sample bucket.
The bucket end time is a regular column and thus it is not possible to search by end time. However, the end time is typically just one nanosecond before the start time of the following bucket (it is guaranteed to be strictly less than the start time of the next bucket). Therefore, there is usually no need to search based on the end time.
When reading samples, one has to search for the sample buckets that store the samples for the relevant period of time. One can use a query like the following to search for all sample buckets that start in a certain period of time:
SELECT * FROM channels WHERE channel_name = 'myChannel' AND decimation_level = 0 AND bucket_start_time >= 1468429000000000000 AND bucket_start_time <= 1468431000000000000 ORDER BY decimation_level ASC;
In this example,
myChannel is the name of the channel
and we search for sample buckets storing raw samples (decimation period of
0) and starting between the time stamps
It might seem strange to order by the
decimation_level column when we actually want to
order by the
However, Cassandra (currently) only allows specifying the first column
of a composite clustering key in the
ORDER BY clause.
ORDER BY clause still has the intended effect of
(also) ordering by the
Typically, one also needs the sample bucket that starts before the lower time-stamp, unless there is a sample bucket starting right at the lower limit of the search period, which will only happen by chance. One can retrieve information about this sample bucket with a query like the following:
SELECT * FROM channels WHERE channel_name = 'myChannel' AND decimation_level = 0 AND bucket_start_time < 1468429000000000000 ORDER BY decimation_level DESC LIMIT 1;
We are only interested in the first sample bucket just before our lower limit, which is why we use descending order and limit the results to a single row.
Once we know the sample buckets, we can retrieve the corresponding
samples from the control-system support’s table(s).
We need the
bucket_start_time in order to identify the sample
bucket in the control-system support’s table(s).
When querying these tables, the time stamp of the samples should be
limited to the range specified by the
bucket_end_time, unless the limits imposed by the
time period that is queried are more narrow.
Always using these limits ensures that we do not read samples that have
accidentally been written into a sample bucket where they do not belong.
Usually, such samples should not exist, but it is better to be safe.