3. CQL table layout

In the database, channels that use the Channel Access control-system support can be identified by having their control-system type set to “channel_access”. The Channel Access control-system support stores all samples in a single table with the name channel_access_samples. The columns of this table are described by Table D.1, “Columns of table channels_access_samples”.

Table D.1. Columns of table channels_access_samples
Column nameColumn typeData typeDescription
channel_data_idPartition keyuuid Channel data ID.
decimation_levelPartition Keyint Decimation level (identified by the decimation period in seconds). Zero indicates raw samples.
bucket_start_timePartition Keybigint Start time of the sample bucket (in nanoseconds since epoch, which is January 1st, 1970, 00:00:00 UTC).
sample_timeClustering Keybigint Time stamp of the sample (in nanoseconds since epoch, which is January 1st, 1970, 00:00:00 UTC).
a_charRegularfrozen<channel_access_array_char> Data for a sample of type DBR_CHAR with more than one element.
a_doubleRegularfrozen<channel_access_array_double> Data for a sample of type DBR_DOUBLE with more than one element.
a_enumRegularfrozen<channel_access_array_enum> Data for a sample of type DBR_ENUM with more than one element.
a_floatRegularfrozen<channel_access_array_float> Data for a sample of type DBR_FLOAT with more than one element.
a_longRegularfrozen<channel_access_array_long> Data for a sample of type DBR_LONG with more than one element.
a_shortRegularfrozen<channel_access_array_short> Data for a sample of type DBR_SHORT with more than one element.
a_stringRegularfrozen<channel_access_array_string> Data for a sample of type DBR_STRING with more than one element.
current_bucket_sizeStaticint Accumulated size (in bytes) of the samples that have been written to the sample bucket so far.
disabledRegularboolean Marker for a sample indicating that the channel was disabled at that point in time.
disconnectedRegularboolean Marker for a sample indicating that the channel was disconnected at that point in time.
gs_charRegularfrozen<channel_access_aggregated_scalar_char> Data for an aggregated sample that has been built from samples of type DBR_CHAR, each having a single element.
gs_doubleRegularfrozen<channel_access_aggregated_scalar_double> Data for an aggregated sample that has been built from samples of type DBR_DOUBLE, each having a single element.
gs_floatRegularfrozen<channel_access_aggregated_scalar_float> Data for an aggregated sample that has been built from samples of type DBR_FLOAT, each having a single element.
gs_longRegularfrozen<channel_access_aggregated_scalar_long> Data for an aggregated sample that has been built from samples of type DBR_LONG, each having a single element.
gs_shortRegularfrozen<channel_access_aggregated_scalar_short> Data for an aggregated sample that has been built from samples of type DBR_SHORT, each having a single element.
s_charRegularfrozen<channel_access_scalar_char> Data for a sample of type DBR_CHAR with a single element.
s_doubleRegularfrozen<channel_access_scalar_double> Data for a sample of type DBR_DOUBLE with a single element.
s_enumRegularfrozen<channel_access_scalar_enum> Data for a sample of type DBR_ENUM with a single element.
s_floatRegularfrozen<channel_access_scalar_float> Data for a sample of type DBR_FLOAT with a single element.
s_longRegularfrozen<channel_access_scalar_long> Data for a sample of type DBR_LONG with a single element.
s_shortRegularfrozen<channel_access_scalar_short> Data for a sample of type DBR_SHORT with a single element.
s_stringRegularfrozen<channel_access_scalar_string> Data for a sample of type DBR_STRING with a single element.

The channel_data_id, decimation_level, and bucket_start_time form a composite partition key that identifies the sample bucket. These parameters are passed to the control-system support by the Cassandra PV Archiver server and are simply used “as-is”.

The sample_time is used as the clustering key. This way, it is easily possible to select only those samples from a sample bucket that have a time stamp within a specific interval.

The current_bucket_size is a static column because it obviously is the same for the whole sample bucket. This column is updated by the control-system support each time a sample is added to the sample bucket.

All other columns are used for storing the sample’s data. For each sample, exactly one of these columns has a non-null value. The disabled and disconnected columns are simple boolean columns. If one of them is true, it means that the sample is a marker of the corresponding type. Each column that stores a regular (non-marker) sample uses a user-defined type (UDT) that is only used by that column.

[Note]Note

The names of the data columns have intentionally been chosen to be very short. The reason for this is simple: Due to how regular columns are internally handled by Cassandra, the column name is serialized for each row. When there are many rows, a long column name can contribute to the total data size significantly. Most of this overhead is compensated by the compression that is applied to SSTables before storing them on disk. However, the sample bucket size that is limited to about 100 MB is measured before applying the compression. For this reason, longer column names would significantly reduce the number of samples that could be stored in each sample bucket.

User-defined types (UDTs) are used for the same reason: When the various fields that are needed to store a sample would be represented as separate columns, the overhead that is caused by the meta-data for each column would increase the total data size significantly. Frozen UDTs, on the other hand, are as efficient as frozen tuples, allowing for the space-efficient storage of sample data while having human-readable names for their fields.

The UDTs that are used by the Channel Access control-system support all share a similar structure. The fields that may be present in these UDTs are listed in Table D.2, “Fields of the user-defined types”.

Table D.2. Fields of the user-defined types
Field nameData typeDescription
valuedepends on UDT sample’s value.
stddouble standard deviation for an aggregated sample.
mindouble least original value for an aggregated sample.
maxdouble greatest original value for an aggregated sample.
covered_period_fractiondouble fraction of the period that is actually covered by the data in the aggregated sample. A value of 1.0 means that the data that was used to calculate the aggregated sample actually covers the full period that is supposed to be represented by the aggregated sample. A value of 0.5 means that the data that was used to calculate the aggregated sample actually only covers half of the period that is supposed to be covered by the aggregated sample.
alarm_severitysmallint alarm severity (0 means NO_ALARM, 1 means MINOR, 2 means MAJOR, 3 means INVALID).
alarm_statussmallint alarm status (the number is the status code that is used by the Channel Access protocol to signal the corresponding alarm status).
precisionsmallint display precision for floating point numbers.
unitstext engineering units.
labelsfrozen<list<text>> labels for enum states.
lower_warning_limitdepends on UDT lower warning limit.
upper_warning_limitdepends on UDT upper warning limit.
lower_alarm_limitdepends on UDT lower alarm limit.
upper_alarm_limitdepends on UDT upper alarm limit.
lower_display_limitdepends on UDT lower display limit.
upper_display_limitdepends on UDT upper display limit.
lower_control_limitdepends on UDT lower control limit.
upper_control_limitdepends on UDT upper control limit.

Not all of these fields are present in each UDT. The value, alarm_severity, and alarm_status fields are the only ones that are present in all UDTs. The std, min, max, and covered_period_fraction fields are only present in the channel_access_aggregated_* UDTs. The precision field is only present in UDTs representing samples of a floating-point type. The units, lower_warning_limit, upper_warning_limit, lower_alarm_limit, upper_alarm_limit, lower_display_limit, upper_display_limit, lower_control_limit, and upper_control_limit fields are only present in UDTs that represent samples of a numeric type. The labels field is only present in the channel_access_array_enum and channel_access_scalar_enum UDTs.

The type of the value field depends on the type of the sample that is represented by the UDT. The same applies to the lower_warning_limit, upper_warning_limit, lower_alarm_limit, upper_alarm_limit, lower_display_limit, upper_display_limit, lower_control_limit, and upper_control_limit fields. The types used for those fields are listed in Table D.3, “Type of UDT fields”.

Table D.3. Type of UDT fields
User-defined typeValue field typeLimit fields type
channel_access_aggregated_scalar_chardoubletinyint
channel_access_aggregated_scalar_doubledoubledouble
channel_access_aggregated_scalar_floatdoublefloat
channel_access_aggregated_scalar_longdoubleint
channel_access_aggregated_scalar_shortdoublesmallint
channel_access_array_charblobtinyint
channel_access_array_doubleblobdouble
channel_access_array_enumblobn/a
channel_access_array_floatblobfloat
channel_access_array_longblobint
channel_access_array_shortblobsmallint
channel_access_array_stringblobn/a
channel_access_scalar_chartinyinttinyint
channel_access_scalar_doubledoubledouble
channel_access_scalar_enumsmallintn/a
channel_access_scalar_floatfloatfloat
channel_access_scalar_longintint
channel_access_scalar_shortsmallintsmallint
channel_access_scalar_stringtextn/a

For aggregated samples, the value field is always of type double because it stores the mean of all source samples. The array types store the value elements in a blob. The reason for this is that Cassandra’s list type comes with an overhead that is significant when representing a large number of elements as it is commonly encountered for Channel Access channels that have array values.

Storing these arrays inside a blob is very efficient because the size occupied by each element is not more than the element’s actual size (e.g. two bytes for a each element of a DBR_SHORT sample). The numbers inside the blob are stored in big endian format, so that when using Java, they can easily be converted back to numbers by interpreting the ByteBuffer representing the blob as a buffer of numbers (e.g. an IntBuffer for samples of type DBR_LONG).

For array samples of type DBR_STRING, the blob stores 40 bytes for each element. These 40 bytes represent the raw value as it has been received from the Channel Access server.

The complete list of Java element and buffer types that correspond to the data stored in the value fields of the array UDTs is given by Table D.4, “Java types corresponding to blobs storing sample values”.

Table D.4. Java types corresponding to blobs storing sample values
User-defined typeJava element typeJava buffer type
channel_access_array_charbyteByteBuffer
channel_access_array_doubledoubleDoubleBuffer
channel_access_array_enumshortShortBuffer
channel_access_array_floatfloatFloatBuffer
channel_access_array_longintIntBuffer
channel_access_array_shortshortShortBuffer
channel_access_array_stringbyte[40]ByteBuffer