3. CQL table layout

3. CQL table layout
Prev	Appendix D. Channel Access control-system support

In the database, channels that use the Channel Access control-system support can be identified by having their control-system type set to “channel_access”. The Channel Access control-system support stores all samples in a single table with the name channel_access_samples. The columns of this table are described by Table D.1, “Columns of table channels_access_samples”.

Table D.1. Columns of table channels_access_samples

Column name	Column type	Data type	Description
channel_data_id	Partition key	uuid	Channel data ID.
decimation_level	Partition Key	int	Decimation level (identified by the decimation period in seconds). Zero indicates raw samples.
bucket_start_time	Partition Key	bigint	Start time of the sample bucket (in nanoseconds since epoch, which is January 1^st, 1970, 00:00:00 UTC).
sample_time	Clustering Key	bigint	Time stamp of the sample (in nanoseconds since epoch, which is January 1^st, 1970, 00:00:00 UTC).
a_char	Regular	frozen<channel_access_array_char>	Data for a sample of type `DBR_CHAR` with more than one element.
a_double	Regular	frozen<channel_access_array_double>	Data for a sample of type `DBR_DOUBLE` with more than one element.
a_enum	Regular	frozen<channel_access_array_enum>	Data for a sample of type `DBR_ENUM` with more than one element.
a_float	Regular	frozen<channel_access_array_float>	Data for a sample of type `DBR_FLOAT` with more than one element.
a_long	Regular	frozen<channel_access_array_long>	Data for a sample of type `DBR_LONG` with more than one element.
a_short	Regular	frozen<channel_access_array_short>	Data for a sample of type `DBR_SHORT` with more than one element.
a_string	Regular	frozen<channel_access_array_string>	Data for a sample of type `DBR_STRING` with more than one element.
current_bucket_size	Static	int	Accumulated size (in bytes) of the samples that have been written to the sample bucket so far.
disabled	Regular	boolean	Marker for a sample indicating that the channel was disabled at that point in time.
disconnected	Regular	boolean	Marker for a sample indicating that the channel was disconnected at that point in time.
gs_char	Regular	frozen<channel_access_aggregated_scalar_char>	Data for an aggregated sample that has been built from samples of type `DBR_CHAR`, each having a single element.
gs_double	Regular	frozen<channel_access_aggregated_scalar_double>	Data for an aggregated sample that has been built from samples of type `DBR_DOUBLE`, each having a single element.
gs_float	Regular	frozen<channel_access_aggregated_scalar_float>	Data for an aggregated sample that has been built from samples of type `DBR_FLOAT`, each having a single element.
gs_long	Regular	frozen<channel_access_aggregated_scalar_long>	Data for an aggregated sample that has been built from samples of type `DBR_LONG`, each having a single element.
gs_short	Regular	frozen<channel_access_aggregated_scalar_short>	Data for an aggregated sample that has been built from samples of type `DBR_SHORT`, each having a single element.
s_char	Regular	frozen<channel_access_scalar_char>	Data for a sample of type `DBR_CHAR` with a single element.
s_double	Regular	frozen<channel_access_scalar_double>	Data for a sample of type `DBR_DOUBLE` with a single element.
s_enum	Regular	frozen<channel_access_scalar_enum>	Data for a sample of type `DBR_ENUM` with a single element.
s_float	Regular	frozen<channel_access_scalar_float>	Data for a sample of type `DBR_FLOAT` with a single element.
s_long	Regular	frozen<channel_access_scalar_long>	Data for a sample of type `DBR_LONG` with a single element.
s_short	Regular	frozen<channel_access_scalar_short>	Data for a sample of type `DBR_SHORT` with a single element.
s_string	Regular	frozen<channel_access_scalar_string>	Data for a sample of type `DBR_STRING` with a single element.

The channel_data_id, decimation_level, and bucket_start_time form a composite partition key that identifies the sample bucket. These parameters are passed to the control-system support by the Cassandra PV Archiver server and are simply used “as-is”.

The sample_time is used as the clustering key. This way, it is easily possible to select only those samples from a sample bucket that have a time stamp within a specific interval.

The current_bucket_size is a static column because it obviously is the same for the whole sample bucket. This column is updated by the control-system support each time a sample is added to the sample bucket.

All other columns are used for storing the sample’s data. For each sample, exactly one of these columns has a non-null value. The disabled and disconnected columns are simple boolean columns. If one of them is true, it means that the sample is a marker of the corresponding type. Each column that stores a regular (non-marker) sample uses a user-defined type (UDT) that is only used by that column.

	Note
The names of the data columns have intentionally been chosen to be very short. The reason for this is simple: Due to how regular columns are internally handled by Cassandra, the column name is serialized for each row. When there are many rows, a long column name can contribute to the total data size significantly. Most of this overhead is compensated by the compression that is applied to SSTables before storing them on disk. However, the sample bucket size that is limited to about 100 MB is measured before applying the compression. For this reason, longer column names would significantly reduce the number of samples that could be stored in each sample bucket. User-defined types (UDTs) are used for the same reason: When the various fields that are needed to store a sample would be represented as separate columns, the overhead that is caused by the meta-data for each column would increase the total data size significantly. Frozen UDTs, on the other hand, are as efficient as frozen tuples, allowing for the space-efficient storage of sample data while having human-readable names for their fields.

Note

The names of the data columns have intentionally been chosen to be very short. The reason for this is simple: Due to how regular columns are internally handled by Cassandra, the column name is serialized for each row. When there are many rows, a long column name can contribute to the total data size significantly. Most of this overhead is compensated by the compression that is applied to SSTables before storing them on disk. However, the sample bucket size that is limited to about 100 MB is measured before applying the compression. For this reason, longer column names would significantly reduce the number of samples that could be stored in each sample bucket.

User-defined types (UDTs) are used for the same reason: When the various fields that are needed to store a sample would be represented as separate columns, the overhead that is caused by the meta-data for each column would increase the total data size significantly. Frozen UDTs, on the other hand, are as efficient as frozen tuples, allowing for the space-efficient storage of sample data while having human-readable names for their fields.

The UDTs that are used by the Channel Access control-system support all share a similar structure. The fields that may be present in these UDTs are listed in Table D.2, “Fields of the user-defined types”.

Table D.2. Fields of the user-defined types

Field name	Data type	Description
value	depends on UDT	sample’s value.
std	double	standard deviation for an aggregated sample.
min	double	least original value for an aggregated sample.
max	double	greatest original value for an aggregated sample.
covered_period_fraction	double	fraction of the period that is actually covered by the data in the aggregated sample. A value of `1.0` means that the data that was used to calculate the aggregated sample actually covers the full period that is supposed to be represented by the aggregated sample. A value of `0.5` means that the data that was used to calculate the aggregated sample actually only covers half of the period that is supposed to be covered by the aggregated sample.
alarm_severity	smallint	alarm severity (`0` means `NO_ALARM`, `1` means `MINOR`, `2` means `MAJOR`, `3` means `INVALID`).
alarm_status	smallint	alarm status (the number is the status code that is used by the Channel Access protocol to signal the corresponding alarm status).
precision	smallint	display precision for floating point numbers.
units	text	engineering units.
labels	frozen<list<text>>	labels for enum states.
lower_warning_limit	depends on UDT	lower warning limit.
upper_warning_limit	depends on UDT	upper warning limit.
lower_alarm_limit	depends on UDT	lower alarm limit.
upper_alarm_limit	depends on UDT	upper alarm limit.
lower_display_limit	depends on UDT	lower display limit.
upper_display_limit	depends on UDT	upper display limit.
lower_control_limit	depends on UDT	lower control limit.
upper_control_limit	depends on UDT	upper control limit.

Not all of these fields are present in each UDT. The value, alarm_severity, and alarm_status fields are the only ones that are present in all UDTs. The std, min, max, and covered_period_fraction fields are only present in the channel_access_aggregated_* UDTs. The precision field is only present in UDTs representing samples of a floating-point type. The units, lower_warning_limit, upper_warning_limit, lower_alarm_limit, upper_alarm_limit, lower_display_limit, upper_display_limit, lower_control_limit, and upper_control_limit fields are only present in UDTs that represent samples of a numeric type. The labels field is only present in the channel_access_array_enum and channel_access_scalar_enum UDTs.

The type of the value field depends on the type of the sample that is represented by the UDT. The same applies to the lower_warning_limit, upper_warning_limit, lower_alarm_limit, upper_alarm_limit, lower_display_limit, upper_display_limit, lower_control_limit, and upper_control_limit fields. The types used for those fields are listed in Table D.3, “Type of UDT fields”.

Table D.3. Type of UDT fields

User-defined type	Value field type	Limit fields type
channel_access_aggregated_scalar_char	double	tinyint
channel_access_aggregated_scalar_double	double	double
channel_access_aggregated_scalar_float	double	float
channel_access_aggregated_scalar_long	double	int
channel_access_aggregated_scalar_short	double	smallint
channel_access_array_char	blob	tinyint
channel_access_array_double	blob	double
channel_access_array_enum	blob	n/a
channel_access_array_float	blob	float
channel_access_array_long	blob	int
channel_access_array_short	blob	smallint
channel_access_array_string	blob	n/a
channel_access_scalar_char	tinyint	tinyint
channel_access_scalar_double	double	double
channel_access_scalar_enum	smallint	n/a
channel_access_scalar_float	float	float
channel_access_scalar_long	int	int
channel_access_scalar_short	smallint	smallint
channel_access_scalar_string	text	n/a

For aggregated samples, the value field is always of type double because it stores the mean of all source samples. The array types store the value elements in a blob. The reason for this is that Cassandra’s list type comes with an overhead that is significant when representing a large number of elements as it is commonly encountered for Channel Access channels that have array values.

Storing these arrays inside a blob is very efficient because the size occupied by each element is not more than the element’s actual size (e.g. two bytes for a each element of a DBR_SHORT sample). The numbers inside the blob are stored in big endian format, so that when using Java, they can easily be converted back to numbers by interpreting the ByteBuffer representing the blob as a buffer of numbers (e.g. an IntBuffer for samples of type DBR_LONG).

For array samples of type DBR_STRING, the blob stores 40 bytes for each element. These 40 bytes represent the raw value as it has been received from the Channel Access server.

The complete list of Java element and buffer types that correspond to the data stored in the value fields of the array UDTs is given by Table D.4, “Java types corresponding to blobs storing sample values”.

Table D.4. Java types corresponding to blobs storing sample values

User-defined type	Java element type	Java buffer type
channel_access_array_char	byte	ByteBuffer
channel_access_array_double	double	DoubleBuffer
channel_access_array_enum	short	ShortBuffer
channel_access_array_float	float	FloatBuffer
channel_access_array_long	int	IntBuffer
channel_access_array_short	short	ShortBuffer
channel_access_array_string	byte[40]	ByteBuffer

Prev	Up
2. Decimated samples	Home