3. Server configuration

The configuration options used by the Cassandra PV Archiver server are controlled through a configuration file in the YAML format. The configuration file is located in the conf directory of the binary distribution or in the /etc/cassandra-pv-archiver directory when using the Debian package. In either case, the configuration file is called cassandra-pv-archiver.yaml. It is not an error if the configuration file does not exists at the expected location. In this case the server starts using default values for all configuration options.

The path to the configuration file can be overridden by specifying the --config-file command line option to the cassandra-pv-archiver-server script. When this configuration option is specified, the default location is not used. Unlike the configuration file in the default location, a configuration file specified with --config-file option must exist and the server does not start if it is missing.

The configuration options are organized in a hierarchy. For the rest of this document, the first level of this hierarchy is called the section. The hierarchical path to a configuration option can either be specified inline or through indentation. For example, specifying

level1a:
  option1: value1
  level2:
    option1: value2
level1b:
  option1: value3

is equivalent to specifying

level1a.option1: value1
level1a.level2.option1: value2
level1b:option1: value3

The default values specified in this document are the default values that are used when a configuration option is not specified at all, not the value of the option that is specified in the configuration file distributed as part of the binary distribution or Debian package.

This section only describes the part of the configuration that is stored in the per-server configuration file, not the configuration that is stored in the database. Regarding the latter one, please refer to Section 4, “Administrative user interface”.

3.1. Cassandra cluster

The cassandra section configures the server’s connection to the Cassandra cluster.

Hosts

The cassandra.hosts option specifies the list of hosts which are used for initially establishing the connection with the Cassandra cluster. This list does not have to contain all Cassandra hosts because all hosts in the cluster are detected automatatically once the connection to at least one host has been established. However, it is still a good idea to specify more than one host here because this will ensure that the connection can be established even if one of the hosts is down when the Cassandra PV Archiver server is started.

By default, the list only contains localhost. The list of hosts has to be specified as a YAML list, using the regular or the inline list syntax. For example, a list specifying three hosts might look like this:

cassandra:
  hosts:
    - server1.example.com
    - server2.example.com
    - server3.example.com

Port

The cassandra.port option specifies the port number on which the Cassandra hosts are listening for incoming connections (for Cassandra’s native protocol). The default value is 9042, which is also the default value used by Cassandra.

Keyspace

The cassandra.keyspace option specifies the name of the keyspace in which the Cassandra PV Archiver stores its data. The default value is pv_archive. While strictly speaking mixed-case names are allowed, the use of such names is discouraged because many tools have problem with them and they typically require quoting. For this reason, the keyspace name should be all lower-case when possible.

Username

The cassandra.username option specifies the username that is specified when authenticating with the Cassandra cluster. When empty, the connection to the Cassandra cluster is established without trying to authenticate the client. The default value is the empty string (no authentication).

Password

The cassandra.password option specifies the password that is specified when authenticating with the Cassandra cluster. The password is only used when the username is not empty. The default value is the empty string.

Use local consistency level

The cassandra.useLocalConsistencyLevel option specifies the consistency level that is used for all database operations. The default value is false. This option only has an effect when the Cassandra cluster is distributed across multiple data centers. By setting this option to true, the LOCAL_QUORUM consistency level is used where usually the QUORUM consistency level would be used. In the same way, the LOCAL_SERIAL consistency level is used instead of the SERIAL consistency level.

This option must only be enabled if only a single data center makes modifications to the data and all other data centers only use the database for read access. In this case, enabling this option can reduce the latency of operations because the client only has to wait for nodes local to the data center. The most likely scenario is a situation where all nodes running the Cassandra PV Archiver servers are in a single data center, but there is a second data center to which all data is replicated for disaster recovery.

[Important]Important

Never enable this option when there is more than one data center that is used for write access to the database. In this case, enabling this option will lead to data corruption because operations that are expected to result in a consistent state might actually leave inconsistencies.

This option merely provides a performance optimization, so in case of doubt, leave it at its default value of false.

3.2. Archiving server

The server section configures the archiving server (for example the ID assigned to each server instance and on which address and ports the archiving server listens). While the address and port settings can usually be left at their defaults the server’s ID has to be set.

Server UUID

Each server in the cluster is identified by a unique ID (UUID). As this UUID has to be unique for each server, there is no reasonable default value, but it has to be specified explicitly. The server’s UUID can be specified using the server.uuid option. Alternatively, it can be specified by passing the --server-uuid parameter to the server’s start script.

[Important]Important

Starting two server instances with the same UUID results in data corruption, regardless of whether these instances are started on the same host or different hosts. For this reason, care should be taken to ensure that each UUID is only used for exactly one process.

Server UUID file

As an alternative to specifying the server’s UUID in the configuration file or on the command line, it is possible to have a separate file that specifies the UUID. The path to this file can be specified with the server.uuidFile option. If this file exists, it is expected to contain a single line with the UUID that is then used as the server’s UUID. If this file does not exist, the server tries to create it on startup, using a randomly generated UUID. By default this option is not set so that the server expects an explicitly specified UUID. This option is particularly useful in an environment where servers are deployed automatically and should thus automatically generate a UUID the first time they are started.

Listen address

The server.listenAddress option specifies the IP address (or the hostname resolving to the IP address) on which the server listens for incoming connections. If it is empty (the default), the server listens on the first non-loopback address that is found. This means that typically, this option only has to be set for servers that have more than one (non-loopback) interface.

The specified address is used for the administrative user-interface, the archive-access interface, and the inter-node communication interface. In addition to the specified address, the administrative user-interface and the archive-access interface are also made available on the loopback address.

This option should never be set to localhost, 127.0.0.1, ::1, or any other loopback address because other servers will try to contact the server on the specified address and obviously this will lead to unexpected results when the address is a loopback address.

Admin port

The server.adminPort option specifies the TCP port number on which the administrative user-interface is made available. The default is port 4812.

Archive access port

The server.archiveAccessPort option specifies the TCP port number on which the archive-access interface is made available. The default is port 9812. The archive-access interface is the web-interface through which clients access the data stored in the archive.

Inter-node communication port

The server.interNodeCommunicationPort option specifies the TCP port number on which the inter-node communication interface is made available. The default is port 9813. Like the name suggests, the inter-node communication interface is used for internal communication between Cassandra PV Archiver servers that is needed in order to coordinate the cluster operation (for example in case of configuration changes).

3.3. Throttling

The throttling section contains options for throttling database operations. The Cassandra PV Archiver server tries to run database operations in parallel in order to reduce the effective latency of complex operations (e.g. operations involing many channels). However, depending on the exact configuration of the Cassandra cluster (for example the size of the cluster, network bandwidth and latency, hardware used for the cluster, load caused by other applications), the number of operations that can safely be run in parallel might differ.

When running too many operations in parallel, this results in some of the operations timing out. This can be avoided by reducing the number of operations allowed to run in parallel. On the other hand, when operations never time out, one might try to increase the limits in order to improve the performance.

The limits can be controlled separately for read and write operations and for operations touching the channels’ meta-data (for example the configuration and information about sample buckets) and the actual samples. Operations modifying channel meta-data are typically carried out using the SERIAL consistency level, so in this case write operations typically are more expensive than read operations. Thus the limit for write operations should be lower than the limit for read operations. In the case of operations dealing with actual samples, read operations typically are more expensive than write operation (due to how Cassandra works internally), so the limit for read operations shold be lower than the limit for write operations.

[Note]Note

When trying to optimize the throttling settings, it can be helpful to connect to the Cassandra PV Archiver server via JMX (for example using JConsole from the JDK). The current number of operations that are running and waiting is exposed via MBeans, so that it is possible to monitor how changing the throttling parameters affects the operation.

Max. concurrent channel meta-data read statements

The throttling.maxConcurrentChannelMetaDataReadStatements configuration option controls how many read operations for channel meta-data should be allowed to run in parallel. Usually, these are statements reading from the channels, channels_by_server, and pending_channel_operations_by_server tables. Typically, this limit should be greater than the limit set by the throttling.maxConcurrentChannelMetaDataWriteStatements option. The default value is 64.

Max. concurrent channel meta-data write statements

The throttling.maxConcurrentChannelMetaDataWriteStatements configuration option controls how many write operations for channel meta-data should be allowed to run in parallel. Usually, these are statements writing to the channels, channels_by_server, and pending_channel_operations_by_server tables. Typically, such operations are light-weight transactions and thus this limit should be less than the limit set by the throttling.maxConcurrentChannelMetaDataReadStatements option. The default value is 16.

Max. concurrent control-system support read statements

The throttling.maxConcurrentControlSystemSupportReadStatements configuration option controls how many read operations the control-system supports (all of them combined) are allowed to run in parallel. Usually, these are statements that read actual samples and thus read from the tables used by the control-system support(s). Typically, this limit should be less than the limit set by the throttling.maxConcurrentControlSystemSupportWriteStatements option, but significantly greater than the limit set by the throttling.maxConcurrentChannelMetaDataReadStatements option. The default value is 128.

Max. concurrent control-system support write statements

The throttling.maxConcurrentControlSystemSupportWriteStatements configuration option controls how many write operations the control-system supports (all of them combined) are allowed to run in parallel. Usually, these are statements that write actual samples (for each sample that is written, an INSERT statement is triggered) and that thus write to the tables used by the control-system support(s). Typically, this limit should be greater than the limit set by the throttling.maxConcurrentControlSystemSupportReadStatements option and significantly greater than the limits set by the throttling.maxConcurrentChannelMetaDataReadStatements and throttling.maxConcurrentChannelMetaDataWriteStatements options. The default value is 512.

3.4. Control-system supports

The controlSystemSupport section contains the configuration options for the various control-system supports. For each available control-system support, this section has a corresponding sub-section. The configuration options in these sub-sections are not handled by the Cassandra PV Archiver server itself but passed as-is to the respective control-system support. For this reason, the names of the available options entirely depend on the respective control-system support. Please refer to the documentation of the respective control-system support for details. For example, the documentation for the Channel Access control-system support is available in Appendix D, Channel Access control-system support.

3.5. Logging

The Cassandra PV Archiver server is based on the Spring Boot framework. For this reason, the options supported for configuring logging are actually the same ones that are supported by Spring Boot. These options are documented in the Spring Boot Reference Guide. The Cassanra PV Archiver server uses Logback as its logging backend, so the specifics of how to configure Logback for Spring Boot might also be interesting.

In order to get started more easily, this section contains a few pointers on how the logging configuration can be modified.

Log levels

The log level can be set both globally and for specific subtrees of the class hierarchy. When specifying different log levels for different parts of the hierarchy, more specific definitions (the ones covering a smaller sub-tree of the hierarchy) take precedence over more general definitions.

The available log levels are ERROR, WARN, INFO, DEBUG, and TRACE. Each log level contains the preceding log levels (for example the log level INFO also contains ERROR and WARN).

The log level for the root of the hierarchy (that is used for all loggers that do not have a more specific definition) is set through the logging.root.level option. By default, this log level is set to INFO. This results in a lot of diagnostic messages being logged, so you might want to consider reducing it to WARN.

The log level for individual parts of the hierarchy can be set by using a configuration option containing the path to the respective hierarchy level. For example, in order to enable DEBUG messages for all classes in the com.aquenos.cassandra.pvarchiver package (and its sub-packages), one could set logging.com.aquenos.cassandra.pvarchiver.level to DEBUG.

Log file

The path to the log file can be specified using the logging.file option. If no log file is specified (the default), log messages are only written to the standard output. In order to log to more than one log file (for example depending on the log level or the class writing the log message) or in order to disable logging to the standard output, one has to specify a custom logback configuration file (see the next section).

Logging configuration file

When the configuration options directly available through the Cassandra PV Archiver server configuration-file are not sufficient, one can specify a custom Logback configuration file. The path to this file is specified using the logging.config option. The information available in the Spring Boot Reference Guide might be useful when using this option.

3.6. Environment variables

In addition to the configuration options that can be specified in the server’s configuration file, there are two environment variables that can be passed to the server’s startup script. When using the Debian package, these environment variables should be set in the file /etc/default/cassandra-pv-archiver-server.

The first environment variable is JAVA_HOME. It specifies the path to the JRE. When starting the Java process, the server’s startup scripts uses the $JAVA_HOME/bin/java executable (%JAVA_HOME%/bin/java.exe on Windows). When JAVA_HOME is not set, the startup script uses the java executable that is in the search PATH of the shell executing the startup script.

The second environment variable is JAVA_OPTS. When set, the value of this environment variable is added to the parameters passed to the java executable. It can be used to configure JVM options like the maximum heap size.