Table of Contents
engineConfiguration
engineConfigurationToGroups
groupConfiguration
groupConfigurationToChannels
channelConfiguration
channelConfigurationToCompressionLevels
compressionLevelConfiguration
samples
This manual is divided in four chapters (not counting this introduction). The first chapter introduces the concepts of Apache Cassandra in general and the Cassandra Archiver in particular. The second chapter describes the few steps needed to setup a basic installation of the Cassandra Archiver. The third chapter gives more detailed instructions on how to install the Cassandra Archiver. Finally, the fourth chapter explains how to configure the archiver.
Table of Contents
engineConfiguration
engineConfigurationToGroups
groupConfiguration
groupConfigurationToChannels
channelConfiguration
channelConfigurationToCompressionLevels
compressionLevelConfiguration
samples
This chapter introduces the concepts behind the Cassandra Archiver and and Apache Cassandra. First column-oriented database systems and Apache Cassanra are presented shortly. Subsequently, the Cassandra Archiver for Control System Studio is introduced. Finally, the structure of the keyspace storing data for the Cassandra Archiver is explained. You might want to skip this last section if you are reading this manual for the first time and just interested in getting started with the Cassandra Archiver.
Apache Cassandra is a column-oriented database management system (CDBMS), which is optimized for storing large amounts (tera- or even petabytes) of data grouped in column-families. It is a a special form of a key-value store. Unlike a relational database management system (RDBMS) it is not optimized for storing relational data or modifying data in a transactional way. The main advantages of a CDBMS compared to a RDBMS are superior read-write performance, linear scalability and high availability at low operation costs.
In a CDBMS data is stored in column families. Each column family contains an arbitrary number of rows, which (for a multi-node setup) are distributed over all cluster nodes. Each row is identified by a unique row key. Each row contains one or more columns. Each column is identified by a column name, that must be unique for the respective row. Each column can but does not have to store a value. Row keys, column names and column values are stored as array of bytes. The meaning of the bytes depend on the application accessing the database. Therefore, data-types are completely transparent to the CDBMS.
In a multi-node setup, data is distributed across the nodes, so that the amount of data stored is not limited by the disk-space of a single computer. Typically, low-price servers which are not fault-tolerant are used and the same data is stored on multiple nodes (typically three). The database clients and servers have built-in facilities, that automatically switch to a different node, if the first node fails. Therefore, the database cluster is fault-tolerant and highly availabe, although cheap, unreliable computers are used.
This document does not provide a detailed introduction into column-oriented database management systems or Apache Cassandra. For understanding the concepts of a CDBMS, the original paper about Google Bigtable by Chang et al. is a good starting point.
If you want to setup a cluster of Cassandra servers or are interested in advanced configuration options and performance tuning, you should read the Apache Cassandra Documentation provided by DataStax. However, this manual describes the basic steps needed to setup a single-node Cassandra cluster for getting started with the Cassandra Archiver.
The Cassandra Archiver for Control System Studio is a set of plugins that extend the existing archive reader and writer architecture so that a database hosted by Apache Cassandra can be used instead of a traditional RDBMS like MySQL or Oracle.
By using a column-oriented database management system, huge amounts of channel samples can be archived. The Cassandra Archiver uses one column family for storing all channel samples. Each row stores one sample and is identified by a key that aggregates the channel name, the time-stamp of the sample and the compression-level name. The columns of each row store the sample's value and meta-data (e.g. alarm severity). As the data in the samples column-family is compressed before being written to disk, the space requirements of the database are reduced. Due to the way Cassandra stores data, the read and write perfomance of the database is not reduced by using compression. In fact, using compression can even slightly increase the data throughput.
The Cassandra Archiver can be regarded as a hybrid between the RDB Archiver and the Channel Archiver. Like the RDB Archiver, the Cassandra Archiver uses an existing, well-tested database management system for storing data. However, like the Channel Archiver, the Cassandra Archiver uses a storage format that is more optimized for storing channel samples and can provide high write and read rate.
The HyperArchiver uses a similar concept as the Channel Archiver. However, it uses Hypertable to store the samples and MySQL to store the configuration, while the Cassandra Archiver stores the configuration and the samples in the same database, simplifying installation and maintenance. For a HyperArchiver setup, where the Hypertable server is not running on the same node as the archive engine, the source code of the HyperArchiver has to be modified, because important configuration values are hard-coded. Unlike Apache Cassandra, which does not have a single-point of failure, Hypertable has a master server, which, when down, causes the whole cluster to fail. Besides, Cassandra is implemented as pure Java and thus 100 percent platform independent, while Hypertable needs to be compiled for each supported platform. In summary, the Cassandra Archiver is easier to setup and maintain and more reliable than the HyperArchiver, making it the better choice for most scenarios.
This section explains the various column families which are used to store the configuration and samples. If you are not interested in the details, you can simply skip this section and read on at the next chapter. The information in this section is not needed for setting up the Cassandra Archiver.
For row keys which have several parts, the various parts are seperated by a null byte. All row-keys are prepended by a (binary) MD5 hash followed by a null byte in order to make sure that they are evenly distributed across the cluster nodes. The MD5 hash is calculated by appending the constituent byte arrays of the key (without a separating null byte) and then caclulating the MD5 hash of the result byte array.
The engineConfiguration
column family stores
information about archive engines. The engine name, which must be
unique, is used as the row key. Each row has columns with the names
url
and description
storing
the URL and the description of the respective archive engine.
The engineConfigurationToGroups
column family
maps engines to their respective archive groups. The engine name is
used as the row key. A column exists for each group in the archive
engine, using the name of the group as the column name.
The groupConfiguration
column family
stores the configuration for each group. The row key is a combination
of the engine name and the group name. The column
enablingChannel
stores the name of the channel that
enables or disables the group.
The groupConfigurationToChannels
column family
maps archive groups to the channels they contain. The row key is the
same as used for the groupConfiguration
column
family. A column exists for each channel in the archive group, using
the name of the channel as the column name.
The channelConfiguration
column family stores
information about channels. The channel name, which must be
unique, is used as the row key. Each row has columns with the names
engine
, group
,
sampleMode
, samplePeriod
,
sampleDelta
and lastSampleTime
storing the engine and group, each channel is associated with, the
sampling options and the time of the last raw sample that has been
written for the channel.
The channelConfigurationToCompressionLevels
column
family maps channels to their respective compression levels. The
channel name is used as the row key. A column exists for each
compression-level that is configured for the respective channel.
However, the special "raw" compression level always exists, even if
there is no column.
The compressionLevelConfiguration
column family
stores the configuration for each compression level of a channel. The
row key is a combination of the channel name and the compression-level
name. The columns compressionPeriod
,
retentionPeriod
,
lastSavedSampleTime
and
nextSampleTime
store the period between samples
(not for the "raw" compression level), the time after which samples
are deleted, the time-stamp of the latest sample and the time-stamp
of the next sample to be calculated (not for the "raw" compression
level).
The samples
column family
stores the actual samples for the different channels.
The row key is a combination of the compression-level name, the
channel name and the time-stamp. However, the time-stamp is not
included when calculating the MD5 hash.
The columns severity
and status
exist for all rows and store the alarm severity and status of the
sample.
For samples of the type IDoubleValue
,
the column doubleValue
stores the value(s). For
samples that are not in the "raw" compression level, the
valueDoubleMin
and
valueDoubleMax
columns store the minimum and
maximum value in the compression interval.
For samples of the type IEnumValue
the
valueEnum
column stores the value(s) of the sample.
If the names associated with the different enum states are known, they
are stored in the metaDataEnumStates
column.
For samples of the type ILongValue
the
valueLong
column stores the value(s) of the sample.
For samples of the type IStringValue
the valueString
column stores the value(s) of the
sample.
If the sample has meta-data of the type
INumericMetaData
associated with it,
the columns metaDataNumDispLow
,
metaDataNumDispHigh
,
metaDataNumWarnLow
,
metaDataNumWarnHigh
,
metaDataNumAlarmLow
,
metaDataNumAlarmHigh
,
metaDataNumPrecision
and
metaDataNumUnits
store the meta-information for
the sample.
For all samples except the first sample for a given channel and
compression-level the column precedingSampleTime
stores the timestamp of the sample directly preceding the sample.
For setting up a simple test environment for the Cassandra Archiver, four steps are needed. First, Apache Cassandra has to be installed. Second, the Cassandra Archiver Engine and the accompanying tools have to be installed. Third, the keyspace used by the Cassandra Archiver has to be setup and an initial archiver engine configuration has to be imported. Finally, the Cassandra Archiver Reader has to be installed in Control System Studio.
All the steps needed to install and configure the Cassandra Archiver are described in Chapter 4, Installation and Chapter 5, Configuration. If you are using a simple setup, where the Archive Engine, the Apache Cassandra Server and Control System Studio are all running on the same host, you can simply skip the sections marked as optional in these two chapters.
Table of Contents
This section describes the steps needed for setting up the Apache Cassandra server for use with the Cassandra Archiver.
Important | |
---|---|
This section contains important information about configuration options that must be set for the Cassandra Archiver to work correctly. Thus, you should carefully read this section (in particular the section called “Configuring the Partitioner”), even if you already have a running Cassandra server. |
You can download Apache Cassandra from the
project's website.
You should choose the newest version of the binary download from the
1.0 branch, having a filename like
apache-cassandra-1.0.x-bin.tar.gz
. Apache
Cassandra is implemented in Java, so that the binary download is the
same for all platforms. You need a Java Runtime Environment version 6
or higher in order to run Cassandra.
After downloading the tarball, extract it to some place on your
hard-disk. For the rest of this document, we assume that you unpacked
it to /path/to/cassandra
.
Apache Cassandra stores its configuration in
/path/to/cassandra/conf
. For a simple,
single-node configuration, there are two relevant files:
cassandra.yaml
and
log4j-server.properties
.
Before starting Cassandra, you either have to change the paths where Cassandra stores its data, or you have to create the directories used by default and make sure the user, that is running Cassandra can write to these directories.
There are four directories Cassandra uses to store data. The first
three are configured in cassandra.yaml
. The
option data_file_directories
is set to
/var/lib/cassandra/data
by default and defines
where the actual data from the various column families is saved.
The option saved_caches_directory
defaults to
/var/lib/cassandra/saved_caches
and is used
to store cached data. The third option is the
commitlog_directory
, which defaults to
/var/lib/cassandra/commitlog
. This directory is
used for storing the write-ahead log. If you aim for maximum
performance, you might want to consider storing the commit-log on
a different disk than the data directories. For most setups however,
storing the commit-log on the same disk is fine.
The last directory is configured in
log4j-server.properties
and is used to store
the server log. The option log4j.appender.R.File
defaults to /var/log/cassandra/system.log
.
In contrast to the other options, this option specifies the file and
not a directory.
Important | |
---|---|
An order-preserving partitioner must be used for the Cassandra Archiver. The partioner cannot be changed after data has been stored in the database, therefore you have to change this option before starting Cassandra the first time. |
In cassandra.yaml
the
partitioner
option has to be changed to refer to
org.apache.cassandra.dht.ByteOrderedPartitioner
.
The Cassandra Archiver uses key-range queries for retrieving samples
in a specific time range, so that an ordered partioner must be used.
If you have other applications, which do not use ranged queries, you
should run them on a different Cassandra cluster using the random
partioner. Using applications which are desgined for use with the
random partionier on a cluster with an ordered partioner will lead
to unequal data distribution across the cluster nodes and bad read
and write performance.
Note | |
---|---|
If the Cassandra server, the Cassandra Archiver Engine and the Control System Studio client are all running on the same machine, you can skip this step. |
There are five configuration regarding the network interface used
by the cassandra server. The first three options
(storage_port
,
ssl_storage_port
and
listen_address
) are only relevant for a
multi-node Cassandra cluster and thus outside the scope of this
manual.
The other two options (rpc_address
and
rpc_port
) are relevant if you want to run
Control System Studio or the archive engine on different machines
than the Cassandra server. By default rpc_address
is configured to only listen on the loopback interface. You should
change this to the IP address of the network interface your machine
uses to connect to the rest of the network. If you are sure, your
hostname and IP address configuration is correct (in particular
/etc/hosts
and
/etc/hostname
are configured correctly), you
can also set a blank value, to make Cassandra deterine the right
IP address by itself.
The rpc_port
option needs only to be changed, if
you run two or more Cassandra servers on the same host, or a
different service uses the same port. By default TCP port 9160 is
used for the
Thrift service.
If you change this port number, you also have to adjust the setting
in the archive engine and archive reader configurations.
Note | |
---|---|
Configuring the authentication options is completely optional. By default, Cassandra grants full write-access to all connections without any authentication. If using Cassandra in a production environment, you might want to use authentication for better security however. |
Cassandra's security system divides into two components: authentication and authorization. Authentication is the task of checking credentials provided by a client and assigning a principal. Authentication is the task of checking whether a specific principal may perform a certain operation.
By default Cassandra is distributed with an authenticator which accepts any credentials and an authority which grants any permission to any principal.
The SimpleAuthenticator
and
SimpleAuthority
are part of the Cassandra
source code but are not distributed with the binary distribution.
For your convenience, a JAR file with the compiled versions of the
two classes is distributed with the Cassandra Archiver in the
cassandra-simpleauth
directory.
Copy this JAR to the lib
directory of the
Cassandra installation and add the following two lines to the end
of the cassandra-env.sh
configuration file:
JVM_OPTS="$JVM_OPTS -Dpasswd.properties=$CASSANDRA_CONF/passwd.properties" JVM_OPTS="$JVM_OPTS -Daccess.properties=$CASSANDRA_CONF/access.properties"
Besides adding these system properties, you also have to adjust
the authenticator
and
authority
options in
cassandra.yaml
to refer to
org.apache.cassandra.auth.SimpleAuthenticator
and
org.apache.cassandra.auth.SimpleAuthority
respectively.
You also have to create the configuration files
passwd.properties
and
access.properties
in the
conf
directory of the Cassandra installation.
The passwd.properties
file uses a simple
syntax where the property name is the username and the property
value is the clear-text password for the user. The following
examples defines four users with different passwords:
admin=superSafePassword archive-read=somePassword archive-write=someDifferentPassword archive-config=anotherPassword
The access.properties
uses a syntax, where the
property name represents a privilege and the property value is a
comma-separated list of principals, which are granted that
privilege. The following example assigns four levels of privileges:
The user admin
may perform any operation, the
user archive-read
may read data from the
column-families in the cssArchive
keyspace and
the user archive-write
may write data to the
column-families samples
,
channelConfigurations
and
compressionLevelConfigurations
in the
cssArchive
keyspace, and the user
archive-config
may write data to any column
family in the cssArchive
keyspace:
<modify-keyspaces>=admin cssArchive.<ro>=archive-read,archive-write,archive-config cssArchive.<rw>=admin cssArchive.engineConfiguration.<ro>=archive-read,archive-write cssArchive.engineConfiguration.<rw>=archive-config,admin cssArchive.engineConfigurationToGroups.<ro>=archive-read,archive-write cssArchive.engineConfigurationToGroups.<rw>=archive-config,admin cssArchive.groupConfiguration.<ro>=archive-read,archive-write cssArchive.groupConfiguration.<rw>=archive-config,admin cssArchive.groupConfigurationToChannels.<ro>=archive-read,archive-write cssArchive.groupConfigurationToChannels.<rw>=archive-config,admin cssArchive.channelConfiguration.<ro>=archive-read cssArchive.channelConfiguration.<rw>=archive-config,archive-write,admin cssArchive.channelConfigurationToCompressionLevels.<ro>=archive-read,archive-write cssArchive.channelConfigurationToCompressionLevels.<rw>=archive-config,admin cssArchive.compressionLevelConfiguration.<ro>=archive-read cssArchive.compressionLevelConfiguration.<rw>=archive-config,archive-write,admin cssArchive.samples.<ro>=archive-read cssArchive.samples.<rw>=archive-config,archive-write,admin
The Cassandra server can be started using the script
/path/to/cassandra/bin/cassandra.
You can use the -f
flag to start Cassandra
in foreground (recommended when testing Cassandra the first time).
In order to use the Cassandra Archiver, you first have to create the
keyspace and the column families used by the the archiver.
You can do this by starting
/path/to/cassandra/bin/cassandra-cli -h <hostname or IP address of your Cassandra server>.
If you enabled authentication for your Cassandra server, you have to
specify additional parameters. Call
cassandra-cli -h
for getting a list of all
supported command-line parameters.
Once you successfully started the Cassandra CLI and it is connected to the Cassandra server, you can execute the following commands to create the keyspace and the column families for the Cassandra Archiver.
CREATE KEYSPACE cssArchive; USE cssArchive; CREATE COLUMN FAMILY engineConfiguration; CREATE COLUMN FAMILY engineConfigurationToGroups; CREATE COLUMN FAMILY groupConfiguration; CREATE COLUMN FAMILY groupConfigurationToChannels; CREATE COLUMN FAMILY channelConfiguration; CREATE COLUMN FAMILY channelConfigurationToCompressionLevels; CREATE COLUMN FAMILY compressionLevelConfiguration; CREATE COLUMN FAMILY samples WITH compression_options = { sstable_compression: DeflateCompressor, chunk_length_kb: 256 };
Instead of cssArchive
you can use a different name
for the keyspace. However, you will have to configure the keyspace
name for the tools using the Cassandra server, if you do not use the
default keyspace name. The column-family names are fixed and cannot
be changed.
You can change the chunk_length_kb
option for the
samples
column family. Choosing the right chunk
length is a trade-off between the optimal compression ratio and the
best performance for random reads. Using a value of 256 kilobytes
should be okay for most environments, because on one hand random
reads of samples are rare, so there is no significant benefit from
using a smaller chunk size. On the other hand, for the kind of data
typically stored in the samples column family, increasing the chunk
size will not improve the compression ratio significantly.
After downloading the binary distribution from the Cassandra Archiver website you should unpack the archive. The archive contains four directories:
archive-engine
archive-cleanup-tool
archive-config-tool
css-plugins
While the programs in the first three directories can be used as-is,
the files in the css-plugins
directory has to
be copied to the plugins
directory of your
Control System Studio installation. The plugins have been developed
for version 3.0.2 of CSS, so they might not work with other versions.
If the Cassandra server is not running on the same host as the archive engine, you have configured Cassandra to listen on a different port than the default port, or you enabled authentication, you have to create a plug-in customization file.
While the archive config-tool and the archive cleanup-tool can also be configured using command-line paramters, the use of a plug-in customization file is mandatory for the archive engine. For Control System Studio, no plug-in customization file is needed, because all options can be set in the archive URL.
The plug-in customization file is usually called
plugin_customization.ini
and placed in the root
directory of the software it is used for. Here is an example
of a plug-in customization file specifying the relevant options
for the Cassandra Archiver:
; Comma-Separated List of Cassandra Servers. ; You can specify only one server, but if you have a cluster ; with several nodes, you want to list more here for fail-over. com.aquenos.csstudio.archive.cassandra/hosts=first-host.example.com,second-host.example.com ; Thrift Port for the Cassandra Server(s). com.aquenos.csstudio.archive.cassandra/port=9160 ; Cassandra Keyspace Name. com.aquenos.csstudio.archive.cassandra/keyspace=cssArchive ; Cassandra Username com.aquenos.csstudio.archive.cassandra/username=myCassandraWriteUser ; Cassandra Password com.aquenos.csstudio.archive.cassandra/password=myPassword ; Number of Compressor Worker Threads com.aquenos.csstudio.archive.writer.cassandra/numCompressorWorkers=1
The hosts
property has to be specified, if the
Cassandra server is not running on the same host as the archive engine
or if you use a multi-node Cassandra cluster.
The port
property has to be specified, if you do
not use the default Thrift port.
The keyspace
property has to be specified, if you
are using a different keyspace name than
cssArchive
.
The username
and password
properties have to be specified, if you enabled authentication for
the Cassandra server.
The numCompressorWorkers
property (note the
different bundle name) specifies how many thread run in parallel to
perform the sample compression and deletion (see
Section 5.2, “Compression Levels”). The default
setting is 1
. This number can be increased if the
compression process does not catch up with the generation of new data
(usually because the same archive engine is handling a lot of
channels). If this number is set to zero, the compression process is
disabled. This means that no data for compression levels is generated
and old samples are not deleted. This option was introduced in version
1.2.0. In earlier versions there always is exactly one compressor
thread.
In order to tell a program to use the
plugin_customization.ini
you can use the
command-line parameter
-pluginCustomization plugin_customization.ini
.
Note | |
---|---|
A configuration has to be loaded into the database before the archive engine can be started. Refer to Section 5.3, “Loading the Configuration” for details about how to load a configuration. |
The archive engine can be started by changing to the directory where
it is installed (usually archive-engine
) and
executing ArchiveEngine.sh
.
You will have to specify a few parameters, e.g.
./ArchiveEngine.sh -engine MyEngineName -data workspace.
Two instances of the archive engine can not share the same engine name
or workspace, so make sure the parameter values are unique within your
cluster.
Call ./ArchiveEngine.sh -help for a full
list of available command-line options. If the Cassandra database is
not running on the same host as the archive engine, you are using
a non-default keyspace name, or you enabled authentication, you can
specify a plug-in customization file using the
-pluginCustomization
parameter. See
Section 4.2.2, “Configuration” for details
on how to define plugin customization options.
Follow the instructions in Section 4.2.1, “Download and Installation” for installing the plugins needed to integrate the Cassandra Archive Reader into the data browser in Control System Studio.
The Cassandra Archive Reader is configured the same way as the other archive readers in Control System Studio:
In Control System Studio, go to Archive Data Server URLs.
→ . This will open the preferences window. In the tree to the left select → → . Now you can add the URL of the Cassandra database to the list
The URLs supported by the Cassandra Archive Reader have the format
cassandra://<hosts>:<port>/<keyspace>?username=<username>&password=<password>
.
In EBNF the syntax is:
|
The symbols used but not defined here, are defined in RFC 3986.
For a multi-node Cassandra setup, the list of hosts should include all hosts which export the service via Thrift. In this case the client can try all available hosts and continue operation if some of the hosts are down. The port specified here must be the same as the Thrift port specified in the Cassandra configuration (see the section called “Configuring the Network Interface”). This port must be the same for all nodes in the cluster.
Table of Contents
Basically, the configuration format used by the Cassandra Archiver is the same that is used by the RDB Archiver. However, the syntax is extended by a new tag used to configure compression levels.
For explaining the syntax of the configuration file, we use a simple example:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <engineconfig> <group> <name>firstGroup</name> <channel> <name>firstChannel</name> <period>0.5</period> <monitor/> <compression-level name="raw" retention-period="86400"/> <compression-level name="30s" compression-period="30"/> <compression-level name="5m" compression-period="300"/> </channel> <channel> <enable/> <name>secondChannel</name> <period>1</period> <scan/> </channel> </group> <group> <name>anotherGroup</name> <channel> <name>someOtherChannel</name> <period>10</period> <scan/> <compression-level name="30s" compression-period="30"/> <compression-level name="5m" compression-period="300"/> </channel> </group> </engineconfig>
Every engine configuration is enclosed by the engineconfig
tag. Within the engineconfig
there must be at least one
group
. Each group must have a name
. The group name
must be unique within the engine configuration.
Within a group, there can be an arbitrary number of channel
tags. Each channel
must specify a name
. The
channel name must be unique across all engine configurations.
A channel
must also specify a period
and either
the scan
or monitor
mode.
In scan
mode,
the period
specifies the interval (as a floating point number
in seconds) between the snapshots taken
from the channel. If the channel has not changed since the last snapshot
the new snapshot is discarded.
In monitor
mode, every change
received for the channel is saved. In this case, period
specifies the expected change rate. This is used to allocate the queue,
which stores new samples, before they are written to the database. If
the specified period is too long and the actual change rate is higher,
samples might be lost, because the queue fills up. If the specified
period is much shorter than the actual change period, more memory than
needed is allocated for the channel. As computer memory is rather
cheap today, you should rather choose this value too small than too big.
The compression-level
tag is optional and its meaning is
discussed in the
next section.
Unlike the RDB Archiver, the Cassandra Archiver does not perform compression of samples for each read request, but stores the compressed samples instead. This has the advantage, that for queries requesting samples for a long period, less data has to be read and thus the query can be answered more quickly.
The compression levels are independently configured for each channel.
If no compression levels are configured, only raw samples are saved and
they are never deleted. Each compression-level
tag must
have a name
attribute. The
compression-level name must be unique within the channel configuration.
The special name raw
is reserved for the raw samples,
which are not calculated but represent the samples received from the
channel.
Each compression-level
except the raw
level must specify a compression-period
attribute. This interval (an integer number of seconds) specifies the
time between two compressed samples. If two consecutive samples have
the same (average) value as well as the same minimum and maximum bounds,
the seconds sample is not saved. All compressed samples are aligned to
January 1st, 1970, 00:00:00 UTC. This way, the compressed samples from
two different channels but using the same compression period are aligned
with respect to each other. The
compression-period
attribute is not valid
for the special raw
compression level.
The retention-period
attribute is optional
for the compression-level
tag. If a positive retention period
(in integer seconds) is defined, samples that are older than the newest
sample minus the specified period are deleted. The
retention-period
attribute is also valid
for the special raw
compression level.
Important | |
---|---|
When specifying a retention period, you have to make sure that all compressed samples have been calculated before the samples needed for this calculation are deleted. Compressed samples are usually calculated from the compression level with the next shorter compression period, that is an even integer fraction of the compression period of the level to be calculated. However, if no such compression level exists, the raw samples are used however. As a rule of thumb, the retention period for any compression level should be at least double the largest compression period for the same channel. |
For loading or updating an engine configuration, you have to use the
archive config tool, which is distributed in the
archive-config-tool
directory of the binary
distribution. For importing an engine configuration file, you can call
./ArchiveConfigTool.sh -engine myEngineName -config myEngineConfig.xml -import.
If you want to replace the configuration of an existing engine, you have
to add the -replace_engine
parameter. Replacing
an engine configuration will first delete the existing configuration and
than import the new configuration. Thus, it is equivalent to first
using the -delete_config
parameter and then importing
the configuration with the -replace_engine
parameter.
Deleting an engine configuration will never delete the samples
associated with the engine's channels. However, if a channel does not
exist in the configuration, there is no way to retrieve the samples
using the archive reader. Therefore, instead of completely deleting
channels, you should move them to a disabled group, if you want to be
able to retrieve historic data. If you finally want to delete samples
for deleted channels, you have to use the
clean-up tool.
If the default connection parameters (Cassandra host is
localhost
, port is 9160
,
keyspace name is cssArchive
and no authentication is
used) are not correct for your setup, you either have to specify the
connection parameters as command-line parameters, or you have to
specify a
plug-in customization file.
Call ./ArchiveConfigTool -help for a list of all
supported command-line parameters.
If you want to delete the samples for non-existing channels or want to clean-up small inconsistencies, which can occur if a write operation is interrupted, you can use the clean-up tool.
The clean-up tool is distributed in the
archive-cleanup-tool
directory of the binary
distribution. You can start it by invoking
./ArchiveCleanUpTool.sh. If you are using non-default
connection parameters, the same considerations as for the
archive config tool apply.
Important | |
---|---|
The run of the clean-up tool can take a very long time. During this time you should not use the archive config tool, because new configurations added by the config tool and the respective samples might be deleted by the clean-up tool. However, the archive engine can run while the clean-up process is running. |