====== Cache management utility ====== {{indexmenu_n>4}} CCU - Cache Control Utility CCU is designed to load data into the cache, manage its content and collect statistics. The utility supports a flexible configuration which includes all the specified parameters. ======= CCU configuration ======= The main configuration file is the //__/etc/ccu/ccu.conf__//. The configuration in this file is specified using the [[https://ru.wikipedia.org/wiki/YAML|YAML]] format. ===== YAML format ===== The basic elements of the [[https://ru.wikipedia.org/wiki/YAML|YAML]] format are: * "Key-Value" pair, for example:\\ ''default: offpeak''\\ * value sequence, for example:\\ ''- "00:00:00 - 01:00:00"''\\ ''- "17:00:00 - 23:59:59"'' It is possible to create complex data structures based on the basic elements. One of the principles describing complex structures laid down in the [[https://ru.wikipedia.org/wiki/YAML|YAML]] format is the formatting nested elements rule using indents from the beginning of the line. Elements located on the same hierarchy level must have the same number of leading spaces, for example, the following text: # # Time classes time_classes: # Defines the classes of time default: offpeak # Specifies the default value for time classes # All others items define the names of time classes and rules for them definition # Each rule is started from the name of day category and contains # the list of ranges of times in form "HH:MI:SS-HH:MI:SS" peak: workdays: - "00:00:00 - 01:00:00" - "17:00:00 - 23:59:59" weekend_eve: - "00:00:00 - 02:00:00" - "16:00:00 - 23:59:59" weekend: - "00:00:00 - 02:00:00" - "09:00:00 - 23:59:59" holidays: - "00:00:00 - 02:00:00" - "09:00:00 - 23:59:59" can be treated as the ''time_classes'' data structure having the following fields: ''default'' and ''peak''. The ''peak'' field in turn is also a structure with the ''workdays'', ''weekend_eve'', ''weekend''and ''holidays'' fields which are the named string series. Given the above, the main rule when editing [[https://ru.wikipedia.org/wiki/YAML|YAML]] format documents is **__using a certain number of whitespaces to form indents instead of using tabs__**. ===== Conventions to specify the parameters ===== * To specify the name of a particular field in a complex nested structure, you should use the full field name, which is an enumeration of all field names separated by a slash, for example: ''time_classes/peak/workdays'' * To specify the name of user-defined element or the name of element that refers to another element, you should use the symbolic element definition specified between the signs less and more, for example: ''time_classes/peak/////'' * Unless otherwise specified, the full name is specified to be relative to the entire configuration file, for example: ''time_classes/peak/workdays'', or when describing the full name, you can specify its base, for example: the time classes description (''time_classes'') are specified by the parameters in the ''/////////''form that contain a listing of time intervals ===== Parameters ===== ==== Working directories ==== The following two parameters specify the directories where files with running processes identifiers will be stored -''pid_files_path'' and working files of the cache management utility - ''work_files_path''. Among the working files, in particular, are the files that store the cache usage statistics. # # PID files path, default: /var/run/ccu pid_files_path: /var/run/ccu # # Work files directory, default: /var/lib/ccu work_files_path: /var/lib/ccu ==== Events ==== When the certain events occur the ** ccu ** can perform actions specified by the user. Actions to be executed should be specified to meet the rules of the command line interpreter.\\ Only one event currently is supported: creation of a text file containing information about objects stored in caches. To set the response to the event, you should define the ''events/on_after_enumeration_creation'' parameter. A command to create a binary cache contents representation in the event configuration template is specified. # # Events events: # Command which should be executed after caches' files enumeration on_after_enumeration_creation: "cut -d' ' -f2 /var/cache/ccu/data/enumerated.cs | url2dic /var/cache/ccu/data/enumerated.bin" ==== Logging ==== Logging options include the path to the log files, as well as the message levels to be written to the file. The most detailed journaling level is the //''debug''// level, i.e. when it is enabled all possible messages will be written to the file. When the //''error''// journaling level is specified, only errors will be written to the file.\\ The logging levels are specified separately for each **ccu** command. # # Logging parameters logging: path: /var/log/ccu levels: # Enabled log levels for particular command # Possible levels are: # error, warning, info, diagnostic, debug # default logging level is "info" load: "info" purge: "info" remove: "warning" online: "info" monitor: "info" ==== Statistics collection ==== The fields of the ''statistics/collectors/////'' parameter specify the node address and the port the data about the cache usage will be sent. In the future, the name of the statistical collector can be used when [[#Statistics|describing the CACHEs]]. # # Statistics parameters statistics: collectors: # list of collectors' descriptions local: # name of collector description host: localhost # listening address port: 1600 # listening port ==== Day categories and time classes ==== Day classes and time classes are used to specify the restrictions for data loading to the cache. In order to specify the category of the day, you should create the ''day_categories/////'' parameter, whose value is the list of days that fall into the created day category. The day list element can be both the name of the day of the week, as well as a partially or fully defined date. When determining the category of a day for a specific date, the more particular definitions are examined in the first place, after that the more general ones are examined. # # Day categories day_categories: # Defines day categories which could be used in time class definitions # Items under "day_categories" define names of categories; # To define the day categorie it is necessary to specify list of values, # value can be one of following: # - week day # - partially or fully defined date in form DD.MM.YYYY # for example: # "01" - the first day of each month # "07.01" - the 7th of January of each year # "09.05.2015" - 2015, the 9th of May workdays: [Mon, Tue, Wed, Thu] weekend_eve: [Fri] weekend: [Sat, Sun] holidays: - "01.01" - "07.01" - "23.02" - "08.03" - "01.05" - "09.05" - "12.06" - "04.11" Time classes divide the day of a certain category into time intervals. To specify the time class, you should create the ''time_classes/////'' parameter, as well as a parameters set for a time class that define the day category. Its values are represented by a list of time intervals. The default ''time_classes/default'' time class parameter is specified in a special way. If the time class can not be determined using the specified parameters, then the default time class will be used instead.\\ \\ In the example below, the ''peak'' time class is specified and the time intervals for each day category considered to be peaking are defined. If no rule specifying the peak time class can be used, then the default time ''offpeak'' class will be used. # # Time classes time_classes: # Defines the classes of time default: offpeak # Specifies the default value for time classes # All others items define the names of time classes and rules for them definition # Each rule is started from the name of day categorie and contains # the list of ranges of times in form "HH:MI:SS-HH:MI:SS" peak: workdays: - "00:00:00 - 01:00:00" - "17:00:00 - 23:59:59" weekend_eve: - "00:00:00 - 02:00:00" - "16:00:00 - 23:59:59" weekend: - "00:00:00 - 02:00:00" - "09:00:00 - 23:59:59" holidays: - "00:00:00 - 02:00:00" - "09:00:00 - 23:59:59" ==== Jobs ==== The job parameters are specified as nested ''jobs'' parameter structures. There are the following tasks: * ''jobs/monitor'' - monitoring of the network interfaces and data cache usage; * ''jobs/load'' - data cache loading/validating; * ''jobs/scan'' - auxiliary task designed for data analysis [[#Storage|Storages]]; Further, when describing, the full parameter name will be based on the ''jobs'' parameter . === Monitoring === For the monitoring job, you should specify the incoming connection parameters (''monitor/listener/host'' and ''monitor/listener/port''), which will be used to receive messages from other processes. If one computer is used for all processes, then these parameters have to match the parameters defined for [[#Statistics gathering|statistics collectors description]].\\ \\ In other words, the ''monitor/listener'' parameter specifies the server side of the IP connection, and the parameters specified within the [[#Statistics gathering|statistics collectors description]] determine the client side one.\\ \\ The monitoring job, in addition to gathering statistics of the processes operating with the cache, also gathers system information on network interfaces. In order to limit the list of network interfaces used to gather statistics, it is necessary to specify them in a list in the ''monitor/network_interfaces'' parameter. monitor: listener: # listener collects information from different processes about # incoming/outgoing traffic host: localhost port: 1600 network_interfaces: # list of network interfaces which is used for getting # incoming/outgoing traffic statistics # if list is empty, all interfaces except "lo" will be used - === Load === Load parameters are divided into generic ones, which are specified as fields of the ''load'' parameter; used in offline-loading (''load/offline''); used in online-loading (''load/online''); Generic load parameters: # # Load jobs parameters load: ip_binding: # optional, list of local IP-adresses which could be used # to binding while loading - ignored_clients: # list of CIDRs for source IP addresses which should be ignored; # list could be defined internally into configuration file as list of CIDRs # under key "cidr_list" or into external files under key "cidr_files"; # if list is defined externally in file, each line that file must contain # only one CIDR cidr_list: - cidr_files: - rate_limits: # defines rate limit for data loading depending on time class; # value can be one of following: # - "unlimited" # - number of bytes which defines amount of loaded data per second offpeak: unlimited peak: 1m # ... skipping ... The ''load/ip_binding'' parameter specifies the IP address list to be used when loading the data, the listed addresses are used evenly.\\ \\ The ''load/ignored_clients'' parameter allows to specify the list of CIDR to be ignored, the requests being sent from these CIDR should not be taken into account when loading data. CIDRs can be specified both in the configuration file (the ''load/ignored_clients/cidr_list'' list), and in the external files (the ''load/ignored_clients/cidr_files'' list).\\ \\ The ''load/rate_limits'' parameter defines the loading data speed limitations based on the [[#Day categories and time classes|time classes]]. To set a limit on the speed of loading data at a certain time, you should to create the ''load / rate_limits ///