Skip to content

Operational Concerns

Richard Warburton edited this page Feb 7, 2019 · 5 revisions

Monitoring and Statistical Counters

In order to monitor the progress of Artio in production it's helpful to be aware of a number of counters that are exported from the Artio system. These are exposed via Aeron's well documented counter mechanism. Generally these counters measure failure events. Each class of counter has a type ID associated with it and documented below:

Engine-wide Counters

Failed Inbound - 10000 - Failed to successfully offer a message on the inbound Aeron stream - ie going from the FixEngine to a FixLibrary. Rapid increases in this counter indicates FixLibrary instances backpressuring the FixEngine.

Failed Outbound - 10001 - Failed to successfully offer a message on the outbound Aeron stream - ie going from a FixLibrary to the FixEngine. Rapid increases in this counter indicates the FixEngine backpressuring FixLibrary instances.

Failed Replay - 10002 - Failed to successfully offer a message on the replay Aeron stream - ie going from the Replayer inside the FixEngine to the Framer. Rapid increases in this counter indicates Framer instances backpressuring the Replayer - or possibly a counter-party requesting loads of replay operations.

Once per FIX connection Counters

Messages Read - 10003 - the number of fully framed FIX messages read off of the TCP connection. This excludes the messages with an invalid checksum value.

Bytes In Buffer - 10004 - the number of bytes that have been queued up ready to be sent, but not sent yet. This number should be 0 for any FIX session that isn't a Slow Consumer.

Invalid Library Attempts - 10005 - the number of messages, attempted to be sent, that have been ignored because the wrong library sent them. This indicates that a library believes it owns a Session when it actually doesn't.

Sent message sequence number - 10006 - this is the last sent msgSeqNum for a given FIX session.

Received message sequence number - 10007 - this is the last received msgSeqNum for a given FIX session.

Directory structure

Artio stores persistent state using both the Aeron Archiver and its own files.

The Aeron Archiver is used for a persistent log of messages that are exchanged between FixEngine and FixLibrary. Please refer to the Aeron Archiver documentation for how to read these files.

All of Artio's persistent state is held by the FixEngine and stored in files in a directory configured set by EngineConfiguration.logFileDir().

Session Information

FIX Sessions are assigned surrogate session ids given their unique company ids. The mapping between these surrogate session ids and company ids is stored in session_id_buffer. This state must persist between restarts if you want to use sequence numbers that persist over restarts.

Sequence Numbers

Artio's Sequence Number Index persists the mapping between Session Ids and the last associated sequence number for that session. These are stored in two files called: sequence_numbers_received and sequence_numbers_sent.

Replay Positions

Artio keeps track of a mapping between the Aeron stream position and the FIX message sequence numbers so that it can replay messages correctly. Files of the form replay-index-- record that mapping. Here the stream-id is the Aeron Stream Id and the fixSessionId is the surrogate key assigned to each FIX session. The last position at which replays were index up to is record in the replay-positions-. These files only need to be persisted over restarts if persistent sequence numbers are used or if a catchup replay from a previous sequence index is to be requested.

Clone this wiki locally