-
Notifications
You must be signed in to change notification settings - Fork 122
Operational Concerns
In order to monitor the progress of Artio in production it's helpful to be aware of a number of counters that are exported from the Artio system. These are exposed via Aeron's well documented counter mechanism. Generally these counters measure failure events. Each class of counter has a type ID associated with it and documented below:
Failed Inbound - 10000
- Failed to successfully offer a message on the inbound Aeron stream - ie going from the FixEngine
to a FixLibrary
. Rapid increases in this counter indicates FixLibrary
instances backpressuring the FixEngine
.
Failed Outbound - 10001
- Failed to successfully offer a message on the outbound Aeron stream - ie going from a FixLibrary
to the FixEngine
. Rapid increases in this counter indicates the FixEngine
backpressuring FixLibrary
instances.
Failed Replay - 10002
- Failed to successfully offer a message on the replay Aeron stream - ie going from the Replayer
inside the FixEngine
to the Framer
. Rapid increases in this counter indicates Framer
instances backpressuring the Replayer
- or possibly a counter-party requesting loads of replay operations.
Messages Read - 10003
- the number of fully framed FIX messages read off of the TCP connection. This excludes the messages with an invalid checksum value.
Bytes In Buffer - 10004
- the number of bytes that have been queued up ready to be sent, but not sent yet. This number should be 0 for any FIX session that isn't a Slow Consumer.
Invalid Library Attempts - 10005
- the number of messages, attempted to be sent, that have been ignored because the wrong library sent them. This indicates that a library believes it owns a Session when it actually doesn't.
Sent message sequence number - 10006
- this is the last sent msgSeqNum for a given FIX session.
Received message sequence number - 10007
- this is the last received msgSeqNum for a given FIX session.
Artio stores persistent state using both the Aeron Archiver and its own files.
The Aeron Archiver is used for a persistent log of messages that are exchanged between FixEngine
and FixLibrary
. Please refer to the Aeron Archiver documentation for how to read these files.
All of Artio's persistent state is held by the FixEngine
and stored in files in a directory configured set by EngineConfiguration.logFileDir()
.
FIX Sessions are assigned surrogate session ids given their unique company ids. The mapping between these surrogate session ids and company ids is stored in session_id_buffer
. This state must persist between restarts if you want to use sequence numbers that persist over restarts.
Artio's Sequence Number Index persists the mapping between Session Ids and the last associated sequence number for that session. These are stored in two files called: sequence_numbers_received
and sequence_numbers_sent
.
Artio keeps track of a mapping between the Aeron stream position and the FIX message sequence numbers so that it can replay messages correctly. Files of the form replay-index-- record that mapping. Here the stream-id is the Aeron Stream Id and the fixSessionId is the surrogate key assigned to each FIX session. The last position at which replays were index up to is record in the replay-positions-. These files only need to be persisted over restarts if persistent sequence numbers are used or if a catchup replay from a previous sequence index is to be requested.