NeTEx is a European standard for exchanging Transit data. OTP can import NeTEx into its internal model. The XML parser support the entire NeTEx specification and is not limited to a specific profile, but not every part of it is mapped into OTP. Only a small subset of the entities are supported. When loading NeTEx data OTP should print warnings for all NeTEx data types not loaded.
OTP is tested with data from Entur which uses the Nordic NeTEx profile and Data from HVV wich uses the EPIP NeTEx Profile. If you find that some part of your import is not imported/supported by OTP you will need to add support for it in this model. NeTEx is huge, and ONLY data relevant for travel planning should be imported.
OTP assume the data is valid and as a main rule the data is not washed or improved inside OTP. Poor data quality should be fixed BEFORE loading the data into OTP. OTP will try to ignores invalid data, allowing the rest to be imported.
- Import Transit data from NeTEx xml-files
- Handle large input file sets (10 GB)
- Allow some data to be shared and group other data together is an isolated scope
- Support for reading data fast, multi-threaded (the design support this, but not implemented jet)
- Warn or report issues on poor data, but keep building a graph so one "bad" line do not block the entire import.
- The import should put any restrictions on the order of XML types in the files. If ServiceJourney comes before Authority in the xml file - that should be ok. The file-hierarchy is an optional way to group and scope data.
The 2 main classes are the NetexModule
and
the NetexBundle
. The NetexModule
is a GraphBuilderModule
and responsible
for building all bundles, while a bundle is responsible for importing a Netex bundle, normally a
zip-file with a Netex data set. You may start OTP with as many bundles as you like, and you may mix
GTFS and NeTEx bundles in the same build.
The Netex files are xml-files and one data set can be more than 5 GB in size. There is no fixed
relationship between file names and content like it is in GTFS, where for example stops.txt
contains all stops. Instead, OTP import Netex data based one a file hierarchy.
As seen above the netex-file-bundle is organized in a hierarchy. This is done to support loading large data set, and to avoid keeping XML DOM entities in memory. Also, the hierarchy prevent references from different files at the same level to reference each other. The hierarchy allow OTP to go through the steps of parsing xml data into Netex POJOs, validating the relationships and mapping these POJOs into OTPs internal data model for each set/group of files.
The general rule is that entities referencing other entities, should be in the same file or placed at a lover level in the hierarchy, so the referenced object already exist when mapping an entity. There are exception to this. For example trip-to-trip interchanges.
The shared data si available during the entire mapping process. Then group data is kept in memory for the duration of parsing and mapping each group. Data in one group is not visible to another group.
Within each group there is also shared-group-data and group-files (leaf-files).
- Entities in group-files can reference other entities in the same file and entities in the shared-group-files and in the global shared-files, but not entities in other group-files.
- Entities in shared-group-files can reference other entities in the same file and entities in the same group of shared-group-files and in the global shared-files, but not entities in any group-files.
- Entities in global shared-files can reference other entities in the same file and entities in other global shared-files.
✅ Note! You can configure how your data files are grouped into the 3 levels above using regular expressions in the build-config.json.
For each level in the hierarchy and each group of files OTP perform the same steps:
- Load XML entities (NeTEx XML DOM POJOs).
See
NetexDataSourceHierarchy
- Parse xml file and insert XML POJOs into the index.
See
NetexXmlParser
- Validate relationships. See
Validator
- Map XML entities to OPT internal model. See
NetexMapper
OTP load entities into a hierarchical NetexEntityDataIndex
before validating and mapping each entity. Entities may appear in any order in the xml-files. So,
doing the validation in a separate step ensure all entities is available when doing the validation.
If an entity or a required relation is missing the validator should remove the invalid entity. This
make the mapping easier, because the mapper can assume all required data and entities exist.
Here is an outline of the process including the file-hierarchy traversal and the steps at each level:
- Load shared-data-files into index.
- Validate loaded entities
- Map shared-data-entries
- For each group:
- Load group-shared-files into index
- Validate loaded entities
- Map group-shared-entries
- For each leaf group-file file:
- Load group-file into index
- Validate loaded entities
- Map group-entries
- Clear leaf data from index
- Remove group data from index
The NetexBundele
repeat the exact same steps for each group/set of files. To
emulate navigation in the hierarchy both the NetexEntityDataIndex
and the NetexMapper
persist data in a "Stack" like structure. The
NetexBundle
call the push()
and pop()
on the index and the mapper to enter and exit each file
set at a given level. Entities loaded at a given level is in the local scope, while entities loaded
at a higher level is in the global scope. The index has methods to access both local and global
scoped entities, but it is only possible to add entities at the local scope.