-
Notifications
You must be signed in to change notification settings - Fork 1
Normalization
The normalization is the first step of the structured data import. At the end of the normalization we will have a the data from the new structured data source transformed to the subject scheme and saved to a temporary table.
First of all, the data has to be imported to the database. For that, a new Import-Job has to be added to the dataimport
package. The Import-Job has to extract the data from a resource file, transform the data to a data source dependent entity type and finally filter the entities for entities which represent german businesses.
After the data of the data source is saved to the database, they have to be transformed to the subject
scheme. To do so, another DataLakeImport-Job has to be implemented. The implementation should inherit from the DataLakeImportImplementation
class and override the necessary methods. This transformation includes the normalization of the data source original attributes to the database uniform attributes. The normalized attributes can be found in here.
The resulting subjects are then saved to a temporary table.
Next step Duplicate Detection