Consists of multiple parts:
- Extracts specified Infobox definitions from the XML dump file and saves them as separate files for any further processing.
- Converts Infobox text files to corresponding CSV files.
- Reports.scala - sample of Spark SQL queries to analyze CSV file with specific Infoboxes