Skip to content

Overview

Mario Martin Basa edited this page Jan 24, 2024 · 7 revisions

Overview of Excavator-J

The Excavator-J project is an extension of the original Excavator Project which tries to get and catalog Personal Information gathered from a user by various Social Media Applications knowingly or sometimes unknowingly. And just like the Excavator, the Excavator-J will traverse a specified Takeout directory, looking for specific JSON and CSV data which it converts into Tabular Data in a SQLite database. Since SQLite is generally used, the tabular data created from the Takeout can be used is a wide variety of applications.

The Excavator-J though deals solely with the most recent Google Takeout Location History and the Takeout's updated JSON data format. Also, it not only saves Point data from the Location History, but now creates Linestrings from the waypoints of the Activity Segment of the Semantic information. With this, Lines as well as Points can be viewed in a mapping application such as QGIS.

Java

The original Excavator project was written in the Rust Programming Language to create a very robust and very performant application. The downside of this though is that it is necessary to compile the application in every OS and in every computer architecture to produce a native binary (i.e. compile for an Intel cpu based Mac and a M1 cpu Mac). As such, it was decided to re-write the application into the Java programming Language because once a Java Archive (JAR) file containing the application has been created, it will be able to run in any OS and architecture that has a Java Runtime Environment (JRE) installed.

SQLite

SQLite is a database engine that is small, fast, self-contained, high-reliability, and full-featured. SQLite's file format is cross-platform and its database file has become a standard and is commonly used as containers for data that can be transferred between systems and applications. The Excavator project, along with Excavator-J, writes information found in the Takeout directories into SQLite since the information can be of use for many other software applications.

Excavator-J will create and populate the following tables:

  • google_location_history - this is a legacy table for compatibility with previously made Jupyter Notebooks. The data is populated from a combination of the Records.json and the Semantic Data's place_visit data. Note that the timestamp is in millisecond, which is again for backward compatibility.
      Column      |       Type       | Collation | Nullable | Default 
------------------+------------------+-----------+----------+---------
 id               | integer          |           | not null | 
 source           | text             |           |          | 
 activity         | text             |           |          | 
 address          | text             |           |          | 
 place_name       | text             |           |          | 
 timestamp_msec   | bigint           |           | not null | 
 accuracy         | integer          |           |          | 
 verticalaccuracy | integer          |           |          | 
 altitude         | integer          |           |          | 
 lat              | double precision |           | not null | 
 lng              | double precision |           | not null | 
  • google_location_history_updated - populated with data from Records.json of the Location History directory
      Column      |            Type             | Collation | Nullable | Default 
------------------+-----------------------------+-----------+----------+---------
 id               | integer                     |           | not null | 
 origin           | text                        |           |          | 
 platform_type    | text                        |           |          | 
 form_factor      | text                        |           |          | 
 source           | text                        |           |          | 
 activity         | text                        |           |          | 
 timestamp        | timestamp without time zone |           | not null | 
 accuracy         | integer                     |           |          | 
 verticalaccuracy | integer                     |           |          | 
 altitude         | integer                     |           |          | 
 lat              | double precision            |           | not null | 
 lng              | double precision            |           | not null | 

  • google_location_placevisit - populated from the different files found in the sub-directories of the Semantic Location History directory.
          Column          |            Type             | Collation | Nullable | Default 
--------------------------+-----------------------------+-----------+----------+---------
 id                       | integer                     |           | not null | 
 address                  | text                        |           |          | 
 name                     | text                        |           |          | 
 lat                      | double precision            |           | not null | 
 lng                      | double precision            |           | not null | 
 start_timestamp          | timestamp without time zone |           | not null | 
 end_timestamp            | timestamp without time zone |           | not null | 
 edit_confirmation_status | text                        |           |          | 
 location_confidence      | integer                     |           |          | 
 place_visit_type         | text                        |           |          | 
 place_visit_importance   | text                        |           |          | 
  • google_location_activity - populated from the different files found in the sub-directories of the Semantic Location History directory. This table will contain the Points of the activitySegment data that includes the start and end points as well as the waypointPath.
     Column      |            Type             | Collation | Nullable | Default 
-----------------+-----------------------------+-----------+----------+---------
 id              | integer                     |           | not null | 
 start_timestamp | timestamp without time zone |           |          | 
 end_timestamp   | timestamp without time zone |           |          | 
 lat             | double precision            |           | not null | 
 lng             | double precision            |           | not null | 
  • google_location_activitysegment - populated from the different files found in the sub-directories of the Semantic Location History directory. This table will contain the data of the activitySegment with an added Linestring Well Known Text (WKT) that will contain the waypointPath and is stored in the wkt column.
     Column      |            Type             | Collation | Nullable | Default 
-----------------+-----------------------------+-----------+----------+---------
 id              | integer                     |           | not null | 
 start_timestamp | timestamp without time zone |           | not null | 
 end_timestamp   | timestamp without time zone |           | not null | 
 start_lat       | double precision            |           | not null | 
 start_lng       | double precision            |           | not null | 
 end_lat         | double precision            |           | not null | 
 end_lng         | double precision            |           | not null | 
 distance        | integer                     |           |          | 
 activity_type   | text                        |           |          | 
 confidence      | text                        |           |          | 
 wkt             | text                        |           |          | 

Usage

To use the Excavator-J, download first and extract the Google Takeout directory, which can be gotten (here)[takeout.google.com]. Download next the latest Excavator-J JAR file from this projects Release page. A Java Runtime Environment (JRE) or a Java Development Kit (JDK) has to be installed as pre-requisite.

The Excavator-J can be run in this manner:

java -jar excavator.jar

which will display the list of parameters:

usage: excavator [opts]
 -d,--outputDB <arg>   sqlite3 output file.
 -h,--help             Help
 -i,--inputDir <arg>   Takeout directory

Running the application with these parameters:

java -jar excavator.jar -i ./Takeout -d ichnion.db

will let Excavator-J traverse through the Takeout directory and create and populate tables into the ichnion.db SQLite database file.

Note

If there are any missing information in the Location History data that was not included in the extraction process, do create an Issue in this project page and it will be addressed as soon as possible.

Clone this wiki locally