Adding new table with Mozilla's urlClassification -- How do I ensure that table is correctly populated #1091
-
I would like to add a table with the corresponding request_id and the firstParty and thirdParty flags under Mozilla's urlClassification I am encountering an issue where the firstParty and thirdParty fields are not being populated during my OpenWPM crawl. Instead, these fields contain empty arrays. I have reviewed my crawl setup and logs, but I am unable to identify the root cause of this problem. I set up OpenWPM environment and ensure all dependencies are installed. I made a new listener:
as well as a new handler
but my desired information is still not shown. Clearly, the fields themsevles (e.g. visit_id) are filled up, but the arrays under firstParty and thirdParty always show up as empty. What may be the issue? Thank you/ |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 7 replies
-
Do you see the correct output in the log? Also you could add something here: OpenWPM/openwpm/storage/storage_controller.py Lines 159 to 160 in 566d03b Like if table_name == "urlclassification":
self.logger.error("Got classification record %r", data) If this still doesn't work, it might make sense to have a look at the generated insert statement OpenWPM/openwpm/storage/sql_provider.py Lines 55 to 56 in 566d03b |
Beta Was this translation helpful? Give feedback.
-
Thanks, the urlClassification table seems to be processed as desired
but I'm not sure why firstParty and thirdParty still show up as empty arrays. I wonder if this could be because Enhanced Tracking is disabled, which doesn't allow first Party and thirdParty arrays to be populated. |
Beta Was this translation helpful? Give feedback.
-
If you go into the log file that is written to disk, you should also see the output of the Extension.
I have no particular insight on this setup, but you ofc can't collect data that isn't provided 😁 |
Beta Was this translation helpful? Give feedback.
-
To whoever faces a similar issue in the future, remember to edit:
|
Beta Was this translation helpful? Give feedback.
To whoever faces a similar issue in the future, remember to edit:
schema,sql
parquet_schema.py
http_instrument.ts
test_values.py