-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Opta] Ordering of events #267
Comments
Are there any details on how to properly sort on correctly and maintain millisecond precision? A solution could be to extract timestamp from “min” and “sec” attributes but than we lose the precision. |
My documentation doesn't mention the precision of the "timestamp" field. However, my version of the documentation is extremely outdated. Maybe @JanVanHaaren has something more up-to-date. I find it strange that the "timestamp" field does not align with the "min" and "sec" fields. If the precision of the "timestamp" field would be inferior to the "min" and "sec" fields, I don't see why we would infer an (incorrect) millisecond precision from it. |
Looking at a few more timestamps, I now realize that Opta does not add leading zeros to the milliseconds. So, "2018-08-20T21:32:27.98" is actually "2018-08-20T21:32:27.098000". Python's
|
The Documentation Opta F24
Documentation Stats Perform MA3
|
So, to conclude, would it be okay to fill the "timestamp" field in Kloppy with |
That suggestion sounds good to me. The Wyscout V3 deserializer fills the Should we explicitly store a sequence number for each event as well? StatsBomb and Wyscout explicitly provide a sequence number in the |
I would just make sure that the records in a dataset are chronologically ordered. Storing a sequence number then does not provide any added value since you would be able to infer it from the position in the list of records. |
Small question about the timestamp vs min/sec: when the record is not altered afterwards, does the timestamp match the min/sec? |
My understanding is that the I suspect that the |
I will contact the Stats Perform support desk. The official documentation is confusing. Documentation website
|
I haven't heard back yet from Stats Perform, but I think I finally understand how the timestamps work. I suspect the meaning of the For example, the event data for this friendly match between Salzburg and Ried has coverage level 14. The game took place on 12 October 2023, but the
|
The question is rather whether they can be used as a reliable way to measure the relative time that has passed since the "period start" event. |
I don't know yet, but my feeling is that it should be possible for the highest coverage levels. I'll investigate a few more matches. Unfortunately, I don't have access to much event data that was collected at lower coverage levels. |
Opta does not zero-pad milliseconds. Therefore, they were incorrectly parsed by Python's default "%f" format code. See also PySport#267
I noticed that Opta events can sometimes be slightly out of order. The F24 docs specify that the following attributes (in the given order) should be used to order each team's match events chronologically:
Only sorting by timestamp does not always give the same result. For example:
Since the Opta deserializer currently only parses the "timestamp" field, it does not seem possible to order events chronologically.
The text was updated successfully, but these errors were encountered: