-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting MFP CSV to YAML schedule #111
Conversation
I'll ask the students to add a column with station time. I hope it will become available in later downloads from the MFP website directly as well. It is included online.
|
@ammedd |
I was discussing with @ammedd , and I think the best thing to do would be to use the CSV to populate as many fields as it can in the schedule, and leave the rest up to the user to populate in the YAML. This saves getting the user to add columns to the CSV (which I think would be more error prone). I'll be able to give this a proper review on Friday - just have some other items to clear before then :) |
Okay, I've managed to look through the code. Thanks for picking this up @iuryt ! I definitely see how this will streamline things for users - especially for the waypoints and the bounding box.
Correct, we'll have to leave the dates blank - its just information that is not provided from the MFP export. Though the order of points in the YAML should match the CSV. If the MFP export changes in future, then we can adapt. So yes, it would also be difficult to guess whether stations are the same. Honestly, I think that we should avoid making that guess. Making it clear that Its not optimal, but given MFP->virtualship isn't necessarily 1to1, and given the limited export, I think that its all we can do. I also think this is the clearer approach from a maintenance POV - being explicit with what is supported, and by not making additional assumptions which may not be right - and hands the control to the user to modify the schedule file.
Sounds like a good plan! My vote is for
Let's install Other notes:
Let me know what your thoughts are on all this ^, and if there's anything I can do to help :) |
But I tested it, and the current version is working. # Define maximum depth and buffer for each instrument
instrument_properties = {
"CTD": {"depth": 5000, "buffer": 1},
"DRIFTER": {"depth": 1, "buffer": 5},
"ARGO_FLOAT": {"depth": 2000, "buffer": 5},
} Any comments on this, @erikvansebille ? Let me know what you think. I have not added yet
|
Looks goo, but perhaps explain what |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress! Left some review comments. I think there is some duplication of information between the comments and the code. I recommend in general: Don't Write Comments - rewrite the code (really good YouTube channel in general for code style - highly recommend). The only time I write comments is to explain why something was done. E.g., # Keeping this legacy format for backward compatibility with older datasets
There are many ways to make things self documenting. E.g.,
mfp_to_yaml(
mfp_file, str(path)
) # Pass the path to save in the correct directory
could become
mfp_to_yaml(
mfp_file, save_directory = str(path)
)
.
It's possible to have clean, readable code with 0 duplication, but also 0 comments ;)
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
start_time: datetime | None = None | ||
end_time: datetime | None = None | ||
|
||
@model_validator(mode="after") | ||
def _check_time_range(self) -> Self: | ||
if not self.start_time < self.end_time: | ||
raise ValueError("start_time must be before end_time") | ||
if self.start_time and self.end_time: | ||
if not self.start_time < self.end_time: | ||
raise ValueError("start_time must be before end_time") | ||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VeckoTheGecko @ammedd
While using pydantic, I had to change this to accept nonetype time, but this means that when we do schedule.from_yaml()
for fetch
it will not give any error, right? What should we do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I thought that pydantic would have a way of disabling validation during the initialisation of the object, but looking further at the documentation its looking that that isn't possible ... . Longterm it would be good to have start_time
and end_time
not none, but perhaps thats something for a future PR
for more information, see https://pre-commit.ci
@ammedd |
class Waypoint(BaseModel): | ||
"""A Waypoint to sail to with an optional time and an optional instrument.""" | ||
|
||
location: Location | ||
time: datetime | None = None | ||
instrument: InstrumentType | list[InstrumentType] | None = None | ||
|
||
@field_serializer("instrument") | ||
def serialize_instrument(self, instrument): | ||
"""Ensure InstrumentType is serialized as a string (or list of strings).""" | ||
if isinstance(instrument, list): | ||
return [inst.value for inst in instrument] | ||
return instrument.value if instrument else None | ||
|
||
@field_serializer("time") | ||
def serialize_time(self, time): | ||
"""Ensure datetime is formatted properly in YAML.""" | ||
return time.strftime("%Y-%m-%d %H:%M:%S") if time else None |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see why this was done now - that InstrumentType serialization wasn't happening as expected. Ok, let's keep this (though still not sure about the time serialization as that looks to be ok? reverting that part for the timebeing, but can introduce in future PR)
f065ac9
to
c94567b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with a csv on my side. Works!
@ammedd @VeckoTheGecko
The file does not contain the time expected for each of the stations, how should we guess this?
Also, how should we organize this code? Should we embed this to the
virtualship init
command?e.g.
virtualship init --mfp_file ./CoordinatesExport-Filled.xlsx
So far, this is just a script that creates the
yaml
file. We can delete them after deciding where to implement it.I am also using the CSV version to avoid installing
openpyxl
, but we can also install it, no problem.