Converting MFP CSV to YAML schedule #111

iuryt · 2025-01-28T18:43:12Z

The file does not contain the time expected for each of the stations, how should we guess this?

Also, how should we organize this code? Should we embed this to the virtualship init command?
e.g. virtualship init --mfp_file ./CoordinatesExport-Filled.xlsx

So far, this is just a script that creates the yaml file. We can delete them after deciding where to implement it.

I am also using the CSV version to avoid installing openpyxl, but we can also install it, no problem.

Closes YAML schedule from MFP CSV #61
Tests added
Re-ran README.md help commands if the CLI has changed

for more information, see https://pre-commit.ci

ammedd · 2025-01-29T07:48:51Z

I'll ask the students to add a column with station time. I hope it will become available in later downloads from the MFP website directly as well. It is included online.

virtualship init --mfp_file ./CoordinatesExport-Filled.xlsx sounds great! maybe changing file for csv or DD
When you download from MFP, you choose Export Coordinates - DD (decimal degrees). There are other options but I would not cater for those at the moment.

iuryt · 2025-01-29T09:46:22Z

@ammedd
Can you give me a version of this file with the times using the template you plan to use?

VeckoTheGecko · 2025-01-29T12:08:21Z

I was discussing with @ammedd , and I think the best thing to do would be to use the CSV to populate as many fields as it can in the schedule, and leave the rest up to the user to populate in the YAML. This saves getting the user to add columns to the CSV (which I think would be more error prone).

I'll be able to give this a proper review on Friday - just have some other items to clear before then :)

scripts/coordinates_to_yaml_output.yaml

VeckoTheGecko · 2025-01-31T09:52:19Z

Okay, I've managed to look through the code. Thanks for picking this up @iuryt ! I definitely see how this will streamline things for users - especially for the waypoints and the bounding box.

But that means that we need to either leave blank for the dates or guess it, right? There are also stations that will have more than one instrument, but it will be hard to guess they are the same if we leave the dates blank. Of course, they have the same location, but if longitudes and latitudes are too close, it might be harder than just adding a column to the CSV. Although I understand that adding a column manually is more prone to error.

Correct, we'll have to leave the dates blank - its just information that is not provided from the MFP export. Though the order of points in the YAML should match the CSV. If the MFP export changes in future, then we can adapt. So yes, it would also be difficult to guess whether stations are the same. Honestly, I think that we should avoid making that guess. Making it clear that virtualship init --from-mfp will only populate some of the fields, and the user will be required to take it from there (e.g., merging waypoints to the same location that were initially spaced a bit further apart in MFP, filling in the dates from the UI of MFP - since they weren't included in the export).

Its not optimal, but given MFP->virtualship isn't necessarily 1to1, and given the limited export, I think that its all we can do. I also think this is the clearer approach from a maintenance POV - being explicit with what is supported, and by not making additional assumptions which may not be right - and hands the control to the user to modify the schedule file.

Also, how should we organize this code? Should we embed this to the virtualship init command?
e.g. virtualship init --mfp_file ./CoordinatesExport-Filled.xlsx

Sounds like a good plan! My vote is for --from-mfp since that is more in line with CLI conventions. We can then add in the help message something like this would be good:

--from-mfp    Partially initialise a project from an exported xlsx or csv file from NIOZ' Marine Facilities Planning tool (specifically the "Export Coordinates > DD" option). User edits are required after initialisation.

I am also using the CSV version to avoid installing openpyxl, but we can also install it, no problem.

Let's install openpyxl

Other notes:

Could we add an error message to users if the CSV headers are not as expected?
- Incorrect headers -> Error: "Found columns [...], but expected columns [...]. Are you sure that you're using the correct export from MFP?"
- Additional headers -> Warning: "Found additional unexpected columns [...]. Manually added columns have no effect. If the MFP export format changed, please submit an issue https://github.com/OceanParcels/virtualship/issues."
It would be nice to have a message at the end that its up to the user to populate the rest of the YAML
Would you be comfortable writing some unit tests for this functionality as well?

Let me know what your thoughts are on all this ^, and if there's anything I can do to help :)

for more information, see https://pre-commit.ci

iuryt · 2025-02-03T17:48:16Z

@ammedd and @VeckoTheGecko

But I tested it, and the current version is working.
I also added some maximum_depth and buffer for longitude and latitude that depend on the instruments.

    # Define maximum depth and buffer for each instrument
    instrument_properties = {
        "CTD": {"depth": 5000, "buffer": 1},
        "DRIFTER": {"depth": 1, "buffer": 5},
        "ARGO_FLOAT": {"depth": 2000, "buffer": 5},
    }

Any comments on this, @erikvansebille ?

Let me know what you think.

I have not added yet

Error message to users if the CSV headers are not as expected
Additional headers -> Warning
Message at the end that its up to the user to populate the rest of the YAML
Unit tests for this functionality

erikvansebille · 2025-02-04T07:04:50Z

Looks goo, but perhaps explain what buffer means in the dictionary above?

VeckoTheGecko

Good progress! Left some review comments. I think there is some duplication of information between the comments and the code. I recommend in general: Don't Write Comments - rewrite the code (really good YouTube channel in general for code style - highly recommend). The only time I write comments is to explain why something was done. E.g., # Keeping this legacy format for backward compatibility with older datasets

There are many ways to make things self documenting. E.g.,

mfp_to_yaml(
            mfp_file, str(path)
        )  # Pass the path to save in the correct directory

could become

mfp_to_yaml(
            mfp_file, save_directory = str(path)
        )

.

It's possible to have clean, readable code with 0 duplication, but also 0 comments ;)

src/virtualship/utils.py

src/virtualship/cli/commands.py

environment.yml

src/virtualship/utils.py

src/virtualship/cli/commands.py

for more information, see https://pre-commit.ci

…nt and time

for more information, see https://pre-commit.ci

iuryt · 2025-02-13T22:24:22Z

src/virtualship/expedition/space_time_region.py

+    start_time: datetime | None = None
+    end_time: datetime | None = None

    @model_validator(mode="after")
    def _check_time_range(self) -> Self:
-        if not self.start_time < self.end_time:
-            raise ValueError("start_time must be before end_time")
+        if self.start_time and self.end_time:
+            if not self.start_time < self.end_time:
+                raise ValueError("start_time must be before end_time")
        return self


@VeckoTheGecko @ammedd
While using pydantic, I had to change this to accept nonetype time, but this means that when we do schedule.from_yaml() for fetch it will not give any error, right? What should we do?

I see. I thought that pydantic would have a way of disabling validation during the initialisation of the object, but looking further at the documentation its looking that that isn't possible ... . Longterm it would be good to have start_time and end_time not none, but perhaps thats something for a future PR

for more information, see https://pre-commit.ci

iuryt · 2025-02-14T02:19:53Z

@ammedd
I tested the code, and it seems to be working on my end, including the warning and error messages. I also ran the unit tests, which appear to be functioning properly. Let me know if I should add any other functionalities. I hope this is sufficient for you to merge so you don't have a lot of work tomorrow.

tests/test_mfp_to_yaml.py

VeckoTheGecko · 2025-02-14T12:32:35Z

src/virtualship/expedition/waypoint.py

+class Waypoint(BaseModel):
    """A Waypoint to sail to with an optional time and an optional instrument."""

    location: Location
    time: datetime | None = None
    instrument: InstrumentType | list[InstrumentType] | None = None
+
+    @field_serializer("instrument")
+    def serialize_instrument(self, instrument):
+        """Ensure InstrumentType is serialized as a string (or list of strings)."""
+        if isinstance(instrument, list):
+            return [inst.value for inst in instrument]
+        return instrument.value if instrument else None
+
+    @field_serializer("time")
+    def serialize_time(self, time):
+        """Ensure datetime is formatted properly in YAML."""
+        return time.strftime("%Y-%m-%d %H:%M:%S") if time else None


i see why this was done now - that InstrumentType serialization wasn't happening as expected. Ok, let's keep this (though still not sure about the time serialization as that looks to be ok? reverting that part for the timebeing, but can introduce in future PR)

ammedd

Tested with a csv on my side. Works!

src/virtualship/utils.py

iuryt and others added 2 commits January 28, 2025 13:40

draft script for converting MFP CSV to YAML schedule

5c5bf9a

[pre-commit.ci] auto fixes from pre-commit.com hooks

40c7ff9

for more information, see https://pre-commit.ci

iuryt changed the title ~~draft script for converting MFP CSV to YAML schedule~~ Converting MFP CSV to YAML schedule Jan 28, 2025

iuryt commented Jan 29, 2025

View reviewed changes

scripts/coordinates_to_yaml_output.yaml Outdated Show resolved Hide resolved

iuryt and others added 6 commits February 3, 2025 12:31

add openpyxl

366dede

add mfp_to_yaml function

d046444

add new command to init to accept mfp file as input

e3199fa

delete files from scripts/

d9fe46a

deleted scripts files

7dc9bd7

[pre-commit.ci] auto fixes from pre-commit.com hooks

a79433c

for more information, see https://pre-commit.ci

VeckoTheGecko requested changes Feb 4, 2025

View reviewed changes

iuryt and others added 9 commits February 13, 2025 14:28

export the schedule body instead of saving file

11332f8

change name of cli param and adapt for new mfp_to_yaml function

ad54992

[pre-commit.ci] auto fixes from pre-commit.com hooks

66adb18

for more information, see https://pre-commit.ci

add warning message for time entry on yaml

2672afa

change to pydantic and change name of variables

a370641

add XBT

b87d944

accept nonetype time

eba08b8

change to Waypoint to BaseModel and add field_serializer for instrume…

c0a52ac

…nt and time

[pre-commit.ci] auto fixes from pre-commit.com hooks

526d2af

for more information, see https://pre-commit.ci

iuryt commented Feb 13, 2025

View reviewed changes

iuryt added 3 commits February 13, 2025 21:16

remove restriction for version

c51043d

add checking for columns from excel file

f3daaa7

add unit tests

4c59420

[pre-commit.ci] auto fixes from pre-commit.com hooks

b67b15d

for more information, see https://pre-commit.ci

ammedd reviewed Feb 14, 2025

View reviewed changes

tests/test_mfp_to_yaml.py Show resolved Hide resolved

Add update comments and var naming

6f63cd4

VeckoTheGecko reviewed Feb 14, 2025

View reviewed changes

VeckoTheGecko added 2 commits February 14, 2025 13:25

Remove buffering from mfp conversion

222df85

update references to Waypoint

c94567b

VeckoTheGecko force-pushed the mfp_yaml_its branch from f065ac9 to c94567b Compare February 14, 2025 12:38

ammedd approved these changes Feb 14, 2025

View reviewed changes

iuryt commented Feb 14, 2025

View reviewed changes

src/virtualship/utils.py Outdated Show resolved Hide resolved

erikvansebille merged commit 0357ac4 into OceanParcels:main Feb 14, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting MFP CSV to YAML schedule #111

Converting MFP CSV to YAML schedule #111

iuryt commented Jan 28, 2025 •

edited

Loading

ammedd commented Jan 29, 2025

iuryt commented Jan 29, 2025

VeckoTheGecko commented Jan 29, 2025

VeckoTheGecko commented Jan 31, 2025 •

edited

Loading

iuryt commented Feb 3, 2025 •

edited

Loading

erikvansebille commented Feb 4, 2025

VeckoTheGecko left a comment •

edited

Loading

iuryt Feb 13, 2025

VeckoTheGecko Feb 14, 2025

iuryt commented Feb 14, 2025

This comment was marked as outdated.

VeckoTheGecko Feb 14, 2025 •

edited

Loading

ammedd left a comment

Converting MFP CSV to YAML schedule #111

Converting MFP CSV to YAML schedule #111

Conversation

iuryt commented Jan 28, 2025 • edited Loading

ammedd commented Jan 29, 2025

iuryt commented Jan 29, 2025

VeckoTheGecko commented Jan 29, 2025

VeckoTheGecko commented Jan 31, 2025 • edited Loading

iuryt commented Feb 3, 2025 • edited Loading

erikvansebille commented Feb 4, 2025

VeckoTheGecko left a comment • edited Loading

Choose a reason for hiding this comment

iuryt Feb 13, 2025

Choose a reason for hiding this comment

VeckoTheGecko Feb 14, 2025

Choose a reason for hiding this comment

iuryt commented Feb 14, 2025

This comment was marked as outdated.

VeckoTheGecko Feb 14, 2025 • edited Loading

Choose a reason for hiding this comment

ammedd left a comment

Choose a reason for hiding this comment

iuryt commented Jan 28, 2025 •

edited

Loading

VeckoTheGecko commented Jan 31, 2025 •

edited

Loading

iuryt commented Feb 3, 2025 •

edited

Loading

VeckoTheGecko left a comment •

edited

Loading

VeckoTheGecko Feb 14, 2025 •

edited

Loading