Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File paths are os dependent #1

Open
A-Fisk opened this issue Jul 15, 2019 · 6 comments
Open

File paths are os dependent #1

A-Fisk opened this issue Jul 15, 2019 · 6 comments

Comments

@A-Fisk
Copy link

A-Fisk commented Jul 15, 2019

The file paths read in from the spreadsheets are kept in strings and we keep running into problems of getting backslashes/forwardslashes as we keep changing from unix to windows.

Worth updating to read them in as pathlib objects?

@paulbrodersen
Copy link
Owner

Yeah, totally agree.

Unfortunately, I don't have much time this week as I am presenting at our lab meeting on Wednesday and I am off for my holiday Thursday morning for two weeks. Hence: do you want to make a pull request? Otherwise, I will get to it when I am back Aug 2.

I have started writing some skeleton code for a decorator so the body of the relevant functions can ideally remain unchanged. If you look at the latest commit, I have added to data I/O functions in example_pipeline/data_io.py the following decorator:

def _handle_file_path(func):
    def func_wrapper(file_path, *args, **kwargs):
        pathlib_object = _get_pathlib_object(file_path)
        output = func(pathlib_object, *args, **kwargs)
        return output
    func_wrapper.__name__ = func.__name__
    func_wrapper.__doc__ = func.__doc__
    return func_wrapper


def _get_pathlib_object(file_path):
    return file_path

_get_pathlib_object is obviously a dud (it just returns the string without doing anything to it).
I would propose that _get_pathlib_object

  1. converts the string to a pathlib Path,
  2. checks that the path exist,
  3. returns the sanitized path based on OS or, alternatively, the Path object (if that plays nice with all the import functions).

@A-Fisk
Copy link
Author

A-Fisk commented Jul 16, 2019

I can have a look at putting something together later this week. Hopefully it won't be beyond me, I'll follow your steps and see how that goes.

This doesn't have any tests yet to see if it's working does it?

@paulbrodersen
Copy link
Owner

I have a couple of test scripts locally, but they are still in a mess and in no form to be published, and still need a lot of work to be made into a proper test suite. Most I/O functions are still missing, mostly because I have only been getting files from your lab and didn't really know what exceptions to test against.

In the meantime, example_pipeline/02_test_state_annotation.py is an ok poor man's integration test as the coverage is actually pretty good (apart from the other scripts in the example_pipeline). If 01_preprocess_signals.py passes and 02_test_state_annotation.py still gives the same accuracy values, everything is probably ok.

@paulbrodersen
Copy link
Owner

paulbrodersen commented Jul 16, 2019

On a different note, scratch point 2 in my objectives for _get_pathlib_object.
It will be counter-productive for functions that write data (as the path probably shouldn't exist) and for functions that read data, it would be better if those functions threw the error and not the generic wrapper.

So all that needs to go into the _get_pathlib_object is:

  1. Convert the string to a pathlib Path.
  2. Determine the OS (Windows/other is probably enough).
  3. Convert pathlib object to a more appropriate Path class (e.g WindowsPath or whatever it is called).
  4. Convert back to string in case some of the downstream functions don't play nice with a file handle.

The whole function should probably be renamed to _sanitize_file_path, coming to think of it.

@A-Fisk
Copy link
Author

A-Fisk commented Jul 16, 2019

Re: testing. I've come across a few problems running my scripts because I've processed them slightly differently to other people. Not sure where the best place to record them is, open another issue for them?

Re: objectives. checking if path exists.
How about checking if the path exists, if it doesn't at least check the parent exists so it can still write there?

@paulbrodersen
Copy link
Owner

Not sure where the best place to record them is, open another issue for them?

Yes, please.

How about checking if the path exists, if it doesn't at least check the parent exists so it can still write there?

Great idea. I should have thought of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants