Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

Structure of steps - ideas for making it cleaner #87

Open
stared opened this issue Jul 13, 2018 · 2 comments
Open

Structure of steps - ideas for making it cleaner #87

stared opened this issue Jul 13, 2018 · 2 comments

Comments

@stared
Copy link

stared commented Jul 13, 2018

@kamil-kaczmarek, @jakubczakon I know it is a bunch of different ideas and suggestions clustered in one issue. Let me know which of those are compatible with the current roadmap. (I am happy to contribute/collaborate on some.)

  • default data folder (e.g. ./.steppy/step_name/) or to be configurable if needed; overriding only when strictly necessary
  • no input_data; it complicates things for no obvious reason!
  • names optional, automatically generated from class names + number
  • more explicit job structure (steps = Sequence([step1, step2])); vide Keras API
  • adapters as inheriting from BaseTrainers,step = Rename({'a': 'aaa', 'b': 'bbb'}), vide rename in Pandas
  • how to separate persist-data vs persist-parameters? (e.g. for image preprocessing, it may be time-saving to save once processed images)
  • built-in data tests (e.g. len(X) == len(Y)), in def test
  • built-in test if persist->load is correct (i.e. loaded data is the same as saved)
@jakubczakon
Copy link
Collaborator

jakubczakon commented Jul 28, 2018

@stared Sorry for late response.

Go-for-it's

  • default data folder
  • default names
  • explicit job structure. Sounds great not sure how complicated it is to contruct

Could-be's

  • drop input_data would create a need for input_data step I guess. I don't mind the idea but gotta see it work first.
  • Rename is a good idea but remember that there could be multiple steps with the same output key that are joined somewhere so it will be more complicated than what you suggested. I would love to improve the adapter structure though.

Don't-get-it's

  • persist-data is different than persist-parameters for exactly that reason
  • persist>load can you elaborate?
  • I am always for any tests could you explain what you mean by those data tests?

@kamil-kaczmarek
Copy link
Member

Update here:

  • default data folder -> implemented
  • names optional, automatically generated from class names + number -> implmented

more on the way... 😃

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants