Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convenience: allow specifying the corpus description json file #172

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mtrofin
Copy link
Collaborator

@mtrofin mtrofin commented Oct 27, 2022

I found it useful sometimes to have variations of the corpus description - e.g. remove swaths of modules. This patch allows multiple descriptions to co-exist under a corpus directory, and select the desired one transparently.

Renamed the ctor argument since "data path" is more of a directory than a file, "location" covers both cases.

I found it useful sometimes to have variations of the corpus description
- e.g. remove swaths of modules. This patch allows multiple descriptions
to co-exist under a corpus directory, and select the desired one
transparently.

Renamed the argument since "data path" is more of a directory than a
file, "location" covers both.
@@ -238,7 +240,10 @@ def __init__(self,
output) and validated.

Args:
data_path: corpus directory.
location: either the path to the corpus description json file in a corpus
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better that we only allow the input being the corpus_description path? The only concern is that it adds some migration cost...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that'd be better, but to avoid migration pain, we could stage it: this patch first; then warn about deprecation; then deprecate

we can skip the 2nd step because we don't have a release model and so when folks would notice the change is out of our control; but can work with e.g. Fuchsia to update their scripts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the migration plan sgtm!

@@ -32,6 +32,8 @@
# command line, where all the flags reference existing, local files.
FullyQualifiedCmdLine = Tuple[str, ...]

DEFAULT_CORPUS_DESCRIPTION_FILENAME = 'corpus_description.json'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_CORPUS...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean adding underscore('') before DEFAULT...

github believes that ''DEFAULT... means italic the word following the '_'

@@ -225,7 +227,7 @@ class ReplaceContext:

def __init__(self,
*,
data_path: str,
location: str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

location is a bit confusing, seems that data_path is a more informative name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to change to anything, but did you read the commit message (because I'm making the opposite argument there).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i missed that part.

data_path sounds to me that can represents both file path or dir path. My major concern about the naming of location is: 1) it does not describe it is the location of what so can be a bit confusing, data_location may be more descriptive in that sense; 2) the flag name is data_path now, having different names increases the burden a little bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about corpus_description, since we'd subsequently go with deprecating dir-based value anyway? (probably subsequently worth updating the flag, too)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with corpus_description is that it is a bit too specific so if in the future we have more/different corpus formatting we have to do some migration work again. Meanwhile, data_path is a general and reasonable name imo.

I'm fine with either as long as the name is the same as the flag name, so up to you :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corpus_path (and ack on the flag). Less general than "data", reasonably ambiguous-ish to mean either dir or file, and says nothing about description or anything like that. wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corpus_path sounds great!

@@ -238,7 +240,10 @@ def __init__(self,
output) and validated.

Args:
data_path: corpus directory.
location: either the path to the corpus description json file in a corpus
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the migration plan sgtm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants