Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: leverage data formats #2

Open
Beanow opened this issue Feb 29, 2020 · 3 comments
Open

Suggestion: leverage data formats #2

Beanow opened this issue Feb 29, 2020 · 3 comments

Comments

@Beanow
Copy link

Beanow commented Feb 29, 2020

Noticing some parsing of the txt file is happening:

repo, extra_tags = line.strip().split(" ")
extra_tags = extra_tags.split(",")
repo = "/".join(repo.split("/")[-2:])

JSON/YAML could probably make this easier and self-documenting:

# json style
[
  {
    "owner": "organization",
    "name": "example-repo",
    "tags": ["hello", "world"]
  }
]

# yaml style
- owner: organization
  name: example-repo
  tags: [hello, world]

Likewise, I think it would be really useful to have an option to produce data output. Rather than directly going to front matter + markdown. Even if just internally in the python scripts.

Here's some pseudo code of what I mean in terms of interfaces.

type Filename = string;
type MarkdownContent = string;

interface ScriptFunctions {
  fetchRepos(path: Filename): Repository[];
  fetchIssues(repos: Repository[]): Map<Repository, Issue[]>;
  templateIssues(data: Map<Repository, Issue[]>): Map<Filename, MarkdownContent>;
  writeMarkdownFiles(files: Map<Filename, MarkdownContent>);
}

This would make it trivial to support writing JSON data out instead of MD files :]

@vsoch
Copy link
Contributor

vsoch commented Feb 29, 2020

The audience I chose the input "format" for is academic and data science groups, and a lot of them have preference for formats like text / csv, even over json (and definitely yaml), so based on the fact that people will most readily copy paste a GitHub url (including organization and repository name) and then we just need a list of tags after that, a single url then comma delimited tags is a simple and logical solution. I think the formats you shared would be reasonable if there were more needed than that.

What kind of use cases do you have in mind for just data output?

@Beanow
Copy link
Author

Beanow commented Feb 29, 2020

Writing yaml is already a requirement for setting up the GH action.
https://github.com/rseng/good-first-issues#example-usage

Copy pasting URLs rather than pulling them apart is a fair point. I'm interpreting that as a deliberate choice to prioritize simple to update, over self-documenting.

You could have that same priority with something very close to the custom txt you're using, while still being valid json/yml.

# yaml style, closest to txt format
https://github.com/organization/example-repo: [hello, world]

# json style
{
  "https://github.com/organization/example-repo": ["hello", "world"]
}

# yaml alternative, with some empty lines would be nice to read
https://github.com/organization/example-repo:
  - hello
  - world

What kind of use cases do you have in mind for just data output?

Raw data in a well-supported and simple format like JSON, is by far the best interoperability choice you could make 😄

Take the other issue: #3
Q: Why not just scrape the HTML to get this data, instead of wait for an API?
A: Because HTML (and XML for that matter) is a horrid and unwieldy data format.
In fact, it's unfair to call it a data format, because it's a markup language. Same as markdown.

Having the scores.json format, allowed me to create SourceCred widgets before I had a clue about how any of the SourceCred code worked or what sort of APIs or tooling it had.

Likewise, I think giving the option to output JSON, allows people to use this action as input for whatever else they can come up with. Maybe they would like to load the json into some javascript browser tool rather than static-rendering. Maybe they'll load it into their own script and add bounties / cred scores for these issues. Maybe they want to store it in a database.

Whatever the use-case is, having "just data" will make it significantly easier to do.
That said, I don't myself have a use-case right now. So I would suggest this more as a best practice than a feature request. 🙂

@vsoch
Copy link
Contributor

vsoch commented Feb 29, 2020

Likewise, I think giving the option to output JSON, allows people to use this action as input for whatever else they can come up with. Maybe they would like to load the json into some javascript browser tool rather than static-rendering. Maybe they'll load it into their own script and add bounties / cred scores for these issues. Maybe they want to store it in a database.

I am in total support! I would suggest that we wait for someone actually asking for this, before designing something that isn't asked for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants