Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stylo(): derive label coloring from metadata table instead of filename #21

Open
christofs opened this issue Dec 18, 2018 · 2 comments
Open

Comments

@christofs
Copy link

This is a feature request that has been on my mind for a while. It would be really neat if the stylo() function could pull information used to color the dendrogram from a metadata table instead of from the filename.

Maybe the default GUI option could remain as it is, but a CLI option could be used to (a) indicate that the coloring should be derived from a CSV file with metadata, (b) the path to the metadata file, and (c) the column to be used for coloring. The metadata table would need to have a column called "filenames" (or something along these lines) that contain the filenames actually used, so that the metadata can be mapped to the actual filenames.

You could dispense with (b) if a conventional filename, e.g. "metadata.csv", and file location, e.g. the current working directory, is instead foreseen.

The advantage of this would be that it becomes much easier to switch, for a given stylometric analysis, between colorings of different types (e.g. author vs. genre vs. decade of publication) to compare the clusterings to various potential factors influencing text similarity, without keeping multiple identical corpora (only with different filenames) and without calculating the text similarity multiple times.

@pielstroem
Copy link
Contributor

I'm working on it. We are currently preparing a prototype brand-new shiny-GUI for the stylo-function, and need some metadata handling for that anyway.

@pielstroem pielstroem mentioned this issue Dec 20, 2018
@christofs
Copy link
Author

Thanks so much, it is a great feature. One more follow-up wish: It would be wonderful if the filename of the resulting dendrogram could have the value of "grouping.column" as a suffix to the other parameters. In that way, one could use several different grouping criteria and the dendrogram files would not get overwritten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants