stylo(): derive label coloring from metadata table instead of filename #21

christofs · 2018-12-18T08:40:14Z

This is a feature request that has been on my mind for a while. It would be really neat if the stylo() function could pull information used to color the dendrogram from a metadata table instead of from the filename.

Maybe the default GUI option could remain as it is, but a CLI option could be used to (a) indicate that the coloring should be derived from a CSV file with metadata, (b) the path to the metadata file, and (c) the column to be used for coloring. The metadata table would need to have a column called "filenames" (or something along these lines) that contain the filenames actually used, so that the metadata can be mapped to the actual filenames.

You could dispense with (b) if a conventional filename, e.g. "metadata.csv", and file location, e.g. the current working directory, is instead foreseen.

The advantage of this would be that it becomes much easier to switch, for a given stylometric analysis, between colorings of different types (e.g. author vs. genre vs. decade of publication) to compare the clusterings to various potential factors influencing text similarity, without keeping multiple identical corpora (only with different filenames) and without calculating the text similarity multiple times.

pielstroem · 2018-12-20T11:36:02Z

I'm working on it. We are currently preparing a prototype brand-new shiny-GUI for the stylo-function, and need some metadata handling for that anyway.

christofs · 2019-03-12T04:04:48Z

Thanks so much, it is a great feature. One more follow-up wish: It would be wonderful if the filename of the resulting dendrogram could have the value of "grouping.column" as a suffix to the other parameters. In that way, one could use several different grouping criteria and the dendrogram files would not get overwritten.

pielstroem mentioned this issue Dec 20, 2018

Metadata #23

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stylo(): derive label coloring from metadata table instead of filename #21

stylo(): derive label coloring from metadata table instead of filename #21

christofs commented Dec 18, 2018

pielstroem commented Dec 20, 2018

christofs commented Mar 12, 2019

stylo(): derive label coloring from metadata table instead of filename #21

stylo(): derive label coloring from metadata table instead of filename #21

Comments

christofs commented Dec 18, 2018

pielstroem commented Dec 20, 2018

christofs commented Mar 12, 2019