You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a feature request that has been on my mind for a while. It would be really neat if the stylo() function could pull information used to color the dendrogram from a metadata table instead of from the filename.
Maybe the default GUI option could remain as it is, but a CLI option could be used to (a) indicate that the coloring should be derived from a CSV file with metadata, (b) the path to the metadata file, and (c) the column to be used for coloring. The metadata table would need to have a column called "filenames" (or something along these lines) that contain the filenames actually used, so that the metadata can be mapped to the actual filenames.
You could dispense with (b) if a conventional filename, e.g. "metadata.csv", and file location, e.g. the current working directory, is instead foreseen.
The advantage of this would be that it becomes much easier to switch, for a given stylometric analysis, between colorings of different types (e.g. author vs. genre vs. decade of publication) to compare the clusterings to various potential factors influencing text similarity, without keeping multiple identical corpora (only with different filenames) and without calculating the text similarity multiple times.
The text was updated successfully, but these errors were encountered:
Thanks so much, it is a great feature. One more follow-up wish: It would be wonderful if the filename of the resulting dendrogram could have the value of "grouping.column" as a suffix to the other parameters. In that way, one could use several different grouping criteria and the dendrogram files would not get overwritten.
This is a feature request that has been on my mind for a while. It would be really neat if the stylo() function could pull information used to color the dendrogram from a metadata table instead of from the filename.
Maybe the default GUI option could remain as it is, but a CLI option could be used to (a) indicate that the coloring should be derived from a CSV file with metadata, (b) the path to the metadata file, and (c) the column to be used for coloring. The metadata table would need to have a column called "filenames" (or something along these lines) that contain the filenames actually used, so that the metadata can be mapped to the actual filenames.
You could dispense with (b) if a conventional filename, e.g. "metadata.csv", and file location, e.g. the current working directory, is instead foreseen.
The advantage of this would be that it becomes much easier to switch, for a given stylometric analysis, between colorings of different types (e.g. author vs. genre vs. decade of publication) to compare the clusterings to various potential factors influencing text similarity, without keeping multiple identical corpora (only with different filenames) and without calculating the text similarity multiple times.
The text was updated successfully, but these errors were encountered: