Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curate sort order of facet terms #233

Open
ACharbonneau opened this issue Aug 30, 2021 · 8 comments
Open

Curate sort order of facet terms #233

ACharbonneau opened this issue Aug 30, 2021 · 8 comments
Assignees

Comments

@ACharbonneau
Copy link
Contributor

ACharbonneau commented Aug 30, 2021

Summary

We could allow the CFDE-CC to curate the presentation order of values in a facet by adding a rank value to the vocabulary tables.

For each affected vocabulary:

  1. Add a rank (integer or float?) to the vocab tracking table in the registry
    • but ank is not added in c2m2 vocab table, i.e. rank not controlled by submitting DCC
    • default value can be null, saying no preferred order for newly discovered terms
  2. Modify the catalog prep ETL to augment vocab table in the prepared catalog with rank column
    • pull ranks from registry and copy into local vocab
    • gaps occur when a submission introduces new terms and/or existing terms have not been assigned a rank number
    • apply some kind of fallback strategy to fill in gaps? (lexical order? original insertion order?)
  3. Annotate the portal UI config to use a rank-based sort order
    • lower rank number displayed before higher rank number
    • use fallback order if applicable

This would allow for an out-of-band process by a CFDE-CC member to curate the ranks in the registry, affecting the ordering behavior for new catalogs built after those registry updates.

  • Manual adjustments could be made via Chaise registry UI
  • Simple CLI could be provided to do bulk curation via TSV or JSON files?

Original issue text

  • What is the current sort order?
  • Is it possible to (globally) change the sort order to something else? If so, what options do we have?
  • Would it ever be possible to sort on something more dynamic, like putting the most popular ones towards the top (based on some kind of assessment I did, and made a list of, for instance)

This is for sure not a thing I am asking you to do for Q1

@karlcz
Copy link
Contributor

karlcz commented Sep 1, 2021

I think it is defaulting to sort by an internal numerical id which is essentially a proxy for an insertion order, which isn't particularly meaningful.

We could configure a preferred sort for each vocabulary table, e.g. lexicographic sort by term name. We could also consider augmenting a vocabulary table with a "rank" column to add an integer (or floating point!) ordinal value to use as the sort order. This would thn require a process for populating the rank:

  • ontology group could provide rank info extracted from ontology somehow (a global/static property)
  • some kind of ETL could determine it during catalog preparation (a per-inventory property)

I'm not sure where your idea of an assessment would fit into this. Is it my second option (something we can code into the catalog building process) or does it imply a new async workflow to allow an external process (or human) to run assessments of input data and supply an extra mapping of term-rank values that would be incorporated into the catalog build... I hope not the latter, as it sounds too cumbersome in practice.

@karlcz
Copy link
Contributor

karlcz commented Sep 1, 2021

For completeness, I should add that we have a third concept in Chaise but it is only supported for scalar facets (which we aren't using much), and I think it is also inadvisable due to our scalability concerns...

That is to dynamically compute a number-of-occurrences for each value and sort the facet choices to show the most frequently used values on top. This adapts on the fly to the currently matching result set. But, it makes more expensive queries against the service to do so, which will scale poorly with the table size.

@mgiglio99
Copy link

I'm wondering if it makes sense to use the slim terms as a structure around which to hang the non-slim terms. We could order the non-slim terms based on slim category first, then alphabetically. This would work now for uberon.
For other ontologies, until we get slims for those, we can simply list them alphabetically. Doing something to infer order based on the ontology structure would be great, but will likely take a good bit of work. Will be better to wait for slims.

@karlcz
Copy link
Contributor

karlcz commented Sep 2, 2021 via email

@mgiglio99
Copy link

Yes, forgot about those.
They can't show up in the list twice under both slim terms?

@karlcz
Copy link
Contributor

karlcz commented Sep 3, 2021 via email

@ACharbonneau
Copy link
Contributor Author

What I meant by assessment was mostly an order I pick for some reason, that we update maybe quarterly, to specifically not try to have it dynamically reorder by query

@karlcz
Copy link
Contributor

karlcz commented Sep 3, 2021 via email

@karlcz karlcz changed the title Questions about sort order of facet terms Curate sort order of facet terms Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants