Curate sort order of facet terms #233

ACharbonneau · 2021-08-30T20:23:43Z

Summary

We could allow the CFDE-CC to curate the presentation order of values in a facet by adding a rank value to the vocabulary tables.

For each affected vocabulary:

Add a rank (integer or float?) to the vocab tracking table in the registry
- but ank is not added in c2m2 vocab table, i.e. rank not controlled by submitting DCC
- default value can be null, saying no preferred order for newly discovered terms
Modify the catalog prep ETL to augment vocab table in the prepared catalog with rank column
- pull ranks from registry and copy into local vocab
- gaps occur when a submission introduces new terms and/or existing terms have not been assigned a rank number
- apply some kind of fallback strategy to fill in gaps? (lexical order? original insertion order?)
Annotate the portal UI config to use a rank-based sort order
- lower rank number displayed before higher rank number
- use fallback order if applicable

This would allow for an out-of-band process by a CFDE-CC member to curate the ranks in the registry, affecting the ordering behavior for new catalogs built after those registry updates.

Manual adjustments could be made via Chaise registry UI
Simple CLI could be provided to do bulk curation via TSV or JSON files?

Original issue text

What is the current sort order?
Is it possible to (globally) change the sort order to something else? If so, what options do we have?
Would it ever be possible to sort on something more dynamic, like putting the most popular ones towards the top (based on some kind of assessment I did, and made a list of, for instance)

This is for sure not a thing I am asking you to do for Q1

The text was updated successfully, but these errors were encountered:

karlcz · 2021-09-01T19:27:32Z

I think it is defaulting to sort by an internal numerical id which is essentially a proxy for an insertion order, which isn't particularly meaningful.

We could configure a preferred sort for each vocabulary table, e.g. lexicographic sort by term name. We could also consider augmenting a vocabulary table with a "rank" column to add an integer (or floating point!) ordinal value to use as the sort order. This would thn require a process for populating the rank:

ontology group could provide rank info extracted from ontology somehow (a global/static property)
some kind of ETL could determine it during catalog preparation (a per-inventory property)

I'm not sure where your idea of an assessment would fit into this. Is it my second option (something we can code into the catalog building process) or does it imply a new async workflow to allow an external process (or human) to run assessments of input data and supply an extra mapping of term-rank values that would be incorporated into the catalog build... I hope not the latter, as it sounds too cumbersome in practice.

karlcz · 2021-09-01T19:32:16Z

For completeness, I should add that we have a third concept in Chaise but it is only supported for scalar facets (which we aren't using much), and I think it is also inadvisable due to our scalability concerns...

That is to dynamically compute a number-of-occurrences for each value and sort the facet choices to show the most frequently used values on top. This adapts on the fly to the currently matching result set. But, it makes more expensive queries against the service to do so, which will scale poorly with the table size.

mgiglio99 · 2021-09-01T23:05:37Z

I'm wondering if it makes sense to use the slim terms as a structure around which to hang the non-slim terms. We could order the non-slim terms based on slim category first, then alphabetically. This would work now for uberon.
For other ontologies, until we get slims for those, we can simply list them alphabetically. Doing something to infer order based on the ontology structure would be great, but will likely take a good bit of work. Will be better to wait for slims.

karlcz · 2021-09-02T00:14:30Z

One concern would be the DAG aspects, where a term maps to more than one slim term. You'd have to pick one primary category under which the fine-grained terms would sort.

mgiglio99 · 2021-09-03T00:28:26Z

Yes, forgot about those.
They can't show up in the list twice under both slim terms?

karlcz · 2021-09-03T01:52:00Z

No, the UI is querying an API to return a _set_ of terms connected to the results...

ACharbonneau · 2021-09-03T13:54:20Z

What I meant by assessment was mostly an order I pick for some reason, that we update maybe quarterly, to specifically not try to have it dynamically reorder by query

karlcz · 2021-09-03T16:44:21Z

Sure, that would be covered by the same mechanism as the term ranks assigned by ontologists. The hard part would be getting a human workflow where you participate in the ontology-wg to get a specific ranking assigned each quarter. The easy part would be where I modify the catalog build process to load these updated ranks, and deal with corner cases like a submission providing terms not known by the central term file w/ assigned ranks.

jrchudy assigned jrchudy and karlcz Nov 5, 2021

karlcz changed the title ~~Questions about sort order of facet terms~~ Curate sort order of facet terms Dec 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Curate sort order of facet terms #233

Curate sort order of facet terms #233

ACharbonneau commented Aug 30, 2021 •

edited by karlcz

Loading

karlcz commented Sep 1, 2021

karlcz commented Sep 1, 2021

mgiglio99 commented Sep 1, 2021

karlcz commented Sep 2, 2021 via email

mgiglio99 commented Sep 3, 2021

karlcz commented Sep 3, 2021 via email

ACharbonneau commented Sep 3, 2021

karlcz commented Sep 3, 2021 via email

Curate sort order of facet terms #233

Curate sort order of facet terms #233

Comments

ACharbonneau commented Aug 30, 2021 • edited by karlcz Loading

Summary

Original issue text

karlcz commented Sep 1, 2021

karlcz commented Sep 1, 2021

mgiglio99 commented Sep 1, 2021

karlcz commented Sep 2, 2021 via email

mgiglio99 commented Sep 3, 2021

karlcz commented Sep 3, 2021 via email

ACharbonneau commented Sep 3, 2021

karlcz commented Sep 3, 2021 via email

ACharbonneau commented Aug 30, 2021 •

edited by karlcz

Loading