In order to provide a broad, configurable search and filtering interface across areas of our site, we have implemented a custom StreamField block, FilterableList, that allows a user to specify what filters are available, how to order results, and which pages should be included in the search.
The journey on how a page gets a filterable form is not necessarily a straight or simple path, but it is something that is important to know. The page class must both support the FilterableList
block within its content StreamField and must also inherit from AbstractFilterablePage
.
Pges must extend from the AbstractFilterablePage
class. This class defines several important methods, such as get_form_class
, which defines the form to use. We also have some methods that retrieve relevant information for the form to use, such as get_filterable_root
and get_filterable_queryset
. The bulk of the work is done in the get_context
method, which is responsible for getting and populating the form, processing the form, and returning the results to the user.
Page classes may optionally define filterable_categories
as a list of categories used to limit search results. We can see this in action with both Newsroom (NewsroomLandingPage
) and Recent Updates (ActivityLogPage
) pages.
As of our initial release of Elasticsearch-backed filterable lists in March 2021, our filterable forms can be broken into three specific forms: FilterableListForm
, EnforcementActionsFilterableListForm
, and EventArchiveFilterForm
. The majority of our filterable lists rely on FilterableListForm
and the other two are each leveraged by a single page.
This is the base form that the vast majority of cf.gov uses for filterable lists. It defines the core fields that are visible on the form as well as functions to assist in setting initial data and sanitizing form input. The important information regarding FilterableListForm
is that it defines the function get_page_set
, which is responsible for invoking a search query.
The EnforcementActionsFilterForm
is an extension of FilterableListForm
, adding on two fields specific to Enforcement Actions, and using a refined search class to provide search functionality against the new fields and a proper ordering by initial filing date.
The EventArchiveFilterForm
is another extension of FilterableListForm
, the only real modification being the invocation of an event specific search class that allows us to provide filtering based on event dates rather than page publication dates.
There is currently only one type of document defined, FilterablePagesDocument
, which is based off the AbstractFilterPage
class. This document is responsible for housing data related to any of the filterable page types that extend AbstractFilterPage
, including EnforcementActionPage
, BlogPage
, EventPage
, and NewsroomPage
, to name a few. In order to get fields that are specific to a page type, such as the status list for an Enforcement Action, you use the prepare_field
function syntax. The use of get_instances_from_related
is to enforce the auto-updating of our index when changes occur to a specific page we have indexed, rather than just the relation to AbstractFilterPage
that is reflected in the database.
Search is the final piece of the puzzle, where we actually leverage Elasticsearch to filter and match documents and return them in an ordered QuerySet
. Before breaking down the search classes, it's important to discuss the current implementation from an Elasticsearch perspective to understand how we're gathering results.
The expanded search for filterable lists is using a multi-match query across the title, topic name, preview description, and content fields of all FilterablePagesDocument
s. We are leveraging a phrase_prefix matching style with a currently configured slop of 2, to allow for some looser matching restrictions. We also provide a boost score for matching to the title and topic name fields, indicated by ^10
within the code base. This boost score is to enable better ordering by relevance when desired. Search currently supports two different methods of ordering results: relevance and date published. Relevance is calculated by the Elasticsearch engine when returning results, and the date published is calculated based on page publication date. Enforcement Actions define their own ordering logic based on initial filing date.
FilterablePagesDocumentSearch
is the core search class that is used across the majority of our searching. It is invoked from FilterableListForm
. This search class defines the common structure for our search function, as well as the base logic for filtering against all common fields and logic behind our multi-match and ordering steps. The core function called from outside the class is the search
function, which properly chains all of our filter/match/sorting logic and returns the resulting list as a Django QuerySet
.
EventFilterablePagesDocumentSearch
is an extension of FilterablePagesDocumentSearch
that defines behavior specific to our future and past Events listings. The class overwrites one method from its parent, the filter_date
function, to change the behavior to filter based on fields specific to events, the start and end date of an event.
EnforcementActionsFilterForm
is an extension of FilterablePagesDocumentSearch
that exposes some additional filter logic through the apply_specific_filters
function. We also see that filter_date
and order_results
have been overwritten to leverage an Enforcement Action-specific field, initial filing date.