You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Datagovteam would like a utility that can search for and serve datasets that correspond to a natural language query
Anticipated/hypothesized benefits
Employing natural-language search will allow all users, both novice and expert, to locate datasets and derive connections between them that would otherwise not be doable with a traditional Solr query.
Measurements/metrics
measure the adoption of the tool
measure the quality of the results
compare the use of the tool against the use of the standard Solr-query backed search.
Knowing a little bit more about LLMs now, I would ask the team two questions which may redefine the scope of this ticket,
What is the intent for the chatbot? If I ask a question about "what state has the most potholes", will it give me the answer? Or will it just point me to the source that it thinks has the answer?
With knowledge of a catalog of data that's constantly changing, do you really want to train an LLM or just employ a more informed LLM? There are different methods to do the latter.
I would suggest a Retrieval Augmented Generation (RAG)-based approach which just takes a bunch of search results and passes it to an LLM to process. As it gets more information from the user, it can get different data from the DB and make a more specialized response. It's extremely lightweight to use an off-the-shelf open-source model and then ask it a question while passing it information to interpret.
For this to be an open-source contribution (i.e. for like me to do it haha..), the things that I'd like to see are an api endpoint to get search results (I know it was CKAN before, but I don't know what part of the API is still functional in the transition) and hmm... maybe that's it?
As a side note, for generic questions, ChatGPT's web crawler probably has enough information to give decent response 🤷♀️
Not using data.gov data,
Answer with data.gov (sounds accurate the the data isn't there),
Asking a question that I know moreso is on data.gov,
All of these demos to prove that if the LLM just had access to the data, it can provide more informed answers, training an LLM from scratch is almost not necessary.
Feature/what we're after
Datagovteam would like a utility that can search for and serve datasets that correspond to a natural language query
Anticipated/hypothesized benefits
Measurements/metrics
References/background
The text was updated successfully, but these errors were encountered: