A Clojure library originally created by Wit.ai for identifying semantic content in text - like time, measurements, currencies and similar.
Wit.ai decided to abandon the Clojure version in favour of a Haskell version.
See the old hosted documentation at https://duckling.wit.ai for more information.
org.msync/duckling is a Clojure library that parses text into structured data:
(parse :en$core "the first Tuesday of October")
({:dim :ordinal, :body "first", :value {:value 1}, :start 4, :end 9}
{:dim :time,
:body "the first Tuesday of October",
:value
{:type "value",
:value "2021-10-05T00:00:00.000+05:30",
:grain :day,
:values
({:type "value",
:value "2021-10-05T00:00:00.000+05:30",
:grain :day})},
:start 0,
:end 28})
For the time the hosted Clojure version by wit.ai exists, you can try it out at https://duckling.wit.ai
The original blog post announcement can be found at https://wit.ai/blog/2014/10/01/open-source-parser-duckling for more context.
To use Duckling in your project, you just need two functions: `load!` to load the default configuration, and `parse` to parse a string.
(ns myproject.core
(:require [duckling.core :as p]))
(p/load!) ;; Load all languages
;; Or optionally, load only specific languages
(load! ["en" "fr"])
(parse :en$core ;; core configuration for English ; see also :fr$core, :es$core, :zh$core
"wake me up the last Monday of January 2015 at 6am"
[:time]) ;; We are interested in :time expressions only ; see also :duration, :temperature, etc.
({:dim :time,
:body "last Monday of January 2015 at 6am",
:value {:type "value",
:value "2015-01-26T06:00:00.000+05:30",
:grain :hour,
:values ()},
:start 15,
:end 49})
There are multiple languages supported. For reference, the current list looks like below
; In the duckling.core namespace
(available-languages)
#{"ar"
"da"
"de"
"en"
"es"
"et"
"fr"
"ga"
"he"
"hr"
"id"
"it"
"ja"
"ko"
"my"
"nb"
"nl"
"pl"
"pt"
"ro"
"ru"
"sv"
"tr"
"uk"
"vi"
"zh"}
Before you can use duckling, you will need to load the relevant languages’ corpuses - datasets that contain examples and rules from which duckling learns how to generalize. For example, to load English and French data, run the following
(load! ["en" "fr"])
{:en$core
(:phone-number
:number
:distance
:volume
:time
:temperature
:url
:email
:timezone
:leven-unit
:leven-product
:unit
:quantity
:amount-of-money
:ordinal
:unit-of-duration
:cycle
:duration),
:fr$core
(:phone-number
:time
:number
:distance
:volume
:temperature
:url
:email
:timezone
:leven-unit
:leven-product
:quantity
:unit
:amount-of-money
:unit-of-duration
:cycle
:duration
:ordinal)}
As you may already notice, there is support for identifying structured information on time, money, phone-numbers, temperature et al. The English language data, now available with the key :en$core, and French data with the key :fr$core
To parse a sentence, in a known language, use the parse function and the right language key. For example
(parse :en$core "Meet me at 8")
({:dim :number,
:body "8",
:value {:type "value", :value 8},
:start 11,
:end 12}
{:dim :distance,
:body "8",
:value {:type "value", :value 8},
:start 11,
:end 12,
:latent true}
{:dim :volume,
:body "8",
:value {:type "value", :value 8},
:start 11,
:end 12,
:latent true}
{:dim :temperature,
:body "8",
:value {:type "value", :value 8},
:start 11,
:end 12,
:latent true}
{:dim :time,
:body "at 8",
:value
{:type "value",
:value "2021-04-17T08:00:00.000+05:30",
:grain :hour,
:values
({:type "value",
:value "2021-04-17T08:00:00.000+05:30",
:grain :hour}
{:type "value",
:value "2021-04-17T20:00:00.000+05:30",
:grain :hour}
{:type "value",
:value "2021-04-18T08:00:00.000+05:30",
:grain :hour})},
:start 8,
:end 12})
The returned map gives multiple possible interpretations, and the caller should pick the most appropriate one. The type of the value - the dimension - is given under the :dim key. For the dimensions duckling is more confident about, there is no :latent flag. So, in the above example, :number and :time are the most confident interpretations.
If you are sure about what dimension you are looking to extract, you can specify it
(parse :en$core "Meet me at 8" [:time])
({:dim :time,
:body "at 8",
:value
{:type "value",
:value "2021-04-17T20:00:00.000+05:30",
:grain :hour,
:values
({:type "value",
:value "2021-04-17T20:00:00.000+05:30",
:grain :hour}
{:type "value",
:value "2021-04-18T08:00:00.000+05:30",
:grain :hour}
{:type "value",
:value "2021-04-18T20:00:00.000+05:30",
:grain :hour})},
:start 8,
:end 12})
Notice that the results are contextual - dependent on the time when it was called. In the above example, 8 was interpreted to be the closest times when you’d see 8 on the clock - both PM and AM, in the immediate future.
But you can also supply a context - which has a reference time to consider while parsing.
(require '[duckling.time.obj :as time])
(parse :en$core "Meet me at 8" [:time] {:reference-time (time/t 0 2020 04 01)})
({:dim :time,
:body "at 8",
:value
{:type "value",
:value "2020-04-01T08:00:00.000Z",
:grain :hour,
:values
({:type "value", :value "2020-04-01T08:00:00.000Z", :grain :hour}
{:type "value", :value "2020-04-01T20:00:00.000Z", :grain :hour}
{:type "value", :value "2020-04-02T08:00:00.000Z", :grain :hour})},
:start 8,
:end 12})
Another interesting example is the following - duckling can consider other signals, like the world tomorrow below.
(parse :en$core "Meet me tomorrow at 8" [:time] {:reference-time (time/t 0 2020 04 01)})
({:dim :time,
:body "tomorrow at 8",
:value
{:type "value",
:value "2020-04-02T08:00:00.000Z",
:grain :hour,
:values
({:type "value", :value "2020-04-02T08:00:00.000Z", :grain :hour}
{:type "value", :value "2020-04-02T20:00:00.000Z", :grain :hour})},
:start 8,
:end 21})