Skip to content
This repository has been archived by the owner on Oct 9, 2021. It is now read-only.

2017 05 22.md

arwenhutt edited this page Jun 19, 2017 · 18 revisions

LD4 Community Reconciliation Working Group Kick-off Call

Monday 22 May 2017, 1 PM Pacific / 2 AM Mountain / 3 PM Central / 4 PM Eastern / 9 PM BST

  • Moderator/Gatekeeper: Arwen & Christina
  • Notetaker: Brian Tingle
  • Connection information: Meeting Link is https://stanford.zoom.us/j/626468296 or click Details below for more connection information (calling in, mobile connections, international numbers, etc.)
Join from PC, Mac, Linux, iOS or Android: https://stanford.zoom.us/j/626468296
Or iPhone one-tap (US Toll):  +16465588656,626468296# or +14086380968,626468296#
Or Telephone:
    Dial: +1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
    +886 277 417 473 (Taiwan Toll)
    +1 855 703 8985 (Canada Toll Free)
    Meeting ID: 626 468 296
    International numbers available: https://stanford.zoom.us/zoomconference?m=YRhOncnPpiwtSPc6hqNZ2vbFjE3dZ_xx
Or an H.323/SIP room system:
    H.323: 
        162.255.37.11 (US West)
        162.255.36.11 (US East)
        221.122.88.195 (China)
        115.114.131.7 (India)
        213.19.144.110 (EMEA)
        202.177.207.158 (Australia)
        209.9.211.110 (Hong Kong)
    Meeting ID: 626 468 296
    SIP: [email protected]

Attendees

Agenda

  • Quick Introductions of Members
  • Introduction to this Group
  • Recon WG Work Areas
    • Review, Add, & Discuss
      • List of existing reconciliation tools & a comparison of them
      • Identify common reconciliation use cases
      • Develop flexible (modular perhaps?) abstract workflow models for use cases
      • Create functional requirements for shared workflow components
      • Set of leveled (i.e. simplest to complex) reconciliation and matching algorithms for common reconciliation cases & data types (i.e. going from label look-ups based off various string matching algorithms to contextual data matching based off of string and other data type matching algorithms for multiple fields attached to a resource)
      • Work on light-weight tools for certain needs listed above to make proof of concepts or support intermediate work & review
      • (Meta) Model for LD4 / Other blackbox LODLAM efforts in community engagement & work
    • Select / Prioritize
    • Start of Work Planning
  • Recon WG Logistics
    • Calls
    • Communication (outside of calls)
    • Working Space
  • Questions, Comments, Issues
  • Confirm Next Call

Notes

Intros:

  • C. Harlow
  • A. Hutt (common workflows)
  • A. Soroka (data repository, get more from links, dicipline specific vocabularies, large scale database for reuse)
  • B. Tingle (here to listen)
  • C. Rissmeyer (community solutions / tools / workflows / work together to leverage linked data)
  • G. Reser (make day to day work more efficient, gnarly issues)
  • J. Hardesty (avalon / hydra-fedora ; focus in on migration (ie Fedora 3 to Fedora 4) / automation in migration workflow)
  • K. Hess (experience with matching at LoC)
  • plisks (susanne) deals with weird data streams
  • R. Sanderson (NER/strings-to-thing or thing to thing -- wants services that resolve identity w/o NER; work together vs do your own thing, no silo)
  • R. Johnson (enity resolution with scripts and local services / looking to make that less siloed)
  • Theodore (metadata librarian, Russian Architecture linked data current project / 3 people working part time on the dataset)
  • T. Hill (gets lots URIs in Europeana, but not always sure what they mean / sharing authority files)
  • Darcy (ILS / linked data, contribute to something community based)
  • Arcadia
  • J. Greben (library systems programmer / converting records or new records)

Introduction to this Group -- one of the first LD4* community meetings, what does it mean to start moving into linked data.

scope -- will decide today

timeframe -- one year good target

frequency of call -- will decide today / or once we have a better idea of the work / once or twice a month for ~30 mins

#ld4all in stanford slack (ping harlow if you need to get on slack team)

email list? not sure yet

http://recurse.com/manual#sub-sec-social-rules <-- ground rules from recurse center

What do we want to work on?

  • list of tools
    • yelp for dataservices and datasets
  • common use cases (user stories and analysis) (Cornell has some stub use cases)
  • workflow models for use cases / what can be made generic?
    • functional requirements for workflows
  • algorithem development (MARC 650 could be resolved with xyz points / foaf agent, bibframe agent, here is a subgraph)
  • lightweight tools [- meta model for community engagement]

think of maintenance costs (ie for list of tools)

w/r/t tools, Europeana has very hetrogenous data, if it works them, will work for anyone?

reach out to dp.la?

Start with a list of tools? Use cases? Dive into algo?

Start with id-ing common use cases as a group (with short timeline, expressed in a common way / or start with general ones and make more specific)/ then split --> algo --> workflow then; come around and eval tools

any effective algo will be tied to data source / LCNAF will use a diff algo than getty ULAN (both about people, but different semantics and data structures) Elephant: library or generality // some grey area with NER algo that are not tied to data source -- what do we mean by algo? script I can run? or more abastract. A framework that uses different algos

Where do libraries have divergent needs from the general case?

Would it be helpfull to have evaluation rubric/criteria for tools as a work product (vs writing tools)?

Action Items

  • on-going: make sure work is not limited to libraries
  • Christina & Arwen: Make a WORKPLAN.md doc that people can review, comment on, etc.
  • set & start short timeline for use cases
    • Christina will look for Islandora use cases as starting point for this
    • Arwen + Christina will review set up then email group for input
    • Group: give input on use case structure then contribute use cases
  • meet again in two weeks, time TBD: Christina will set up / email out
  • look at workflows: follows use cases
  • start looking at how we perform matching: follows use cases