Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cap on Observation #71

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions docs/data/evidence-records.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,15 +34,15 @@ An Action contains information about the thing that happened. This corresponds t

### Observations

Each Action contains one or more observations. Because there are a diverse range of types of input data in Event Data, different observations can be made. For example each of these is an observation:
Each Action contains one or more Observations. Because there are a diverse range of types of input data in Event Data, different Observations can be made. For example each of these is an Observation:

- The text of a tweet (which may contain plaintext DOIs).
- The automatically extracted URLs from a tweet (which could be DOIs or Article Landing Pages).
- The URL of a blog post from an RSS newsfeed (which must be visited to see the content of the blog post).

As you can see from the tweet example, it's possible to make different observations of different types about the same input. In some cases, such as Twitter, we do not have the permission to include the text, so it is removed before the Evidence Record is saved. In this case, the `sensitive` flag on the Evidence Record is set to `true` and a SHA1 hash of the content is included. This means that if you want to verify the Evidence Record you can retrieve the tweet text yourself from Twitter and compare the hash. If the hash matches, you know you were working from the same input text as the Agent.
As you can see from the tweet example, it's possible to make different Observations of different types about the same input. In some cases, such as Twitter, we do not have the permission to include the text, so it is removed before the Evidence Record is saved. In this case, the `sensitive` flag on the Evidence Record is set to `true` and a SHA1 hash of the content is included. This means that if you want to verify the Evidence Record you can retrieve the tweet text yourself from Twitter and compare the hash. If the hash matches, you know you were working from the same input text as the Agent.

The following observation types are available:
The following Observation types are available:

- `plaintext` - some text that could contain plain text DOIs, DOI URLs or landing page URLs
- `html` - some HTML that could contain plain text DOIs, DOI URLs or landing page URLs
Expand All @@ -61,11 +61,11 @@ See [Duplication and Redundancy](/data/duplication) for further discussion.

### Observations to Candidates to Matches to Events

Each observation may or may not ultimately yield Events.
Each Observation may or may not ultimately yield Events.

The first step is to create a set of Candidates for each observation. For example some `plaintext` may contain something that looks like a DOI and something that looks like an Article Landing Page URL. The webpage at the end of a `content-url` may contain something that looks like an Article Landing Page in the HTML of that page.
The first step is to create a set of Candidates for each Observation. For example some `plaintext` may contain something that looks like a DOI and something that looks like an Article Landing Page URL. The webpage at the end of a `content-url` may contain something that looks like an Article Landing Page in the HTML of that page.

The set of available candidate types are:
The set of available Candidate types are:

- `doi-url` - a full DOI URL
- `pii` - a Publication Item Identifier
Expand Down