Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atom with embedded schema.org info #10

Closed
hober opened this issue Mar 2, 2020 · 8 comments
Closed

Atom with embedded schema.org info #10

hober opened this issue Mar 2, 2020 · 8 comments

Comments

@hober
Copy link

hober commented Mar 2, 2020

Hi,

@dbaron, @hadleybeeman, @torgo, and I looked at this in the @w3ctag Wellington F2F as requested in w3ctag/design-reviews#477. Thank you so much for bringing this to our attention.

In your explainer, you dismiss using MediaRSS because "it does not give us enough detail about whether the content is a TV show, movie as well as the structure around a TV series and seasons." Instead, you propose a new, JSON-LD-based feed format whose basic structure resembles existing feed formats like RSS and Atom, which embeds schema.org information about the media, and you propose an addition to the web app manifest to link to the feed.

Given that it's already possible to embed schema.org information like this in Atom, which is a widely-implemented feed format supported by all major podcast players, and to use <link rel=feed> to associate an Atom feed with a web app — it's not yet clear to us what the benefits of a new format would be.

In general, adding new formats to the web is something we should only ever do with caution. Rolling out support for the format in multiple implementations takes time; multiple implementations of the parser, serializer, etc. can be error-prone and lack interoperability, etc. And getting developers to change their behavior can take a long time.

We'd like to hear your thoughts on this... particularly what this new format might enable that the existing Atom/schema.org ecosystem can't already express? We'd like to help you achieve your aims here, but are also keen to help you avoid unnecessary frustration.

@hadleybeeman
Copy link

Adding in @danbri for schema.org thoughts too.

@beccahughes
Copy link
Owner

Is there more information about embedding schema.org in atom? I tried searching but could not find anything.

@hober
Copy link
Author

hober commented Mar 2, 2020

Is there more information about embedding schema.org in atom? I tried searching but could not find anything.

I think you could express schema.org metadata in Atom using either RDF/XML or RDFa; here's one way to express RDFa in Atom. That said, I'm sure @danbri knows more about this than I do.

@beccahughes
Copy link
Owner

I don't think Atom/RSS is the right solution here. Browsers do not have native support for Atom/RSS + RDFa which is going to be harder to implement than JSON-LD which is an extension of JSON which is not a new format a browsers already have pretty solid support for. JSON-LD schema.org feeds are already used for server to server feeds and we want to be consistent with these (https://developers.google.com/actions/media/reference/feed-examples/watch-actions-examples#base_case). Nobody uses RDFa in Atom/RSS whereas most sites that will implement Media Feeds are already providing these feeds for Media Actions. Podcasts is also not relevant since this is focused on video.

JSON-LD is also used everywhere on the web already and is used by search engines: https://developers.google.com/search/docs/guides/intro-structured-data

There is also an open PR to get this to use a tag instead: https://github.com/beccahughes/media-feeds/pull/8/files

@danbri
Copy link

danbri commented Mar 2, 2020

short version

Thanks for copying me in. Excuse the long notes. For context, I am responsible for Schema.org; I work at Google, and I was involved in the RDF and RSS efforts back in the day.

I am sympathetic to the TAG concern here, particularly if there are specific cases of a feed-reader app that wants to use this data. On balance I think we'll find the JSON-LD path much less painful. I recommend going with JSON-LD unless someone wants to put a serious amount of time into getting adoption for structured data graphs (more or less RDF) inside RSS/Atom feeds.

too-long version

The idea of having rich structured data graphs using additional vocabularies within RSS (and later Atom) feeds has been around since 1998 or so. In 2005 I put some time into the possibility of Atom and RDF having a converged syntax. This is also essentially the RSS 1.0 vision from 2000, where we hoped feeds (in RDF/XML) would be augmented with rich extension data (e.g. job search example). I would love to see these kinds of approaches to feeds come back to life, but history doesn't seem to have gone there.

a few points -

  1. the terms like TVEpisode, Organization, Movie etc from Schema.org (mostly types and properties) are essentially part of an RDF schema. This means they can very easily be used within syntaxes like JSON-LD and RDFa 1.1 which are explicitly built for encoding RDFish structured data graphs. It can also be used with Microdata, which is close enough to the same underlying approach to data. It could also be used with trivial ease in syntaxes like the near-obsolete RDF/XML, N-Triples (a line-oriented data-dump format), or Turtle; I wouldn't recommend any of those here. Any of those could be put inside Atom; but without picking an encoding, saying "put it in Atom" isn't enough.

already possible to embed schema.org information like this in Atom,

Atom lets you include information from extensions beyond Atom but as far as I recall the group never ultimately blessed any particular extension syntax. It was much discussed at the time, since Atom evolved out of RSS which had a long and painful history of being "kinda sorta" RDF. I am not aware of much established practice for putting RDF extension blocks inside Atom, but the choice would basically be: use an existing syntax (JSON-LD, RDFa, Microdata, Turtle, N-Triples, RDF/XML etc.), or invent one. The latter would be more work than it sounds and make nobody happy. The non-XML formats would look out of place; the XML-formats would look retro. RDFa and Microdata would also look out of place in that they're at the best when annotating existing HTML marked up content. But maybe something could be cooked up.

There is no clear winner for a non-JSON syntax to put RDF inside Atom. Microdata from HTML5 pretty much eclipsed RDFa 1.0 in the 2011-2014 period. RDFa 1.1 (Lite) was a response to that but the rise of JSON-LD (in a search engine context at least) eclipsed both of these. At Google in Search we cautiously switched our default syntax recommendation from Microdata to JSON-LD in the light of very positive feedback from publishers/webmasters and developers who found JSON-LD much more practical and readable. It would be weird to go back to RDFa or Microdata. Similarly, elsewhere around W3C there are proposals to declare RDF/XML obsolete and rescind the REC. Whether or not that happens, it is rarely considered a modern choice for an RDF syntax.

I would also caution against trying to pick out the subset of schema.org JSON you think publishers and consumers will want and fix the syntax to just encode that. Doing so means that the feed format maintainers will need to keep revising their spec to account for evolved schemas and new requirements; again, a lot of work. (Similarly, don't auto-generate java code from schema.org's schema collection like some folks; you run into similar maintenance problems).

  1. On Schema.org in JSON-LD...

While the specific usecase proposed might look a bit alien and new to folk here, the combination of JSON-LD within HTML, using Schema.org for vocabulary, is very very mainstream in terms of web markup for Search. Mainstream in the sense of JSON-LD + Schema.org being on 10s of millions of sites, and also commonly supported in CMS themes/addons and SEO tooling; for example see https://yoast.com/yoast-seo-13-1/

For the very specific combination of vocabulary terms (TVEpisode, TVSeason, Organization etc), and graph data shape in this proposal, it is most heavily seen in non-public feeds. At Google it is how we get information from streaming media services about their catalogues.

That said you can also see some of the same terms used in various sites, e.g. here are a few examples using the TV-related terms:

<script type="application/ld+json">{
  "@context": "http://schema.org",
  "@type": "TVEpisode",
  "url": "/title/tt9916118/",
  "name": "Women Take Center Stage",
  "image": "https://m.media-amazon.com/images/M/MV5BOTllNTE2YWUtOTU1Ni00Zjg5LWI0MmItY2U1MTUyZWMyYzZjXkEyXkFqcGdeQXVyODg3NDc1OTE@._V1_.jpg",
  "genre": "Reality-TV",
  "datePublished": "2019-03-17",
  "trailer": {
    "@type": "VideoObject",
    "name": "TOTAL BELLAS: Women Take Center Stage",
    "embedUrl": "/video/imdb/vi295615513",
    "thumbnail": {
      "@type": "ImageObject",
      "contentUrl": "https://m.media-amazon.com/images/M/MV5BY2ExNTUwYmItOGIxNi00ZjVjLTk2OTktMTZhZjZmNzc4ZTgyXkEyXkFqcGdeQXRyYW5zY29kZS13b3JrZmxvdw@@._V1_.jpg"
    },
    "thumbnailUrl": "https://m.media-amazon.com/images/M/MV5BY2ExNTUwYmItOGIxNi00ZjVjLTk2OTktMTZhZjZmNzc4ZTgyXkEyXkFqcGdeQXRyYW5zY29kZS13b3JrZmxvdw@@._V1_.jpg",
    "description": "TOTAL BELLAS: Women Take Center Stage",
    "uploadDate": "2019-04-22T20:28:59Z"
  }
}</script>

For "VideoObject" markup, see lots of sites. Every YouTube video page, for example, has (JS-injected) schema.org JSON-LD markup. Example.

@danbri
Copy link

danbri commented Mar 3, 2020

I realize some of my comments above are backed up by my privileged access to Google Search crawl data. For alternative evidence about the "very very mainstream" nature of JSON-LD (within HTML) markup, see also Web Data Commons, e.g. https://lists.w3.org/Archives/Public/semantic-web/2020Jan/0018.html for their most recent summary. It is from a relatively limited crawl but should give some sense for what's out there, and the trends they're seeing.

@dbaron
Copy link

dbaron commented May 27, 2020

In #10 (comment) @beccahughes wrote:

Browsers do not have native support for Atom/RSS + RDFa which is going to be harder to implement than JSON-LD which is an extension of JSON which is not a new format a browsers already have pretty solid support for.

I think whether this is true depends a lot on exactly what is specified. Browsers all have support for JSON, but I'm not aware of any browsers currently having support for the RDF data model that JSON-LD represents or support for turning JSON-LD into that data model. So I think the balance here on which is more complexity to add to browsers depends a lot on whether the specification requires JSON processing or JSON-LD processing of this JSON feed file.

@beccahughes
Copy link
Owner

The spec is inspired by JSON-LD but browsers only need to implement JSON processing and then the added Media Feed processing on top (which is covered by the spec). It does not require the browser to implement JSON-LD or RDF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants