Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured Data 2024 Chapter #3811

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft

Conversation

cyberandy
Copy link
Contributor

@cyberandy cyberandy commented Oct 25, 2024

@tunetheweb tunetheweb changed the title Draft Update Structured Data Chapter for 2024 and Add Images (WIP) Structured Data 2024 Chapter for 2024 Oct 25, 2024
@@ -8,7 +8,7 @@
STATIC_DIR = ROOT_DIR + "/static"

SUPPORTED_YEARS = []
DEFAULT_YEAR = "2022"
DEFAULT_YEAR = "2024"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this before merging. Just to get it to run for now.

@@ -4,7 +4,7 @@ const convert = require('xml-js');

const { get_yearly_configs } = require('../generate/shared');

const default_year = 2022;
const default_year = 2024;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this before merging. Just to get it to run for now.

@tunetheweb tunetheweb added the writing Related to wording and content label Oct 28, 2024
@tunetheweb tunetheweb changed the title Structured Data 2024 Chapter for 2024 Structured Data 2024 Chapter Oct 30, 2024
Copy link

@capjamesg capjamesg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left several comments across the document. If you need any clarification, let me know.

I am excited to see so much data available, and look forward to seeing the final report come together!


## The Expanding Landscape of Structured Data

Over the past 18 months, there have been significant changes in the structured data landscape. In 2023, Google deprecated rich results for FAQs and HowTos from its search engine results pages (SERP). In November 2024 Google will also remove the Sitelinks Search Box from search results starting. However, in parallel, there has been a new wave of innovation and expansion in using structured data from both Google and Bing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. **New Structured Data Types**: Google introduced several new types, including Vehicle listings, Course info, Vacation Rentals, and 3D Models for products. Also on the eCommerce space Google has integrated loyalty programs into its structured data offerings, particularly through the Merchant Center and schema.org.
2. **Enhanced Existing Types**: Improvements to organization data, product variants, and the introduction of discount-rich results.
3. **Structured Data Carousels**: The beta launch of structured data carousels, combining ItemList with other types, opens new content presentation possibilities on Google’s SERP.
4. **GS1 Integrations**: There has been increased support for GS1 standards such as the GS1 Digital Link, which aims to bridge the gap between physical and digital product information. This technology enables manufacturers and retailers to connect physical products to their digital identities through QR codes. When scanned, these codes provide access to comprehensive product information, enhancing transparency and customer engagement. Also the gs1:CertificationDetails property has been officially adopted by Google as schema:Certification, demonstrating how industry-specific extensions can successfully influence and integrate with schema.org standards.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a point on use of / growing awareness of semantic data outside of search applications here.

Relevant resources:

While rel=me support may have been added to Mastodon a while ago (i.e. see this discussion from 2022 about using rel=me with Mastodon and a Ghost website: https://forum.ghost.org/t/verifying-mastodon-account-with-rel-me/34227), Verification is a known feature that many Mastodon users support.


## Beyond Traditional Implementation

As the structured data ecosystem matures, we're witnessing a diversification in implementation strategies:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment on rel=me support, which adds to a broader theme of using structured data to enrich social web applications / two-way identity verification.


The rise of generative AI and advanced machine learning has further underscored the importance of structured data:

* **Fact Validation**: Structured data provides a reliable source for AI systems to verify information and combat misinformation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend saying "parsable" instead of "reliable":

Structured data provides a parsable source for AI systems to verify information and combat misinformation.

Data being structured doesn't mean that it is reliable; rather, it means the data can be processed by a computer more efficiently / accurately.


Linked data remains a cornerstone of structured data. We create an interconnected web of information by adding structured data to web pages and providing URI links to referenced entities. This contributes to the semantic web, where data is linked through the Resource Description Framework (RDF), enabling machines to treat web pages as databases.

The concept of semantic triples (subject-predicate-object) continues to be fundamental in expressing relationships between entities. While [SPARQL](https://en.wikipedia.org/wiki/SPARQL) remains helpful for querying RDF data, the focus has shifted towards more accessible ways of leveraging this linked data structure, such as [GraphQL](https://en.wikipedia.org/wiki/GraphQL).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought SPARQL and GraphQL were addressing different needs? In my mind, SPARQL is for querying graph data / triples, whereas GraphQL is a data modeling layer that can be added on top of a back-end / database.


## JSON-LD

**JSON-LD types** continue to show diverse implementation patterns across websites, with the **WebSite** schema leading adoption at **12.73%** of mobile pages, significantly higher than other types. **Organization** and **LocalBusiness** types maintain a strong presence at **7.16%** and **3.97%**, respectively, reflecting their importance in establishing **entity identity**.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "diverse implementation patterns" mean? We should be explicit. For example:

JSON-LD is widely used across the websites we analyzed, with varied types of data implemented from WebSite to Organization.


![A year on year comparison of JSON-LD usage on mobile pages][image31]

The consistency in implementation across devices indicates a mature approach to structured data deployment, where developers are ensuring uniform markup regardless of the target platform. This alignment between mobile and desktop implementations suggests that organizations are following best practices for responsive design while maintaining consistent structured data across all user experiences.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did HowTo/FAQ publishing rates drop after Google's policy change?


Most notably, these patterns indicate that **structured data implementation is moving beyond simple SEO markup toward creating true knowledge graphs** that can support AI-powered search experiences and rich data integrations.

![][image33]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: There are no images here.


The structured data landscape is rapidly evolving, marked by Google's introduction of specialized schemas for vehicles, courses, and 3D product models, alongside increased support for Digital Product Passports through GS1 Digital Link. The growing adoption of JSON-LD (now at 41% of pages) and sophisticated entity relationships through sameAs properties indicates a maturing ecosystem focused on comprehensive knowledge graph development.

The data shows a clear shift toward more specialized implementation patterns, particularly in e-commerce and local business contexts. Entity disambiguation has become increasingly critical, with organizations leveraging structured data to establish clear digital identities across platforms and knowledge bases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The data shows a clear shift toward more specialized implementation patterns" -- to what data does this refer? Perhaps mention that general publishing methods like FAQ/HowTo are no longer supported by Google, and that newly added methods are for more domain-specific tasks.

---

## Introduction

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure where to suggest new data to analyze, but I would be curious about use of all rel= values, especially:

  • rel=me
  • rel=search

Also, as a future trend, it may be worth mentioning https://blog.joinmastodon.org/2024/07/highlighting-journalism-on-mastodon/. The feature, while marketed towards journalists, is not exclusively for journalists. It can be used by any web publisher. Here is an example of someone who publishes the fediverse:creator attribute, for reference: https://www.benji.dog/


## Structured Data

The landscape of structured data implementation continues to evolve, with RDFa and Open Graph maintaining their dominant positions, now reaching 66% and 64% of pages respectively. X (Twitter) meta tags have shown significant growth, appearing on 45% of pages, while JSON-LD usage has expanded to 41% of pages, reflecting its growing adoption for structured data implementation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my big gripes with this report over the years has been the conflation of syntax/encoding with schema. OGP is actually RDFa. It's a vocabulary of RDFa and even specifies a RDF schema. You could encode schema.org this way (and it's weirdly possible some of Google systems would read it, we just convert them to triples either way). Similarly, you could put OGP in JSON-LD. Yes, most consumers probably wouldn't read it, but it's orthogonal technically. Dublin core can be encoded several ways but you're probably talking about RDFa as well.

Things that I think are syntax/encoding:
JSON-LD
Microdata
RDFa
HEAD data (non RDFa meta tag usage like Twitter cards)
Class metadata (if you want to give a name to whatever microformats does)

Things that I feel are vocabulary:
schema.org
OGP
Twitter
dublin core
Microformats

I'm aware that this has been done like this for awhile and so there is value in some consistency with regards to over-time graphs. And I agree it would be silly to not recognize that no one is using class metadata encoding except microformats so maybe that doesn't need to be discussed separately.

But this text often compares them side by side and that's the part I feel like can be shifted a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
writing Related to wording and content
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Structured Data 2024
4 participants