Skip to content

Commit

Permalink
Merge pull request #14 from lsst-sqre/tickets/DM-40815
Browse files Browse the repository at this point in the history
DM-40815: Embed Highwire Press metadata tag
  • Loading branch information
jonathansick authored Sep 20, 2023
2 parents 9ad506a + d765100 commit a5872fe
Show file tree
Hide file tree
Showing 19 changed files with 584 additions and 16 deletions.
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,8 @@ clean:
rm -rf docs/_build
rm -rf docs/api
rm -f demo/_build

.PHONY: demo
demo:
npm run build
tox run -e demo
9 changes: 9 additions & 0 deletions changelog.d/20230920_111916_jsick_DM_40815.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
### New features

- Include common metadata in the technote HTML:

- Standard HTML meta tags like `description` and `canonical` URL link rel.
- Highwire Press meta tags (used by Google Scholar)
- OpenGraph meta tags (used by social media and messaging apps)
- microformats2 annotations on relevant elements
- Custom data attributes on relevant elements (the link to the technote source repository)
8 changes: 5 additions & 3 deletions docs/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
User guide
##########

.. .. toctree::
.. :maxdepth: 2
.. .. :titlesonly:
.. toctree::
:maxdepth: 2
:titlesonly:

metadata
68 changes: 68 additions & 0 deletions docs/user-guide/metadata.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
##############################
Metadata published by technote
##############################

Technote publishes metadata with HTML documents.
This metadata can be used for a number of purposes, from search engine optimization, to inclusion in Google Scholar, unfurling in social media and message apps, and even for maintaining institutional documentation indices.
Technote uses supports a number of metadata standards, including Highwire Press, Open Graph, microformats2, and custom element annotations with data attributes.
This page describes the metadata that Technote publishes.

Standard HTML metadata
======================

Technote publishes standard HTML metadata:

- ``meta name="title"`` is the document's title (h1 heading).
- ``meta name="description"`` is the document's description derived from the ``abstract`` directive.
- ``meta name="generator"`` is the name of the software that generated the document. Example: ``<meta name="generator" content="technote 1.0.0: https://technote.lsst.io">``.
- ``link ref="canonical"`` is the canonical URL of the document, derived from the ``canonical_url`` field in a document's ``technote.toml`` configuration file.

Highwire Press metadata
=======================

Google Scholar uses Highwire Press metadata to index literature.
Technote publishes the following ``meta`` tags:

- ``citation_title``
- ``citation_author``
- ``citation_author_institution``
- ``citation_author_email``
- ``citation_author_orcid``
- ``citation_date``
- ``citattion_doi``
- ``citation_technical_report_number``
- ``citation_fulltext_html_url``

OpenGraph metadata
==================

Social media and messaging apps use OpenGraph metadata to unfurl links.
Technote publishes the following ``meta`` tags:

- ``og:title``
- ``og:description``
- ``og:url``
- ``og:type`` (always ``article``)
- ``og:article:author``
- ``og:article:published_time``
- ``og:article:modified_time``

microformats2 metadata
======================

microformats2 is a standard for annotating HTML element that reflect standard document metadata.
The annotations are published as ``class`` attributes on HTML elements.

- ``h-entry`` is applied to the container element for the document (including sidebars).
- ``e-content`` is applied to the container element for the document's content.
- ``p-summary`` is applied to the abstract's container section.
- ``p-author`` is applied to the name of each author.
- ``dt-updated`` is applied to the date element of the last update.
- ``dt-published`` is applied to the date element of the original publication date.

Element data attributes
=======================

For on-page metadata that is not covered by the standards above, Technote annotates on-page metadata as data attributes on HTML elements.

- ``data-technote-source-url`` is set to the URL of the source repository for the document (e.g. on GitHub). This data attribute is applied to the ``a`` element that links to the source repository.
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ dev = [
# Test depedendencies for analyzing HTML output
"lxml",
"cssselect",
"mf2py",
# Documentation
"documenteer[guide] @ git+https://github.com/lsst-sqre/documenteer@main",
"autodoc_pydantic",
Expand Down
56 changes: 51 additions & 5 deletions src/technote/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
from dataclasses import dataclass
from datetime import date
from enum import Enum
from importlib.metadata import PackageNotFoundError, version
from pathlib import Path
from typing import Self
from urllib.parse import urlparse
Expand All @@ -37,6 +38,8 @@
)
from sphinx.errors import ConfigError

from .metadata.highwire import HighwireMetadata
from .metadata.opengraph import OpenGraphMetadata
from .metadata.orcid import validate_orcid_url
from .metadata.orcid import verify_checksum as verify_orcid_checksum
from .metadata.ror import validate_ror_url
Expand Down Expand Up @@ -648,17 +651,14 @@ def title(self) -> str:
return self._content_title

@property
def abstract(self) -> str:
def abstract(self) -> str | None:
"""The technote's unformatted abstract.
This content is extracted from the ``abstract`` directive, and all
markup is removed as part of that process. This attribute can be used
for populating summary tags in the HTML header.
"""
if self._content_abstract:
return self._content_abstract
else:
return "N/A"
return self._content_abstract

@property
def date_updated_iso(self) -> str | None:
Expand All @@ -668,11 +668,24 @@ def date_updated_iso(self) -> str | None:
else:
return None

@property
def date_created_iso(self) -> str | None:
"""The date of initial publication, as an ISO 8601 string."""
if self.toml.technote.date_created:
return self._format_iso_date(self.toml.technote.date_created)
else:
return None

@property
def version(self) -> str | None:
"""The version, as a string if available."""
return self.toml.technote.version

@property
def canonical_url(self) -> str | None:
"""The canonical URL of the technote, if available."""
return str(self.toml.technote.canonical_url)

@property
def github_url(self) -> str | None:
"""The GitHub repository URL."""
Expand Down Expand Up @@ -736,3 +749,36 @@ def set_abstract(self, abstract: str) -> None:
def _format_iso_date(self, date: date) -> str:
"""Format a date in ISO 8601 format."""
return date.isoformat()

@property
def highwire_metadata_tags(self) -> str:
"""The Highwire metadata tags for the technote."""
highwire = HighwireMetadata(
metadata=self.toml.technote,
title=self.title,
abstract=self.abstract,
)
return highwire.as_html()

@property
def opengraph_metadata_tags(self) -> str:
"""The OpenGraph metadata tags for the technote."""
og = OpenGraphMetadata(
metadata=self.toml.technote,
title=self.title,
abstract=self.abstract,
)
return og.as_html()

@property
def generator_tag(self) -> str:
"""A meta name=generator tag to identify the version of technote."""
try:
tn_version = version("technote")
except PackageNotFoundError:
# package is not installed
tn_version = "0.0.0"
return (
f'<meta name="generator" content="technote {tn_version}: '
'https://technote.lsst.io" >'
)
4 changes: 3 additions & 1 deletion src/technote/ext/abstract.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@ def visit_abstract_node_html(
self: HTML5Translator, node: nodes.Element
) -> None:
"""Add HTML content before the `AbstractNode`."""
self.body.append('<section class="technote-abstract" id="abstract">')
self.body.append(
'<section class="technote-abstract p-summary" id="abstract">'
)
self.body.append('<h2 class="technote-abstract__header">Abstract</h2>')


Expand Down
122 changes: 122 additions & 0 deletions src/technote/metadata/highwire.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
"""Support for the Hirewire schema for academic metadata HTML in HTML."""

from __future__ import annotations

from typing import TYPE_CHECKING

from .metatagbase import MetaTagFormatterBase

if TYPE_CHECKING:
from technote.config import TechnoteTable


class HighwireMetadata(MetaTagFormatterBase):
"""A class that transforms technote metadata into Highwire metadata
tags.
Notes
-----
Resources for learning about Highwire metadata tags:
- https://cheb.hatenablog.com/entry/2014/07/25/002548#f-c017c3cf
- https://scholar.google.com/intl/en/scholar/inclusion.html#indexing
"""

def __init__(
self,
*,
metadata: TechnoteTable,
title: str,
abstract: str | None = None,
) -> None:
self._metadata = metadata
self._title = title
self._abstract = abstract

@property
def tag_attributes(self) -> list[str]:
"""The names of class properties that create tags."""
return [
"title",
"author_info",
"date",
"doi",
"technical_report_number",
"html_url",
]

@property
def title(self) -> str:
"""The title metadata."""
return f'<meta name="citation_title" content="{ self._title }">'

@property
def author_info(self) -> list[str]:
"""The author metadata.
Each author is represented with these tags:
- ``citation_author``
- ``citation_author_institution``
- ``citation_author_email``
- ``citation_author_orcid``
"""
authors = self._metadata.authors
author_tags: list[str] = []
for author in authors:
author_tags.append(
self._format_tag("author", author.name.plain_text_name)
)
affil_tags = [
self._format_tag("author_institution", affiliation.name)
for affiliation in author.affiliations
if affiliation.name is not None
]
author_tags.extend(affil_tags)
if author.email is not None:
author_tags.append(
self._format_tag("author_email", author.email)
)
if author.orcid is not None:
author_tags.append(
self._format_tag("author_orcid", str(author.orcid))
)
return author_tags

@property
def date(self) -> str | None:
"""The ``citation_date`` metadata tag."""
if self._metadata.date_updated is None:
return None
iso8601_date = self._metadata.date_updated.isoformat()
return self._format_tag("date", iso8601_date)

@property
def doi(self) -> str | None:
"""The ``citation_doi`` metadata tag."""
if self._metadata.doi is None:
return None
return self._format_tag("doi", str(self._metadata.doi))

@property
def technical_report_number(self) -> str | None:
"""The ``citation_technical_report_number`` metadata tag."""
if self._metadata.id is None:
return None
return self._format_tag("technical_report_number", self._metadata.id)

@property
def html_url(self) -> str | None:
"""The ``citation_fulltext_html_url`` metadata tag."""
if self._metadata.canonical_url is None:
return None
return self._format_tag(
"fulltext_html_url", str(self._metadata.canonical_url)
)

def _format_tag(self, name: str, content: str) -> str:
"""Format a Highwire metadata tag."""
return (
f'<meta name="citation_{ name }" content="{ content }" '
f'data-highwire="true">'
)
38 changes: 38 additions & 0 deletions src/technote/metadata/metatagbase.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
"""Support for generating HTML meta tags."""

from __future__ import annotations

from abc import ABC, abstractmethod


class MetaTagFormatterBase(ABC):
"""A base class for generating HTML meta tags."""

def __str__(self) -> str:
"""Create the Highwire metadata tags."""
return self.as_html()

@property
@abstractmethod
def tag_attributes(self) -> list[str]:
"""The names of class properties that create tags."""
raise NotImplementedError

def as_html(self) -> str:
"""Create the Highwire metadata HTML tags."""
tags: list[str] = []
for prop in self.tag_attributes:
self.extend_not_none(tags, getattr(self, prop))
return "\n".join(tags) + "\n"

@staticmethod
def extend_not_none(
entries: list[str], new_item: None | str | list[str]
) -> None:
"""Extend a list with new items if they are not None."""
if new_item is None:
return
if isinstance(new_item, str):
entries.append(new_item)
else:
entries.extend(new_item)
Loading

0 comments on commit a5872fe

Please sign in to comment.