diff --git a/README.md b/README.md index bf604613..385718da 100644 --- a/README.md +++ b/README.md @@ -334,184 +334,9 @@ Documentation: [djangotemplates]: https://docs.djangoproject.com/en/4.2/topics/templates/ -## Importing the existing legal tool text - -> :warning: **This section should no longer be required and will eventually be -> moved to a better location.** - -Note that once the site is up and running in production, the data in the site -will become the canonical source, and the process described here should not -need to be repeated after that. - -The implementation is the Django management command `load_html_files`, which -reads from the legacy HTML legal code files in the -[creativecommons/cc-legal-tools-data][repodata] repository, and populates the -database records and translation files. - -`load_html_files` uses [BeautifulSoup4][bs4docs] to parse the legacy HTML legal -code: -1. `import_zero_license_html()` for CC0 Public Domain tool - - HTML is handled specifically (using tag ids and classes) to populate - translation strings and to be used with specific HTML formatting when - displayed via template -2. `import_by_40_license_html()` for 4.0 License tools - - HTML is handled specifically (using tag ids and classes) to populate - translation strings and to be used with specific HTML formatting when - displayed via a template -3. `import_by_30_unported_license_html()` for unported 3.0 License tools - (English-only) - - HTML is handled specifically to be used with specific HTML formatting - when displayed via a template -4. `simple_import_license_html()` for everything else - - HTML is handled generically; only the title and license body are - identified. The body is stored in the `html` field of the - `LegalCode` model - -[bs4docs]: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ -[repodata]: https://github.com/creativecommons/cc-legal-tools-data - - -### Import Process - -> :warning: **This section should no longer be required and will eventually be -> moved to a better location.** - -This process will read the HTML files from the specified directory, populate -`LegalCode` and `Tool` models, and create the `.po` portable object Gettext -files in [creativecommons/cc-legal-tools-data][repodata]. - -1. Ensure the [Data Repository](#data-repository), above, is in place -2. Ensure [Docker Compose Setup](#docker-compose-setup), above, is complete -3. Clear data in the database - ```shell - docker compose exec app ./manage.py clear_license_data - ``` -4. Load legacy HTML in the database - ```shell - docker compose exec app ./manage.py load_html_files - ``` -5. Optionally (and only as appropriate): - 1. Commit the `.po` portable object Gettext file changes in - [creativecommons/cc-legal-tools-data][repodata] - 2. [Translation Update Process](#translation-update-process), below - 3. [Generate Static Files](#generate-static-files), below - -[repodata]:https://github.com/creativecommons/cc-legal-tools-data - - -### Import Dependency Documentation - -> :warning: **This section should no longer be required and will eventually be -> moved to a better location.** - -- [Beautiful Soup Documentation — Beautiful Soup 4 documentation][bs4docs] - - [lxml - Processing XML and HTML with Python][lxml] -- [Quick start guide — polib documentation][polibdocs] - -[bs4docs]: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ -[lxml]: https://lxml.de/ -[polibdocs]: https://polib.readthedocs.io/en/latest/quickstart.html - - ## Translation -To upload/download translation files to/from Transifex, you'll need an account -there with access to these translations. Then follow the [Authentication - -Transifex API v3][transauth]: to get an API token, and set -`TRANSIFEX["API_TOKEN"]` in your environment with its value. - -The [creativecommons/cc-legal-tools-data][repodata] repository must be cloned -next to this `cc-legal-tools-app` repository. (It can be elsewhere, then you -need to set `DATA_REPOSITORY_DIR` to its location.) Be sure to clone using a -URL that starts with `git@github...` and not `https://github...`, or you won't -be able to push to it. Also see [Data Repository](#data-repository), above. - -In production, the `check_for_translation_updates` management command should be -run hourly. See [Check for Translation -Updates](#check-for-translation-updates), below. - -Also see [Publishing changes to git repo](#publishing-changes-to-git-repo), -below. - -[Babel][babel] is used for localization information. - -Documentation: -- [Babel — Babel documentation][babel] -- [Translation | Django documentation | Django][djangotranslation] - -[babel]: http://babel.pocoo.org/en/latest/index.html -[repodata]:https://github.com/creativecommons/cc-legal-tools-data -[transauth]: https://transifex.github.io/openapi/index.html#section/Authentication - - -### How the tool translation is implemented - -Django Translation uses two sets of Gettext Files in the -[creativecommons/cc-legal-tools-data][repodata] repository (the [Data -Repository](#data-repository), above). See that repository for detailed -information and definitions. - -Documentation: -- [Translation | Django documentation | Django][djangotranslation] -- Transifex API - - [Introduction to API 3.0 | Transifex Documentation][api30intro] - - [Transifex API v3][api30] - - Python SDK: [transifex-python/transifex/api][apisdk] - -[api30]: https://transifex.github.io/openapi/index.html#section/Introduction -[api30intro]: https://docs.transifex.com/api-3-0/introduction-to-api-3-0 -[apisdk]: https://github.com/transifex/transifex-python/tree/devel/transifex/api -[djangotranslation]: https://docs.djangoproject.com/en/4.2/topics/i18n/translation/ -[repodata]: https://github.com/creativecommons/cc-legal-tools-data - - -### Check for Translation Updates - -> :warning: **This functionality is currently disabled.** - -The hourly run of `check_for_translation_updates` looks to see if any of the -translation files in Transifex have newer last modification times than we know -about. It performs the following process (which can also be done manually: - -1. Ensure the [Data Repository](#data-repository), above, is in place -2. Within the [creativecommons/cc-legal-tools-data][repodata] (the [Data - Repository](#data-repository)): - 1. Checkout or create the appropriate branch. - - For example, if a French translation file for BY 4.0 has changed, the - branch name will be `cc4-fr`. - 2. Download the updated `.po` portable object Gettext file from Transifex - 3. Do the [Translation Update Process](#translation-update-process) (below) - - _This is important and easy to forget,_ but without it, Django will - keep using the old translations - 4. Commit that change and push it upstream. -3. Within this `cc-legal-tools-app` repository: - 1. For each branch that has been updated, [Generate Static - Files](#generate-static-files) (below). Use the options to update git and - push the changes. - -[repodata]:https://github.com/creativecommons/cc-legal-tools-data - - -### Check for Translation Updates Dependency Documentation - -- [GitPython Documentation — GitPython documentation][gitpythondocs] -- [Requests: HTTP for Humans™ — Requests documentation][requestsdocs] - -[gitpythondocs]: https://gitpython.readthedocs.io/en/stable/index.html -[requestsdocs]: https://docs.python-requests.org/en/master/ - - -### Translation Update Process - -This Django Admin command must be run any time the `.po` portable object -Gettext files are created or changed. - -1. Ensure the [Data Repository](#data-repository), above, is in place -2. Ensure [Docker Compose Setup](#docker-compose-setup), above, is complete -3. Compile translation messages (update the `.mo` machine object Gettext files) - ```shell - docker compose exec app ./manage.py compilemessages - ``` +See [`docs/translation.md`](docs/translation.md) ## Generate Static Files @@ -557,6 +382,11 @@ the full path to that deploy key file. [lxml]: https://lxml.de/ +## Machine/metadata layer: RDF/XML + +For details and history, see [`docs/rdf.md`](docs/rdf.md). + + ## Licenses diff --git a/docs-outdated/format_techspec.sh b/docs/_ARCHIVED/format_techspec.sh similarity index 100% rename from docs-outdated/format_techspec.sh rename to docs/_ARCHIVED/format_techspec.sh diff --git a/docs-outdated/index_rdf_data_issues.txt b/docs/_ARCHIVED/index_rdf_data_issues.txt similarity index 100% rename from docs-outdated/index_rdf_data_issues.txt rename to docs/_ARCHIVED/index_rdf_data_issues.txt diff --git a/docs/_ARCHIVED/load_html_import.md b/docs/_ARCHIVED/load_html_import.md new file mode 100644 index 00000000..41ceb624 --- /dev/null +++ b/docs/_ARCHIVED/load_html_import.md @@ -0,0 +1,77 @@ +## Helper Scripts + +Best run before every commit: +- `./dev/20231009_concatenatemessages.sh` - Concatenate legacy ccEngine + translations into cc-legal-tools-app + + +## Importing the existing legal tool text + +Note that once the site is up and running in production, the data in the site +will become the canonical source, and the process described here should not +need to be repeated after that. + +The implementation is the Django management command +`20231010_load_html_files.py`, which reads from the legacy HTML legal code +files in the [creativecommons/cc-legal-tools-data][repodata] repository, and +populates the database records and translation files. + +`load_html_files` uses [BeautifulSoup4][bs4docs] to parse the legacy HTML legal +code: +1. `import_zero_license_html()` for CC0 Public Domain tool + - HTML is handled specifically (using tag ids and classes) to populate + translation strings and to be used with specific HTML formatting when + displayed via template +2. `import_by_40_license_html()` for 4.0 License tools + - HTML is handled specifically (using tag ids and classes) to populate + translation strings and to be used with specific HTML formatting when + displayed via a template +3. `import_by_30_unported_license_html()` for unported 3.0 License tools + (English-only) + - HTML is handled specifically to be used with specific HTML formatting + when displayed via a template +4. `simple_import_license_html()` for everything else + - HTML is handled generically; only the title and license body are + identified. The body is stored in the `html` field of the + `LegalCode` model + +[bs4docs]: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ +[repodata]: https://github.com/creativecommons/cc-legal-tools-data + + +### Import Process + +This process will read the HTML files from the specified directory, populate +`LegalCode` and `Tool` models, and create the `.po` portable object Gettext +files in [creativecommons/cc-legal-tools-data][repodata]. + +1. Ensure the Data Repository (see [`../../README.md`](../../README.md) is in + place +2. Ensure Docker Compose Setup (see [`../../README.md`](../../README.md) is + complete +3. Clear data in the database + ```shell + docker compose exec app ./manage.py clear_license_data + ``` +4. Load legacy HTML in the database + ```shell + docker compose exec app ./manage.py load_html_files + ``` +5. Optionally (and only as appropriate): + 1. Commit the `.po` portable object Gettext file changes in + [creativecommons/cc-legal-tools-data][repodata] + 2. Translation Update Process (see [`../translation.md`](../translation.md) + 3. Generate Static Files (see [`../../README.md`](../../README.md) + +[repodata]:https://github.com/creativecommons/cc-legal-tools-data + + +### Import Dependency Documentation + +- [Beautiful Soup Documentation — Beautiful Soup 4 documentation][bs4docs] + - [lxml - Processing XML and HTML with Python][lxml] +- [Quick start guide — polib documentation][polibdocs] + +[bs4docs]: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ +[lxml]: https://lxml.de/ +[polibdocs]: https://polib.readthedocs.io/en/latest/quickstart.html diff --git a/docs-outdated/provisioning.md b/docs/_ARCHIVED/provisioning.md similarity index 100% rename from docs-outdated/provisioning.md rename to docs/_ARCHIVED/provisioning.md diff --git a/docs-outdated/techspec.md b/docs/_ARCHIVED/techspec.md similarity index 100% rename from docs-outdated/techspec.md rename to docs/_ARCHIVED/techspec.md diff --git a/docs-outdated/translation.md b/docs/_ARCHIVED/translation.md similarity index 100% rename from docs-outdated/translation.md rename to docs/_ARCHIVED/translation.md diff --git a/rdf.md b/docs/rdf.md similarity index 72% rename from rdf.md rename to docs/rdf.md index c7f331fd..e1b6f204 100644 --- a/rdf.md +++ b/docs/rdf.md @@ -1,8 +1,48 @@ # RDF/XML +(Return to primary [`../README.md`](../README.md).) + ## Namespaces + +### ccREL Schema + +`schema.rdf` excerpt: +```xml + +``` +| Prefix | Name | URL | +| ------ | ------------------- | ------------------------------------------- | +| `cc` | ccREL | http://creativecommons.org/ns# | +| `owl` | OWL 2 | http://www.w3.org/2002/07/owl# | +| `rdf` | RDF XML Syntax | http://www.w3.org/1999/02/22-rdf-syntax-ns# | +| `rdfs` | RDF Schema | http://www.w3.org/2000/01/rdf-schema# | + + +### Images + +`images.rdf` excerpt: +```xml + +``` +| Prefix | Name | URL | +| ------ | ------------------- | ------------------------------------------- | +| `exif` | Exif RDF Schema | http://www.w3.org/2003/12/exif/ns# | +| `rdf` | RDF XML Syntax | http://www.w3.org/1999/02/22-rdf-syntax-ns# | + + +### Legal code + +`**/rdf` excerpt: ```xml Vocabulary to describe an Exif format picture data. All Exif 2.2 tags are defined as RDF properties, as well as several terms to help this schema + +- [Exif RDF Schema][exifrdf] + +[exifrdf]: https://www.w3.org/2003/12/exif/ + + +### FOAF Vocabulary (`foaf`) [FOAF - Wikipedia](https://en.wikipedia.org/wiki/FOAF) (retrieved 2023-07-20): > FOAF (an acronym of friend of a friend) is a machine-readable ontology @@ -55,16 +105,17 @@ [foafvocab]: http://xmlns.com/foaf/0.1/ -### OWL 2 +### OWL 2 (`owl` prefix) -[OWL 2 Web Ontology Language Document Overview (Second Edition)][owl2overiew] +[OWL 2 Web Ontology Language Document Overview (Second Edition)][owl2overview] (retrieved 2023-07-20): > The OWL 2 Web Ontology Language, informally OWL 2, is an ontology language > for the Semantic Web with formally defined meaning. OWL 2 ontologies provide > classes, properties, individuals, and data values and are stored as Semantic > Web documents. -- [OWL 2 Web Ontology Language Document Overview (Second Edition)][owl2overiew] +- [OWL 2 Web Ontology Language Document Overview (Second + Edition)][owl2overview] - [OWL 2 Web Ontology Language Structural Specification and Functional-Style Syntax (Second Edition)][owl2spec] - [OWL 2 Web Ontology Language XML Serialization (Second Edition)][owl2xml] @@ -76,13 +127,39 @@ [wikipediasameas]: https://en.wikipedia.org/wiki/SameAs -### RDF XML Syntax +### RDF XML Syntax (`rdf` prefix) - [RDF 1.1 XML Syntax](https://www.w3.org/TR/rdf-syntax-grammar/) +### RDF Schema (`rdfs` prefix) + +- [RDF Schema 1.1](https://www.w3.org/TR/rdf11-schema/) + + +## RDF canonical URL + +Due to historical reasons, the canonical URL for the legal tools in RDF uses +the HTTP (unencrypted) protocol. + +Although the legal tools are not available via HTTP, the URLs still function +correctly due to redirects. Additionally, the RDF now includes an `owl:sameAs` +element with the HTTPS URL for further compatibility. + +For example: + +| Context | Protocol | Canonical URL | Published data example | +| ------- | -------- | ------------: | ---------------------- | +| RDF | HTTP | `http://creativecommons.org/licenses/by/4.0/` | [licenses/by/4.0/rdf#L3][ex1] +| all other uses | HTTPS | `https://creativecommons.org/licenses/by/4.0/` | [licenses/by/4.0/legalcode.en.html#L390-L397][ex2] + +[ex1]: https://github.com/creativecommons/cc-legal-tools-data/blob/ba0781024c735f5cd4eb59f8a1716eb9a12df212/docs/licenses/by/4.0/rdf#L3 +[ex2]: https://github.com/creativecommons/cc-legal-tools-data/blob/ba0781024c735f5cd4eb59f8a1716eb9a12df212/docs/licenses/by/4.0/legalcode.en.html#L390-L397 + + ## Changes + ### Overview The changes between the old legacy ccEngine RDF/XML and the new CC Legal Tools diff --git a/docs/translation.md b/docs/translation.md new file mode 100644 index 00000000..85706a8a --- /dev/null +++ b/docs/translation.md @@ -0,0 +1,124 @@ +# Translation + +(Return to primary [`../README.md`](../README.md).) + + +## Overview + +To upload/download translation files to/from Transifex, you'll need an account +there with access to these translations. Then follow the [Authentication - +Transifex API v3][transauth]: to get an API token, and set +`TRANSIFEX["API_TOKEN"]` in your environment with its value. + +The [creativecommons/cc-legal-tools-data][repodata] repository must be cloned +next to this `cc-legal-tools-app` repository. (It can be elsewhere, then you +need to set `DATA_REPOSITORY_DIR` to its location.) Be sure to clone using a +URL that starts with `git@github...` and not `https://github...`, or you won't +be able to push to it. See [`../README.md`](../README.md) for details. + +~~In production, the `check_for_translation_updates` management command should +be run hourly. See [Check for Translation +Updates](#check-for-translation-updates), below.~~ + +Also see [Publishing changes to git repo](#publishing-changes-to-git-repo), +below. + +[Babel][babel] is used for localization information. + +Documentation: +- [Babel — Babel documentation][babel] +- [Translation | Django documentation | Django][djangotranslation] + +[babel]: http://babel.pocoo.org/en/latest/index.html +[repodata]:https://github.com/creativecommons/cc-legal-tools-data +[transauth]: https://transifex.github.io/openapi/index.html#section/Authentication + + +## How the tool translation is implemented + +Django Translation uses two sets of Gettext Files in the +[creativecommons/cc-legal-tools-data][repodata] repository (the [Data +Repository](#data-repository), above). See that repository for detailed +information and definitions. + +Documentation: +- [Translation | Django documentation | Django][djangotranslation] +- Transifex API + - [Introduction to API 3.0 | Transifex Documentation][api30intro] + - [Transifex API v3][api30] + - Python SDK: [transifex-python/transifex/api][apisdk] + +[api30]: https://transifex.github.io/openapi/index.html#section/Introduction +[api30intro]: https://docs.transifex.com/api-3-0/introduction-to-api-3-0 +[apisdk]: https://github.com/transifex/transifex-python/tree/devel/transifex/api +[djangotranslation]: https://docs.djangoproject.com/en/4.2/topics/i18n/translation/ +[repodata]: https://github.com/creativecommons/cc-legal-tools-data + + +## Add translation + +1. Add language to appropriate resource in Transifex +2. Ensure language is present in Django + - If not, update `cc_legal_tools/settings/base.py` +3. Add objects for new language translation using the `add_objects` management + command. + - Examples: + ```shell + docker compose exec app ./manage.py add_objects -v2 --licenses -l tlh + ``` + ```shell + docker compose exec app ./manage.py add_objects -v2 --zero -l tlh + ``` +4. Synchronize repository Gettext files with Transifex +5. Compile `.mo` machine object Gettext files: + ```shell + docker compose exec app ./manage.py compilemessages + ``` + +## Synchronize repository Gettext files with Transifex + +- **TODO** document processes of synchronizing the repository Gettext files + with Transifex, including the following management commands: + - `locale_info` + - `normalize_translations` + - `compare_translations` + - `pull_translation` + - `push_translation` + - `compilemessages` + + +## Check for translation updates + +> :warning: **This functionality is currently disabled.** + +~~The hourly run of `check_for_translation_updates` looks to see if any of the +translation files in Transifex have newer last modification times than we know +about. It performs the following process (which can also be done manually:~~ + +1. ~~Ensure the Data Repository ([`../README.md`](../README.md)) is in place~~ +2. ~~Within the [creativecommons/cc-legal-tools-data][repodata] (the [Data + Repository](#data-repository)):~~ + 1. ~~Checkout or create the appropriate branch.~~ + - ~~For example, if a French translation file for BY 4.0 has changed, the + branch name will be `cc4-fr`.~~ + 2. ~~Download the updated `.po` portable object Gettext file from + Transifex~~ + 3. ~~Do the [Translation Update Process](#translation-update-process) + (below)~~ + - ~~_This is important and easy to forget,_ but without it, Django will + keep using the old translations~~ + 4. ~~Commit that change and push it upstream.~~ +3.~~ Within this `cc-legal-tools-app` repository:~~ + 1. ~~For each branch that has been updated, Generate Static + Files ([`../README.md`](../README.md)). Use the options to update git and + push the changes.~~ + +[repodata]:https://github.com/creativecommons/cc-legal-tools-data + + +Documentation: +- [GitPython Documentation — GitPython documentation][gitpythondocs] +- [Requests: HTTP for Humans™ — Requests documentation][requestsdocs] + +[gitpythondocs]: https://gitpython.readthedocs.io/en/stable/index.html +[requestsdocs]: https://docs.python-requests.org/en/master/ diff --git a/legal_tools/management/commands/load_html_files.py b/legal_tools/management/commands/20231010_load_html_files.py similarity index 100% rename from legal_tools/management/commands/load_html_files.py rename to legal_tools/management/commands/20231010_load_html_files.py diff --git a/legal_tools/management/commands/add_objects.py b/legal_tools/management/commands/add_objects.py new file mode 100644 index 00000000..4ab6d9e5 --- /dev/null +++ b/legal_tools/management/commands/add_objects.py @@ -0,0 +1,82 @@ +# Standard library +import logging +from argparse import ArgumentParser + +# Third-party +from django.conf import settings +from django.core.management import BaseCommand, CommandError + +# First-party/Local +from legal_tools.models import LegalCode, Tool + +LOG = logging.getLogger(__name__) +LOG_LEVELS = { + 0: logging.ERROR, + 1: logging.WARNING, + 2: logging.INFO, + 3: logging.DEBUG, +} + + +class Command(BaseCommand): + """ + Create new Licenses 4.0 or CC Zero 1.0 LegalCode objects for a given + language. + """ + + def add_arguments(self, parser: ArgumentParser): + domains = parser.add_mutually_exclusive_group(required=True) + domains.add_argument( + "--licenses", + action="store_const", + const="licenses", + help="Add licenses 4.0 translations", + dest="domains", + ) + domains.add_argument( + "--zero", + action="store_const", + const="zero", + help="Add CC0 1.0 translation", + dest="domains", + ) + parser.add_argument( + "-l", + "--language", + action="store", + required=True, + help="limit translation language to specified Language Code", + ) + parser.add_argument( + "-n", + "--dryrun", + action="store_true", + help="dry run: do not make any changes", + ) + + def add_legal_code(self, options, category, version, unit=None): + tool_parameters = {"category": category, "version": version} + if unit is not None: + tool_parameters["unit"] = unit + tools = Tool.objects.filter(**tool_parameters).order_by("unit") + for tool in tools: + title = f"{tool.unit} {tool.version} {options['language']}" + legal_code_parameters = { + "tool": tool, + "language_code": options["language"], + } + if LegalCode.objects.filter(**legal_code_parameters).exists(): + LOG.warn(f"LegalCode object already exists: {title}") + else: + LOG.info(f"Creating LeglCode object: {title}") + if not options["dryrun"]: + _ = LegalCode.objects.create(**legal_code_parameters) + + def handle(self, **options): + LOG.setLevel(LOG_LEVELS[int(options["verbosity"])]) + if options["language"] not in settings.LANG_INFO: + raise CommandError(f"Invalid language code: {options['language']}") + if options["domains"] == "licenses": + self.add_legal_code(options, "licenses", "4.0") + elif options["domains"] == "zero": + self.add_legal_code(options, "publicdomain", "1.0", "zero")