From ceefa82ed5c9441dba2391ea0836c74dce1bf932 Mon Sep 17 00:00:00 2001 From: Michael McCarthy <51542091+mccarthy-m-g@users.noreply.github.com> Date: Wed, 3 May 2023 15:22:07 -0700 Subject: [PATCH] update to quarto v1.3.340 and apply changes to site based on resolved issues I filed --- _common/_appendix.qmd | 10 +- _common/_metadata.yml | 1 + .../index/execute-results/html.json | 8 +- _freeze/site_libs/clipboard/clipboard.min.js | 4 +- _quarto.yml | 2 +- _site/404.html | 140 +- _site/about/index.html | 148 +- _site/index.html | 178 +- _site/index.xml | 10106 +++++++++++++--- .../posts/2021-06-15_demons-souls/index.html | 204 +- _site/posts/2021-06-19_distill/index.html | 222 +- .../2022-06-16_projectile-motion/index.html | 217 +- _site/posts/2022-09-29_borderlands/index.html | 207 +- _site/posts/2022-12-20_palettes/index.html | 200 +- .../posts/2023-05-03_r-developers/index.html | 199 +- _site/posts/4000-01-01_test-post/index.html | 318 +- _site/search.json | 301 +- .../index.html | 247 +- .../posts/learning-r.html | 228 +- .../posts/productivity-tips.html | 247 +- .../posts/python-setup.html | 243 +- .../posts/r-getting-help.html | 216 +- .../posts/r-packages.html | 264 +- .../posts/r-setup.html | 247 +- .../posts/start-here.html | 208 +- .../posts/system-setup.html | 249 +- .../posts/technical-writing.html | 257 +- .../posts/version-control-setup.html | 243 +- .../posts/zotero-setup.html | 257 +- _site/site_libs/bootstrap/bootstrap-icons.css | 316 +- .../site_libs/bootstrap/bootstrap-icons.woff | Bin 137124 -> 164168 bytes _site/site_libs/bootstrap/bootstrap.min.css | 4 +- _site/site_libs/clipboard/clipboard.min.js | 4 +- .../quarto-syntax-highlighting.css | 32 + _site/site_libs/quarto-html/quarto.js | 338 +- _site/site_libs/quarto-nav/quarto-nav.js | 57 +- .../quarto-search/autocomplete.umd.js | 4 +- _site/site_libs/quarto-search/fuse.min.js | 6 +- .../site_libs/quarto-search/quarto-search.js | 33 +- _site/sitemap.xml | 48 +- .../index.html | 205 +- .../index.html | 203 +- .../index.html | 205 +- .../index.html | 214 +- assets/theme.scss | 5 - posts/4000-01-01_test-post/index.qmd | 2 + .../index.qmd | 5 +- 47 files changed, 13570 insertions(+), 3482 deletions(-) diff --git a/_common/_appendix.qmd b/_common/_appendix.qmd index aac2c02..d7fbf9b 100644 --- a/_common/_appendix.qmd +++ b/_common/_appendix.qmd @@ -1,5 +1,5 @@ -## {.appendix .unnumbered .unlisted} +## {.appendix} ::: {.d-flex} @@ -16,7 +16,7 @@ Thanks for reading! I’m Michael, the voice behind Tidy Tales. I am an award wi ::: -## Comments {.appendix .unnumbered .unlisted} +## Comments {.appendix}
@@ -24,7 +24,7 @@ Thanks for reading! I’m Michael, the voice behind Tidy Tales. I am an award wi :::
-## Session Info {.appendix .unnumbered .unlisted} +## Session Info {.appendix}
@@ -52,13 +52,13 @@ pkg_sesh
`r if (is.null(params$data_file)) ""` `r if (!params$fair_dealing) ""` diff --git a/_common/_metadata.yml b/_common/_metadata.yml index 738caf1..d4084ac 100644 --- a/_common/_metadata.yml +++ b/_common/_metadata.yml @@ -14,4 +14,5 @@ toc-location: left # Citation Metadata citation: true +appendix-cite-as: display csl: ../../assets/apa.csl diff --git a/_freeze/posts/4000-01-01_test-post/index/execute-results/html.json b/_freeze/posts/4000-01-01_test-post/index/execute-results/html.json index a037e8c..0e2f7f2 100644 --- a/_freeze/posts/4000-01-01_test-post/index/execute-results/html.json +++ b/_freeze/posts/4000-01-01_test-post/index/execute-results/html.json @@ -1,8 +1,10 @@ { - "hash": "129ea4e82f617e05688e70f429251c88", + "hash": "6516bf42d6b08d35edecdf963560cc44", "result": { - "markdown": "---\ndraft: true\n\ntitle: Test Post\ndescription: |\n A post for testing things.\n\ndate: '4000-01-01'\n\ncategories:\n - .Testing\n\nparams:\n use_renv: FALSE\n data_file: NULL\n fair_dealing: FALSE\n---\n\n\n\n\nA test post.\n", - "supporting": [], + "markdown": "---\ndraft: true\n\ntitle: Test Post\ndescription: |\n A post for testing things.\n\ndate: '4000-01-01'\n\ncategories:\n - .Testing\n\nparams:\n use_renv: FALSE\n data_file: NULL\n fair_dealing: FALSE\n---\n\n\n\n\nA test post.\n\n\n## {.appendix}\n\n::: {.d-flex}\n\n![](/assets/images/avatar.png){fig-align=\"left\" fig-alt=\"Avatar\" .rounded .avatar}\n\n::: {}\n\n::: {}\n## Michael McCarthy {.quarto-appendix-heading}\n:::\n\nThanks for reading! I’m Michael, the voice behind Tidy Tales. I am an award winning data scientist and R programmer with the skills and experience to help you solve the problems you care about. You can learn more about me, my [consulting services](https://michaelmccarthy.tidytales.ca/consulting/), and my other projects on my [personal website](https://michaelmccarthy.tidytales.ca).\n:::\n\n:::\n\n## Comments {.appendix}\n\n
\n\n::: {.giscus}\n:::\n
\n\n## Session Info {.appendix}\n\n
\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n```\n─ Session info ───────────────────────────────────────────────────────────────\n setting value\n version R version 4.2.2 (2022-10-31)\n os macOS Mojave 10.14.6\n system x86_64, darwin17.0\n ui X11\n language (EN)\n collate en_CA.UTF-8\n ctype en_CA.UTF-8\n tz America/Vancouver\n date 2023-05-03\n pandoc 2.14.0.3 @ /Applications/RStudio.app/Contents/MacOS/pandoc/ (via rmarkdown)\n quarto 1.3.340 @ /usr/local/bin/quarto\n\n─ Packages ───────────────────────────────────────────────────────────────────\n package * version date (UTC) lib source\n sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)\n\n [1] /Users/Michael/Library/R/x86_64/4.2/library/__tidytales\n [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library\n\n──────────────────────────────────────────────────────────────────────────────\n```\n:::\n:::\n
\n\n\n\n\n\n", + "supporting": [ + "index_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/site_libs/clipboard/clipboard.min.js b/_freeze/site_libs/clipboard/clipboard.min.js index 41c6a0f..1103f81 100644 --- a/_freeze/site_libs/clipboard/clipboard.min.js +++ b/_freeze/site_libs/clipboard/clipboard.min.js @@ -1,7 +1,7 @@ /*! - * clipboard.js v2.0.10 + * clipboard.js v2.0.11 * https://clipboardjs.com/ * * Licensed MIT © Zeno Rocha */ -!function(t,e){"object"==typeof exports&&"object"==typeof module?module.exports=e():"function"==typeof define&&define.amd?define([],e):"object"==typeof exports?exports.ClipboardJS=e():t.ClipboardJS=e()}(this,function(){return n={686:function(t,e,n){"use strict";n.d(e,{default:function(){return o}});var e=n(279),i=n.n(e),e=n(370),u=n.n(e),e=n(817),c=n.n(e);function a(t){try{return document.execCommand(t)}catch(t){return}}var f=function(t){t=c()(t);return a("cut"),t};var l=function(t){var e,n,o,r=1= 1.0.0" +quarto-required: ">= 1.3.340" diff --git a/_site/404.html b/_site/404.html index 0a5c889..2940e25 100644 --- a/_site/404.html +++ b/_site/404.html @@ -2,7 +2,7 @@ - + @@ -17,7 +17,7 @@ ul.task-list{list-style: none;} ul.task-list li input[type="checkbox"] { width: 0.8em; - margin: 0 0.8em 0.2em -1.6em; + margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ vertical-align: middle; } @@ -92,49 +92,51 @@ Tidy Tales + @@ -197,9 +199,23 @@

Page Not Found

icon: icon }; anchorJS.add('.anchored'); + const isCodeAnnotation = (el) => { + for (const clz of el.classList) { + if (clz.startsWith('code-annotation-')) { + return true; + } + } + return false; + } const clipboard = new window.ClipboardJS('.code-copy-button', { - target: function(trigger) { - return trigger.previousElementSibling; + text: function(trigger) { + const codeEl = trigger.previousElementSibling.cloneNode(true); + for (const childEl of codeEl.children) { + if (isCodeAnnotation(childEl)) { + childEl.remove(); + } + } + return codeEl.innerText; } }); clipboard.on('success', function(e) { @@ -264,6 +280,92 @@

Page Not Found

return note.innerHTML; }); } + let selectedAnnoteEl; + const selectorForAnnotation = ( cell, annotation) => { + let cellAttr = 'data-code-cell="' + cell + '"'; + let lineAttr = 'data-code-annotation="' + annotation + '"'; + const selector = 'span[' + cellAttr + '][' + lineAttr + ']'; + return selector; + } + const selectCodeLines = (annoteEl) => { + const doc = window.document; + const targetCell = annoteEl.getAttribute("data-target-cell"); + const targetAnnotation = annoteEl.getAttribute("data-target-annotation"); + const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation)); + const lines = annoteSpan.getAttribute("data-code-lines").split(","); + const lineIds = lines.map((line) => { + return targetCell + "-" + line; + }) + let top = null; + let height = null; + let parent = null; + if (lineIds.length > 0) { + //compute the position of the single el (top and bottom and make a div) + const el = window.document.getElementById(lineIds[0]); + top = el.offsetTop; + height = el.offsetHeight; + parent = el.parentElement.parentElement; + if (lineIds.length > 1) { + const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]); + const bottom = lastEl.offsetTop + lastEl.offsetHeight; + height = bottom - top; + } + if (top !== null && height !== null && parent !== null) { + // cook up a div (if necessary) and position it + let div = window.document.getElementById("code-annotation-line-highlight"); + if (div === null) { + div = window.document.createElement("div"); + div.setAttribute("id", "code-annotation-line-highlight"); + div.style.position = 'absolute'; + parent.appendChild(div); + } + div.style.top = top - 2 + "px"; + div.style.height = height + 4 + "px"; + let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter"); + if (gutterDiv === null) { + gutterDiv = window.document.createElement("div"); + gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter"); + gutterDiv.style.position = 'absolute'; + const codeCell = window.document.getElementById(targetCell); + const gutter = codeCell.querySelector('.code-annotation-gutter'); + gutter.appendChild(gutterDiv); + } + gutterDiv.style.top = top - 2 + "px"; + gutterDiv.style.height = height + 4 + "px"; + } + selectedAnnoteEl = annoteEl; + } + }; + const unselectCodeLines = () => { + const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"]; + elementsIds.forEach((elId) => { + const div = window.document.getElementById(elId); + if (div) { + div.remove(); + } + }); + selectedAnnoteEl = undefined; + }; + // Attach click handler to the DT + const annoteDls = window.document.querySelectorAll('dt[data-target-cell]'); + for (const annoteDlNode of annoteDls) { + annoteDlNode.addEventListener('click', (event) => { + const clickedEl = event.target; + if (clickedEl !== selectedAnnoteEl) { + unselectCodeLines(); + const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active'); + if (activeEl) { + activeEl.classList.remove('code-annotation-active'); + } + selectCodeLines(clickedEl); + clickedEl.classList.add('code-annotation-active'); + } else { + // Unselect the line + unselectCodeLines(); + clickedEl.classList.remove('code-annotation-active'); + } + }); + } const findCites = (el) => { const parentEl = el.parentElement; if (parentEl) { @@ -302,12 +404,12 @@

Page Not Found

} } var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//); - var filterRegex = new RegExp(/https:\/\/tidytales\.ca/); + var filterRegex = new RegExp("https:\/\/tidytales\.ca"); var isInternal = (href) => { return filterRegex.test(href) || localhostRegex.test(href); } // Inspect non-navigation links and adorn them if external - var links = window.document.querySelectorAll('a:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item)'); + var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item)'); for (var i=0; iPage Not Found diff --git a/_site/about/index.html b/_site/about/index.html index 92b3030..eb4033f 100644 --- a/_site/about/index.html +++ b/_site/about/index.html @@ -2,7 +2,7 @@ - + @@ -17,7 +17,7 @@ ul.task-list{list-style: none;} ul.task-list li input[type="checkbox"] { width: 0.8em; - margin: 0 0.8em 0.2em -1.6em; + margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ vertical-align: middle; } @@ -64,18 +64,14 @@ - + - - - + - - @@ -92,49 +88,51 @@ Tidy Tales + @@ -223,9 +221,23 @@

Contact

icon: icon }; anchorJS.add('.anchored'); + const isCodeAnnotation = (el) => { + for (const clz of el.classList) { + if (clz.startsWith('code-annotation-')) { + return true; + } + } + return false; + } const clipboard = new window.ClipboardJS('.code-copy-button', { - target: function(trigger) { - return trigger.previousElementSibling; + text: function(trigger) { + const codeEl = trigger.previousElementSibling.cloneNode(true); + for (const childEl of codeEl.children) { + if (isCodeAnnotation(childEl)) { + childEl.remove(); + } + } + return codeEl.innerText; } }); clipboard.on('success', function(e) { @@ -290,6 +302,92 @@

Contact

return note.innerHTML; }); } + let selectedAnnoteEl; + const selectorForAnnotation = ( cell, annotation) => { + let cellAttr = 'data-code-cell="' + cell + '"'; + let lineAttr = 'data-code-annotation="' + annotation + '"'; + const selector = 'span[' + cellAttr + '][' + lineAttr + ']'; + return selector; + } + const selectCodeLines = (annoteEl) => { + const doc = window.document; + const targetCell = annoteEl.getAttribute("data-target-cell"); + const targetAnnotation = annoteEl.getAttribute("data-target-annotation"); + const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation)); + const lines = annoteSpan.getAttribute("data-code-lines").split(","); + const lineIds = lines.map((line) => { + return targetCell + "-" + line; + }) + let top = null; + let height = null; + let parent = null; + if (lineIds.length > 0) { + //compute the position of the single el (top and bottom and make a div) + const el = window.document.getElementById(lineIds[0]); + top = el.offsetTop; + height = el.offsetHeight; + parent = el.parentElement.parentElement; + if (lineIds.length > 1) { + const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]); + const bottom = lastEl.offsetTop + lastEl.offsetHeight; + height = bottom - top; + } + if (top !== null && height !== null && parent !== null) { + // cook up a div (if necessary) and position it + let div = window.document.getElementById("code-annotation-line-highlight"); + if (div === null) { + div = window.document.createElement("div"); + div.setAttribute("id", "code-annotation-line-highlight"); + div.style.position = 'absolute'; + parent.appendChild(div); + } + div.style.top = top - 2 + "px"; + div.style.height = height + 4 + "px"; + let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter"); + if (gutterDiv === null) { + gutterDiv = window.document.createElement("div"); + gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter"); + gutterDiv.style.position = 'absolute'; + const codeCell = window.document.getElementById(targetCell); + const gutter = codeCell.querySelector('.code-annotation-gutter'); + gutter.appendChild(gutterDiv); + } + gutterDiv.style.top = top - 2 + "px"; + gutterDiv.style.height = height + 4 + "px"; + } + selectedAnnoteEl = annoteEl; + } + }; + const unselectCodeLines = () => { + const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"]; + elementsIds.forEach((elId) => { + const div = window.document.getElementById(elId); + if (div) { + div.remove(); + } + }); + selectedAnnoteEl = undefined; + }; + // Attach click handler to the DT + const annoteDls = window.document.querySelectorAll('dt[data-target-cell]'); + for (const annoteDlNode of annoteDls) { + annoteDlNode.addEventListener('click', (event) => { + const clickedEl = event.target; + if (clickedEl !== selectedAnnoteEl) { + unselectCodeLines(); + const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active'); + if (activeEl) { + activeEl.classList.remove('code-annotation-active'); + } + selectCodeLines(clickedEl); + clickedEl.classList.add('code-annotation-active'); + } else { + // Unselect the line + unselectCodeLines(); + clickedEl.classList.remove('code-annotation-active'); + } + }); + } const findCites = (el) => { const parentEl = el.parentElement; if (parentEl) { @@ -328,12 +426,12 @@

Contact

} } var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//); - var filterRegex = new RegExp(/https:\/\/tidytales\.ca/); + var filterRegex = new RegExp("https:\/\/tidytales\.ca"); var isInternal = (href) => { return filterRegex.test(href) || localhostRegex.test(href); } // Inspect non-navigation links and adorn them if external - var links = window.document.querySelectorAll('a:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item)'); + var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item)'); for (var i=0; iContact diff --git a/_site/index.html b/_site/index.html index 049d819..75a5cfa 100644 --- a/_site/index.html +++ b/_site/index.html @@ -2,7 +2,7 @@ - + @@ -17,7 +17,7 @@ ul.task-list{list-style: none;} ul.task-list li input[type="checkbox"] { width: 0.8em; - margin: 0 0.8em 0.2em -1.6em; + margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ vertical-align: middle; } @@ -129,7 +129,7 @@ const options = { valueNames: ['listing-date','listing-title','listing-description','listing-categories',{ data: ['index'] },{ data: ['categories'] },{ data: ['listing-date-sort'] },{ data: ['listing-file-modified-sort'] }], - searchColumns: ["listing-date","listing-title","listing-author","listing-description","listing-categories"], + searchColumns: ["listing-date","listing-title","listing-author","listing-image","listing-description","listing-categories"], }; window['quarto-listings'] = window['quarto-listings'] || {}; @@ -195,49 +195,51 @@ Tidy Tales + @@ -294,9 +296,9 @@

Posts

-
+
-

+

-
+
-

+

-
+
-

+

-
+
-

+

-
+
-

+

-
+
-

+

-
+
-
+
-
+
-
+
-
+

@@ -830,9 +832,23 @@

icon: icon }; anchorJS.add('.anchored'); + const isCodeAnnotation = (el) => { + for (const clz of el.classList) { + if (clz.startsWith('code-annotation-')) { + return true; + } + } + return false; + } const clipboard = new window.ClipboardJS('.code-copy-button', { - target: function(trigger) { - return trigger.previousElementSibling; + text: function(trigger) { + const codeEl = trigger.previousElementSibling.cloneNode(true); + for (const childEl of codeEl.children) { + if (isCodeAnnotation(childEl)) { + childEl.remove(); + } + } + return codeEl.innerText; } }); clipboard.on('success', function(e) { @@ -897,6 +913,92 @@

return note.innerHTML; }); } + let selectedAnnoteEl; + const selectorForAnnotation = ( cell, annotation) => { + let cellAttr = 'data-code-cell="' + cell + '"'; + let lineAttr = 'data-code-annotation="' + annotation + '"'; + const selector = 'span[' + cellAttr + '][' + lineAttr + ']'; + return selector; + } + const selectCodeLines = (annoteEl) => { + const doc = window.document; + const targetCell = annoteEl.getAttribute("data-target-cell"); + const targetAnnotation = annoteEl.getAttribute("data-target-annotation"); + const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation)); + const lines = annoteSpan.getAttribute("data-code-lines").split(","); + const lineIds = lines.map((line) => { + return targetCell + "-" + line; + }) + let top = null; + let height = null; + let parent = null; + if (lineIds.length > 0) { + //compute the position of the single el (top and bottom and make a div) + const el = window.document.getElementById(lineIds[0]); + top = el.offsetTop; + height = el.offsetHeight; + parent = el.parentElement.parentElement; + if (lineIds.length > 1) { + const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]); + const bottom = lastEl.offsetTop + lastEl.offsetHeight; + height = bottom - top; + } + if (top !== null && height !== null && parent !== null) { + // cook up a div (if necessary) and position it + let div = window.document.getElementById("code-annotation-line-highlight"); + if (div === null) { + div = window.document.createElement("div"); + div.setAttribute("id", "code-annotation-line-highlight"); + div.style.position = 'absolute'; + parent.appendChild(div); + } + div.style.top = top - 2 + "px"; + div.style.height = height + 4 + "px"; + let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter"); + if (gutterDiv === null) { + gutterDiv = window.document.createElement("div"); + gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter"); + gutterDiv.style.position = 'absolute'; + const codeCell = window.document.getElementById(targetCell); + const gutter = codeCell.querySelector('.code-annotation-gutter'); + gutter.appendChild(gutterDiv); + } + gutterDiv.style.top = top - 2 + "px"; + gutterDiv.style.height = height + 4 + "px"; + } + selectedAnnoteEl = annoteEl; + } + }; + const unselectCodeLines = () => { + const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"]; + elementsIds.forEach((elId) => { + const div = window.document.getElementById(elId); + if (div) { + div.remove(); + } + }); + selectedAnnoteEl = undefined; + }; + // Attach click handler to the DT + const annoteDls = window.document.querySelectorAll('dt[data-target-cell]'); + for (const annoteDlNode of annoteDls) { + annoteDlNode.addEventListener('click', (event) => { + const clickedEl = event.target; + if (clickedEl !== selectedAnnoteEl) { + unselectCodeLines(); + const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active'); + if (activeEl) { + activeEl.classList.remove('code-annotation-active'); + } + selectCodeLines(clickedEl); + clickedEl.classList.add('code-annotation-active'); + } else { + // Unselect the line + unselectCodeLines(); + clickedEl.classList.remove('code-annotation-active'); + } + }); + } const findCites = (el) => { const parentEl = el.parentElement; if (parentEl) { @@ -935,12 +1037,12 @@

} } var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//); - var filterRegex = new RegExp(/https:\/\/tidytales\.ca/); + var filterRegex = new RegExp("https:\/\/tidytales\.ca"); var isInternal = (href) => { return filterRegex.test(href) || localhostRegex.test(href); } // Inspect non-navigation links and adorn them if external - var links = window.document.querySelectorAll('a:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item)'); + var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item)'); for (var i=0; i

diff --git a/_site/index.xml b/_site/index.xml index 51496d3..7ef4aee 100644 --- a/_site/index.xml +++ b/_site/index.xml @@ -9,7 +9,7 @@ https://tidytales.ca/index.html -quarto-1.2.313 +quarto-1.3.340 Wed, 03 May 2023 07:00:00 GMT The Pareto Principle in R package development @@ -28,19 +28,35 @@

Prerequisites

-
library(tidyverse)
-library(stringi)
-library(scales)
-library(gt)
+
library(tidyverse)
+library(stringi)
+library(scales)
+library(gt)

I’ll be using the CRAN package repository data returned by tools::CRAN_package_db() to get package and author metadata for the current packages available on CRAN. This returns a data frame with character columns containing most metadata from the DESCRIPTION file of a given R package.

Since this data will change over time, here’s when tools::CRAN_package_db() was run for reference: 2023-05-03.

-
cran_pkg_db <- tools::CRAN_package_db()
+
cran_pkg_db <- tools::CRAN_package_db()
 
-glimpse(cran_pkg_db)
+glimpse(cran_pkg_db)
#> Rows: 19,473
 #> Columns: 67
@@ -118,17 +134,57 @@
 

Wrangle

Since we only care about package and author metadata, a good first step is to remove everything else. This leaves us with a Package field and two author fields: Author and Authors@R. The difference between the two author fields is that Author is an unstructured text field that can contain any text in any format, and Authors@R is a structured text field containing R code that defines authors’ names and roles with the person() function.

-
cran_pkg_db <- cran_pkg_db |>
-  select(package = Package, authors = Author, authors_r = `Authors@R`) |>
-  as_tibble()
+
cran_pkg_db <- cran_pkg_db |>
+  select(package = Package, authors = Author, authors_r = `Authors@R`) |>
+  as_tibble()

Here’s a comparison of the two fields, using the dplyr package as an example:

-
# Author
-cran_pkg_db |>
-  filter(package == "dplyr") |>
-  pull(authors) |>
-  cat()
+
# Author
+cran_pkg_db |>
+  filter(package == "dplyr") |>
+  pull(authors) |>
+  cat()
#> Hadley Wickham [aut, cre] (<https://orcid.org/0000-0003-4757-117X>),
 #>   Romain François [aut] (<https://orcid.org/0000-0002-2444-4226>),
@@ -137,11 +193,29 @@
 #>   Davis Vaughan [aut] (<https://orcid.org/0000-0003-4777-038X>),
 #>   Posit Software, PBC [cph, fnd]
-
# Authors@R
-cran_pkg_db |>
-  filter(package == "dplyr") |>
-  pull(authors_r) |>
-  cat()
+
# Authors@R
+cran_pkg_db |>
+  filter(package == "dplyr") |>
+  pull(authors_r) |>
+  cat()
#> c(
 #>     person("Hadley", "Wickham", , "hadley@posit.co", role = c("aut", "cre"),
@@ -179,9 +253,19 @@
 

From the output above you can see that every package uses the Author field, but not all packages use the Authors@R field. This is unfortunate, because it means that the names and roles of authors need to be extracted from the unstructured text in the Author field for a subset of packages, which is difficult to do and somewhat error-prone. Just for consideration, here’s how many packages don’t use the Authors@R field.

-
cran_pkg_db |>
-  filter(is.na(authors_r)) |>
-  nrow()
+
cran_pkg_db |>
+  filter(is.na(authors_r)) |>
+  nrow()
#> [1] 6361
@@ -193,7 +277,43 @@

Extracting from Authors@R

Getting the data we want from the Authors@R field is pretty straightforward. For the packages where this is used, each one has a vector of person objects stored as a character string like:

-
mm_string <- "person(\"Michael\", \"McCarthy\", , role = c(\"aut\", \"cre\"))"
+
mm_string <- "person(\"Michael\", \"McCarthy\", , role = c(\"aut\", \"cre\"))"
 
 mm_string
@@ -202,18 +322,56 @@

Which can be parsed and evaluated as R code like:

-
mm_eval <- eval(parse(text = mm_string))
+
mm_eval <- eval(parse(text = mm_string))
 
-class(mm_eval)
+class(mm_eval)
#> [1] "person"

Then the format() method for the person class can be used to get names and roles into the format I want simply and accurately.

-
mm_person <- format(mm_eval, include = c("given", "family"))
-mm_roles  <- format(mm_eval, include = c("role"))
-tibble(person = mm_person, roles = mm_roles)
+
mm_person <- format(mm_eval, include = c("given", "family"))
+mm_roles  <- format(mm_eval, include = c("role"))
+tibble(person = mm_person, roles = mm_roles)
#> # A tibble: 1 × 2
 #>   person           roles     
@@ -223,28 +381,130 @@
 

I’ve wrapped this up into a small helper function, authors_r(), that includes some light tidying steps just to deal with a couple small discrepancies I noticed in a subset of packages.

-
# Get names and roles from "person" objects in the Authors@R field
-authors_r <- function(x) {
-  # Some light preprocessing is needed to replace the unicode symbol for line
-  # breaks with the regular "\n". This is an edge case from at least one
-  # package.
-  code <- str_replace_all(x, "\\<U\\+000a\\>", "\n")
-  persons <- eval(parse(text = code))
-  person <- str_trim(format(persons, include = c("given", "family")))
-  roles <- format(persons, include = c("role"))
-  tibble(person = person, roles = roles)
+
# Get names and roles from "person" objects in the Authors@R field
+authors_r <- function(x) {
+  # Some light preprocessing is needed to replace the unicode symbol for line
+  # breaks with the regular "\n". This is an edge case from at least one
+  # package.
+  code <- str_replace_all(x, "\\<U\\+000a\\>", "\n")
+  persons <- eval(parse(text = code))
+  person <- str_trim(format(persons, include = c("given", "family")))
+  roles <- format(persons, include = c("role"))
+  tibble(person = person, roles = roles)
 }

Here’s an example of it with dplyr:

-
cran_pkg_db |>
-  filter(package == "dplyr") |>
-  pull(authors_r) |>
-  # Normalizing names leads to more consistent results with summary statistics
-  # later on, since some people use things like umlauts and accents
-  # inconsistently.
-  stri_trans_general("latin-ascii") |>
-  authors_r()
+
cran_pkg_db |>
+  filter(package == "dplyr") |>
+  pull(authors_r) |>
+  # Normalizing names leads to more consistent results with summary statistics
+  # later on, since some people use things like umlauts and accents
+  # inconsistently.
+  stri_trans_general("latin-ascii") |>
+  authors_r()
#> # A tibble: 6 × 2
 #>   person              roles     
@@ -274,53 +534,289 @@
 

Fortunately, regular expressions actually worked pretty well, so this is the solution I settled on. I tried two approaches to this. First I tried to split the names (and roles) up by commas (and eventually other punctuation as I ran into edge cases). This worked alright; there were clearly errors in the data with this method, but since most packages use a simple structure in the Author field it correctly extracted names from most packages.

Second I tried to extract the names (and roles) directly with a regular expression that could match a variety of names. This is the solution I settled on. It still isn’t perfect, but the data is cleaner than with the other method. Regardless, the difference in number of observations between both methods was only in the mid hundreds—so I think any statistics based on this data, although not completely accurate, are still sufficient to get a good idea of the R developer landscape on CRAN.

-
# This regex was adapted from <https://stackoverflow.com/a/7654214/16844576>.
-# It's designed to capture a wide range of names, including those with
-# punctuation in them. It's tailored to this data, so I don't know how well
-# it would generalize to other situations, but feel free to try it.
-persons_roles <- r"((\'|\")*[A-Z]([A-Z]+|(\'[A-Z])?[a-z]+|\.)(?:(\s+|\-)[A-Z]([a-z]+|\.?))*(?:(\'?\s+|\-)[a-z][a-z\-]+){0,2}(\s+|\-)[A-Z](\'?[A-Za-z]+(\'[A-Za-z]+)?|\.)(?:(\s+|\-)[A-Za-z]([a-z]+|\.))*(\'|\")*(?:\s*\[(.*?)\])?)"
-# Some packages put the person() code in the wrong field, but it's also
-# formatted incorrectly and throws an error when evaluated, so the best we can
-# do is just extract the whole thing for each person.
-person_objects <- r"(person\((.*?)\))"
+
# This regex was adapted from <https://stackoverflow.com/a/7654214/16844576>.
+# It's designed to capture a wide range of names, including those with
+# punctuation in them. It's tailored to this data, so I don't know how well
+# it would generalize to other situations, but feel free to try it.
+persons_roles <- r"((\'|\")*[A-Z]([A-Z]+|(\'[A-Z])?[a-z]+|\.)(?:(\s+|\-)[A-Z]([a-z]+|\.?))*(?:(\'?\s+|\-)[a-z][a-z\-]+){0,2}(\s+|\-)[A-Z](\'?[A-Za-z]+(\'[A-Za-z]+)?|\.)(?:(\s+|\-)[A-Za-z]([a-z]+|\.))*(\'|\")*(?:\s*\[(.*?)\])?)"
+# Some packages put the person() code in the wrong field, but it's also
+# formatted incorrectly and throws an error when evaluated, so the best we can
+# do is just extract the whole thing for each person.
+person_objects <- r"(person\((.*?)\))"
 
-# Get names and roles from character strings in the Author field
-authors <- function(x) {
-  # The Author field is unstructured and there are idiosyncrasies between
-  # different packages. The steps here attempt to fix the idiosyncrasies so
-  # authors can be extracted with as few errors as possible.
-  persons <- x |>
-    # Line breaks should be replaced with spaces in case they occur in the
-    # middle of a name.
-    str_replace_all("\\n|\\<U\\+000a\\>|\\n(?=[:upper:])", " ") |>
-    # Periods should always have a space after them so initials will be
-    # recognized as part of a name.
-    str_replace_all("\\.", "\\. ") |>
-    # Commas before roles will keep them from being included in the regex.
-    str_remove_all(",(?= \\[)") |>
-    # Get persons and their roles.
-    str_extract_all(paste0(persons_roles, "|", person_objects)) |>
-    unlist() |>
-    # Multiple spaces can be replaced with a single space for cleaner names.
-    str_replace_all("\\s+", " ")
+# Get names and roles from character strings in the Author field
+authors <- function(x) {
+  # The Author field is unstructured and there are idiosyncrasies between
+  # different packages. The steps here attempt to fix the idiosyncrasies so
+  # authors can be extracted with as few errors as possible.
+  persons <- x |>
+    # Line breaks should be replaced with spaces in case they occur in the
+    # middle of a name.
+    str_replace_all("\\n|\\<U\\+000a\\>|\\n(?=[:upper:])", " ") |>
+    # Periods should always have a space after them so initials will be
+    # recognized as part of a name.
+    str_replace_all("\\.", "\\. ") |>
+    # Commas before roles will keep them from being included in the regex.
+    str_remove_all(",(?= \\[)") |>
+    # Get persons and their roles.
+    str_extract_all(paste0(persons_roles, "|", person_objects)) |>
+    unlist() |>
+    # Multiple spaces can be replaced with a single space for cleaner names.
+    str_replace_all("\\s+", " ")
 
-  tibble(person = persons) |>
-    mutate(
-      roles  = str_extract(person, "\\[(.*?)\\]"),
-      person = str_remove(
-        str_remove(person, "\\s*\\[(.*?)\\]"),
-        "^('|\")|('|\")$" # Some names are wrapped in quotations
+  tibble(person = persons) |>
+    mutate(
+      roles  = str_extract(person, "\\[(.*?)\\]"),
+      person = str_remove(
+        str_remove(person, "\\s*\\[(.*?)\\]"),
+        "^('|\")|('|\")$" # Some names are wrapped in quotations
       )
     )
 }

Here’s an example of it with dplyr. If you compare it to the output from authors_r() above you can see the data quality is still good enough for rock ‘n’ roll, but it isn’t perfect; Posit’s roles are no longer defined because the comma in their name cut off the regex before it captured the roles. So there are some edge cases like this that will create measurement error in the person or roles columns, but I don’t think it’s bad enough to invalidate the results.

-
cran_pkg_db |>
-  filter(package == "dplyr") |>
-  pull(authors) |>
-  stri_trans_general("latin-ascii") |>
-  authors()
+
cran_pkg_db |>
+  filter(package == "dplyr") |>
+  pull(authors) |>
+  stri_trans_general("latin-ascii") |>
+  authors()
#> # A tibble: 6 × 2
 #>   person          roles     
@@ -340,53 +836,241 @@
 

Kurt Hornik, Duncan Murdoch and Achim Zeileis published a nice article in The R Journal explaining the roles of R package authors and where they come from. Briefly, they come from the “Relator and Role” codes and terms from MARC (MAchine-Readable Cataloging, Library of Congress, 2012) here: https://www.loc.gov/marc/relators/relaterm.html.

There are a lot of roles there; I just took the ones that were present in the data at the time I wrote this post.

-
marc_roles <- c(
-  analyst = "anl",
-  architecht = "arc",
-  artist = "art",
-  author = "aut",
-  author_in_quotations = "aqt",
-  author_of_intro = "aui",
-  bibliographic_antecedent = "ant",
-  collector = "col",
-  compiler = "com",
-  conceptor = "ccp",
-  conservator = "con",
-  consultant = "csl",
-  consultant_to_project = "csp",
-  contestant_appellant = "cot",
-  contractor = "ctr",
-  contributor = "ctb",
-  copyright_holder = "cph",
-  corrector = "crr",
-  creator = "cre",
-  data_contributor = "dtc",
-  degree_supervisor = "dgs",
-  editor = "edt",
-  funder = "fnd",
-  illustrator = "ill",
-  inventor = "inv",
-  lab_director = "ldr",
-  lead = "led",
-  metadata_contact = "mdc",
-  musician = "mus",
-  owner = "own",
-  presenter = "pre",
-  programmer = "prg",
-  project_director = "pdr",
-  scientific_advisor = "sad",
-  second_party = "spy",
-  sponsor = "spn",
-  supporting_host = "sht",
-  teacher = "tch",
-  thesis_advisor = "ths",
-  translator = "trl",
-  research_team_head = "rth",
-  research_team_member = "rtm",
-  researcher = "res",
-  reviewer = "rev",
-  witness = "wit",
-  woodcutter = "wdc"
+
marc_roles <- c(
+  analyst = "anl",
+  architecht = "arc",
+  artist = "art",
+  author = "aut",
+  author_in_quotations = "aqt",
+  author_of_intro = "aui",
+  bibliographic_antecedent = "ant",
+  collector = "col",
+  compiler = "com",
+  conceptor = "ccp",
+  conservator = "con",
+  consultant = "csl",
+  consultant_to_project = "csp",
+  contestant_appellant = "cot",
+  contractor = "ctr",
+  contributor = "ctb",
+  copyright_holder = "cph",
+  corrector = "crr",
+  creator = "cre",
+  data_contributor = "dtc",
+  degree_supervisor = "dgs",
+  editor = "edt",
+  funder = "fnd",
+  illustrator = "ill",
+  inventor = "inv",
+  lab_director = "ldr",
+  lead = "led",
+  metadata_contact = "mdc",
+  musician = "mus",
+  owner = "own",
+  presenter = "pre",
+  programmer = "prg",
+  project_director = "pdr",
+  scientific_advisor = "sad",
+  second_party = "spy",
+  sponsor = "spn",
+  supporting_host = "sht",
+  teacher = "tch",
+  thesis_advisor = "ths",
+  translator = "trl",
+  research_team_head = "rth",
+  research_team_member = "rtm",
+  researcher = "res",
+  reviewer = "rev",
+  witness = "wit",
+  woodcutter = "wdc"
 )
@@ -394,49 +1078,171 @@

Tidying the data

With all the explanations out of the way we can now tidy the data with our helper functions.

-
cran_authors <- cran_pkg_db |>
-  mutate(
-    # Letters with accents, etc. should be normalized so that names including
-    # them are picked up by the regex.
-    across(c(authors, authors_r), \(.x) stri_trans_general(.x, "latin-ascii")),
-    # The extraction functions aren't vectorized so they have to be mapped over.
-    # This creates a list column.
-    persons = if_else(
-      is.na(authors_r),
-      map(authors, \(.x) authors(.x)),
-      map(authors_r, \(.x) authors_r(.x))
+
cran_authors <- cran_pkg_db |>
+  mutate(
+    # Letters with accents, etc. should be normalized so that names including
+    # them are picked up by the regex.
+    across(c(authors, authors_r), \(.x) stri_trans_general(.x, "latin-ascii")),
+    # The extraction functions aren't vectorized so they have to be mapped over.
+    # This creates a list column.
+    persons = if_else(
+      is.na(authors_r),
+      map(authors, \(.x) authors(.x)),
+      map(authors_r, \(.x) authors_r(.x))
     )
-  ) |>
-  select(-c(authors, authors_r)) |>
-  unnest(persons) |>
-  # If a package only has one author then they must be the author and creator,
-  # so it's safe to impute this when it isn't there.
-  group_by(package) |>
-  mutate(roles = if_else(
-    is.na(roles) & n() == 1, "[aut, cre]", roles
-  )) |>
-  ungroup()
+ ) |> + select(-c(authors, authors_r)) |> + unnest(persons) |> + # If a package only has one author then they must be the author and creator, + # so it's safe to impute this when it isn't there. + group_by(package) |> + mutate(roles = if_else( + is.na(roles) & n() == 1, "[aut, cre]", roles + )) |> + ungroup()

Then add the indicator columns for roles. Note the use of the walrus operator (:=) here to create new columns from the full names of MARC roles on the left side of the walrus, while detecting the MARC codes with str_detect() on the right side. I’m mapping over this because the left side can’t be a vector.

-
cran_authors_tidy <- cran_authors |>
-  # Add indicator columns for all roles.
-  bind_cols(
-    map2_dfc(
-      names(marc_roles), marc_roles,
-      function(.x, .y) {
-        cran_authors |>
-          mutate(!!.x := str_detect(roles, .y)) |>
-          select(!!.x)
+
cran_authors_tidy <- cran_authors |>
+  # Add indicator columns for all roles.
+  bind_cols(
+    map2_dfc(
+      names(marc_roles), marc_roles,
+      function(.x, .y) {
+        cran_authors |>
+          mutate(!!.x := str_detect(roles, .y)) |>
+          select(!!.x)
       }
     )
-  ) |>
-  # Not everyone's role is known.
-  mutate(unknown = is.na(roles))
+ ) |> + # Not everyone's role is known. + mutate(unknown = is.na(roles))

This all leaves us with a tidy (mostly error free) data frame about R developers and their roles that is ready to explore:

-
glimpse(cran_authors_tidy)
+
glimpse(cran_authors_tidy)
#> Rows: 52,719
 #> Columns: 50
@@ -498,51 +1304,235 @@
 

R developer statistics

I’ll start with person-level stats, mainly because some of the other stats are further summaries of these statistics. Nothing fancy here, just the number of packages a person has contributed to, role counts, and nominal and percentile rankings. Both the ranking methods used here give every tie the same (smallest) value, so if two people tied for second place both their ranks would be 2, and the next person’s rank would be 4.

-
cran_author_pkg_counts <- cran_authors_tidy |>
-  group_by(person) |>
-  summarise(
-    n_packages = n(),
-    across(analyst:unknown, function(.x) sum(.x, na.rm = TRUE))
-  ) |>
-  mutate(
-    # Discretizing this for visualization purposes later on
-    n_pkgs_fct = case_when(
-      n_packages == 1 ~ "One",
-      n_packages == 2 ~ "Two",
-      n_packages == 3 ~ "Three",
-      n_packages >= 4 ~ "Four+"
+
cran_author_pkg_counts <- cran_authors_tidy |>
+  group_by(person) |>
+  summarise(
+    n_packages = n(),
+    across(analyst:unknown, function(.x) sum(.x, na.rm = TRUE))
+  ) |>
+  mutate(
+    # Discretizing this for visualization purposes later on
+    n_pkgs_fct = case_when(
+      n_packages == 1 ~ "One",
+      n_packages == 2 ~ "Two",
+      n_packages == 3 ~ "Three",
+      n_packages >= 4 ~ "Four+"
     ),
-    n_pkgs_fct = factor(n_pkgs_fct, levels = c("One", "Two", "Three", "Four+")),
-    rank = min_rank(desc(n_packages)),
-    percentile = percent_rank(n_packages) * 100,
-    .after = n_packages
-  ) |>
-  arrange(desc(n_packages))
+ n_pkgs_fct = factor(n_pkgs_fct, levels = c("One", "Two", "Three", "Four+")), + rank = min_rank(desc(n_packages)), + percentile = percent_rank(n_packages) * 100, + .after = n_packages + ) |> + arrange(desc(n_packages))

Here’s an interactive gt table of the person-level stats so you can find yourself, or ask silly questions like how many other authors share a name with you. If you page or search through it you can also get an idea of the data quality (e.g., try “Posit” under the person column and you’ll see that they don’t use a consistent organization name across all packages, which creates some measurement error here).

Code -
cran_author_pkg_counts |>
-  select(-n_pkgs_fct) |>
-  gt() |>
-  tab_header(
-    title = "R Developer Contributions",
-    subtitle = "CRAN Package Authorships and Roles"
-  ) |>
-  text_transform(
-    \(.x) str_to_title(str_replace_all(.x, "_", " ")),
-    locations = cells_column_labels()
-  ) |>
-  fmt_number(
-    columns = percentile
-  ) |>
-  fmt(
-    columns = rank,
-    fns = \(.x) label_ordinal()(.x)
-  ) |>
-  cols_width(everything() ~ px(120)) |>
-  opt_interactive(use_sorting = FALSE, use_filters = TRUE)
+
cran_author_pkg_counts |>
+  select(-n_pkgs_fct) |>
+  gt() |>
+  tab_header(
+    title = "R Developer Contributions",
+    subtitle = "CRAN Package Authorships and Roles"
+  ) |>
+  text_transform(
+    \(.x) str_to_title(str_replace_all(.x, "_", " ")),
+    locations = cells_column_labels()
+  ) |>
+  fmt_number(
+    columns = percentile
+  ) |>
+  fmt(
+    columns = rank,
+    fns = \(.x) label_ordinal()(.x)
+  ) |>
+  cols_width(everything() ~ px(120)) |>
+  opt_interactive(use_sorting = FALSE, use_filters = TRUE)
@@ -1006,22 +1996,98 @@

Package contributions

The title of this post probably gave this away, but around 90% of R developers have worked on one to three packages, and only around 10% have worked on four or more packages.

-
cran_author_pkg_counts |>
-  group_by(n_pkgs_fct) |>
-  summarise(n_people = n()) |>
-  ggplot(mapping =  aes(x = n_pkgs_fct, y = n_people)) +
-    geom_col() +
-    scale_y_continuous(
-      sec.axis = sec_axis(
-        trans = \(.x) .x / nrow(cran_author_pkg_counts),
-        name = "Percent of sample",
-        labels = label_percent(),
-        breaks = c(0, .05, .10, .15, .70)
+
cran_author_pkg_counts |>
+  group_by(n_pkgs_fct) |>
+  summarise(n_people = n()) |>
+  ggplot(mapping =  aes(x = n_pkgs_fct, y = n_people)) +
+    geom_col() +
+    scale_y_continuous(
+      sec.axis = sec_axis(
+        trans = \(.x) .x / nrow(cran_author_pkg_counts),
+        name = "Percent of sample",
+        labels = label_percent(),
+        breaks = c(0, .05, .10, .15, .70)
       )
-    ) +
-    labs(
-      x = "Package contributions",
-      y = "People"
+    ) +
+    labs(
+      x = "Package contributions",
+      y = "People"
     )

@@ -1029,23 +2095,105 @@

Notably, in the group that have worked on four or more packages, the spread of package contributions is huge. This vast range is mostly driven by people who do R package development as part of their job (e.g., if you look at the cran_author_pkg_counts table above, most of the people at the very top are either professors of statistics or current or former developers from Posit, rOpenSci, or the R Core Team).

-
cran_author_pkg_counts |>
-  filter(n_pkgs_fct == "Four+") |>
-  group_by(rank, n_packages) |>
-  summarise(n_people = n()) |>
-  ggplot(mapping = aes(x = n_packages, y = n_people)) +
-    geom_segment(aes(xend = n_packages, yend = 0)) +
-    geom_point() +
-    scale_y_continuous(
-      sec.axis = sec_axis(
-        trans = \(.x) .x / nrow(cran_author_pkg_counts),
-        name = "Percent of sample",
-        labels = label_percent()
+
cran_author_pkg_counts |>
+  filter(n_pkgs_fct == "Four+") |>
+  group_by(rank, n_packages) |>
+  summarise(n_people = n()) |>
+  ggplot(mapping = aes(x = n_packages, y = n_people)) +
+    geom_segment(aes(xend = n_packages, yend = 0)) +
+    geom_point() +
+    scale_y_continuous(
+      sec.axis = sec_axis(
+        trans = \(.x) .x / nrow(cran_author_pkg_counts),
+        name = "Percent of sample",
+        labels = label_percent()
       )
-    ) +
-    labs(
-      x = "Package contributions",
-      y = "People"
+    ) +
+    labs(
+      x = "Package contributions",
+      y = "People"
     )

@@ -1053,15 +2201,51 @@

Here are some subsample summary statistics to compliment the plots above.

-
cran_author_pkg_counts |>
-  group_by(n_packages >= 4) |>
-  summarise(
-    n_developers = n(),
-    n_pkgs_mean = mean(n_packages),
-    n_pkgs_sd = sd(n_packages),
-    n_pkgs_median = median(n_packages),
-    n_pkgs_min = min(n_packages),
-    n_pkgs_max = max(n_packages)
+
cran_author_pkg_counts |>
+  group_by(n_packages >= 4) |>
+  summarise(
+    n_developers = n(),
+    n_pkgs_mean = mean(n_packages),
+    n_pkgs_sd = sd(n_packages),
+    n_pkgs_median = median(n_packages),
+    n_pkgs_min = min(n_packages),
+    n_pkgs_max = max(n_packages)
   )
#> # A tibble: 2 × 7
@@ -1078,14 +2262,44 @@
 

Not every contribution to an R package involves code. For example, two authors of the wiad package were woodcutters! The package is for wood image analysis, so although it’s surprising a role like that exists, it makes a lot of sense in context. Anyways, neat factoids aside, the point of this section is to look at the distribution of different roles in R package development.

To start, let’s get an idea of how many people were involved in programming-related roles. This won’t be universally true, but most of the time the following roles will involve programming:

-
programming_roles <-
-  c("author", "creator", "contributor", "compiler", "programmer")
+
programming_roles <-
+  c("author", "creator", "contributor", "compiler", "programmer")

Here’s the count:

-
cran_author_pkg_counts |>
-  filter(if_any(!!programming_roles, \(.x) .x > 0)) |>
-  nrow()
+
cran_author_pkg_counts |>
+  filter(if_any(!!programming_roles, \(.x) .x > 0)) |>
+  nrow()
#> [1] 24170
@@ -1093,16 +2307,96 @@

There were also 5434 whose role was unknown (either because it wasn’t specified or wasn’t picked up by my regex method). Regardless, most people have been involved in programming-related roles, and although other roles occur they’re relatively rare.

Here’s a plot to compliment this point:

-
cran_authors_tidy |>
-  summarise(across(analyst:unknown, function(.x) sum(.x, na.rm = TRUE))) |>
-  pivot_longer(cols = everything(), names_to = "role", values_to = "n") |>
-  arrange(desc(n)) |>
-  ggplot(mapping = aes(x = n, y = reorder(role, n))) +
-    geom_segment(aes(xend = 0, yend = role)) +
-    geom_point() +
-    labs(
-      x = "Count across packages",
-      y = "Role"
+
cran_authors_tidy |>
+  summarise(across(analyst:unknown, function(.x) sum(.x, na.rm = TRUE))) |>
+  pivot_longer(cols = everything(), names_to = "role", values_to = "n") |>
+  arrange(desc(n)) |>
+  ggplot(mapping = aes(x = n, y = reorder(role, n))) +
+    geom_segment(aes(xend = 0, yend = role)) +
+    geom_point() +
+    labs(
+      x = "Count across packages",
+      y = "Role"
     )

@@ -1116,21 +2410,91 @@

This is why Hadley is on the cover of Glamour magazine and we’re not.

-
cran_author_pkg_counts |>
-  # We don't want organizations or groups here
-  filter(!(person %in% c("RStudio", "R Core Team", "Posit Software, PBC"))) |>
-  head(20) |>
-  select(person, n_packages) |>
-  gt() |>
-  tab_header(
-    title = "Top 20 R Developers",
-    subtitle = "Based on number of CRAN package authorships"
-  ) |>
-  text_transform(
-    \(.x) str_to_title(str_replace_all(.x, "_", " ")),
-    locations = cells_column_labels()
-  ) |>
-  cols_width(person ~ px(140))
+
cran_author_pkg_counts |>
+  # We don't want organizations or groups here
+  filter(!(person %in% c("RStudio", "R Core Team", "Posit Software, PBC"))) |>
+  head(20) |>
+  select(person, n_packages) |>
+  gt() |>
+  tab_header(
+    title = "Top 20 R Developers",
+    subtitle = "Based on number of CRAN package authorships"
+  ) |>
+  text_transform(
+    \(.x) str_to_title(str_replace_all(.x, "_", " ")),
+    locations = cells_column_labels()
+  ) |>
+  cols_width(person ~ px(140))
@@ -1651,7 +3015,7 @@ -

+

@@ -1666,7 +3030,7 @@

Thanks for reading! I’m Michael, the voice behind Tidy Tales. I am an award winning data scientist and R programmer with the skills and experience to help you solve the problems you care about. You can learn more about me, my consulting services, and my other projects on my personal website.

-

Comments

+

Comments

@@ -1675,7 +3039,7 @@
-

Session Info

+

Session Info

@@ -1733,15 +3097,8 @@ Any of the trademarks, service marks, collective marks, design rights or similar -

Citation

BibTeX citation:
@online{mccarthy2023,
-  author = {Michael McCarthy},
-  title = {The {Pareto} {Principle} in {R} Package Development},
-  date = {2023-05-03},
-  url = {https://tidytales.ca/posts/2023-05-03_r-developers},
-  langid = {en}
-}
-
For attribution, please cite this work as:
-Michael McCarthy. (2023, May 3). The Pareto Principle in R package +

Citation

For attribution, please cite this work as:
+McCarthy, M. (2023, May 3). The Pareto Principle in R package development. https://tidytales.ca/posts/2023-05-03_r-developers
]]> .Wrangle @@ -1766,7 +3123,7 @@ development. h
-
+ -
+ -
+ -
+ -
+ -
+ -
+ -
+ -
+ -
+ -
+

@@ -2012,22 +3369,14 @@ Jan 24, 2023

No matching items
-

Artwork

+

Artwork

Reuse

Citation

BibTeX citation:
@online{mccarthy2023,
-  author = {Michael McCarthy},
-  title = {Reproducible {Data} {Science}},
-  date = {2023-01-24},
-  url = {https://tidytales.ca/series/2023-01-24_reproducible-data-science},
-  langid = {en}
-}
-
For attribution, please cite this work as:
-Michael McCarthy. (2023, January 24). Reproducible Data -Science. https://tidytales.ca/series/2023-01-24_reproducible-data-science +

Reuse

Citation

For attribution, please cite this work as:
+McCarthy, M. (2023, January 24). Reproducible Data Science. https://tidytales.ca/series/2023-01-24_reproducible-data-science
]]> .Misc https://tidytales.ca/series/2023-01-24_reproducible-data-science/index.html @@ -2047,10 +3396,18 @@ Science. Prerequisites

To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:

-
library(ggplot2)
-library(ggdist)
-library(palettes)
-library(forcats)
+
library(ggplot2)
+library(ggdist)
+library(palettes)
+library(forcats)
@@ -2109,48 +3466,226 @@ Science. - -
-
- -
Code -
ggplot(likert_scores, aes(x = score, y = item)) +
-  stat_slab(
-    aes(fill = cut(after_stat(x), breaks = breaks(x))),
-    justification = -.2,
-    height = 0.7,
-    slab_colour = "black",
-    slab_linewidth = 0.5,
-    trim = TRUE
-  ) +
-  geom_boxplot(
-    width = .2,
-    outlier.shape = NA
-  ) +
-  geom_jitter(width = .1, height = .1, alpha = .3) +
-  scale_fill_manual(
-    values = pal_ramp(met_palettes$Hiroshige, 5, -1),
-    labels = 1:5,
-    guide = guide_legend(title = "score", reverse = TRUE)
+
ggplot(likert_scores, aes(x = score, y = item)) +
+  stat_slab(
+    aes(fill = cut(after_stat(x), breaks = breaks(x))),
+    justification = -.2,
+    height = 0.7,
+    slab_colour = "black",
+    slab_linewidth = 0.5,
+    trim = TRUE
+  ) +
+  geom_boxplot(
+    width = .2,
+    outlier.shape = NA
+  ) +
+  geom_jitter(width = .1, height = .1, alpha = .3) +
+  scale_fill_manual(
+    values = pal_ramp(met_palettes$Hiroshige, 5, -1),
+    labels = 1:5,
+    guide = guide_legend(title = "score", reverse = TRUE)
   )
-ggplot(likert_scores, aes(x = score, y = item)) +
-  stat_slab(
-    justification = -.2,
-    height = 0.7,
-    slab_colour = "black",
-    slab_linewidth = 0.5,
-    trim = FALSE
-  ) +
-  geom_boxplot(
-    width = .2,
-    outlier.shape = NA
-  ) +
-  geom_jitter(width = .1, height = .1, alpha = .3) +
-  scale_x_continuous(breaks = 1:5)
+ggplot(likert_scores, aes(x = score, y = item)) + + stat_slab( + justification = -.2, + height = 0.7, + slab_colour = "black", + slab_linewidth = 0.5, + trim = FALSE + ) + + geom_boxplot( + width = .2, + outlier.shape = NA + ) + + geom_jitter(width = .1, height = .1, alpha = .3) + + scale_x_continuous(breaks = 1:5)
@@ -2166,7 +3701,7 @@ Science.

-

trim = FALSE

+
trim = FALSE
@@ -2180,68 +3715,338 @@ Science. Histogram raincloud plots

First make some data.

-
set.seed(123)
+
set.seed(123)
 
-likert_scores <- data.frame(
-  item = rep(letters[1:2], times = 33),
-  score = sample(1:5, 66, replace = TRUE)
+likert_scores <- data.frame(
+  item = rep(letters[1:2], times = 33),
+  score = sample(1:5, 66, replace = TRUE)
 )

It’s straightforward to make density histograms for each item with ggplot2.

-
ggplot(likert_scores, aes(x = score, y = after_stat(density))) +
-  geom_histogram(
-    aes(fill = after_stat(x)),
-    bins = 5,
-    colour = "black"
-  ) +
-  scale_fill_gradientn(
-    colours = pal_ramp(met_palettes$Hiroshige, 5, -1),
-    guide = guide_legend(title = "score", reverse = TRUE)
-  ) +
-  facet_wrap(vars(fct_rev(item)), ncol = 1)
+
ggplot(likert_scores, aes(x = score, y = after_stat(density))) +
+  geom_histogram(
+    aes(fill = after_stat(x)),
+    bins = 5,
+    colour = "black"
+  ) +
+  scale_fill_gradientn(
+    colours = pal_ramp(met_palettes$Hiroshige, 5, -1),
+    guide = guide_legend(title = "score", reverse = TRUE)
+  ) +
+  facet_wrap(vars(fct_rev(item)), ncol = 1)

However, the density histograms in this plot can’t be vertically justified to give space for the box and whiskers plot and points used in a typical raincloud plot. For that we need the stat_slab() function from the ggdist package and a small helper function to determine where to put breaks in the histogram.

-
#' Set breaks so bins are centred on each score
-#'
-#' @param x A vector of values.
-#' @param width Any value between 0 and 0.5 for setting the width of the bins.
-breaks <- function(x, width = 0.49999999) {
-  rep(1:max(x), each = 2) + c(-width, width)
+
#' Set breaks so bins are centred on each score
+#'
+#' @param x A vector of values.
+#' @param width Any value between 0 and 0.5 for setting the width of the bins.
+breaks <- function(x, width = 0.49999999) {
+  rep(1:max(x), each = 2) + c(-width, width)
 }

The default slab type for stat_slab() is a probability density (or mass) function ("pdf"), but it can also calculate density histograms ("histogram"). To match the appearance of geom_histogram(), the breaks argument needs to be given the location of each bin’s left and right edge; this also necessitates using cut() with the fill aesthetic so the fill breaks correctly align with each bin.

-
ggplot(likert_scores, aes(x = score, y = item)) +
-  stat_slab(
-    # Divide fill into five equal bins
-    aes(fill = cut(after_stat(x), breaks = 5)),
-    slab_type = "histogram",
-    breaks = \(x) breaks(x),
-    # Justify the histogram upwards
-    justification = -.2,
-    # Reduce the histogram's height so it doesn't cover geoms from other items
-    height = 0.7,
-    # Add black outlines because they look nice
-    slab_colour = "black",
-    outline_bars = TRUE,
-    slab_linewidth = 0.5
-  ) +
-  geom_boxplot(
-    width = .2,
-    # Hide outliers since the raw data will be plotted
-    outlier.shape = NA
-  ) +
-  geom_jitter(width = .1, height = .1, alpha = .3) +
-  # Cutting the fill into bins puts it on a discrete scale
-  scale_fill_manual(
-    values = pal_ramp(met_palettes$Hiroshige, 5, -1),
-    labels = 1:5,
-    guide = guide_legend(title = "score", reverse = TRUE)
+
ggplot(likert_scores, aes(x = score, y = item)) +
+  stat_slab(
+    # Divide fill into five equal bins
+    aes(fill = cut(after_stat(x), breaks = 5)),
+    slab_type = "histogram",
+    breaks = \(x) breaks(x),
+    # Justify the histogram upwards
+    justification = -.2,
+    # Reduce the histogram's height so it doesn't cover geoms from other items
+    height = 0.7,
+    # Add black outlines because they look nice
+    slab_colour = "black",
+    outline_bars = TRUE,
+    slab_linewidth = 0.5
+  ) +
+  geom_boxplot(
+    width = .2,
+    # Hide outliers since the raw data will be plotted
+    outlier.shape = NA
+  ) +
+  geom_jitter(width = .1, height = .1, alpha = .3) +
+  # Cutting the fill into bins puts it on a discrete scale
+  scale_fill_manual(
+    values = pal_ramp(met_palettes$Hiroshige, 5, -1),
+    labels = 1:5,
+    guide = guide_legend(title = "score", reverse = TRUE)
   )

@@ -2253,7 +4058,7 @@ Science.

+

Comments

+

Comments

@@ -2277,7 +4082,7 @@ Science.

Session Info

+

Session Info

Reuse

Citation

BibTeX citation:
@online{mccarthy2023,
-  author = {Michael McCarthy},
-  title = {Histogram Raincloud Plots},
-  date = {2023-01-19},
-  url = {https://tidytales.ca/snippets/2023-01-19_ggdist-histogram-rainclouds},
-  langid = {en}
-}
-
For attribution, please cite this work as:
-Michael McCarthy. (2023, January 19). Histogram raincloud -plots. https://tidytales.ca/snippets/2023-01-19_ggdist-histogram-rainclouds +

Reuse

Citation

For attribution, please cite this work as:
+McCarthy, M. (2023, January 19). Histogram raincloud plots. https://tidytales.ca/snippets/2023-01-19_ggdist-histogram-rainclouds
]]> .Visualize {ggplot2} @@ -2362,35 +4159,165 @@ plots. Prerequisites

To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:

-
library(ggplot2)
-library(patchwork)
+
library(ggplot2)
+library(patchwork)

Then make some data and ggplot2 plots to be used in the patchwork.

-
huron <- data.frame(year = 1875:1972, level = as.vector(LakeHuron))
-h <- ggplot(huron, aes(year))
+
huron <- data.frame(year = 1875:1972, level = as.vector(LakeHuron))
+h <- ggplot(huron, aes(year))
 
-h1 <- h +
-  geom_ribbon(aes(ymin = level - 1, ymax = level + 1), fill = "grey70") +
-  geom_line(aes(y = level))
+h1 <- h +
+  geom_ribbon(aes(ymin = level - 1, ymax = level + 1), fill = "grey70") +
+  geom_line(aes(y = level))
 
-h2 <- h + geom_area(aes(y = level))
+h2 <- h + geom_area(aes(y = level))

Shared x-axis labels

We set the bottom margin to 0 so the tag is in the same vertical position that the x-axis would otherwise be in.

-
# Create the patchwork, dropping the x-axis labels from the plots, and setting
-# the margins
-h_patch <- h1 + h2 & xlab(NULL) & theme(plot.margin = margin(5.5, 5.5, 0, 5.5))
+
# Create the patchwork, dropping the x-axis labels from the plots, and setting
+# the margins
+h_patch <- h1 + h2 & xlab(NULL) & theme(plot.margin = margin(5.5, 5.5, 0, 5.5))
 
-# Use the tag label as an x-axis label
-wrap_elements(panel = h_patch) +
-  labs(tag = "year") +
-  theme(
-    plot.tag = element_text(size = rel(1)),
-    plot.tag.position = "bottom"
+# Use the tag label as an x-axis label
+wrap_elements(panel = h_patch) +
+  labs(tag = "year") +
+  theme(
+    plot.tag = element_text(size = rel(1)),
+    plot.tag.position = "bottom"
   )

@@ -2401,16 +4328,80 @@ plots.
Shared y-axis labels

We set the left margin to 0 so the tag is in the same horizontal position that the y-axis would otherwise be in.

-
# Create the patchwork, dropping the y-axis labels from the plots, and setting
-# the margins
-h_patch <- h1 / h2 & ylab(NULL) & theme(plot.margin = margin(5.5, 5.5, 5.5, 0))
+
# Create the patchwork, dropping the y-axis labels from the plots, and setting
+# the margins
+h_patch <- h1 / h2 & ylab(NULL) & theme(plot.margin = margin(5.5, 5.5, 5.5, 0))
 
-# Use the tag label as a y-axis label
-wrap_elements(h_patch) +
-  labs(tag = "level") +
-  theme(
-    plot.tag = element_text(size = rel(1), angle = 90),
-    plot.tag.position = "left"
+# Use the tag label as a y-axis label
+wrap_elements(h_patch) +
+  labs(tag = "level") +
+  theme(
+    plot.tag = element_text(size = rel(1), angle = 90),
+    plot.tag.position = "left"
   )

@@ -2421,32 +4412,144 @@ plots.
Shared axis labels without using patchwork

Elio Campitelli shared a solution on Mastodon that accomplishes the same results as above, but without patchwork. It uses the magic tilde notation to create functions in the data argument of each geom that adds a grouping variable var that can be faceted on.

-
h <- ggplot(huron, aes(year)) +
-  geom_ribbon(
-    data = ~ transform(.x, var = "a"),
-    aes(ymin = level - 1, ymax = level + 1),
-    fill = "grey70"
-  ) +
-  geom_line(data = ~ transform(.x, var = "a"), aes(y = level)) +
-  geom_area(data = ~ transform(.x, var = "b"), aes(y = level)) +
-  # Since we don't care about the facet strips here, we can remove them.
-  theme(
-    strip.text = element_blank(),
-    strip.background = element_blank()
+
h <- ggplot(huron, aes(year)) +
+  geom_ribbon(
+    data = ~ transform(.x, var = "a"),
+    aes(ymin = level - 1, ymax = level + 1),
+    fill = "grey70"
+  ) +
+  geom_line(data = ~ transform(.x, var = "a"), aes(y = level)) +
+  geom_area(data = ~ transform(.x, var = "b"), aes(y = level)) +
+  # Since we don't care about the facet strips here, we can remove them.
+  theme(
+    strip.text = element_blank(),
+    strip.background = element_blank()
   )

Facet by rows for a shared x-axis.

-
h +
-  facet_wrap(vars(var), scales = "free_y")
+
h +
+  facet_wrap(vars(var), scales = "free_y")

Facet by columns for a shared y-axis.

-
h +
-  facet_wrap(vars(var), scales = "free_y", ncol = 1)
+
h +
+  facet_wrap(vars(var), scales = "free_y", ncol = 1)

@@ -2456,7 +4559,7 @@ plots.

+

Comments

+

Comments

@@ -2480,7 +4583,7 @@ plots.

Session Info

+

Session Info

Reuse

Citation

BibTeX citation:
@online{mccarthy2022,
-  author = {Michael McCarthy},
-  title = {Shared Axis Labels in Patchwork Plots},
-  date = {2022-12-22},
-  url = {https://tidytales.ca/snippets/2022-12-22_patchwork-shared-axis-labels},
-  langid = {en}
-}
-
For attribution, please cite this work as:
-Michael McCarthy. (2022, December 22). Shared axis labels in -patchwork plots. https://tidytales.ca/snippets/2022-12-22_patchwork-shared-axis-labels +

Reuse

Citation

For attribution, please cite this work as:
+McCarthy, M. (2022, December 22). Shared axis labels in patchwork +plots. https://tidytales.ca/snippets/2022-12-22_patchwork-shared-axis-labels
]]> .Visualize {ggplot2} @@ -2579,7 +4675,11 @@ $(function(){
  • To make it easy for anyone to make their own colour palette package. Colour palette packages made with palettes exist solely for the purpose of distributing colour palettes and get access to all the features of palettes for free.

  • If you just want to jump in and start using palettes, you can install it from CRAN with:

    -
    install.packages("palettes")
    +
    install.packages("palettes")

    The package website is the best place to start: https://mccarthy-m-g.github.io/palettes/index.html

    If you want to learn more about why you should be using palettes, read on to learn more about the motivation of the package and how it makes working with colour vectors and colour palettes easy and fun for everyone.

    @@ -2589,7 +4689,7 @@ $(function(){

    -

    Hellboy promo poster by Mike Mignola

    +
    Hellboy promo poster by Mike Mignola

    So how does one make a colour palette package in R? My answer now is to read the Creating a colour palette package vignette and make it with palettes. My answer then was to read the source code of several other colour palette packages, then reimplement the relevant functions in BPRDcolours. Not a great answer, but it’s the approach everyone else was using.

    @@ -2605,7 +4705,9 @@ $(function(){

    Just show me some colour palettes already!

    Okay, okay.

    -
    library(palettes)
    +
    library(palettes)

    Colour classes in palettes come in two forms:

      @@ -2615,7 +4717,23 @@ $(function(){

      Colour vectors can be thought of as a base type for colours, and colour palettes are just (named) lists of colour vectors. To illustrate, let’s use some colours from the MetBrewer package.

      pal_colour() is a nice way to create a colour vector.

      -
      java <- pal_colour(c("#663171", "#cf3a36", "#ea7428", "#e2998a", "#0c7156"))
      +
      java <- pal_colour(c("#663171", "#cf3a36", "#ea7428", "#e2998a", "#0c7156"))
       java
      #> <palettes_colour[5]>
       #>  #663171
      @@ -2627,9 +4745,27 @@ $(function(){
       

      pal_palette() is a nice way to create named colour palettes.

      -
      metbrewer_palettes <- pal_palette(
      -  egypt = c("#dd5129", "#0f7ba2", "#43b284", "#fab255"),
      -  java  = java
      +
      metbrewer_palettes <- pal_palette(
      +  egypt = c("#dd5129", "#0f7ba2", "#43b284", "#fab255"),
      +  java  = java
       )
       metbrewer_palettes
      #> <palettes_palette[2]>
      @@ -2651,7 +4787,9 @@ $(function(){
       

      plot() is a nice way to showcase colour vectors and colour palettes. The appearance of the plot depends on the input.

      -
      plot(metbrewer_palettes)
      +
      plot(metbrewer_palettes)

      @@ -2659,7 +4797,11 @@ $(function(){

      Casting and coercion methods are also available to turn other objects (like character vectors or lists) into colour vectors and colour palettes.

      You can even cast colour vectors and colour palettes into tibbles.

      -
      metbrewer_tbl <- as_tibble(metbrewer_palettes)
      +
      metbrewer_tbl <- as_tibble(metbrewer_palettes)
       metbrewer_tbl
      #> # A tibble: 9 × 2
       #>   palette colour   
      @@ -2677,9 +4819,19 @@ $(function(){
       

      This is useful if you want to wrangle the colours with dplyr.

      -
      library(dplyr)
      +
      library(dplyr)
       
      -metbrewer_tbl <- slice(metbrewer_tbl, -8)
      +metbrewer_tbl <- slice(metbrewer_tbl, -8)
       metbrewer_tbl
      #> # A tibble: 8 × 2
       #>   palette colour   
      @@ -2696,12 +4848,28 @@ $(function(){
       

      Then go back to a colour palette with the deframe() function from tibble.

      -
      library(tibble)
      +
      library(tibble)
       
      -metbrewer_tbl %>%
      -  group_by(palette) %>%
      -  summarise(pal_palette(colour)) %>%
      -  deframe()
      +metbrewer_tbl %>% + group_by(palette) %>% + summarise(pal_palette(colour)) %>% + deframe()
      #> <palettes_palette[2]>
       #> $egypt
       #> <palettes_colour[4]>
      @@ -2723,17 +4891,67 @@ $(function(){
       

      What about ggplot2 plots?

      Just use one of the scale_ functions!

      -
      library(ggplot2)
      +
      library(ggplot2)
       
      -hiroshige <- pal_colour(c(
      -  "#1e466e", "#376795", "#528fad", "#72bcd5", "#aadce0",
      -  "#ffe6b7", "#ffd06f", "#f7aa58", "#ef8a47", "#e76254"
      +hiroshige <- pal_colour(c(
      +  "#1e466e", "#376795", "#528fad", "#72bcd5", "#aadce0",
      +  "#ffe6b7", "#ffd06f", "#f7aa58", "#ef8a47", "#e76254"
       ))
       
      -ggplot(faithfuld, aes(waiting, eruptions, fill = density)) +
      -  geom_raster() +
      -  coord_cartesian(expand = FALSE) +
      -  scale_fill_palette_c(hiroshige)
      +ggplot(faithfuld, aes(waiting, eruptions, fill = density)) + + geom_raster() + + coord_cartesian(expand = FALSE) + + scale_fill_palette_c(hiroshige)

      @@ -2755,15 +4973,12 @@ $(function(){
    • Make a better hex sticker (looking for help on this one!)

    If you have other suggestions or requests, please file an issue on GitHub.

    -
    - -
    -

    +

    @@ -2778,7 +4993,7 @@ $(function(){

    Thanks for reading! I’m Michael, the voice behind Tidy Tales. I am an award winning data scientist and R programmer with the skills and experience to help you solve the problems you care about. You can learn more about me, my consulting services, and my other projects on my personal website.

    -

    Comments

    +

    Comments

    @@ -2787,7 +5002,7 @@ $(function(){
    -

    Session Info

    +

    Session Info

    @@ -2837,15 +5052,8 @@ Any of the trademarks, service marks, collective marks, design rights or similar -

    Citation

    BibTeX citation:
    @online{mccarthy2022,
    -  author = {Michael McCarthy},
    -  title = {Introducing the Palettes Package},
    -  date = {2022-12-20},
    -  url = {https://tidytales.ca/posts/2022-12-20_palettes},
    -  langid = {en}
    -}
    -
    For attribution, please cite this work as:
    -Michael McCarthy. (2022, December 20). Introducing the palettes +

    Citation

    For attribution, please cite this work as:
    +McCarthy, M. (2022, December 20). Introducing the palettes package. https://tidytales.ca/posts/2022-12-20_palettes
    ]]> .Visualize @@ -2870,38 +5078,170 @@ package. https://t

    Prerequisites

    To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:

    -
    library(tidyverse)
    -library(ggdist)
    -library(geomtextpath)
    +
    library(tidyverse)
    +library(ggdist)
    +library(geomtextpath)

    Directly labeling lineribbons

    First make some data.

    -
    set.seed(1234)
    -n = 5000
    +
    set.seed(1234)
    +n = 5000
     
    -df <- tibble(
    -  .draw = 1:n,
    -  intercept = rnorm(n, 3, 1),
    -  slope = rnorm(n, 1, 0.25),
    -  x = list(-4:5),
    -  y = map2(intercept, slope, ~ .x + .y * -4:5)
    -) %>%
    -  unnest(c(x, y))
    +df <- tibble( + .draw = 1:n, + intercept = rnorm(n, 3, 1), + slope = rnorm(n, 1, 0.25), + x = list(-4:5), + y = map2(intercept, slope, ~ .x + .y * -4:5) +) %>% + unnest(c(x, y))

    Then plot it.

    -
    df %>%
    -  group_by(x) %>%
    -  median_qi(y, .width = c(.50, .80, .95)) %>%
    -  ggplot(aes(x = x, y = y, ymin = .lower, ymax = .upper)) +
    -  # Hide the line from geom_lineribbon() by setting `size = 0`
    -  geom_lineribbon(size = 0) +
    -  scale_fill_brewer() +
    -  # Replace the hidden line with a labelled line
    -  geom_textline(label = "label")
    +
    df %>%
    +  group_by(x) %>%
    +  median_qi(y, .width = c(.50, .80, .95)) %>%
    +  ggplot(aes(x = x, y = y, ymin = .lower, ymax = .upper)) +
    +  # Hide the line from geom_lineribbon() by setting `size = 0`
    +  geom_lineribbon(size = 0) +
    +  scale_fill_brewer() +
    +  # Replace the hidden line with a labelled line
    +  geom_textline(label = "label")

    @@ -2911,7 +5251,7 @@ package.
    https://t - ]]> .Visualize @@ -3026,7 +5359,7 @@ lineribbons with geomtextpath.

    -

    In-game screenshot from Borderlands 3.

    +
    In-game screenshot from Borderlands 3.
    @@ -3045,7 +5378,7 @@ lineribbons with geomtextpath.

    -

    The ECHO-3 in-game menu in Borderlands 3.

    +
    The ECHO-3 in-game menu in Borderlands 3.
    @@ -3057,7 +5390,7 @@ lineribbons with geomtextpath.

    -

    Loot in Borderlands (guns, grenades, shields, etc.) are colour categorized by the rarity with which they can be found in containers or dropped by defeated enemies. From left to right the categories are: Common, Uncommon, Rare, Epic, Legendary.

    +
    Loot in Borderlands (guns, grenades, shields, etc.) are colour categorized by the rarity with which they can be found in containers or dropped by defeated enemies. From left to right the categories are: Common, Uncommon, Rare, Epic, Legendary.
    @@ -3077,16 +5410,38 @@ lineribbons with geomtextpath.

    Prerequisites

    -
    library(tidyverse)
    -library(glue)
    -library(lubridate)
    -library(magick)
    -library(ggdist)
    +
    library(tidyverse)
    +library(glue)
    +library(lubridate)
    +library(magick)
    +library(ggdist)

    I’ll be using Steam player data for my plot. The data contains statistics for the average and peak number of players playing a variety of games each month from July 2012 to February 2022. You can download this data with the Data Source code in the appendix, or from Tidy Tuesday with tidytuesdayR::tt_load("2021-03-16").

    -
    # Load the weekly data
    -games <- read_csv(here("data", "2021-03-16_games.csv"))
    +
    # Load the weekly data
    +games <- read_csv(here("data", "2021-03-16_games.csv"))
     games
    @@ -3102,11 +5457,27 @@ lineribbons with geomtextpath. Wrangle

    I only want data from the mainline Borderlands titles for my plot, so let’s get those.

    -
    # Filter to mainline Borderlands titles available in the data. The first game
    -# is not available in the dataset so filtering based on the title and digit
    -# works fine here.
    -borderlands <- games %>%
    -  filter(str_detect(gamename, "Borderlands[[:space:]][[:digit:]]"))
    +
    # Filter to mainline Borderlands titles available in the data. The first game
    +# is not available in the dataset so filtering based on the title and digit
    +# works fine here.
    +borderlands <- games %>%
    +  filter(str_detect(gamename, "Borderlands[[:space:]][[:digit:]]"))
     
     borderlands
    @@ -3120,10 +5491,24 @@ lineribbons with geomtextpath. -
    # Summarize how much data exists for each Borderlands title
    -borderlands %>%
    -  group_by(gamename) %>%
    -  summarise(count = n())
    +
    # Summarize how much data exists for each Borderlands title
    +borderlands %>%
    +  group_by(gamename) %>%
    +  summarise(count = n())
    @@ -3135,20 +5520,80 @@ lineribbons with geomtextpath. -
    # Wrangle date data into a date-time object to prepare for filtering
    -borderlands <- borderlands %>%
    -  mutate(date = glue("{year}-{month}"),
    -         date = parse_date_time(date, "ym"),
    -         .after = gamename)
    +
    # Wrangle date data into a date-time object to prepare for filtering
    +borderlands <- borderlands %>%
    +  mutate(date = glue("{year}-{month}"),
    +         date = parse_date_time(date, "ym"),
    +         .after = gamename)
     
    -# Filter Borderlands 2 data down to only its first year of release to make
    -# comparisons with Borderlands 3 more appropriate. There is no need to filter
    -# by date for Borderlands 3 since only its first year of data are available in
    -# the dataset.
    -borderlands <- borderlands %>%
    -  filter(gamename == "Borderlands 2" &
    -         date %within% interval(ymd("2012--09-01"), ymd("2013--08-01")) |
    -         gamename == "Borderlands 3") 
    +# Filter Borderlands 2 data down to only its first year of release to make
    +# comparisons with Borderlands 3 more appropriate. There is no need to filter
    +# by date for Borderlands 3 since only its first year of data are available in
    +# the dataset.
    +borderlands <- borderlands %>%
    +  filter(gamename == "Borderlands 2" &
    +         date %within% interval(ymd("2012--09-01"), ymd("2013--08-01")) |
    +         gamename == "Borderlands 3") 
     
     borderlands
    @@ -3162,10 +5607,32 @@ lineribbons with geomtextpath. -
    # This code is sufficient since the data is in reverse chronological order.
    -borderlands <- borderlands %>%
    -  group_by(gamename) %>%
    -  mutate(since_release = 11:0, .after = month)
    +
    # This code is sufficient since the data is in reverse chronological order.
    +borderlands <- borderlands %>%
    +  group_by(gamename) %>%
    +  mutate(since_release = 11:0, .after = month)
     
     borderlands
    @@ -3179,8 +5646,16 @@ lineribbons with geomtextpath. -
    borderlands %>%
    -  summarise(quantile = quantile(peak))
    +
    borderlands %>%
    +  summarise(quantile = quantile(peak))
    @@ -3192,13 +5667,73 @@ lineribbons with geomtextpath. -
    borderlands <- borderlands %>%
    -  mutate(rarity = case_when(
    -    between(peak, 0, 19999) ~ "white",
    -    between(peak, 20000, 39999) ~ "green",
    -    between(peak, 40000, 59999) ~ "blue",
    -    between(peak, 60000, 79999) ~ "purple",
    -    between(peak, 80000, 150000) ~ "orange"
    +
    borderlands <- borderlands %>%
    +  mutate(rarity = case_when(
    +    between(peak, 0, 19999) ~ "white",
    +    between(peak, 20000, 39999) ~ "green",
    +    between(peak, 40000, 59999) ~ "blue",
    +    between(peak, 60000, 79999) ~ "purple",
    +    between(peak, 80000, 150000) ~ "orange"
       ))
    @@ -3208,23 +5743,123 @@ lineribbons with geomtextpath.
    -
    outline_plot <- ggplot(borderlands, aes(since_release, peak)) +
    -  facet_wrap(vars(gamename)) +
    -  labs(
    -    x = "Months Since Release",
    -    y = "Peak Player Count",
    -    title = "Peak players in Borderlands drop faster\nthan common loot",
    -    caption = "Source: Steam / Graphic: Michael McCarthy"
    -  ) +
    -  theme_bw() +
    -  theme(
    -    text = element_text(family = "Compacta Bold", colour = "black"),
    -    axis.text = element_text(colour = "black"),
    -    axis.line = element_blank(),
    -    panel.grid.major = element_blank(), 
    -    panel.grid.minor = element_blank(),
    -    panel.border = element_rect(colour = "black", fill = NA),
    -    strip.background = element_rect(fill = "white", colour = "black")
    +
    outline_plot <- ggplot(borderlands, aes(since_release, peak)) +
    +  facet_wrap(vars(gamename)) +
    +  labs(
    +    x = "Months Since Release",
    +    y = "Peak Player Count",
    +    title = "Peak players in Borderlands drop faster\nthan common loot",
    +    caption = "Source: Steam / Graphic: Michael McCarthy"
    +  ) +
    +  theme_bw() +
    +  theme(
    +    text = element_text(family = "Compacta Bold", colour = "black"),
    +    axis.text = element_text(colour = "black"),
    +    axis.line = element_blank(),
    +    panel.grid.major = element_blank(), 
    +    panel.grid.minor = element_blank(),
    +    panel.border = element_rect(colour = "black", fill = NA),
    +    strip.background = element_rect(fill = "white", colour = "black")
       )
     
     outline_plot
    @@ -3234,28 +5869,140 @@ lineribbons with geomtextpath.
    -
    blue <- "#08283c"
    -light_blue <- "#115190"
    -baby_blue <- "#a7e5ff"
    -indigo <- "#cef8ff"
    +
    blue <- "#08283c"
    +light_blue <- "#115190"
    +baby_blue <- "#a7e5ff"
    +indigo <- "#cef8ff"
     
    -colour_plot <- outline_plot +
    -  theme(
    -    text = element_text(colour = "white"),
    -    # Axis
    -    axis.text = element_text(colour = indigo),
    -    axis.ticks = element_line(colour = light_blue),
    -    # Panel
    -    panel.grid.major.y = element_line(colour = light_blue),
    -    panel.grid.minor.y = element_line(colour = light_blue),
    -    panel.border = element_rect(colour = light_blue, fill = NA),
    -    panel.background = element_rect(fill = blue, colour = light_blue),
    -    # Plot
    -    plot.title = element_text(colour = "#fff01a"),
    -    plot.background = element_rect(fill = "pink"),
    -    # Strip
    -    strip.text = element_text(colour = baby_blue),
    -    strip.background = element_rect(fill = "#00378f", colour = light_blue)
    +colour_plot <- outline_plot +
    +  theme(
    +    text = element_text(colour = "white"),
    +    # Axis
    +    axis.text = element_text(colour = indigo),
    +    axis.ticks = element_line(colour = light_blue),
    +    # Panel
    +    panel.grid.major.y = element_line(colour = light_blue),
    +    panel.grid.minor.y = element_line(colour = light_blue),
    +    panel.border = element_rect(colour = light_blue, fill = NA),
    +    panel.background = element_rect(fill = blue, colour = light_blue),
    +    # Plot
    +    plot.title = element_text(colour = "#fff01a"),
    +    plot.background = element_rect(fill = "pink"),
    +    # Strip
    +    strip.text = element_text(colour = baby_blue),
    +    strip.background = element_rect(fill = "#00378f", colour = light_blue)
       )
     
     colour_plot
    @@ -3265,102 +6012,732 @@ lineribbons with geomtextpath.
    -
    # Create plot used for the outline
    -file <- tempfile(fileext = '.png')
    -ragg::agg_png(file, width = 1920, height = 1200, res = 300, units = "px", scaling = 0.5)
    -outline_plot +
    -  geom_col(fill = "white") +
    -  stat_ccdfinterval(fill = "white", point_alpha = 0) +
    -  theme(
    -    # Axis
    -    axis.title = element_text(size = 36),
    -    axis.text = element_text(size = 28),
    -    axis.text.x = element_text(margin = margin(5, 0, 5, 0, "pt")),
    -    axis.text.y = element_text(margin = margin(0, 5, 0, 5, "pt")),
    -    axis.line = element_line(size = 0),
    -    axis.ticks = element_line(size = 2),
    -    axis.ticks.length = unit(10, "pt"),
    -    # Panel
    -    panel.border = element_rect(size = 0),
    -    panel.background = element_rect(colour = "black", size = 5),
    -    panel.spacing = unit(3, "lines"),
    -    # Plot
    -    plot.title = element_text(size = 56),
    -    plot.margin = unit(c(40, 40, 40, 40), "pt"),
    -    # Strip
    -    strip.text = element_text(size = 36, margin = margin(0.5,0,0.5,0, "cm")),
    -    strip.background = element_rect(size = 5),
    -    # Caption
    -    plot.caption = element_text(size = 24)
    +
    # Create plot used for the outline
    +file <- tempfile(fileext = '.png')
    +ragg::agg_png(file, width = 1920, height = 1200, res = 300, units = "px", scaling = 0.5)
    +outline_plot +
    +  geom_col(fill = "white") +
    +  stat_ccdfinterval(fill = "white", point_alpha = 0) +
    +  theme(
    +    # Axis
    +    axis.title = element_text(size = 36),
    +    axis.text = element_text(size = 28),
    +    axis.text.x = element_text(margin = margin(5, 0, 5, 0, "pt")),
    +    axis.text.y = element_text(margin = margin(0, 5, 0, 5, "pt")),
    +    axis.line = element_line(size = 0),
    +    axis.ticks = element_line(size = 2),
    +    axis.ticks.length = unit(10, "pt"),
    +    # Panel
    +    panel.border = element_rect(size = 0),
    +    panel.background = element_rect(colour = "black", size = 5),
    +    panel.spacing = unit(3, "lines"),
    +    # Plot
    +    plot.title = element_text(size = 56),
    +    plot.margin = unit(c(40, 40, 40, 40), "pt"),
    +    # Strip
    +    strip.text = element_text(size = 36, margin = margin(0.5,0,0.5,0, "cm")),
    +    strip.background = element_rect(size = 5),
    +    # Caption
    +    plot.caption = element_text(size = 24)
       )
    -invisible(dev.off())
    +invisible(dev.off())

    For the actual post-processing, I detect the edges of all the plot elements, then dilate them outwards. Finally the white areas in the plot are made transparent, so all that’s left is the black outlines. To demonstrate, I’ve created a blank white image here and flattened the outline plot on top of it.

    -
    plot_outline_layer <- image_read(file) %>%
    -  image_convert(type="Grayscale") %>%
    -  image_negate() %>%
    -  image_threshold("white", "5%") %>%
    -  image_morphology('EdgeOut', "Diamond", iterations = 6) %>%
    -  image_morphology('Dilate', "Diamond", iterations = 1) %>%
    -  image_negate() %>%
    -  image_transparent("white", fuzz = 7)
    +
    plot_outline_layer <- image_read(file) %>%
    +  image_convert(type="Grayscale") %>%
    +  image_negate() %>%
    +  image_threshold("white", "5%") %>%
    +  image_morphology('EdgeOut', "Diamond", iterations = 6) %>%
    +  image_morphology('Dilate', "Diamond", iterations = 1) %>%
    +  image_negate() %>%
    +  image_transparent("white", fuzz = 7)
     
    -image_flatten(c(image_blank(1920, 1200, color = "white"), plot_outline_layer))
    +image_flatten(c(image_blank(1920, 1200, color = "white"), plot_outline_layer))

    Next the colour plot, which just needs to be scaled up with the bars added to it, then saved to a temporary file. Here I’ve used CCDF bars with a gradient, courtesy of the ggdist package, going from black to colour to match the gradients in the ECHO-3 in-game menu in Borderlands 3. It’s a bit tacky, and there isn’t an easy way to add gradients to any other plot elements, but it fits the theme.

    -
    file <- tempfile(fileext = '.png')
    -ragg::agg_png(file, width = 1920, height = 1200, res = 300, units = "px", scaling = 0.5)
    -colour_plot +
    -  # First a solid fill column
    -  geom_col(aes(fill = rarity)) +
    -  # Then use a ccdfinterval to create a vertical gradient over top the solid
    -  # fill
    -  stat_ccdfinterval(
    -    aes(fill = rarity, fill_ramp = stat(y)),
    -    fill_type = "gradient",
    -    show.legend = FALSE,
    -    point_alpha = 0
    -  ) +
    -  scale_fill_identity() +
    -  scale_fill_ramp_continuous(
    -    from = "black",
    -    range = c(0.8, 1),
    -    limits = c(0, 15000)
    -  ) +
    -  expand_limits(y = 0) +
    -  # Finally add a black outline over top of everything
    -  geom_col(fill = NA, colour = "black", size = 1) +
    -  theme(
    -    # Axis
    -    axis.title = element_text(size = 36),
    -    axis.text = element_text(size = 28),
    -    axis.text.x = element_text(margin = margin(5, 0, 5, 0, "pt")),
    -    axis.text.y = element_text(margin = margin(0, 5, 0, 5, "pt")),
    -    axis.line = element_line(size = 0),
    -    axis.ticks = element_line(size = 2),
    -    axis.ticks.length = unit(10, "pt"),
    -    # Panel
    -    panel.border = element_rect(size = 0),
    -    panel.background = element_rect(size = 5),
    -    panel.spacing = unit(3, "lines"),
    -    # Plot
    -    plot.title = element_text(size = 56),
    -    plot.margin = unit(c(40, 40, 40, 40), "pt"),
    -    plot.background = element_rect(fill = "pink"),
    -    # Strip
    -    strip.text = element_text(size = 36, margin = margin(0.5,0,0.5,0, "cm")),
    -    strip.background = element_rect(size = 5),
    -    # Caption
    -    plot.caption = element_text(size = 24)
    +
    file <- tempfile(fileext = '.png')
    +ragg::agg_png(file, width = 1920, height = 1200, res = 300, units = "px", scaling = 0.5)
    +colour_plot +
    +  # First a solid fill column
    +  geom_col(aes(fill = rarity)) +
    +  # Then use a ccdfinterval to create a vertical gradient over top the solid
    +  # fill
    +  stat_ccdfinterval(
    +    aes(fill = rarity, fill_ramp = stat(y)),
    +    fill_type = "gradient",
    +    show.legend = FALSE,
    +    point_alpha = 0
    +  ) +
    +  scale_fill_identity() +
    +  scale_fill_ramp_continuous(
    +    from = "black",
    +    range = c(0.8, 1),
    +    limits = c(0, 15000)
    +  ) +
    +  expand_limits(y = 0) +
    +  # Finally add a black outline over top of everything
    +  geom_col(fill = NA, colour = "black", size = 1) +
    +  theme(
    +    # Axis
    +    axis.title = element_text(size = 36),
    +    axis.text = element_text(size = 28),
    +    axis.text.x = element_text(margin = margin(5, 0, 5, 0, "pt")),
    +    axis.text.y = element_text(margin = margin(0, 5, 0, 5, "pt")),
    +    axis.line = element_line(size = 0),
    +    axis.ticks = element_line(size = 2),
    +    axis.ticks.length = unit(10, "pt"),
    +    # Panel
    +    panel.border = element_rect(size = 0),
    +    panel.background = element_rect(size = 5),
    +    panel.spacing = unit(3, "lines"),
    +    # Plot
    +    plot.title = element_text(size = 56),
    +    plot.margin = unit(c(40, 40, 40, 40), "pt"),
    +    plot.background = element_rect(fill = "pink"),
    +    # Strip
    +    strip.text = element_text(size = 36, margin = margin(0.5,0,0.5,0, "cm")),
    +    strip.background = element_rect(size = 5),
    +    # Caption
    +    plot.caption = element_text(size = 24)
       )
    -invisible(dev.off())
    +invisible(dev.off())
     
    -plot_fill_layer <- image_read(file)
    +plot_fill_layer <- image_read(file)
     
     plot_fill_layer
    @@ -3369,8 +6746,22 @@ lineribbons with geomtextpath. -
    plot_layer <- image_composite(plot_fill_layer, plot_outline_layer) %>% 
    -  image_transparent("pink", fuzz = 7)
    +
    plot_layer <- image_composite(plot_fill_layer, plot_outline_layer) %>% 
    +  image_transparent("pink", fuzz = 7)
     
     plot_layer

    This plot isn’t going to win any awards (unless it’s for an ugly plots contest), but it does show that you can do some pretty cool programmatic image processing of your plots (or any other images) with the magick package.

    @@ -3402,7 +6819,7 @@ lineribbons with geomtextpath.

    +

    Comments

    +

    Comments

    @@ -3426,7 +6843,7 @@ lineribbons with geomtextpath.

    Session Info

    +

    Session Info

    Data

    Download the data used in this post.

    -

    Fair Dealing

    +

    Fair Dealing

    Any of the trademarks, service marks, collective marks, design rights or similar rights that are mentioned, used, or cited in this article are the property of their respective owners. They are used here as fair dealing for the purpose of education in accordance with section 29 of the Copyright Act and do not infringe copyright.

    -

    Citation

    BibTeX citation:
    @online{mccarthy2022,
    -  author = {Michael McCarthy},
    -  title = {Tales from the {Borderlands}},
    -  date = {2022-09-29},
    -  url = {https://tidytales.ca/posts/2022-09-29_borderlands},
    -  langid = {en}
    -}
    -
    For attribution, please cite this work as:
    -Michael McCarthy. (2022, September 29). Tales from the -Borderlands. https://tidytales.ca/posts/2022-09-29_borderlands +

    Citation

    For attribution, please cite this work as:
    +McCarthy, M. (2022, September 29). Tales from the Borderlands. +https://tidytales.ca/posts/2022-09-29_borderlands
    ]]> .Wrangle .Visualize @@ -3520,7 +6930,7 @@ Borderlands. ht

    -

    Projectile motion of an object launched at the same height and velocity but different angles. The symmetrical U-shaped curve of each trajectory is known as a parabola.

    +
    Projectile motion of an object launched at the same height and velocity but different angles. The symmetrical U-shaped curve of each trajectory is known as a parabola.
    @@ -3541,12 +6951,24 @@ Borderlands. ht

    Prerequisites

    -
    library(tidyverse)
    -library(gganimate)
    -library(ggh4x)
    -library(formattable)
    -library(emojifont)
    -library(glue)
    +
    library(tidyverse)
    +library(gganimate)
    +library(ggh4x)
    +library(formattable)
    +library(emojifont)
    +library(glue)

    I’ll be simulating data for my plot by turning the equations for projectile motion into R functions. You can download this data with the Data Source link in the appendix. The sources I used for the equations can also be found in the appendix.

    @@ -3560,16 +6982,52 @@ Borderlands.
    ht

    where is the initial velocity and is the launch angle. Horizontal and vertical velocity can be computed in R with the following functions.

    -
    velocity_x <- function(velocity, angle) {
    -  # Degrees need to be converted to radians in cos() since that is what the
    -  # function uses
    -  velocity * cos(angle * (pi/180))
    +
    velocity_x <- function(velocity, angle) {
    +  # Degrees need to be converted to radians in cos() since that is what the
    +  # function uses
    +  velocity * cos(angle * (pi/180))
     }
     
    -velocity_y <- function(velocity, angle) {
    -  # Degrees need to be converted to radians in sin() since that is what the
    -  # function uses
    -  velocity * sin(angle * (pi/180))
    +velocity_y <- function(velocity, angle) {
    +  # Degrees need to be converted to radians in sin() since that is what the
    +  # function uses
    +  velocity * sin(angle * (pi/180))
     }
    @@ -3579,8 +7037,34 @@ Borderlands. ht

    where is the vertical velocity, is the force of gravity, and , is the initial height the object is launched from. Time of flight is the time from when the object is launched to the time the object reaches the surface. It can be computed in R with the following function.

    -
    flight_time <- function(velocity_y, height, gravity = 9.80665) {
    -  ( velocity_y + sqrt(velocity_y^2 + 2 * gravity * height) ) / gravity
    +
    flight_time <- function(velocity_y, height, gravity = 9.80665) {
    +  ( velocity_y + sqrt(velocity_y^2 + 2 * gravity * height) ) / gravity
     }
    @@ -3590,8 +7074,36 @@ Borderlands.
    ht

    where is the horizontal velocity and is the time of flight. The range of the projectile is the total horizontal distance travelled during the time of flight. It can be computed in R with the following function.

    -
    distance <- function(velocity_x, velocity_y, height, gravity = 9.80665) {
    -  velocity_x * ( velocity_y + sqrt(velocity_y^2 + 2 * gravity * height) ) /
    +
    distance <- function(velocity_x, velocity_y, height, gravity = 9.80665) {
    +  velocity_x * ( velocity_y + sqrt(velocity_y^2 + 2 * gravity * height) ) /
       gravity
     }
    @@ -3602,8 +7114,28 @@ Borderlands.
    ht

    where is the initial height, is the vertical velocity, and is the force of gravity. The maximum height is reached when . It can be computed in R with the following function.

    -
    height_max <- function(velocity_y, height, gravity = 9.80665) {
    -  height + velocity_y^2 / (2 * gravity)
    +
    height_max <- function(velocity_y, height, gravity = 9.80665) {
    +  height + velocity_y^2 / (2 * gravity)
     }
    @@ -3611,57 +7143,221 @@ Borderlands.
    ht

    Projectile motion calculator

    Now to wrap all the components into a single function that will calculate the result for each component based on a set of parameters given to it. These results can then be used to determine the position and velocity of the projectile at any point in time during its trajectory, which I want to return as a data frame that can be used for plotting.

    -
    #' nframes and fps can be used to animate the trajectory as close to real time as possible.
    -#' There will be some rounding error though so it won't be exactly the same as the flight
    -#' time.
    -projectile_motion <- function(velocity, angle, height, gravity = 9.80665, nframes = 30) {
    +
    #' nframes and fps can be used to animate the trajectory as close to real time as possible.
    +#' There will be some rounding error though so it won't be exactly the same as the flight
    +#' time.
    +projectile_motion <- function(velocity, angle, height, gravity = 9.80665, nframes = 30) {
       
    -  # Velocity components
    -  vx <- velocity_x(velocity, angle)
    -  vy <- velocity_y(velocity, angle)
    -  # Flight components
    -  t  <- flight_time(vy, height, gravity)
    -  d  <- distance(vx, vy, height, gravity)
    -  # Max height components
    -  hm <- height_max(vy, height, gravity)
    -  th <- vy / gravity
    -  hd <- vx * th
    +  # Velocity components
    +  vx <- velocity_x(velocity, angle)
    +  vy <- velocity_y(velocity, angle)
    +  # Flight components
    +  t  <- flight_time(vy, height, gravity)
    +  d  <- distance(vx, vy, height, gravity)
    +  # Max height components
    +  hm <- height_max(vy, height, gravity)
    +  th <- vy / gravity
    +  hd <- vx * th
       
    -  # Calculate the position of the projectile in 2D space at a given point in
    -  # time to approximate its trajectory over time
    -  x_pos <- map_dbl(seq(0, t, length = nframes), ~ {
    -    vx * .x
    +  # Calculate the position of the projectile in 2D space at a given point in
    +  # time to approximate its trajectory over time
    +  x_pos <- map_dbl(seq(0, t, length = nframes), ~ {
    +    vx * .x
       })
       
    -  y_pos <- map_dbl(seq(0, t, length = nframes), ~ {
    -    height + ( vy * .x + 0.5 * -gravity * .x^2 )
    +  y_pos <- map_dbl(seq(0, t, length = nframes), ~ {
    +    height + ( vy * .x + 0.5 * -gravity * .x^2 )
       })
       
    -  # Calculate the vertical velocity of the projectile at a given point in time
    -  vy_t  <- map_dbl(seq(0, t, length = nframes), ~ {
    -    vy - gravity * .x
    +  # Calculate the vertical velocity of the projectile at a given point in time
    +  vy_t  <- map_dbl(seq(0, t, length = nframes), ~ {
    +    vy - gravity * .x
       })
       
    -  trajectory <- data.frame(
    -    x = x_pos,
    -    y = y_pos,
    -    vx = vx,
    -    vy = vy_t,
    -    second = seq(0, t, length = nframes)
    +  trajectory <- data.frame(
    +    x = x_pos,
    +    y = y_pos,
    +    vx = vx,
    +    vy = vy_t,
    +    second = seq(0, t, length = nframes)
       )
       
    -  # Return a list with all calculated values
    -  list(
    -    velocity_x = vx,
    -    velocity_y = vy,
    -    flight_time = t,
    -    distance = d,
    -    max_height = hm,
    -    max_height_time = th,
    -    max_height_dist = hd,
    -    trajectory = trajectory,
    -    nframes = nframes,
    -    fps = nframes/t
    +  # Return a list with all calculated values
    +  list(
    +    velocity_x = vx,
    +    velocity_y = vy,
    +    flight_time = t,
    +    distance = d,
    +    max_height = hm,
    +    max_height_time = th,
    +    max_height_dist = hd,
    +    trajectory = trajectory,
    +    nframes = nframes,
    +    fps = nframes/t
       )
       
     }
    @@ -3682,11 +7378,29 @@ Borderlands.
    ht
  • nframes which represents how many points in time to record in the trajectory data frame.
  • -
    projectile_motion(
    -  velocity = 11.4,
    -  angle = 52.1,
    -  height = 18,
    -  nframes = 10
    +
    projectile_motion(
    +  velocity = 11.4,
    +  angle = 52.1,
    +  height = 18,
    +  nframes = 10
     )
    #> $velocity_x
    @@ -3742,45 +7456,165 @@ Borderlands. ht
     

    Given the inspiration for this post, a space themed simulation seems appropriate. Here I want to test how the gravity of each planet in our solar system influences projectile motion, given a projectile is launched with the same velocity, angle, and height.

    First I need to construct a named vector of the gravity of each planet in our solar system. NASA provides these values came as ratios of each planet’s gravity relative to Earth, so I had to multiply each one by Earth’s gravity to get the units correct.

    -
    # All values are in metres per second
    -planets <- c(
    -  mercury = 3.7069137,
    -  venus   = 8.8946315,
    -  earth   = 9.80665,
    -  moon    = 1.6279039,
    -  mars    = 3.697107,
    -  jupiter = 23.143694,
    -  saturn  = 8.9828914,
    -  uranus  = 8.7181118,
    -  neptune = 10.983448,
    -  pluto   = 0.6962721
    +
    # All values are in metres per second
    +planets <- c(
    +  mercury = 3.7069137,
    +  venus   = 8.8946315,
    +  earth   = 9.80665,
    +  moon    = 1.6279039,
    +  mars    = 3.697107,
    +  jupiter = 23.143694,
    +  saturn  = 8.9828914,
    +  uranus  = 8.7181118,
    +  neptune = 10.983448,
    +  pluto   = 0.6962721
     )

    Then I can create a named list of projectile motion calculations, one for each planet. Each planet has its own list of output from projectile_motion(), so the resulting list of projectile motion calculations is actually a list of lists. This can be tidied into a tibble to make it easier to work with.

    -
    # Calculate projectile motion for each planet, given the same velocity,
    -# angle, and height
    -planets_pm <- map(planets, ~{
    -  projectile_motion(
    -    velocity = 20,
    -    angle = 45,
    -    height = 35,
    -    gravity = .x,
    -    nframes = 100)
    +
    # Calculate projectile motion for each planet, given the same velocity,
    +# angle, and height
    +planets_pm <- map(planets, ~{
    +  projectile_motion(
    +    velocity = 20,
    +    angle = 45,
    +    height = 35,
    +    gravity = .x,
    +    nframes = 100)
     })
     
    -# Tidying the list of lists into a tibble makes it easier to work with. Note
    -# that the trajectory column is a list column since it contains the trajectory
    -# data frame for each planet.
    -planets_df <- planets_pm %>%
    -  enframe() %>%
    -  unnest_wider(value) %>%
    -  rename(planet = name)
    +# Tidying the list of lists into a tibble makes it easier to work with. Note
    +# that the trajectory column is a list column since it contains the trajectory
    +# data frame for each planet.
    +planets_df <- planets_pm %>%
    +  enframe() %>%
    +  unnest_wider(value) %>%
    +  rename(planet = name)
     
    -planets_trajectory <- planets_df %>%
    -  select(planet, trajectory) %>%
    -  unnest(trajectory) %>% 
    -  mutate(planet = factor(planet, levels = names(planets)))
    +planets_trajectory <- planets_df %>% + select(planet, trajectory) %>% + unnest(trajectory) %>% + mutate(planet = factor(planet, levels = names(planets)))
    @@ -3791,110 +7625,393 @@ Borderlands. ht

    A simple trajectory

    This is the same simple trajectory I showed the output for earlier, only with more frames to make the animation smoother.

    -
    simple_trajectory <- projectile_motion(
    -  velocity = 11.4,
    -  angle = 52.1,
    -  height = 18,
    -  nframes = 100
    +
    simple_trajectory <- projectile_motion(
    +  velocity = 11.4,
    +  angle = 52.1,
    +  height = 18,
    +  nframes = 100
     )
     
    -# Assign the data frame and max height parameters to objects to make the plot
    -# code easier to read
    -df <- simple_trajectory$trajectory
    -max_height_dist <- simple_trajectory$max_height_dist
    -max_height_time <- simple_trajectory$max_height_time
    -max_height <- simple_trajectory$max_height
    +# Assign the data frame and max height parameters to objects to make the plot +# code easier to read +df <- simple_trajectory$trajectory +max_height_dist <- simple_trajectory$max_height_dist +max_height_time <- simple_trajectory$max_height_time +max_height <- simple_trajectory$max_height

    I’m going to build the plot for this simple trajectory up in chunks to make the code easier to understand. The foundation of the plot is fairly standard. The only unusual thing here are the group aesthetics in geom_line() and geom_point(). These are used to tell gganimate which rows in the data correspond to the same graphic element.

    -
    p <- ggplot(df, aes(x = x, y = y)) +
    -  geom_line(
    -    aes(group = 1),
    -    linetype = "dashed",
    -    colour = "red",
    -    alpha = 0.5
    -  ) +
    -  geom_point(aes(group = 1), size = 2)
    +
    p <- ggplot(df, aes(x = x, y = y)) +
    +  geom_line(
    +    aes(group = 1),
    +    linetype = "dashed",
    +    colour = "red",
    +    alpha = 0.5
    +  ) +
    +  geom_point(aes(group = 1), size = 2)

    For the data I simulated, the projectile starts with a positive vertical velocity. However, at its maximum height, the vertical velocity of the projectile becomes 0 m/s for a brief moment, as it stops rising and starts falling. This happens Because gravity is constantly influencing the vertical velocity of the projectile. This is an important and interesting piece of information I want to communicate in my plot. This can be accomplished subtly by displaying the vertical velocity of the projectile at each point in time, or more overtly using a text annotation. I’m going to do both.

    First the text annotation. I’m using geom_curve() to draw an arrow between the annotation and the point at which the projectile is at its maximum height, and geom_text() to draw the annotation. I’ve supplied each geom with its own data frame containing a second column whose sole value corresponds to the time the projectile reaches its maximum height. This will control when the annotation appears in the animation. I’ve also given the pair a different group aesthetic from geom_line() and geom_point().

    -
    p <- p +
    -  geom_curve(
    -    data = data.frame(
    -      second = max_height_time
    +
    p <- p +
    +  geom_curve(
    +    data = data.frame(
    +      second = max_height_time
         ),
    -    aes(
    -      xend = max_height_dist,
    -      yend = max_height + 0.2,
    -      x = max_height_dist + 2,
    -      y = max_height + 2,
    -      group = 2
    +    aes(
    +      xend = max_height_dist,
    +      yend = max_height + 0.2,
    +      x = max_height_dist + 2,
    +      y = max_height + 2,
    +      group = 2
         ),
    -    curvature = 0.45,
    -    angle = 105,
    -    ncp = 15,
    -    arrow = arrow(length = unit(0.1,"cm"), type = "closed")
    -  ) +
    -  geom_text(
    -    data = data.frame(
    -      second = max_height_time
    +    curvature = 0.45,
    +    angle = 105,
    +    ncp = 15,
    +    arrow = arrow(length = unit(0.1,"cm"), type = "closed")
    +  ) +
    +  geom_text(
    +    data = data.frame(
    +      second = max_height_time
         ),
    -    aes(
    -      x = max_height_dist + 2.4,
    -      y = max_height + 2,
    -      group = 2
    +    aes(
    +      x = max_height_dist + 2.4,
    +      y = max_height + 2,
    +      group = 2
         ),
    -    hjust = "left",
    -    lineheight = 1,
    -    family = "serif",
    -    label = str_c(
    -      "At its maximum height, the vertical velocity \n", 
    -      "of the projectile is 0 m/s for a brief moment, \n",
    -      "as it stops rising and starts falling."
    +    hjust = "left",
    +    lineheight = 1,
    +    family = "serif",
    +    label = str_c(
    +      "At its maximum height, the vertical velocity \n", 
    +      "of the projectile is 0 m/s for a brief moment, \n",
    +      "as it stops rising and starts falling."
         )
       )

    Second the vertical velocity. I’m displaying this in the plot’s subtitle along with the time elapsed since the projectile was launched. The displayed values are updated each frame using the value returned by the expression enclosed in glue braces for a frame. The variable frame_along is made available by gganimate::transition_along() (see below) and gives the position on the along-dimension (time in seconds in this case) that a frame corresponds to. Here I’m using frame_along to display the elapsed time, and to index the data frame df for the vertical velocity at a given second. The latter is a slight workaround because vy cannot be accessed directly in the glue braces.

    -
    p <- p +
    -  labs(
    -    title = str_c(
    -      "Projectile motion of an object launched with ",
    -      #" <br> ",
    -      "a speed of 11.4 m/s at an angle of 52.1°"
    +
    p <- p +
    +  labs(
    +    title = str_c(
    +      "Projectile motion of an object launched with ",
    +      #" <br> ",
    +      "a speed of 11.4 m/s at an angle of 52.1°"
         ),
    -    subtitle = str_c(
    -      "Time: ",
    -      "{formattable(frame_along, digits = 2, format = 'f')}s",
    -      "\n",
    -      "Vertical velocity = ",
    -      "{formattable(df$vy[df$second == frame_along], digits = 2, format = 'f')}",
    -      " m/s"
    +    subtitle = str_c(
    +      "Time: ",
    +      "{formattable(frame_along, digits = 2, format = 'f')}s",
    +      "\n",
    +      "Vertical velocity = ",
    +      "{formattable(df$vy[df$second == frame_along], digits = 2, format = 'f')}",
    +      " m/s"
         ),
    -    x = "Distance (m)",
    -    y = "Height (m)",
    -    caption = "Graphic: Michael McCarthy"
    +    x = "Distance (m)",
    +    y = "Height (m)",
    +    caption = "Graphic: Michael McCarthy"
       )

    Now for theming. I wanted something minimalistic with a scientific feel—the classic theme paired with truncated axes courtesy of ggh4x does this nicely. Finally, I originally planned to use element_markdown() from ggtext to enable markdown text in the subtitle of the plot so that vertical velocity could be written like ; however, this caused issues with the text spacing when rendering the animation to video, so I opted not to.1

    -
    p <- p +
    -  guides(x = "axis_truncated", y = "axis_truncated") +
    -  theme_classic(base_family = "serif")
    +
    p <- p +
    +  guides(x = "axis_truncated", y = "axis_truncated") +
    +  theme_classic(base_family = "serif")

    And finally, the animation code. Yes, that’s it. Animations can be tweaked and spiced up with other functions in gganimate, but I ran into issues making the ones I wanted to use work with transition_reveal().

    Just a note: The behaviour of transition_reveal() shown here was broken in v1.0.8 of gganimate.

    -
    anim <- p +
    -  transition_reveal(second)
    +
    anim <- p +
    +  transition_reveal(second)
     
     anim
    -
    -
    -

    @@ -3903,88 +8020,405 @@ Borderlands. ht

    Now to test how the gravity of each planet in our solar system influences projectile motion. As a reminder, I already simulated the projectile motion data in planets_trajectory, so now it’s just a matter of plotting it.

    Since the simulation is space themed, the plot should be too. Instead of using a simple point to represent the projectile, I’m going to use Font Awesome’s rocket icon by way of the emojifont package. To make it extra, I’ll also add propulsion and rotation to the rocket’s animation.

    -
    # Make it so the propulsion is only present for first half of animation, so it
    -# looks like the rockets are launching.
    -rocket_propulsion <- planets_trajectory %>%
    -  group_by(planet) %>%
    -  mutate(retain = rep(c(TRUE, FALSE), each = 50)) %>%
    -  ungroup() %>%
    -  mutate(x = case_when(
    -    retain == FALSE ~ NA_real_,
    -    TRUE ~ x
    +
    # Make it so the propulsion is only present for first half of animation, so it
    +# looks like the rockets are launching.
    +rocket_propulsion <- planets_trajectory %>%
    +  group_by(planet) %>%
    +  mutate(retain = rep(c(TRUE, FALSE), each = 50)) %>%
    +  ungroup() %>%
    +  mutate(x = case_when(
    +    retain == FALSE ~ NA_real_,
    +    TRUE ~ x
       ))

    The plotting code is mostly boilerplate, but I’ve added comments to highlight a few noteworthy points.

    -
    p <- ggplot(planets_trajectory, aes(x = x, y = y)) +
    -  geom_line(
    -    aes(colour = planet, group = planet),
    -    linetype = "dashed",
    -    alpha = 0.5,
    -    # Change the key glyph in the legend to a point, to represent a planet
    -    key_glyph = "point"
    -  ) +
    -  geom_point(
    -    data = rocket_propulsion,
    -    aes(group = planet),
    -    colour = "orange"
    -  ) +
    -  # Change the angle at different frames to rotate the rocket
    -  geom_text(
    -    aes(colour = planet, group = planet, label = fontawesome("fa-rocket")),
    -    family='fontawesome-webfont',
    -    angle = rep(seq(0, 45, length = 100), 10),
    -    size = 6,
    -    # There is no rocket key glyph, so override this too
    -    key_glyph = "point"
    -  ) +
    -  scale_color_manual(
    -    values = c(
    -      "#97979F",
    -      "#BBB7AB",
    -      "#8CB1DE",
    -      "#DAD9D7",
    -      "#E27B58",
    -      "#C88B3A",
    -      "#C5AB6E",
    -      "#93B8BE",
    -      "#6081FF",
    -      "#4390BA"
    +
    p <- ggplot(planets_trajectory, aes(x = x, y = y)) +
    +  geom_line(
    +    aes(colour = planet, group = planet),
    +    linetype = "dashed",
    +    alpha = 0.5,
    +    # Change the key glyph in the legend to a point, to represent a planet
    +    key_glyph = "point"
    +  ) +
    +  geom_point(
    +    data = rocket_propulsion,
    +    aes(group = planet),
    +    colour = "orange"
    +  ) +
    +  # Change the angle at different frames to rotate the rocket
    +  geom_text(
    +    aes(colour = planet, group = planet, label = fontawesome("fa-rocket")),
    +    family='fontawesome-webfont',
    +    angle = rep(seq(0, 45, length = 100), 10),
    +    size = 6,
    +    # There is no rocket key glyph, so override this too
    +    key_glyph = "point"
    +  ) +
    +  scale_color_manual(
    +    values = c(
    +      "#97979F",
    +      "#BBB7AB",
    +      "#8CB1DE",
    +      "#DAD9D7",
    +      "#E27B58",
    +      "#C88B3A",
    +      "#C5AB6E",
    +      "#93B8BE",
    +      "#6081FF",
    +      "#4390BA"
         )
    -  ) +
    -  labs(
    -    title = str_c(
    -      "projectile motion of an object launched on different planets in our solar system"
    +  ) +
    +  labs(
    +    title = str_c(
    +      "projectile motion of an object launched on different planets in our solar system"
         ),
    -    x = "distance (m)",
    -    y = "height (m)",
    -    caption = "graphic: michael mccarthy"
    -  ) +
    -  guides(
    -    x = "axis_truncated",
    -    y = "axis_truncated",
    -    colour = guide_legend(title.vjust = .7, nrow = 1, label.position = "bottom")
    -  ) +
    -  theme_classic(base_family = "mono") +
    -  theme(
    -    text = element_text(colour = "white"),
    -    axis.text = element_text(colour = "white"),
    -    rect = element_rect(fill = "black"),
    -    panel.background = element_rect(fill = "black"),
    -    axis.ticks = element_line(colour = "white"),
    -    axis.line = element_line(colour = "white"),
    -    legend.position = "top",
    -    legend.justification = "left"
    +    x = "distance (m)",
    +    y = "height (m)",
    +    caption = "graphic: michael mccarthy"
    +  ) +
    +  guides(
    +    x = "axis_truncated",
    +    y = "axis_truncated",
    +    colour = guide_legend(title.vjust = .7, nrow = 1, label.position = "bottom")
    +  ) +
    +  theme_classic(base_family = "mono") +
    +  theme(
    +    text = element_text(colour = "white"),
    +    axis.text = element_text(colour = "white"),
    +    rect = element_rect(fill = "black"),
    +    panel.background = element_rect(fill = "black"),
    +    axis.ticks = element_line(colour = "white"),
    +    axis.line = element_line(colour = "white"),
    +    legend.position = "top",
    +    legend.justification = "left"
       )

    Finally, the animation code. The shadow_wake() function is applied to the orange points used for rocket propulsion to really sell the effect.

    -
    anim <- p +
    -  transition_reveal(second) +
    -  shadow_wake(wake_length = 0.1, size = 2, exclude_layer = c(1, 3))
    -
    -
    - +
    anim <- p +
    +  transition_reveal(second) +
    +  shadow_wake(wake_length = 0.1, size = 2, exclude_layer = c(1, 3))

    @@ -3994,7 +8428,7 @@ Borderlands.
    ht - ]]> .Simulate .Visualize @@ -4129,12 +8556,26 @@ Michael McCarthy. (2022, June 16). On motion. Prerequisites

    To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:

    -
    library(lavaan)
    -library(semTools)
    +
    library(lavaan)
    +library(semTools)

    And read in the data:

    -
    social_exchanges <- read.csv(here("data", "2021-11-01_social-exchanges.csv"))
    +
    social_exchanges <- read.csv(here("data", "2021-11-01_social-exchanges.csv"))

    The data contains simulated values for several indicators of positive and negative social exchanges, measured on two occasions (w1 and w2). There are three continuous indicators that measure perceived companionship (vst1, vst2, vst3), and three binary indicators that measure unwanted advice (unw1, unw2, unw3). The data and some of the examples come from Longitudinal Structural Equation Modeling: A Comprehensive Introduction by Jason Newsom.

    @@ -4142,70 +8583,208 @@ Michael McCarthy. (2022, June 16). On motion. Configural Invariance

    Using the lavaan package.

    -
    configural_model_lav <- ("
    -  # Measurement model
    -  w1comp =~ w1vst1 + w1vst2 + w1vst3
    -  w2comp =~ w2vst1 + w2vst2 + w2vst3
    -  
    -  # Variances and covariances
    -  w2comp ~~ w1comp
    -  w1comp ~~ w1comp
    -  w2comp ~~ w2comp
    +
    configural_model_lav <- ("
    +  # Measurement model
    +  w1comp =~ w1vst1 + w1vst2 + w1vst3
    +  w2comp =~ w2vst1 + w2vst2 + w2vst3
    +  
    +  # Variances and covariances
    +  w2comp ~~ w1comp
    +  w1comp ~~ w1comp
    +  w2comp ~~ w2comp
     
    -  w1vst1 ~~ w1vst1
    -  w1vst2 ~~ w1vst2
    -  w1vst3 ~~ w1vst3
    -  w2vst1 ~~ w2vst1
    -  w2vst2 ~~ w2vst2
    -  w2vst3 ~~ w2vst3
    +  w1vst1 ~~ w1vst1
    +  w1vst2 ~~ w1vst2
    +  w1vst3 ~~ w1vst3
    +  w2vst1 ~~ w2vst1
    +  w2vst2 ~~ w2vst2
    +  w2vst3 ~~ w2vst3
     
    -  w1vst1 ~~ w2vst1
    -  w1vst2 ~~ w2vst2
    -  w1vst3 ~~ w2vst3
    -")
    +  w1vst1 ~~ w2vst1
    +  w1vst2 ~~ w2vst2
    +  w1vst3 ~~ w2vst3
    +")
     
    -configural_model_lav_fit <- sem(configural_model_lav, data = social_exchanges)
    +configural_model_lav_fit <- sem(configural_model_lav, data = social_exchanges)

    Using the semTools package.

    -
    # First, define the configural model, using the repeated measures factors and
    -# indicators.
    -configural_model_smt <- ("
    -  # Measurement model
    -  w1comp =~ w1vst1 + w1vst2 + w1vst3
    -  w2comp =~ w2vst1 + w2vst2 + w2vst3
    -")
    +
    # First, define the configural model, using the repeated measures factors and
    +# indicators.
    +configural_model_smt <- ("
    +  # Measurement model
    +  w1comp =~ w1vst1 + w1vst2 + w1vst3
    +  w2comp =~ w2vst1 + w2vst2 + w2vst3
    +")
     
    -# Second, create a named list indicating which factors are actually the same
    -# latent variable measured repeatedly.
    -longitudinal_factor_names <- list(
    -  comp = c("w1comp", "w2comp")
    +# Second, create a named list indicating which factors are actually the same
    +# latent variable measured repeatedly.
    +longitudinal_factor_names <- list(
    +  comp = c("w1comp", "w2comp")
     )
     
    -# Third, generate the lavaan model syntax using semTools.
    -configural_model_smt <- measEq.syntax(
    -  configural.model = configural_model_smt,
    -  longFacNames = longitudinal_factor_names,
    -  ID.fac = "std.lv",
    -  ID.cat = "Wu.Estabrook.2016",
    -  data = social_exchanges
    +# Third, generate the lavaan model syntax using semTools.
    +configural_model_smt <- measEq.syntax(
    +  configural.model = configural_model_smt,
    +  longFacNames = longitudinal_factor_names,
    +  ID.fac = "std.lv",
    +  ID.cat = "Wu.Estabrook.2016",
    +  data = social_exchanges
     )
    -configural_model_smt <- as.character(configural_model_smt)
    +configural_model_smt <- as.character(configural_model_smt)
     
    -# Finally, fit the model using lavaan.
    -configural_model_smt_fit <- sem(configural_model_smt, data = social_exchanges)
    +# Finally, fit the model using lavaan. +configural_model_smt_fit <- sem(configural_model_smt, data = social_exchanges)

    Compare lavaan and semTools fit measures.

    Configural invariance is met if the model fits well, indicators load on the same factors, and loadings are all of acceptable magnitude. An alternative way of testing longitudinal configural invariance is to fit separate confirmatory factor models at each time point; configural invariance is met if the previously stated criteria hold and the measure has the same factor structure at each time point.

    -
    fitMeasures(configural_model_lav_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    +
    fitMeasures(configural_model_lav_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    #>  chisq     df pvalue    cfi  rmsea 
     #>  9.911  5.000  0.078  0.997  0.041
    -
    fitMeasures(configural_model_smt_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    +
    fitMeasures(configural_model_smt_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    #>  chisq     df pvalue    cfi  rmsea 
     #>  9.911  5.000  0.078  0.997  0.041
    @@ -4216,52 +8795,156 @@ Michael McCarthy. (2022, June 16). On motion.
    Weak Invariance

    Using the lavaan package.

    -
    weak_model_lav <- ("
    -  # Measurement model
    -  w1comp =~ w1vst1 + a*w1vst2 + b*w1vst3 # Factor loading equality constraint
    -  w2comp =~ w2vst1 + a*w2vst2 + b*w2vst3 # Factor loading equality constraint
    +
    weak_model_lav <- ("
    +  # Measurement model
    +  w1comp =~ w1vst1 + a*w1vst2 + b*w1vst3 # Factor loading equality constraint
    +  w2comp =~ w2vst1 + a*w2vst2 + b*w2vst3 # Factor loading equality constraint
     
    -  # Variances and covariances
    -  w2comp ~~ w1comp
    -  w1comp ~~ w1comp
    -  w2comp ~~ w2comp
    +  # Variances and covariances
    +  w2comp ~~ w1comp
    +  w1comp ~~ w1comp
    +  w2comp ~~ w2comp
     
    -  w1vst1 ~~ w1vst1
    -  w1vst2 ~~ w1vst2
    -  w1vst3 ~~ w1vst3
    -  w2vst1 ~~ w2vst1
    -  w2vst2 ~~ w2vst2
    -  w2vst3 ~~ w2vst3
    +  w1vst1 ~~ w1vst1
    +  w1vst2 ~~ w1vst2
    +  w1vst3 ~~ w1vst3
    +  w2vst1 ~~ w2vst1
    +  w2vst2 ~~ w2vst2
    +  w2vst3 ~~ w2vst3
     
    -  w1vst1 ~~ w2vst1
    -  w1vst2 ~~ w2vst2
    -  w1vst3 ~~ w2vst3
    -")
    +  w1vst1 ~~ w2vst1
    +  w1vst2 ~~ w2vst2
    +  w1vst3 ~~ w2vst3
    +")
     
    -weak_model_lav_fit <- sem(weak_model_lav, social_exchanges)
    +weak_model_lav_fit <- sem(weak_model_lav, social_exchanges)

    Using the semTools package.

    -
    weak_model_smt <- measEq.syntax(
    -  configural.model = configural_model_smt,
    -  longFacNames = longitudinal_factor_names,
    -  ID.fac = "std.lv",
    -  ID.cat = "Wu.Estabrook.2016",
    -  long.equal = c("loadings"),
    -  data = social_exchanges
    +
    weak_model_smt <- measEq.syntax(
    +  configural.model = configural_model_smt,
    +  longFacNames = longitudinal_factor_names,
    +  ID.fac = "std.lv",
    +  ID.cat = "Wu.Estabrook.2016",
    +  long.equal = c("loadings"),
    +  data = social_exchanges
     )
    -weak_model_smt <- as.character(weak_model_smt)
    +weak_model_smt <- as.character(weak_model_smt)
     
    -weak_model_smt_fit <- sem(weak_model_smt, data = social_exchanges)
    +weak_model_smt_fit <- sem(weak_model_smt, data = social_exchanges)

    Compare lavaan and semTools fit measures.

    -
    fitMeasures(weak_model_lav_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    +
    fitMeasures(weak_model_lav_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    #>  chisq     df pvalue    cfi  rmsea 
     #> 12.077  7.000  0.098  0.997  0.036
    -
    fitMeasures(weak_model_smt_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    +
    fitMeasures(weak_model_smt_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    #>  chisq     df pvalue    cfi  rmsea 
     #> 12.077  7.000  0.098  0.997  0.036
    @@ -4269,7 +8952,9 @@ Michael McCarthy. (2022, June 16). On motion.
    -
    lavTestLRT(configural_model_lav_fit, weak_model_lav_fit)
    +
    lavTestLRT(configural_model_lav_fit, weak_model_lav_fit)

    Using the semTools package.

    -
    # Example 2.2
    -strong_model_smt <- measEq.syntax(
    -  configural.model = configural_model_smt,
    -  longFacNames = longitudinal_factor_names,
    -  ID.fac = "std.lv",
    -  ID.cat = "Wu.Estabrook.2016",
    -  long.equal = c("loadings", "lv.variances"),
    -  data = social_exchanges
    +
    # Example 2.2
    +strong_model_smt <- measEq.syntax(
    +  configural.model = configural_model_smt,
    +  longFacNames = longitudinal_factor_names,
    +  ID.fac = "std.lv",
    +  ID.cat = "Wu.Estabrook.2016",
    +  long.equal = c("loadings", "lv.variances"),
    +  data = social_exchanges
     )
    -strong_model_smt <- as.character(strong_model_smt)
    +strong_model_smt <- as.character(strong_model_smt)
     
    -strong_model_smt_fit <- sem(strong_model_smt, social_exchanges)
    +strong_model_smt_fit <- sem(strong_model_smt, social_exchanges)

    Compare lavaan and semTools fit measures.

    -
    fitMeasures(strong_model_lav_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    +
    fitMeasures(strong_model_lav_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    #>  chisq     df pvalue    cfi  rmsea 
     #> 37.553  8.000  0.000  0.983  0.080
    -
    fitMeasures(strong_model_smt_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    +
    fitMeasures(strong_model_smt_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    #>  chisq     df pvalue    cfi  rmsea 
     #> 37.553  8.000  0.000  0.983  0.080
    @@ -4334,7 +9113,9 @@ Michael McCarthy. (2022, June 16). On motion.
    -
    lavTestLRT(configural_model_lav_fit, weak_model_lav_fit, strong_model_lav_fit)
    +
    lavTestLRT(configural_model_lav_fit, weak_model_lav_fit, strong_model_lav_fit)
    @@ -4349,54 +9130,160 @@ Michael McCarthy. (2022, June 16). On motion. Strict Invariance

    Using the lavaan package.

    -
    strict_model_lav <- ("
    -  # Measurement model
    -  w1comp =~ w1vst1 + a*w1vst2 + b*w1vst3 # Factor loading equality constraint
    -  w2comp =~ w2vst1 + a*w2vst2 + b*w2vst3 # Factor loading equality constraint
    +
    strict_model_lav <- ("
    +  # Measurement model
    +  w1comp =~ w1vst1 + a*w1vst2 + b*w1vst3 # Factor loading equality constraint
    +  w2comp =~ w2vst1 + a*w2vst2 + b*w2vst3 # Factor loading equality constraint
     
    -  # Variances & covariances
    -  w2comp ~~ w1comp
    +  # Variances & covariances
    +  w2comp ~~ w1comp
     
    -  w1comp ~~ c*w1comp # Factor variance equality constraint
    -  w2comp ~~ c*w2comp # Factor variance equality constraint
    +  w1comp ~~ c*w1comp # Factor variance equality constraint
    +  w2comp ~~ c*w2comp # Factor variance equality constraint
     
    -  w1vst1 ~~ w2vst1
    -  w1vst2 ~~ w2vst2
    -  w1vst3 ~~ w2vst3
    +  w1vst1 ~~ w2vst1
    +  w1vst2 ~~ w2vst2
    +  w1vst3 ~~ w2vst3
     
    -  w1vst1 ~~ d*w1vst1 # Measurement residual equality constraint
    -  w1vst2 ~~ e*w1vst2 # Measurement residual equality constraint
    -  w1vst3 ~~ f*w1vst3 # Measurement residual equality constraint
    +  w1vst1 ~~ d*w1vst1 # Measurement residual equality constraint
    +  w1vst2 ~~ e*w1vst2 # Measurement residual equality constraint
    +  w1vst3 ~~ f*w1vst3 # Measurement residual equality constraint
     
    -  w2vst1 ~~ d*w2vst1 # Measurement residual equality constraint
    -  w2vst2 ~~ e*w2vst2 # Measurement residual equality constraint
    -  w2vst3 ~~ f*w2vst3 # Measurement residual equality constraint
    -")
    +  w2vst1 ~~ d*w2vst1 # Measurement residual equality constraint
    +  w2vst2 ~~ e*w2vst2 # Measurement residual equality constraint
    +  w2vst3 ~~ f*w2vst3 # Measurement residual equality constraint
    +")
     
    -strict_model_lav_fit <- sem(strict_model_lav, social_exchanges)
    +strict_model_lav_fit <- sem(strict_model_lav, social_exchanges)

    Using the semTools package.

    -
    strict_model_smt <- measEq.syntax(
    -  configural.model = configural_model_smt,
    -  longFacNames = longitudinal_factor_names,
    -  ID.fac = "std.lv",
    -  ID.cat = "Wu.Estabrook.2016",
    -  long.equal = c("loadings", "lv.variances", "residuals"),
    -  data = social_exchanges
    +
    strict_model_smt <- measEq.syntax(
    +  configural.model = configural_model_smt,
    +  longFacNames = longitudinal_factor_names,
    +  ID.fac = "std.lv",
    +  ID.cat = "Wu.Estabrook.2016",
    +  long.equal = c("loadings", "lv.variances", "residuals"),
    +  data = social_exchanges
     )
    -strict_model_smt <- as.character(strict_model_smt)
    +strict_model_smt <- as.character(strict_model_smt)
     
    -strict_model_smt_fit <- sem(strict_model_smt, social_exchanges)
    +strict_model_smt_fit <- sem(strict_model_smt, social_exchanges)

    Compare lavaan and semTools fit measures.

    -
    fitMeasures(strict_model_lav_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    +
    fitMeasures(strict_model_lav_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    #>  chisq     df pvalue    cfi  rmsea 
     #> 78.779 11.000  0.000  0.961  0.104
    -
    fitMeasures(strict_model_smt_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    +
    fitMeasures(strict_model_smt_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
    #>  chisq     df pvalue    cfi  rmsea 
     #> 78.779 11.000  0.000  0.961  0.104
    @@ -4404,7 +9291,9 @@ Michael McCarthy. (2022, June 16). On motion.
    -
    lavTestLRT(
    +
    lavTestLRT(
       configural_model_lav_fit,
       weak_model_lav_fit,
       strong_model_lav_fit,
    @@ -4425,7 +9314,7 @@ Michael McCarthy. (2022, June 16). On motion. 

    +

    Comments

    +

    Comments

    @@ -4449,7 +9338,7 @@ Michael McCarthy. (2022, June 16). On motion.

    Session Info

    +

    Session Info

    Data

    Download the data used in this post.

    -

    Reuse

    Citation

    BibTeX citation:
    @online{mccarthy2021,
    -  author = {Michael McCarthy},
    -  title = {Longitudinal {Measurement} {Invariance}},
    -  date = {2021-11-01},
    -  url = {https://tidytales.ca/snippets/2021-11-01_longitudinal-measurement-invariance},
    -  langid = {en}
    -}
    -
    For attribution, please cite this work as:
    -Michael McCarthy. (2021, November 1). Longitudinal Measurement +

    Reuse

    Citation

    For attribution, please cite this work as:
    +McCarthy, M. (2021, November 1). Longitudinal Measurement Invariance. https://tidytales.ca/snippets/2021-11-01_longitudinal-measurement-invariance
    ]]> .Model @@ -4532,68 +9414,470 @@ Invariance. article metadata. However, some of these features are not available for non-article pages on distill blogs (such as link preview images for the home page), and the automatic behaviour of these features limits how much they can be customized. Both of these limitations can be overcome using the metathis package by Garrick Aden-Buie.

    Tom Mock has a great blog post diving into how metadata can be used to customize how links from a distill blog appear on social media. It’s a great resource and I followed it to add metadata and preview images to the home and about pages of Tidy Tales.

    Here is what the index.Rmd file for the Tidy Tales home page looks like.

    -
    ---
    -title: "Wrangling, Visualizing, Modelling, Communicating data"
    -site: distill::distill_website
    -listing: posts
    ----
    +
    ---
    +title: "Wrangling, Visualizing, Modelling, Communicating data"
    +site: distill::distill_website
    +listing: posts
    +---
     
    -```{r, include=FALSE, results='asis'}
    -library(metathis)
    +```{r, include=FALSE, results='asis'}
    +library(metathis)
     
    -meta() %>%
    -  meta_social(
    -    title = "Tidy Tales",
    -    description = "Wrangling, Visualizing, Modelling, Communicating data",
    -    url = "https://tidytales.ca",
    -    image = "https://tidytales.ca/inst/images/twittercard.png",
    -    image_alt = "Tidy Tales logo",
    -    og_type = "website",
    -    twitter_card_type = "summary",
    -    twitter_site = NULL
    -  )
    -```
    +meta() %>% + meta_social( + title = "Tidy Tales", + description = "Wrangling, Visualizing, Modelling, Communicating data", + url = "https://tidytales.ca", + image = "https://tidytales.ca/inst/images/twittercard.png", + image_alt = "Tidy Tales logo", + og_type = "website", + twitter_card_type = "summary", + twitter_site = NULL + ) +```

    When the site is built distill will automatically generate metadata for the home page, and the metathis code in index.Rmd will generate additional metadata for the home page. Here is what it looks like in HTML.

    -
    <!-- Generated by distill -->
    -<meta property="og:title" content="Tidy Tales | Michael McCarthy: Wrangling, Visualizing, Modelling, Communicating data">
    -<meta property="og:type" content="article">
    -<meta property="og:locale" content="en_US">
    -<meta property="og:site_name" content="Tidy Tales | Michael McCarthy">
    -<meta property="twitter:card" content="summary">
    -<meta property="twitter:title" content="Tidy Tales | Michael McCarthy: Wrangling, Visualizing, Modelling, Communicating data">
    -<meta property="twitter:site" content="@propertidytales">
    -<meta property="twitter:creator" content="@mccarthymg">
    +
    <!-- Generated by distill -->
    +<meta property="og:title" content="Tidy Tales | Michael McCarthy: Wrangling, Visualizing, Modelling, Communicating data">
    +<meta property="og:type" content="article">
    +<meta property="og:locale" content="en_US">
    +<meta property="og:site_name" content="Tidy Tales | Michael McCarthy">
    +<meta property="twitter:card" content="summary">
    +<meta property="twitter:title" content="Tidy Tales | Michael McCarthy: Wrangling, Visualizing, Modelling, Communicating data">
    +<meta property="twitter:site" content="@propertidytales">
    +<meta property="twitter:creator" content="@mccarthymg">
     
    -<!-- Generated by metathis -->
    -<meta property="og:locale" content="en_US">
    -<meta name="twitter:title" content="Tidy Tales">
    -<meta name="twitter:description" content="Wrangling, Visualizing, Modelling, Communicating data">
    -<meta name="twitter:url" content="https://tidytales.ca">
    -<meta name="twitter:image:src" content="https://tidytales.ca/inst/images/twittercard.png">
    -<meta name="twitter:image:alt" content="Tidy Tales logo">
    -<meta name="twitter:card" content="summary">
    -<meta property="og:title" content="Tidy Tales">
    -<meta property="og:description" content="Wrangling, Visualizing, Modelling, Communicating data">
    -<meta property="og:url" content="https://tidytales.ca">
    -<meta property="og:image" content="https://tidytales.ca/inst/images/twittercard.png">
    -<meta property="og:image:alt" content="Tidy Tales logo">
    -<meta property="og:type" content="website">
    +<!-- Generated by metathis --> +<meta property="og:locale" content="en_US"> +<meta name="twitter:title" content="Tidy Tales"> +<meta name="twitter:description" content="Wrangling, Visualizing, Modelling, Communicating data"> +<meta name="twitter:url" content="https://tidytales.ca"> +<meta name="twitter:image:src" content="https://tidytales.ca/inst/images/twittercard.png"> +<meta name="twitter:image:alt" content="Tidy Tales logo"> +<meta name="twitter:card" content="summary"> +<meta property="og:title" content="Tidy Tales"> +<meta property="og:description" content="Wrangling, Visualizing, Modelling, Communicating data"> +<meta property="og:url" content="https://tidytales.ca"> +<meta property="og:image" content="https://tidytales.ca/inst/images/twittercard.png"> +<meta property="og:image:alt" content="Tidy Tales logo"> +<meta property="og:type" content="website">

    There is some overlap between the <meta> tags generated by distill and metathis, however, the metadata tags generated by metathis seem to take precedence over those automatically generated by distill. For example, the Twitter card for the Tidy Tales home page displays “Tidy Tales” as its title, rather than “Tidy Tales | Michael McCarthy: Wrangling, Visualizing, Modelling, Communicating data”.

    Article Metadata

    The ability to override some of the metadata generated by distill using metathis is hacky, but it also affords more customization for distill blogs. One trick I’m taking full advantage of with this is to have unique preview images between posts on Tidy Tales and their social cards. Distill allows you to specify a preview image for a post using the preview chunk option.

    -
    ```{r, preview=TRUE}
    -library(ggplot2)
    -ggplot(diamonds, aes(carat, price)) +
    -  geom_smooth()
    -```
    +
    ```{r, preview=TRUE}
    +library(ggplot2)
    +ggplot(diamonds, aes(carat, price)) +
    +  geom_smooth()
    +```

    This preview image will be used alongside post listings and in social cards. However, if a different image is specified in metathis::meta_social() that image will be used in social cards but the preview image specified in the post chunk will still be used alongside post listings. I’m using this on Tidy Tales to have branded images for social cards and plain images for post listings. Here’s an example of the branded social card image from my first post.

    -

    The branded social card image for my first post. Copy the post link into a tweet to see it in action.

    +
    The branded social card image for my first post. Copy the post link into a tweet to see it in action.
    @@ -4604,60 +9888,358 @@ Invariance. Utterances Comments

    Distill only supports Disqus comments officially. I did not want to use Disqus comments on Tidy Tales because it would add bloat to my posts, and because I do not want a third-party data mining and tracking Tidy Tales readers. Utterances is a lightweight alternative that uses GitHub issues for comments. Miles McBain shared an HTML script on his blog to add Utterances to a distill blog.

    Here is what the script for Tidy Tales looks like.

    -
    <script>
    - document.addEventListener("DOMContentLoaded", function () {
    -   if (!/posts/.test(location.pathname)) {
    -     return;
    +
    <script>
    + document.addEventListener("DOMContentLoaded", function () {
    +   if (!/posts/.test(location.pathname)) {
    +     return;
        }
     
    -   var script = document.createElement("script");
    -   script.src = "https://utteranc.es/client.js";
    -   script.setAttribute("repo", "mccarthy-m-g/tidytales");
    -   script.setAttribute("issue-term", "title");
    -   script.setAttribute("crossorigin", "anonymous");
    -   script.setAttribute("label", "utterances");
    -   script.setAttribute("theme", "github-light");
    +   var script = document.createElement("script");
    +   script.src = "https://utteranc.es/client.js";
    +   script.setAttribute("repo", "mccarthy-m-g/tidytales");
    +   script.setAttribute("issue-term", "title");
    +   script.setAttribute("crossorigin", "anonymous");
    +   script.setAttribute("label", "utterances");
    +   script.setAttribute("theme", "github-light");
     
    -   /* wait for article to load, append script to article element */
    -   var observer = new MutationObserver(function (mutations, observer) {
    -     var article = document.querySelector("details.comment-section");
    -     if (article) {
    -       observer.disconnect();
    -       /* HACK: article scroll */
    -       article.setAttribute("style", "overflow-y: hidden");
    -       article.appendChild(script);
    +   /* wait for article to load, append script to article element */
    +   var observer = new MutationObserver(function (mutations, observer) {
    +     var article = document.querySelector("details.comment-section");
    +     if (article) {
    +       observer.disconnect();
    +       /* HACK: article scroll */
    +       article.setAttribute("style", "overflow-y: hidden");
    +       article.appendChild(script);
          }
    -   });
    +   });
     
    -   observer.observe(document.body, { childList: true });
    - });
    -</script>
    + observer.observe(document.body, { childList: true }); + }); +</script>

    The script uses JavaScript to inject the Utterances <iframe> into the end of the first HTML Element within the document that matches the CSS selector specified in document.querySelector(). By default, the script shared by Miles will place the comment section at the end of a distill post’s body. Since Utterances comments sections are not collapsible this presents a problem though, as more comments are made readers will have to scroll further and further to reach a post’s appendix.

    To overcome this on Tidy Tales I created new CSS selectors that use the <details> tag, so readers can show and hide the comments section as they please, and added a brightness and opacity filter to the selector for the Utterances <iframe> to make it fit into the Tidy Tales colour scheme better. I also wanted my comments section to be in the appendix of my posts rather than the body.

    -
    d-appendix details.comment-section {
    -  color: var(--dark-shade-alpha);
    -  font-family: var(--heading-font);
    -  font-size: 15px !important;
    +
    d-appendix details.comment-section {
    +  color: var(--dark-shade-alpha);
    +  font-family: var(--heading-font);
    +  font-size: 15px !important;
     }
     
    -d-appendix details.comment-section summary:after {
    -  content: "Show";
    +d-appendix details.comment-section summary:after {
    +  content: "Show";
     }
     
    -d-appendix details[open].comment-section summary:after {
    -  content: "Hide";
    +d-appendix details[open].comment-section summary:after {
    +  content: "Hide";
     }
     
    -.utterances {
    -  filter: brightness(95%) opacity(85%);
    +.utterances {
    +  filter: brightness(95%) opacity(85%);
     }

    The above HTML and CSS is applied to all Tidy Tales posts using the theme and includes parameters in _site.yml, so to add Utterances to a post I only need to include the following in the R Markdown file for a post as an appendix header.

    -
    ## Comments {.appendix}
    +
    ## Comments {.appendix}
     
    -<details open class="comment-section">
    -   <summary>
    -   </summary>
    -</details>
    +<details open class="comment-section"> + <summary> + </summary> +</details>

    Post Templates

    @@ -4670,7 +10252,7 @@ Invariance.

    -

    Preview of the new post from template RStudio addin for distilltools.

    +
    Preview of the new post from template RStudio addin for distilltools.
    @@ -4684,7 +10266,7 @@ Invariance.

    @@ -4818,7 +10392,7 @@ A soul-devouring demon

    -

    The Demon’s Souls logo.

    +
    The Demon’s Souls logo.
    @@ -4834,34 +10408,226 @@ A soul-devouring demon

    Prerequisites

    -
    library(tidyverse)
    -library(ggfx)
    -library(magick)
    +
    library(tidyverse)
    +library(ggfx)
    +library(magick)

    I’ll be using PlayStation Network trophy data for my plot. The data contains statistics for the percent of players who have slain a given boss in Demon’s Souls out of all the players who have ever played the game. I have constructed the data manually since Sony does not provide an API to access PlayStation Network trophy data programmatically. Demon’s Souls was released on February 5, 2009, so it is unlikely these stats will change much in the future.

    -
    # Tribbles are not just useful for scaring Klingons, they make it easy to
    -# create tibbles too
    -demons_souls <- tribble(
    -  ~boss,            ~boss_type,  ~location,              ~archstone, ~percent_completed,
    -  "Phalanx",        "Demon",     "Boletarian Palace",    "1-1",      63.1,               
    -  "Tower Knight",   "Demon",     "Boletarian Palace",    "1-2",      46.6,               
    -  "Penetrator",     "Demon",     "Boletarian Palace",    "1-3",      30.3,               
    -  "False King",     "Archdemon", "Boletarian Palace",    "1-4",      24.2,               
    -  "Armor Spider",   "Demon",     "Stonefang Tunnel",     "2-1",      43.9,               
    -  "Flamelurker",    "Demon",     "Stonefang Tunnel",     "2-2",      35.1,               
    -  "Dragon God",     "Archdemon", "Stonefang Tunnel",     "2-3",      33.1,               
    -  "Fool’s Idol",    "Demon",     "Tower of Latria",      "3-1",      35.7,               
    -  "Maneater",       "Demon",     "Tower of Latria",      "3-2",      28.7,               
    -  "Old Monk",       "Archdemon", "Tower of Latria",      "3-3",      27.7,               
    -  "Adjudicator",    "Demon",     "Shrine of Storms",     "4-1",      36.1,               
    -  "Old Hero",       "Demon",     "Shrine of Storms",     "4-2",      28.8,               
    -  "Storm King",     "Archdemon", "Shrine of Storms",     "4-3",      28.1,               
    -  "Leechmonger",    "Demon",     "Valley of Defilement", "5-1",      32.5,               
    -  "Dirty Colossus", "Demon",     "Valley of Defilement", "5-2",      27.2,               
    -  "Maiden Astraea", "Archdemon", "Valley of Defilement", "5-3",      26.6
    -) %>%
    -  mutate(across(boss_type:archstone, as_factor))
    +
    # Tribbles are not just useful for scaring Klingons, they make it easy to
    +# create tibbles too
    +demons_souls <- tribble(
    +  ~boss,            ~boss_type,  ~location,              ~archstone, ~percent_completed,
    +  "Phalanx",        "Demon",     "Boletarian Palace",    "1-1",      63.1,               
    +  "Tower Knight",   "Demon",     "Boletarian Palace",    "1-2",      46.6,               
    +  "Penetrator",     "Demon",     "Boletarian Palace",    "1-3",      30.3,               
    +  "False King",     "Archdemon", "Boletarian Palace",    "1-4",      24.2,               
    +  "Armor Spider",   "Demon",     "Stonefang Tunnel",     "2-1",      43.9,               
    +  "Flamelurker",    "Demon",     "Stonefang Tunnel",     "2-2",      35.1,               
    +  "Dragon God",     "Archdemon", "Stonefang Tunnel",     "2-3",      33.1,               
    +  "Fool’s Idol",    "Demon",     "Tower of Latria",      "3-1",      35.7,               
    +  "Maneater",       "Demon",     "Tower of Latria",      "3-2",      28.7,               
    +  "Old Monk",       "Archdemon", "Tower of Latria",      "3-3",      27.7,               
    +  "Adjudicator",    "Demon",     "Shrine of Storms",     "4-1",      36.1,               
    +  "Old Hero",       "Demon",     "Shrine of Storms",     "4-2",      28.8,               
    +  "Storm King",     "Archdemon", "Shrine of Storms",     "4-3",      28.1,               
    +  "Leechmonger",    "Demon",     "Valley of Defilement", "5-1",      32.5,               
    +  "Dirty Colossus", "Demon",     "Valley of Defilement", "5-2",      27.2,               
    +  "Maiden Astraea", "Archdemon", "Valley of Defilement", "5-3",      26.6
    +) %>%
    +  mutate(across(boss_type:archstone, as_factor))
     
     demons_souls
    @@ -4879,17 +10645,67 @@ A soul-devouring demon

    The data is already structured the way I want it for my plot, but there are still some interesting things to explore through wrangling and summary stats.

    Within each location, players have to slay each demon in the order specified by the archstones. For example, in the Boletarian Palace a player cannot face the Tower Knight before they have slain the Phalanx. So each location has a first, second, and third boss (and the Boletarian Palace has a fourth that can only be faced after slaying all the other demons). This can be used to get an imperfect idea of player attrition in the game.

    -
    # Detect the order of bosses based on archstone suffix
    -demons_souls <- demons_souls %>%
    -  mutate(
    -    archstone_boss = case_when(
    -      str_detect(archstone, "-1") ~ "First",
    -      str_detect(archstone, "-2") ~ "Second",
    -      str_detect(archstone, "-3") ~ "Third",
    -      str_detect(archstone, "-4") ~ "Fourth (False King)"
    +
    # Detect the order of bosses based on archstone suffix
    +demons_souls <- demons_souls %>%
    +  mutate(
    +    archstone_boss = case_when(
    +      str_detect(archstone, "-1") ~ "First",
    +      str_detect(archstone, "-2") ~ "Second",
    +      str_detect(archstone, "-3") ~ "Third",
    +      str_detect(archstone, "-4") ~ "Fourth (False King)"
         ),
    -    archstone_boss = as_factor(archstone_boss),
    -    .after = archstone
    +    archstone_boss = as_factor(archstone_boss),
    +    .after = archstone
       )
     
     demons_souls
    @@ -4904,11 +10720,27 @@ A soul-devouring demon

    Now, there are two ways to go about getting this imperfect idea of player attrition in the game. The first involves using the entire data set.

    -
    # Calculate the average percent of players who have slain the first, second,
    -# ..., archstone boss across locations. 
    -demons_souls %>%
    -  group_by(archstone_boss) %>%
    -  summarise(average_completed = mean(percent_completed))
    +
    # Calculate the average percent of players who have slain the first, second,
    +# ..., archstone boss across locations. 
    +demons_souls %>%
    +  group_by(archstone_boss) %>%
    +  summarise(average_completed = mean(percent_completed))
    @@ -4920,11 +10752,33 @@ A soul-devouring demon

    The second involves removing the Phalanx from the data set due to its influential pull on the average for the first archstone boss. It has a much higher completion percent (63.1%) than the other bosses in the game, and the reason for this is that the Phalanx is the first boss in the game. Players must slay it before they can go to face the first archstone boss from other locations in the game. Removing the Phalanx might give a more accurate picture of average completion for first archstone bosses.

    -
    # Trophy earned: Slayer of Demon "Phalanx"
    -demons_souls %>%
    -  filter(boss != "Phalanx") %>%
    -  group_by(archstone_boss) %>%
    -  summarise(average_completed = mean(percent_completed))
    +
    # Trophy earned: Slayer of Demon "Phalanx"
    +demons_souls %>%
    +  filter(boss != "Phalanx") %>%
    +  group_by(archstone_boss) %>%
    +  summarise(average_completed = mean(percent_completed))
    @@ -4940,12 +10794,34 @@ A soul-devouring demon

    Umbassa.

    About one quarter of Demon’s Souls players persisted to the end of the game. But three quarters did not. Assuming most players at least attempted each location, then averaging by location can give an imperfect idea of their overall difficulty for players during their first playthrough.

    -
    # Calculate the average completion rate by location, arranged from "easiest" to
    -# "hardest"
    -demons_souls %>%
    -  group_by(location) %>%
    -  summarise(average_completed = mean(percent_completed)) %>%
    -  arrange(desc(average_completed))
    +
    # Calculate the average completion rate by location, arranged from "easiest" to
    +# "hardest"
    +demons_souls %>%
    +  group_by(location) %>%
    +  summarise(average_completed = mean(percent_completed)) %>%
    +  arrange(desc(average_completed))
    @@ -4960,309 +10836,1409 @@ A soul-devouring demon

    Visualize

    -
    # Define aliases for plot fonts and colours
    -optimus <- "OptimusPrinceps"
    -optimus_b <- "OptimusPrincepsSemiBold"
    -yellow <- "#ffaf24" #  #fec056
    +
    # Define aliases for plot fonts and colours
    +optimus <- "OptimusPrinceps"
    +optimus_b <- "OptimusPrincepsSemiBold"
    +yellow <- "#ffaf24" #  #fec056

    The plot I want to make is inspired by this Tidy Tuesday plot by Georgios Karamanis. I used Georgios’ code as a starting point, then modified it to get the behaviour and result I wanted.

    The centrepiece of the plot is the coloured text that shows the percent of Demon’s Souls players who have completed a given boss in yellow and who have not in red. This effect is achieved by applying a rectangular filter over the text that only allows the portion of the text within the filter’s borders to be shown. Doing this once for yellow text and once for red text allows the full string to appear, with the ratio of colours within a boss’s name reflecting the percent of players that have completed it. A few calculations are needed in order for the ratios to be accurate, and for the text to look aesthetically pleasing.

    -
    demons_souls_plot <- demons_souls %>%
    -  mutate(
    -    # Percentages need to be in decimal form for the calculations and plotting
    -    # to work properly
    -    percent_completed = percent_completed/100,
    -    boss = fct_reorder(toupper(boss), percent_completed),
    -    # In order to justify text to the same width, a ratio of how many times
    -    # each string would fit into the widest string needs to be calculated. This
    -    # can then be multiplied by an arbitrary value to determine the final size
    -    # for each string of text.
    -    str_width = strwidth(boss, family = optimus_b, units = "inches") * 25.4, # in millimetres
    -    str_ratio = max(str_width)/str_width,
    -    text_size = 4.9 * str_ratio,
    -    # The division here is arbitrary, its effect is reflected in the scale of the
    -    # y-axis
    -    tile_height = text_size / 10
    -  ) %>%
    -  # Bosses will appear from top to bottom based on completion ratios. The
    -  # calculation here accounts for the differences in text size for each string.
    -  arrange(percent_completed) %>%
    -  mutate(y = cumsum(lag(tile_height/2, default = 0) + tile_height/2))
    +
    demons_souls_plot <- demons_souls %>%
    +  mutate(
    +    # Percentages need to be in decimal form for the calculations and plotting
    +    # to work properly
    +    percent_completed = percent_completed/100,
    +    boss = fct_reorder(toupper(boss), percent_completed),
    +    # In order to justify text to the same width, a ratio of how many times
    +    # each string would fit into the widest string needs to be calculated. This
    +    # can then be multiplied by an arbitrary value to determine the final size
    +    # for each string of text.
    +    str_width = strwidth(boss, family = optimus_b, units = "inches") * 25.4, # in millimetres
    +    str_ratio = max(str_width)/str_width,
    +    text_size = 4.9 * str_ratio,
    +    # The division here is arbitrary, its effect is reflected in the scale of the
    +    # y-axis
    +    tile_height = text_size / 10
    +  ) %>%
    +  # Bosses will appear from top to bottom based on completion ratios. The
    +  # calculation here accounts for the differences in text size for each string.
    +  arrange(percent_completed) %>%
    +  mutate(y = cumsum(lag(tile_height/2, default = 0) + tile_height/2))

    Now the plot can be constructed. The final code for the plot is roughly 100 lines long, so I’ve hidden it in the section below. However, there are a few parts of the code I want to highlight before showing the final plot.

    Show Code -
    # The trick for geom spacing is to set the size of the plot from the start
    -file <- tempfile(fileext = '.png')
    -ragg::agg_png(file, width = 4, height = 5.5, res = 300, units = "in")
    +
    # The trick for geom spacing is to set the size of the plot from the start
    +file <- tempfile(fileext = '.png')
    +ragg::agg_png(file, width = 4, height = 5.5, res = 300, units = "in")
     
    -ggplot(demons_souls_plot) +
    -  # Make it easier to see where 50% is using a vertical line. geom_segment() is
    -  # used here instead of geom_vline() because the latter goes up into the title
    -  # text. An empty data frame is supplied so that only one copy of the geom is
    -  # drawn.
    -  geom_segment(aes(
    -    x = 0,
    -    xend = 0,
    -    y = 10.9,
    -    yend = 0,
    -    size = 0.6),
    -    data = data.frame(),
    -    alpha = 0.3,
    -    colour = "grey",
    -    lineend = "round",
    -    linetype = "twodash"
    -  ) +
    -  scale_alpha_identity() +
    +ggplot(demons_souls_plot) +
    +  # Make it easier to see where 50% is using a vertical line. geom_segment() is
    +  # used here instead of geom_vline() because the latter goes up into the title
    +  # text. An empty data frame is supplied so that only one copy of the geom is
    +  # drawn.
    +  geom_segment(aes(
    +    x = 0,
    +    xend = 0,
    +    y = 10.9,
    +    yend = 0,
    +    size = 0.6),
    +    data = data.frame(),
    +    alpha = 0.3,
    +    colour = "grey",
    +    lineend = "round",
    +    linetype = "twodash"
    +  ) +
    +  scale_alpha_identity() +
       
    -  # Set bounding box for yellow portion of centrepiece text
    -  as_reference(
    -    geom_rect(aes(
    -      xmin = -0.5,
    -      xmax = -0.5 + ((percent_completed)),
    -      ymin = y - (tile_height * 0.5),
    -      ymax = y + (tile_height * 0.5)
    +  # Set bounding box for yellow portion of centrepiece text
    +  as_reference(
    +    geom_rect(aes(
    +      xmin = -0.5,
    +      xmax = -0.5 + ((percent_completed)),
    +      ymin = y - (tile_height * 0.5),
    +      ymax = y + (tile_height * 0.5)
         )), 
    -    id = "demon_vanquished"
    -  ) +
    -  # Only show the portion of yellow centrepiece text located within the
    -  # bounding box
    -  with_blend(
    -    geom_text(aes(
    -      x = 0,
    -      y = y,
    -      label = boss,
    -      size = text_size
    +    id = "demon_vanquished"
    +  ) +
    +  # Only show the portion of yellow centrepiece text located within the
    +  # bounding box
    +  with_blend(
    +    geom_text(aes(
    +      x = 0,
    +      y = y,
    +      label = boss,
    +      size = text_size
         ),
    -    colour = yellow,
    -    family = optimus_b),
    -    bg_layer = "demon_vanquished",
    -    blend_type = "in"
    -  ) +
    -  # Set bounding box for red portion of centrepiece text
    -  as_reference(
    -    geom_rect(aes(
    -      xmin = 0.5 - ((1 - percent_completed)),
    -      xmax = 0.5,
    -      ymin = y - (tile_height * 0.5),
    -      ymax = y + (tile_height * 0.5)
    +    colour = yellow,
    +    family = optimus_b),
    +    bg_layer = "demon_vanquished",
    +    blend_type = "in"
    +  ) +
    +  # Set bounding box for red portion of centrepiece text
    +  as_reference(
    +    geom_rect(aes(
    +      xmin = 0.5 - ((1 - percent_completed)),
    +      xmax = 0.5,
    +      ymin = y - (tile_height * 0.5),
    +      ymax = y + (tile_height * 0.5)
         )), 
    -    id = "you_died"
    -  ) +
    -  # Only show the portion of red centrepiece text located within the bounding
    -  # box
    -  with_blend(
    -    geom_text(aes(
    -      x = 0,
    -      y = y,
    -      label = boss,
    -      size = text_size
    +    id = "you_died"
    +  ) +
    +  # Only show the portion of red centrepiece text located within the bounding
    +  # box
    +  with_blend(
    +    geom_text(aes(
    +      x = 0,
    +      y = y,
    +      label = boss,
    +      size = text_size
         ),
    -    colour = "red",
    -    family = optimus_b),
    -    bg_layer = "you_died",
    -    blend_type = "in"
    -  ) +
    +    colour = "red",
    +    family = optimus_b),
    +    bg_layer = "you_died",
    +    blend_type = "in"
    +  ) +
       
    -  # Draw "axis" for Demon Vanquished
    -  annotate(
    -    "text",
    -    x = -0.65,
    -    y = 7.75,
    -    label = "demon vanquished",
    -    angle = 90,
    -    size = 5,
    -    family = optimus,
    -    colour = yellow
    -  ) +
    -  geom_segment(aes(
    -    x = -0.645,
    -    xend = -0.645,
    -    y = 10.05,
    -    yend = 10.45),
    -    lineend = "round",
    -    colour = yellow,
    -    size = 0.3,
    -    arrow = arrow(angle = 45, length = unit(1, "mm"), type = "open")
    -  ) +
    -  # Draw "axis" for You Died
    -  annotate(
    -    "text",
    -    x = 0.65,
    -    y = 4.65,
    -    label = "you died",
    -    angle = 270,
    -    size = 5,
    -    family = optimus,
    -    colour = "red"
    -  ) +
    -  geom_segment(aes(
    -    x = 0.645,
    -    xend = 0.645,
    -    y = 3.51,
    -    yend = 3.01),
    -    lineend = "round",
    -    colour = "red",
    -    size = 0.3,
    -    arrow = arrow(angle = 45, length = unit(1, "mm"), type = "open")
    -  ) +
    +  # Draw "axis" for Demon Vanquished
    +  annotate(
    +    "text",
    +    x = -0.65,
    +    y = 7.75,
    +    label = "demon vanquished",
    +    angle = 90,
    +    size = 5,
    +    family = optimus,
    +    colour = yellow
    +  ) +
    +  geom_segment(aes(
    +    x = -0.645,
    +    xend = -0.645,
    +    y = 10.05,
    +    yend = 10.45),
    +    lineend = "round",
    +    colour = yellow,
    +    size = 0.3,
    +    arrow = arrow(angle = 45, length = unit(1, "mm"), type = "open")
    +  ) +
    +  # Draw "axis" for You Died
    +  annotate(
    +    "text",
    +    x = 0.65,
    +    y = 4.65,
    +    label = "you died",
    +    angle = 270,
    +    size = 5,
    +    family = optimus,
    +    colour = "red"
    +  ) +
    +  geom_segment(aes(
    +    x = 0.645,
    +    xend = 0.645,
    +    y = 3.51,
    +    yend = 3.01),
    +    lineend = "round",
    +    colour = "red",
    +    size = 0.3,
    +    arrow = arrow(angle = 45, length = unit(1, "mm"), type = "open")
    +  ) +
       
    -  # Draw a title surrounded by line decorations at the top of the panel
    -  geom_segment(aes(
    -    x = -0.75,
    -    xend = 0.75,
    -    y = 13.2,
    -    yend = 13.2,
    -    size = 0.3),
    -    lineend = "round",
    -    colour = "grey"
    -  ) +
    -  annotate(
    -    "text",
    -    x = 0,
    -    y = 12.325,
    -    size = 7,
    -    family = optimus_b,
    -    colour = "white",
    -    lineheight = 0.75,
    -    label = "DEMON’S SOULS\nBOSS COMPLETION"
    -  ) +
    -  geom_segment(aes(
    -    x = -0.025,
    -    xend = -0.75,
    -    y = 11.4,
    -    yend = 11.4,
    -    size = 0.3),
    -    lineend = "round",
    -    colour = "grey"
    -  ) +
    -  geom_segment(aes(
    -    x = 0.025,
    -    xend = 0.75,
    -    y = 11.4,
    -    yend = 11.4,
    -    size = 0.3),
    -    lineend = "round",
    -    colour = "grey"
    -  ) +
    -  annotate(
    -    "point",
    -    x  = 0,
    -    y = 11.4,
    -    colour = "grey",
    -    shape = 5,
    -    size = 2,
    -    stroke = 0.6
    -  ) +
    -  annotate(
    -    "point",
    -    x  = 0,
    -    y = 11.4,
    -    colour = "grey",
    -    shape = 5,
    -    size = 0.75
    -  ) +
    +  # Draw a title surrounded by line decorations at the top of the panel
    +  geom_segment(aes(
    +    x = -0.75,
    +    xend = 0.75,
    +    y = 13.2,
    +    yend = 13.2,
    +    size = 0.3),
    +    lineend = "round",
    +    colour = "grey"
    +  ) +
    +  annotate(
    +    "text",
    +    x = 0,
    +    y = 12.325,
    +    size = 7,
    +    family = optimus_b,
    +    colour = "white",
    +    lineheight = 0.75,
    +    label = "DEMON’S SOULS\nBOSS COMPLETION"
    +  ) +
    +  geom_segment(aes(
    +    x = -0.025,
    +    xend = -0.75,
    +    y = 11.4,
    +    yend = 11.4,
    +    size = 0.3),
    +    lineend = "round",
    +    colour = "grey"
    +  ) +
    +  geom_segment(aes(
    +    x = 0.025,
    +    xend = 0.75,
    +    y = 11.4,
    +    yend = 11.4,
    +    size = 0.3),
    +    lineend = "round",
    +    colour = "grey"
    +  ) +
    +  annotate(
    +    "point",
    +    x  = 0,
    +    y = 11.4,
    +    colour = "grey",
    +    shape = 5,
    +    size = 2,
    +    stroke = 0.6
    +  ) +
    +  annotate(
    +    "point",
    +    x  = 0,
    +    y = 11.4,
    +    colour = "grey",
    +    shape = 5,
    +    size = 0.75
    +  ) +
       
    -  # Draw plot caption
    -  annotate(
    -    "text",
    -    x = 1,
    -    y = 10.33,
    -    angle = 270,
    -    hjust = 0,
    -    size = 3,
    -    alpha = 0.3,
    -    label = "SOURCE: PLAYSTATION NETWORK | GRAPHIC: MICHAEL MCCARTHY",
    -    family = optimus,
    -    color = "grey"
    -  ) +
    +  # Draw plot caption
    +  annotate(
    +    "text",
    +    x = 1,
    +    y = 10.33,
    +    angle = 270,
    +    hjust = 0,
    +    size = 3,
    +    alpha = 0.3,
    +    label = "SOURCE: PLAYSTATION NETWORK | GRAPHIC: MICHAEL MCCARTHY",
    +    family = optimus,
    +    color = "grey"
    +  ) +
       
    -  # Make sure the text size calculated for each string is used so that strings
    -  # are justified
    -  scale_size_identity() +
    -  # Take axis limits exactly from data so there's no spacing around the panel,
    -  # allow drawing outside of the panel for annotations, and set the axis limits
    -  # to match the limits of the text.
    -  coord_cartesian(expand = FALSE, clip = "off", xlim = c(-0.5, 0.5)) +
    -  # Specify the panel size manually. This makes it easier to position plot
    -  # elements with absolute positions.
    -  ggh4x::force_panelsizes(rows = unit(5, "in"), # height
    -                          cols = unit(1.8, "in")) + # width
    -  theme_void() +
    -  theme(
    -    legend.position = "none",
    -    plot.margin = unit(c(0.5, 4, 0.5, 4), "in"),
    -    plot.background = element_rect(fill = "black", color = NA))
    +  # Make sure the text size calculated for each string is used so that strings
    +  # are justified
    +  scale_size_identity() +
    +  # Take axis limits exactly from data so there's no spacing around the panel,
    +  # allow drawing outside of the panel for annotations, and set the axis limits
    +  # to match the limits of the text.
    +  coord_cartesian(expand = FALSE, clip = "off", xlim = c(-0.5, 0.5)) +
    +  # Specify the panel size manually. This makes it easier to position plot
    +  # elements with absolute positions.
    +  ggh4x::force_panelsizes(rows = unit(5, "in"), # height
    +                          cols = unit(1.8, "in")) + # width
    +  theme_void() +
    +  theme(
    +    legend.position = "none",
    +    plot.margin = unit(c(0.5, 4, 0.5, 4), "in"),
    +    plot.background = element_rect(fill = "black", color = NA))
     
    -invisible(dev.off())
    +invisible(dev.off())
     
    -# Apply a mask texture to the final image to mimic the style of the Demon's
    -# Souls logo in the plot title
    -mask <- image_read(
    -  here("posts", "2021-06-15_demons-souls", "images", "texture.png")
    -  ) %>%
    -  image_transparent("white") %>%
    -  image_threshold("black", "90%")
    +# Apply a mask texture to the final image to mimic the style of the Demon's
    +# Souls logo in the plot title
    +mask <- image_read(
    +  here("posts", "2021-06-15_demons-souls", "images", "texture.png")
    +  ) %>%
    +  image_transparent("white") %>%
    +  image_threshold("black", "90%")
     
    -final_plot <- image_composite(image_read(file), mask, operator = "Over")
    +final_plot <- image_composite(image_read(file), mask, operator = "Over")

    First, the code behind the coloured centrepiece text. It uses ggfx::as_reference() and ggfx::with_blend() to selectively apply a filter over portions of the text, as I discussed earlier. The boundaries of the filter are defined by the ggplot2 geom inside of ggfx::as_reference(), then ggfx::with_blend() applies a filter specified by blend_type to the ggplot2 geom inside of it. By duplicating this process twice—once for yellow text and again for red text—but with different filter boundaries based on the percent completed and not completed, the entire boss name is displayed with accurate colour fills.

    -
      # Set bounding box for yellow portion of centrepiece text
    -  as_reference(
    -    geom_rect(aes(
    -      xmin = -0.5,
    -      xmax = -0.5 + ((percent_completed)),
    -      ymin = y - (tile_height * 0.5),
    -      ymax = y + (tile_height * 0.5)
    +
      # Set bounding box for yellow portion of centrepiece text
    +  as_reference(
    +    geom_rect(aes(
    +      xmin = -0.5,
    +      xmax = -0.5 + ((percent_completed)),
    +      ymin = y - (tile_height * 0.5),
    +      ymax = y + (tile_height * 0.5)
         )), 
    -    id = "demon_vanquished"
    -  ) +
    -  # Only show the portion of yellow centrepiece text located within the
    -  # bounding box
    -  with_blend(
    -    geom_text(aes(
    -      x = 0,
    -      y = y,
    -      label = boss,
    -      size = text_size
    +    id = "demon_vanquished"
    +  ) +
    +  # Only show the portion of yellow centrepiece text located within the
    +  # bounding box
    +  with_blend(
    +    geom_text(aes(
    +      x = 0,
    +      y = y,
    +      label = boss,
    +      size = text_size
         ),
    -    colour = yellow,
    -    family = optimus_b),
    -    bg_layer = "demon_vanquished",
    -    blend_type = "in"
    -  ) +
    -   # Set bounding box for red portion of centrepiece text
    -  as_reference(
    -    geom_rect(aes(
    -      xmin = 0.5 - ((1 - percent_completed)),
    -      xmax = 0.5,
    -      ymin = y - (tile_height * 0.5),
    -      ymax = y + (tile_height * 0.5)
    +    colour = yellow,
    +    family = optimus_b),
    +    bg_layer = "demon_vanquished",
    +    blend_type = "in"
    +  ) +
    +   # Set bounding box for red portion of centrepiece text
    +  as_reference(
    +    geom_rect(aes(
    +      xmin = 0.5 - ((1 - percent_completed)),
    +      xmax = 0.5,
    +      ymin = y - (tile_height * 0.5),
    +      ymax = y + (tile_height * 0.5)
         )), 
    -    id = "you_died"
    -  ) +
    -  # Only show the portion of red centrepiece text located within the bounding
    -  # box
    -  with_blend(
    -    geom_text(aes(
    -      x = 0,
    -      y = y,
    -      label = boss,
    -      size = text_size
    +    id = "you_died"
    +  ) +
    +  # Only show the portion of red centrepiece text located within the bounding
    +  # box
    +  with_blend(
    +    geom_text(aes(
    +      x = 0,
    +      y = y,
    +      label = boss,
    +      size = text_size
         ),
    -    colour = "red",
    -    family = optimus_b),
    -    bg_layer = "you_died",
    -    blend_type = "in"
    +    colour = "red",
    +    family = optimus_b),
    +    bg_layer = "you_died",
    +    blend_type = "in"
       )

    Second, the code behind the distressed, broken style of the title text. This one is actually quite simple. It uses magick::image_composite() to apply a texture mask I made in Krita over the composed plot. The mask has a transparent background with black lines located over the space where the plot title is. Both the composed plot and mask images have the same dimensions, so when they’re composed together the effect is applied exactly where I want it.

    -
    image_composite(plot, mask, operator = "Over")
    +
    image_composite(plot, mask, operator = "Over")

    Finally, I just wanted to note that the decorative lines around the plot’s title text are actually made up of ggplot2 geoms. I used two ggplot2::geom_point() geoms with different sizes to create the diamond on the bottom line.

    @@ -5280,7 +12256,7 @@ A soul-devouring demon

    -

    +

    @@ -5295,7 +12271,7 @@ A soul-devouring demon

    Thanks for reading! I’m Michael, the voice behind Tidy Tales. I am an award winning data scientist and R programmer with the skills and experience to help you solve the problems you care about. You can learn more about me, my consulting services, and my other projects on my personal website.

    -

    Comments

    +

    Comments

    @@ -5304,7 +12280,7 @@ A soul-devouring demon

    -

    Session Info

    +

    Session Info

    @@ -5348,26 +12324,18 @@ A soul-devouring demon

    -

    Data

    +

    Data

    Download the data used in this post.

    -

    Fair Dealing

    +

    Fair Dealing

    Any of the trademarks, service marks, collective marks, design rights or similar rights that are mentioned, used, or cited in this article are the property of their respective owners. They are used here as fair dealing for the purpose of education in accordance with section 29 of the Copyright Act and do not infringe copyright.

    -

    Citation

    BibTeX citation:
    @online{mccarthy2021,
    -  author = {Michael McCarthy},
    -  title = {Go Forth, Slayer of {Demons}},
    -  date = {2021-06-15},
    -  url = {https://tidytales.ca/posts/2021-06-15_demons-souls},
    -  langid = {en}
    -}
    -
    For attribution, please cite this work as:
    -Michael McCarthy. (2021, June 15). Go forth, slayer of Demons. -https://tidytales.ca/posts/2021-06-15_demons-souls +

    Citation

    For attribution, please cite this work as:
    +McCarthy, M. (2021, June 15). Go forth, slayer of Demons. https://tidytales.ca/posts/2021-06-15_demons-souls
    ]]> .Wrangle .Visualize diff --git a/_site/posts/2021-06-15_demons-souls/index.html b/_site/posts/2021-06-15_demons-souls/index.html index 05b252b..aabbb00 100644 --- a/_site/posts/2021-06-15_demons-souls/index.html +++ b/_site/posts/2021-06-15_demons-souls/index.html @@ -2,7 +2,7 @@ - + @@ -20,9 +20,10 @@ ul.task-list{list-style: none;} ul.task-list li input[type="checkbox"] { width: 0.8em; - margin: 0 0.8em 0.2em -1.6em; + margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ vertical-align: middle; } +/* CSS for syntax highlighting */ pre > code.sourceCode { white-space: pre; position: relative; } pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } pre > code.sourceCode > span:empty { height: 1.2em; } @@ -49,43 +50,13 @@ -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none; padding: 0 4px; width: 4em; - color: #aaaaaa; } -pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; } +pre.numberSource { margin-left: 3em; padding-left: 4px; } div.sourceCode { } @media screen { pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } } -code span.al { color: #ff0000; font-weight: bold; } /* Alert */ -code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ -code span.at { color: #7d9029; } /* Attribute */ -code span.bn { color: #40a070; } /* BaseN */ -code span.bu { color: #008000; } /* BuiltIn */ -code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */ -code span.ch { color: #4070a0; } /* Char */ -code span.cn { color: #880000; } /* Constant */ -code span.co { color: #60a0b0; font-style: italic; } /* Comment */ -code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ -code span.do { color: #ba2121; font-style: italic; } /* Documentation */ -code span.dt { color: #902000; } /* DataType */ -code span.dv { color: #40a070; } /* DecVal */ -code span.er { color: #ff0000; font-weight: bold; } /* Error */ -code span.ex { } /* Extension */ -code span.fl { color: #40a070; } /* Float */ -code span.fu { color: #06287e; } /* Function */ -code span.im { color: #008000; font-weight: bold; } /* Import */ -code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ -code span.kw { color: #007020; font-weight: bold; } /* Keyword */ -code span.op { color: #666666; } /* Operator */ -code span.ot { color: #007020; } /* Other */ -code span.pp { color: #bc7a00; } /* Preprocessor */ -code span.sc { color: #4070a0; } /* SpecialChar */ -code span.ss { color: #bb6688; } /* SpecialString */ -code span.st { color: #4070a0; } /* String */ -code span.va { color: #19177c; } /* Variable */ -code span.vs { color: #4070a0; } /* VerbatimString */ -code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */ @@ -161,49 +132,51 @@ Tidy Tales
    +
    @@ -234,7 +207,7 @@
    Affiliation

    @@ -259,7 +232,7 @@

    -
    @@ -317,7 +291,7 @@

    Theming Inspiration

    -

    The Demon’s Souls logo.

    +
    The Demon’s Souls logo.

    @@ -779,7 +753,7 @@

    Final Graphic

    -

    +

    @@ -794,7 +768,7 @@

    M

    Thanks for reading! I’m Michael, the voice behind Tidy Tales. I am an award winning data scientist and R programmer with the skills and experience to help you solve the problems you care about. You can learn more about me, my consulting services, and my other projects on my personal website.

    -

    Comments

    +

    Comments

    @@ -803,7 +777,7 @@

    M

    -

    Session Info

    +

    Session Info

    @@ -847,26 +821,18 @@

    M

    -

    Data

    +

    Data

    Download the data used in this post.

    -

    Fair Dealing

    +

    Fair Dealing

    Any of the trademarks, service marks, collective marks, design rights or similar rights that are mentioned, used, or cited in this article are the property of their respective owners. They are used here as fair dealing for the purpose of education in accordance with section 29 of the Copyright Act and do not infringe copyright.

    -

    Citation

    BibTeX citation:
    @online{mccarthy2021,
    -  author = {Michael McCarthy},
    -  title = {Go Forth, Slayer of {Demons}},
    -  date = {2021-06-15},
    -  url = {https://tidytales.ca/posts/2021-06-15_demons-souls},
    -  langid = {en}
    -}
    -
    For attribution, please cite this work as:
    -Michael McCarthy. (2021, June 15). Go forth, slayer of Demons. -https://tidytales.ca/posts/2021-06-15_demons-souls +

    Citation

    For attribution, please cite this work as:
    +McCarthy, M. (2021, June 15). Go forth, slayer of Demons. https://tidytales.ca/posts/2021-06-15_demons-souls
    @@ -196,49 +167,51 @@ Tidy Tales
    +
    @@ -270,7 +243,7 @@
    Affiliation

    @@ -295,7 +268,7 @@

    -
    @@ -736,9 +710,6 @@

    A simple trajectory< transition_reveal(second) anim

    -
    -
    -

    @@ -826,9 +797,6 @@

    anim <- p +
       transition_reveal(second) +
       shadow_wake(wake_length = 0.1, size = 2, exclude_layer = c(1, 3))

    -
    -
    -

    @@ -838,7 +806,7 @@

    +

    @@ -853,7 +821,7 @@

    M

    Thanks for reading! I’m Michael, the voice behind Tidy Tales. I am an award winning data scientist and R programmer with the skills and experience to help you solve the problems you care about. You can learn more about me, my consulting services, and my other projects on my personal website.

    -

    Comments

    +

    Comments

    @@ -862,7 +830,7 @@

    M

    -

    Session Info

    +

    Session Info

    @@ -924,14 +892,14 @@

    M -

    References

    -
    +

    References

    +
    Projectile motion. (n.d.). Retrieved June 24, 2021, from https://courses.lumenlearning.com/boundless-physics/chapter/projectile-motion/
    -
    +
    Projectile motion calculator. (n.d.). Retrieved June 24, 2021, from https://www.omnicalculator.com/physics/projectile-motion
    -
    +
    What is 2D projectile motion? (n.d.). Retrieved June 24, 2021, from https://www.khanacademy.org/science/physics/two-dimensional-motion/two-dimensional-projectile-mot/a/what-is-2d-projectile-motion

    Footnotes

    @@ -939,15 +907,8 @@

    M
    1. I didn’t look into it too deeply, but I’m guessing it’s related to this issue in ggtext. If you render to a gif instead you won’t have this issue and can use ggtext as normal.↩︎

    -

    Citation

    BibTeX citation:
    @online{mccarthy2022,
    -  author = {Michael McCarthy},
    -  title = {On Motion},
    -  date = {2022-06-16},
    -  url = {https://tidytales.ca/posts/2022-06-16_projectile-motion},
    -  langid = {en}
    -}
    -
    For attribution, please cite this work as:
    -Michael McCarthy. (2022, June 16). On motion. https://tidytales.ca/posts/2022-06-16_projectile-motion +

    Citation

    For attribution, please cite this work as:
    +McCarthy, M. (2022, June 16). On motion. https://tidytales.ca/posts/2022-06-16_projectile-motion

    diff --git a/_site/search.json b/_site/search.json index 2cd26b3..73cda34 100644 --- a/_site/search.json +++ b/_site/search.json @@ -6,6 +6,13 @@ "section": "", "text": "Data science is an exciting discipline that allows you to transform raw data into understanding, insight, and knowledge.\n\n— Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund in R for Data Science\nA data scientist is someone who creates understanding, insight, and knowledge from raw data with programming. Programming is an essential tool in nearly every part of a data science project because it allows you to do data science efficiently and reproducibly.\nThere are many different programming languages you can use to do data science, but here we cover my favourite programming language: R." }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/learning-r.html#whats-data-science", + "href": "series/2023-01-24_reproducible-data-science/posts/learning-r.html#whats-data-science", + "title": "Learning R", + "section": "", + "text": "Data science is an exciting discipline that allows you to transform raw data into understanding, insight, and knowledge.\n\n— Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund in R for Data Science\nA data scientist is someone who creates understanding, insight, and knowledge from raw data with programming. Programming is an essential tool in nearly every part of a data science project because it allows you to do data science efficiently and reproducibly.\nThere are many different programming languages you can use to do data science, but here we cover my favourite programming language: R." + }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/learning-r.html#whats-r", "href": "series/2023-01-24_reproducible-data-science/posts/learning-r.html#whats-r", @@ -62,6 +69,13 @@ "section": "Parting words", "text": "Parting words\n\nData science is not 100% about writing code. There’s a human side to it.\n\n— Hadley Wickham in Designing Data Science\nI discussed this earlier, but it bears repeating: the human side of data science is really important if you want to solve problems successfully. One of the reasons R is my favourite language is because it’s been designed to make statistical thinking and computing accessible to anyone. This accessibility has had a big impact on me—I doubt I would be doing data science without R—and I think it’s why we have such a strong, diverse community of R users and programmers.\nSo I try to make all my work as accessible as it can be, and I recommend you do too. It makes a difference." }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/learning-r.html#footnotes", + "href": "series/2023-01-24_reproducible-data-science/posts/learning-r.html#footnotes", + "title": "Learning R", + "section": "Footnotes", + "text": "Footnotes\n\n\nMyself included. It’s hard for me to recommend how to learn research or statistics in the same way I’ve recommended how to learn R. Hands-On Programming with R and R for Data Science are excellent, beginner-friendly, books that will get you started using the tools of the trade in the way they were intended to be used. But a lot of the excellent statistics books I’ve read are not beginner-friendly (even if they claim to be) and assume you have prior training in statistics. On the other hand, beginner-friendly books can encourage statistical rituals over statistical thinking, which you then have to unlearn in the future as your knowledge and skills develop.↩︎\nRegression and Other Stories is an updated and expanded second edition of the regressions parts of Data Analysis Using Regression and Multilevel/Hierarchical Models. The authors are also working on an updated and expanded second edition of the multilevel modelling parts of Data Analysis Using Regression and Multilevel/Hierarchical Models, but it isn’t out yet.↩︎\nMost of these books have also had their examples translated to use different R packages than the authors used. For example, Andrew Heiss has translated Bayes Rules! and Statistical Rethinking into the tidyverse, brms, and marginaleffects packages; Emil Hvitfeldt has translated An introduction to statistical learning into the tidymodels set of packages; and A. Solomon Kurz has translated Regression and Other Stories into the tidyverse and brms packages.↩︎\nThis should go without saying, but the old “garbage-in garbage-out” adage still applies to reproducible data products. If your data has problems, your code has bugs, your visualizations are misleading, your models are inappropriate, or your communications are unclear, then your data product will be reproducible but not very useful (or maybe even harmful). Quality assurance has to happen at every step, and reproducibility is the last step. It’s supposed to be the little bow on top that ties all the other great work you’ve done together.↩︎" + }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/technical-writing.html", "href": "series/2023-01-24_reproducible-data-science/posts/technical-writing.html", @@ -69,12 +83,19 @@ "section": "", "text": "Literate programming is the practice of mixing text and executable code in the same document, making it possible to write reproducible documents. There are four engines for executing code in Quarto documents:\n\nknitr (supports R, Python, Julia, and more)\njupyter (Python only)\nIJulia (Julia only)\nObservable JS (Observable JS only)\n\nAll engines support executing code in code blocks within a document, which can be used to execute code and include its output in a document. Code blocks can produce a wide variety of outputs, including plots, tabular output from data frames, and plain text. The behaviour of code execution and output can be set with Quarto’s Execution Options.\nknitr, jupyter, and Observable JS also support executing inline code within markdown text, which can be used to allow narrative to automatically use the most up to date computations. The syntax for inline code varies across the engines.\nThe remainder of this post covers some useful literate programming practices when writing reproducible documents with the knitr engine.\n\n\n\n\n\n\nProject management\n\n\n\nLiterate programming is a powerful tool for writing reproducible documents, but it can also become unwieldy if your computations require a lot of code. Rather than writing all the code within a document, it is often better to source the required code within a document, then include output using the objects created by the sourced code. The simplest way to adopt this approach is by using the source() function to source R script files containing the computations for a document; a better way is to use the literate programming approach described in the targets R package." }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/technical-writing.html#literate-programming", + "href": "series/2023-01-24_reproducible-data-science/posts/technical-writing.html#literate-programming", + "title": "Technical Writing", + "section": "", + "text": "Literate programming is the practice of mixing text and executable code in the same document, making it possible to write reproducible documents. There are four engines for executing code in Quarto documents:\n\nknitr (supports R, Python, Julia, and more)\njupyter (Python only)\nIJulia (Julia only)\nObservable JS (Observable JS only)\n\nAll engines support executing code in code blocks within a document, which can be used to execute code and include its output in a document. Code blocks can produce a wide variety of outputs, including plots, tabular output from data frames, and plain text. The behaviour of code execution and output can be set with Quarto’s Execution Options.\nknitr, jupyter, and Observable JS also support executing inline code within markdown text, which can be used to allow narrative to automatically use the most up to date computations. The syntax for inline code varies across the engines.\nThe remainder of this post covers some useful literate programming practices when writing reproducible documents with the knitr engine.\n\n\n\n\n\n\nProject management\n\n\n\nLiterate programming is a powerful tool for writing reproducible documents, but it can also become unwieldy if your computations require a lot of code. Rather than writing all the code within a document, it is often better to source the required code within a document, then include output using the objects created by the sourced code. The simplest way to adopt this approach is by using the source() function to source R script files containing the computations for a document; a better way is to use the literate programming approach described in the targets R package." + }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/technical-writing.html#plain-text-outputs", "href": "series/2023-01-24_reproducible-data-science/posts/technical-writing.html#plain-text-outputs", "title": "Technical Writing", "section": "Plain text outputs", - "text": "Plain text outputs\nMost code you write will be output as plain text. Depending on the purpose of a document, it might be fine to leave that output as is, or you might want to format the text, include it inline, and so forth.\nFor example, say you wanted to report the number of participants in a study you ran. Rather than writing a sentence like this:\n\nThere were twelve participants in the study.\n\nYou could get the number of participants computationally, format that number into the word “twelve”, and include it in the sentence using inline code:\n\nn_participants <- xfun::numbers_to_words(12)\n\n\nThere were `r n_participants` participants in the study.\n\nThis sentence would become “There were twelve participants in the study.” when you render the document. Likewise, if the number of participants changed, the number reported in the sentence would change the next time the document rendered (as long as the R object storing the number of participants knew about the change).\nIf there are many values you need to report inline, storing them in a list is a good practice. For an overview of this approach, see:\n\nLists are my secret weapon for reporting stats with knitr by Tristan Mahr" + "text": "Plain text outputs\nMost code you write will be output as plain text. Depending on the purpose of a document, it might be fine to leave that output as is, or you might want to format the text, include it inline, and so forth.\nFor example, say you wanted to report the number of participants in a study you ran. Rather than writing a sentence like this:\n\nThere were twelve participants in the study.\n\nYou could get the number of participants computationally, format that number into the word “twelve”, and include it in the sentence using inline code:\n\nn_participants <- xfun::numbers_to_words(12)\n\n\nThere were `r n_participants` participants in the study.\n\nThis sentence would become “There were twelve participants in the study.” when you render the document. Likewise, if the number of participants changed, the number reported in the sentence would change the next time the document rendered (as long as the R object storing the number of participants knew about the change).\nIf there are many values you need to report inline, storing them in a list is a good practice. For an overview of this approach, see:\n\nLists are my secret weapon for reporting stats with knitr by Tristan Mahr" }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/technical-writing.html#figure-and-table-outputs", @@ -88,7 +109,14 @@ "href": "series/2023-01-24_reproducible-data-science/posts/technical-writing.html#equations", "title": "Technical Writing", "section": "Equations", - "text": "Equations\nQuarto features extensive support for writing beautiful math equations with LaTeX math expressions authored using standard Pandoc markdown syntax: Use $ delimiters for inline math and $$ delimiters for display math. Provide an #eq- label immediately after a display math equation to make it referenceable.\n\n\n\n\n\n\n\nMarkdown Syntax\nOutput\n\n\n\n\ninline math: $E = mc^{2}$\ninline math: E=mc^{2}\n\n\ndisplay math:\n$$E = mc^{2}$$\ndisplay math:\nE = mc^{2}\n\n\ndisplay math with label:\n$$E = mc^{2}$${#eq-mc-squared}\ndisplay math with label:\nE = mc^{2} \\tag{1}\n\n\n\nMath expressions can be rendered in any of Quarto’s output formats, but different rendering methods are used depending on the format:\n\nhtml offers several math rendering methods\npdf uses LaTeX (including raw LaTeX)\ndocx uses Microsoft Word’s equation environment\n\nTo learn more about writing LaTeX math expressions, see:\n\nMathJax basic tutorial and quick reference1\nCheatsheet for LaTeX Math Commands\n\n\n\n\n\n\n\nInspecting equations rendered by MathJax\n\n\n\nEquations rendered by MathJax can be inspected by right clicking them. This is useful if you want to view or copy the underlying TeX code from an equation Show Math As > TeX Commands or Copy to Clipboard > TeX Commands. Note that Tidy Tales uses the KaTeX renderer, so it won’t work on this site.\n\n\n\nUsing inline R code in math equations\nInline R code `r ` can be used within inline or display math to include code output in math equations.\n\nmtcars_fit <- lm(mpg ~ am, data = mtcars)\ncoef_int <- coef(mtcars_fit)[\"(Intercept)\"]\ncoef_am <- coef(mtcars_fit)[\"am\"]\n\nInline math: $\\mathrm{\\widehat{mpg}} = `r coef_int` + `r coef_am`(\\mathrm{am})$\nInline math: \\mathrm{\\widehat{mpg}} = 17.1473684 + 7.2449393(\\mathrm{am})\nThe same approach also works for display math.\n\n\n\n\n\n\nWriting model equations\n\n\n\nThe equatiomatic package can be used to write equations from a fitted model. Learn more on the package website.\n\nlibrary(equatiomatic)\n\nBy default the model equation uses math symbols.\n\nextract_eq(mtcars_fit)\n\n\n\n#> $$\n#> \\operatorname{mpg} = \\alpha + \\beta_{1}(\\operatorname{am}) + \\epsilon\n#> $$\n\n\nBut model estimates can be used too.\n\nextract_eq(mtcars_fit, use_coefs = TRUE)\n\n\n\n#> $$\n#> \\operatorname{\\widehat{mpg}} = 17.15 + 7.24(\\operatorname{am})\n#> $$\n\n\n\n\n\n\nUsing math equations in plots\nBoth base R and ggplot2 plots feature support for writing beautiful math equations with plotmath expressions. Equations can be used in plot labels, legends, and text.\nIf you would rather write equations using LaTeX math expressions, the latex2exp package can be used to parse and convert LaTeX to plotmath expressions.2 Learn more on the package website.\nggplot2 also includes some convenience functions for using plotmath in plot labels:\n\nlabel_parsed() interprets labels as plotmath expressions\nlabel_bquote() offers a flexible way of labelling facet rows or columns with plotmath expressions" + "text": "Equations\nQuarto features extensive support for writing beautiful math equations with LaTeX math expressions authored using standard Pandoc markdown syntax: Use $ delimiters for inline math and $$ delimiters for display math. Provide an #eq- label immediately after a display math equation to make it referenceable.\n\n\n\n\n\n\n\nMarkdown Syntax\nOutput\n\n\n\n\ninline math: $E = mc^{2}$\ninline math: E=mc^{2}\n\n\ndisplay math:\n$$E = mc^{2}$$\ndisplay math:\nE = mc^{2}\n\n\ndisplay math with label:\n$$E = mc^{2}$${#eq-mc-squared}\ndisplay math with label:\nE = mc^{2} \\tag{1}\n\n\n\nMath expressions can be rendered in any of Quarto’s output formats, but different rendering methods are used depending on the format:\n\nhtml offers several math rendering methods\npdf uses LaTeX (including raw LaTeX)\ndocx uses Microsoft Word’s equation environment\n\nTo learn more about writing LaTeX math expressions, see:\n\nMathJax basic tutorial and quick reference1\nCheatsheet for LaTeX Math Commands\n\n\n\n\n\n\n\nInspecting equations rendered by MathJax\n\n\n\nEquations rendered by MathJax can be inspected by right clicking them. This is useful if you want to view or copy the underlying TeX code from an equation Show Math As > TeX Commands or Copy to Clipboard > TeX Commands. Note that Tidy Tales uses the KaTeX renderer, so it won’t work on this site.\n\n\n\nUsing inline R code in math equations\nInline R code `r ` can be used within inline or display math to include code output in math equations.\n\nmtcars_fit <- lm(mpg ~ am, data = mtcars)\ncoef_int <- coef(mtcars_fit)[\"(Intercept)\"]\ncoef_am <- coef(mtcars_fit)[\"am\"]\n\nInline math: $\\mathrm{\\widehat{mpg}} = `r coef_int` + `r coef_am`(\\mathrm{am})$\nInline math: \\mathrm{\\widehat{mpg}} = 17.1473684 + 7.2449393(\\mathrm{am})\nThe same approach also works for display math.\n\n\n\n\n\n\nWriting model equations\n\n\n\nThe equatiomatic package can be used to write equations from a fitted model. Learn more on the package website.\n\nlibrary(equatiomatic)\n\nBy default the model equation uses math symbols.\n\nextract_eq(mtcars_fit)\n\n\n\n#> $$\n#> \\operatorname{mpg} = \\alpha + \\beta_{1}(\\operatorname{am}) + \\epsilon\n#> $$\n\n\nBut model estimates can be used too.\n\nextract_eq(mtcars_fit, use_coefs = TRUE)\n\n\n\n#> $$\n#> \\operatorname{\\widehat{mpg}} = 17.15 + 7.24(\\operatorname{am})\n#> $$\n\n\n\n\n\n\nUsing math equations in plots\nBoth base R and ggplot2 plots feature support for writing beautiful math equations with plotmath expressions. Equations can be used in plot labels, legends, and text.\nIf you would rather write equations using LaTeX math expressions, the latex2exp package can be used to parse and convert LaTeX to plotmath expressions.2 Learn more on the package website.\nggplot2 also includes some convenience functions for using plotmath in plot labels:\n\nlabel_parsed() interprets labels as plotmath expressions\nlabel_bquote() offers a flexible way of labelling facet rows or columns with plotmath expressions" + }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/technical-writing.html#footnotes", + "href": "series/2023-01-24_reproducible-data-science/posts/technical-writing.html#footnotes", + "title": "Technical Writing", + "section": "Footnotes", + "text": "Footnotes\n\n\nMathJax is an open-source JavaScript rendering engine for LaTeX math expressions. It is the default rendering method in Quarto HTML documents.↩︎\nThere is also an open issue to support LaTeX math expressions using markdown in the ggtext package.↩︎" }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/productivity-tips.html", @@ -97,6 +125,13 @@ "section": "", "text": "RStudio includes a variety of features to make you more productive working with R and RStudio. These features are covered in detail in the RStudio User Manual. Here I highlight some of the features I think are particularly helpful for working productively." }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/productivity-tips.html#overview", + "href": "series/2023-01-24_reproducible-data-science/posts/productivity-tips.html#overview", + "title": "Productivity Tips", + "section": "", + "text": "RStudio includes a variety of features to make you more productive working with R and RStudio. These features are covered in detail in the RStudio User Manual. Here I highlight some of the features I think are particularly helpful for working productively." + }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/productivity-tips.html#customize-rstudio", "href": "series/2023-01-24_reproducible-data-science/posts/productivity-tips.html#customize-rstudio", @@ -146,12 +181,19 @@ "section": "", "text": "The following will install, build, and configure the tools used in the remaining posts in this series (and some other things). This is my personal setup for macOS and is intended to get a new computer up and running with the tools and preferences I like to use. A lot of this is general to any data science setup on macOS, but some of it is specific to me.\nYou are free to use or modify this for your own setup, but should do so thoughtfully. Don’t run any of the scripts below without understanding what they do, and backup your system if you are trying this on an existing setup. To test drive my system setup safely, consider using a virtualized instance of macOS with UTM.\n\n\n\n\n\n\nBefore you begin\n\n\n\nMy system setup is done mainly done through the shell prompt in Terminal, which is the easiest recommended way to get a shell prompt on macOS. You can find Terminal in the Utilities directory inside your Applications directory.\nIf you have never used Terminal before, Apple has a nice shell scripting primer and command line primer book in their archives.\n\n\n\n\nXcode command line tools are needed to build certain packages from source. The command line tools provide a lighter alternative to installing the full Xcode release from the App Store, which contains a lot that isn’t needed for data science use cases.\nInstall Xcode command line tools from the Terminal with:\nxcode-select --install\n\n\n\nHomebrew is an open source package manager for macOS that makes installing, removing, and managing software and dependencies simple.\nInstall homebrew from the Terminal with:\n/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"\n\n\n\nchezmoi is an open source dotfile manager. Dotfiles are hidden files and hidden directories that store preferences and settings for packages and applications. Dotfiles can be made visible in Finder with the Command-Shift-. keyboard shortcut; learn more about dotfiles here and here.\nInstall chezmoi from the Terminal with Homebrew:\nbrew install chezmoi\nchezmoi relies on dotfiles being hosted in a public or private Git repository (e.g., GitHub) to share changes across multiple computers. For example, my private dotfiles repository is hosted at: https://github.com/mccarthy-m-g/dotfiles, where I’ve added my dotfiles and other setup scripts. For more information, see:\n\nchezmoi’s Quick Start guide\nAutomating the Setup of a New Mac by Moncef Belyamani" }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/system-setup.html#prerequisites", + "href": "series/2023-01-24_reproducible-data-science/posts/system-setup.html#prerequisites", + "title": "System Setup", + "section": "", + "text": "The following will install, build, and configure the tools used in the remaining posts in this series (and some other things). This is my personal setup for macOS and is intended to get a new computer up and running with the tools and preferences I like to use. A lot of this is general to any data science setup on macOS, but some of it is specific to me.\nYou are free to use or modify this for your own setup, but should do so thoughtfully. Don’t run any of the scripts below without understanding what they do, and backup your system if you are trying this on an existing setup. To test drive my system setup safely, consider using a virtualized instance of macOS with UTM.\n\n\n\n\n\n\nBefore you begin\n\n\n\nMy system setup is done mainly done through the shell prompt in Terminal, which is the easiest recommended way to get a shell prompt on macOS. You can find Terminal in the Utilities directory inside your Applications directory.\nIf you have never used Terminal before, Apple has a nice shell scripting primer and command line primer book in their archives.\n\n\n\n\nXcode command line tools are needed to build certain packages from source. The command line tools provide a lighter alternative to installing the full Xcode release from the App Store, which contains a lot that isn’t needed for data science use cases.\nInstall Xcode command line tools from the Terminal with:\nxcode-select --install\n\n\n\nHomebrew is an open source package manager for macOS that makes installing, removing, and managing software and dependencies simple.\nInstall homebrew from the Terminal with:\n/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"\n\n\n\nchezmoi is an open source dotfile manager. Dotfiles are hidden files and hidden directories that store preferences and settings for packages and applications. Dotfiles can be made visible in Finder with the Command-Shift-. keyboard shortcut; learn more about dotfiles here and here.\nInstall chezmoi from the Terminal with Homebrew:\nbrew install chezmoi\nchezmoi relies on dotfiles being hosted in a public or private Git repository (e.g., GitHub) to share changes across multiple computers. For example, my private dotfiles repository is hosted at: https://github.com/mccarthy-m-g/dotfiles, where I’ve added my dotfiles and other setup scripts. For more information, see:\n\nchezmoi’s Quick Start guide\nAutomating the Setup of a New Mac by Moncef Belyamani" + }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/system-setup.html#my-setup", "href": "series/2023-01-24_reproducible-data-science/posts/system-setup.html#my-setup", "title": "System Setup", "section": "My setup", - "text": "My setup\nWith the prerequisites installed, it is now possible to install, build, and configure the remaining tools for my setup. There are many ways to do this, but I use chezmoi’s script support so it’s easy to setup my dotfiles and install software with Homebrew on a new computer using the same command:\nchezmoi init --apply $GITHUB_USERNAME\nWhat this does:\n\nClones the dotfiles repository\nApplies the changes and runs any scripts\n\nBelow I cover how to integrate this approach with Homebrew.\n\nInstalling software with Homebrew\nHomebrew can be used to install almost any software or dependency, whether it’s open source or closed source. Homebrew’s installation commands include:\n\nbrew: Install software packages (also known as formulae)\ncask: Install application casks (RStudio, etc.)\ntap: Add package repositories (from GitHub, etc.)\nmas: Install Mac App Store applications (requires brew install mas)\nwhalebrew: Install Docker images as if they were Homebrew formulae (requires brew install whalebrew)\n\nSoftware and dependencies can be installed interactively from the command line, but a better approach for system setup is to use Homebrew Bundle, which allows you to install a list of dependencies located in a Brewfile using any of Homebrew’s installation commands.\nThe Brewfile to install all the dependencies covered in this series looks something like:\n\n\nBrewfile\n\n# Version control setup\nbrew \"git\"\ncask \"github\"\nbrew \"gh\"\ntap \"microsoft/git\"\ncask \"git-credential-manager-core\"\n\n# R setup\ntap \"r-lib/rig\"\ncask \"rig\"\ncask \"rstudio\"\ncask \"quarto\"\n\n# Python setup\nbrew \"pyenv\"\nbrew \"openssl\"\nbrew \"readline\"\nbrew \"sqlite3\"\nbrew \"xz\"\nbrew \"zlib\"\nbrew \"tcl-tk\"\nbrew \"pipenv\"\n\n# Reference manager setup\ncask \"zotero\"\n\nAfter making a Brewfile, install its dependency list with:\nbrew bundle\nUse the optional --file argument to specify the path to the Brewfile.\n\n\n\n\n\n\nHomebrew Bundle tricks\n\n\n\n\n\nCreate a Brewfile of your current Homebrew installations in the current working directory with:\nbrew bundle dump\nUninstall all software and dependencies not listed in a Brewfile with:\nbrew bundle cleanup\n\n\n\nTo integrate Homebrew Bundle with chezmoi, create a run_once_ script containing the Brewfile as a Here document in your dotfiles repository:\n\n\nrun_once_before_install-packages-darwin.sh.tmpl\n\n{{- if eq .chezmoi.os \"darwin\" -}}\n#!/bin/bash\n\nbrew bundle --no-lock --file=/dev/stdin << Brewfile\n# Version control setup\nbrew \"git\"\ncask \"github\"\nbrew \"gh\"\n\n# R setup\ntap \"r-lib/rig\"\ncask \"rig\"\ncask \"rstudio\"\ncask \"quarto\"\n\n# Python setup\nbrew \"pyenv\"\nbrew \"openssl\"\nbrew \"readline\"\nbrew \"sqlite3\"\nbrew \"xz\"\nbrew \"zlib\"\nbrew \"tcl-tk\"\nbrew \"pipenv\"\n\n# Reference manager setup\ncask \"zotero\"\nBrewfile\n{{ end -}}\n\nThis script will be run the first time you initialize chezmoi (as shown earlier):\nchezmoi init --apply $GITHUB_USERNAME" + "text": "My setup\nWith the prerequisites installed, it is now possible to install, build, and configure the remaining tools for my setup. There are many ways to do this, but I use chezmoi’s script support so it’s easy to setup my dotfiles and install software with Homebrew on a new computer using the same command:\nchezmoi init --apply $GITHUB_USERNAME\nWhat this does:\n\nClones the dotfiles repository\nApplies the changes and runs any scripts\n\nBelow I cover how to integrate this approach with Homebrew.\n\nInstalling software with Homebrew\nHomebrew can be used to install almost any software or dependency, whether it’s open source or closed source. Homebrew’s installation commands include:\n\nbrew: Install software packages (also known as formulae)\ncask: Install application casks (RStudio, etc.)\ntap: Add package repositories (from GitHub, etc.)\nmas: Install Mac App Store applications (requires brew install mas)\nwhalebrew: Install Docker images as if they were Homebrew formulae (requires brew install whalebrew)\n\nSoftware and dependencies can be installed interactively from the command line, but a better approach for system setup is to use Homebrew Bundle, which allows you to install a list of dependencies located in a Brewfile using any of Homebrew’s installation commands.\nThe Brewfile to install all the dependencies covered in this series looks something like:\n\n\nBrewfile\n\n# Version control setup\nbrew \"git\"\ncask \"github\"\nbrew \"gh\"\ntap \"microsoft/git\"\ncask \"git-credential-manager-core\"\n\n# R setup\ntap \"r-lib/rig\"\ncask \"rig\"\ncask \"rstudio\"\ncask \"quarto\"\n\n# Python setup\nbrew \"pyenv\"\nbrew \"openssl\"\nbrew \"readline\"\nbrew \"sqlite3\"\nbrew \"xz\"\nbrew \"zlib\"\nbrew \"tcl-tk\"\nbrew \"pipenv\"\n\n# Reference manager setup\ncask \"zotero\"\n\nAfter making a Brewfile, install its dependency list with:\nbrew bundle\nUse the optional --file argument to specify the path to the Brewfile.\n\n\n\n\n\n\nHomebrew Bundle tricks\n\n\n\n\n\nCreate a Brewfile of your current Homebrew installations in the current working directory with:\nbrew bundle dump\nUninstall all software and dependencies not listed in a Brewfile with:\nbrew bundle cleanup\n\n\n\nTo integrate Homebrew Bundle with chezmoi, create a run_once_ script containing the Brewfile as a Here document in your dotfiles repository:\n\n\nrun_once_before_install-packages-darwin.sh.tmpl\n\n{{- if eq .chezmoi.os \"darwin\" -}}\n#!/bin/bash\n\nbrew bundle --no-lock --file=/dev/stdin << Brewfile\n# Version control setup\nbrew \"git\"\ncask \"github\"\nbrew \"gh\"\n\n# R setup\ntap \"r-lib/rig\"\ncask \"rig\"\ncask \"rstudio\"\ncask \"quarto\"\n\n# Python setup\nbrew \"pyenv\"\nbrew \"openssl\"\nbrew \"readline\"\nbrew \"sqlite3\"\nbrew \"xz\"\nbrew \"zlib\"\nbrew \"tcl-tk\"\nbrew \"pipenv\"\n\n# Reference manager setup\ncask \"zotero\"\nBrewfile\n{{ end -}}\n\nThis script will be run the first time you initialize chezmoi (as shown earlier):\nchezmoi init --apply $GITHUB_USERNAME" }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/r-getting-help.html", @@ -160,6 +202,13 @@ "section": "", "text": "If you need help solving a problem, the first step is to create a reproducible example. The goal of a reproducible example is to make it easier for others to help you, or for you to help yourself, by packaging your problematic code in such a way that anyone can exactly reproduce your problem. A good reproducible example is:\n\nSelf-contained: Your code includes everything needed to reproduce your problem\nMinimal: Your example only includes code directly related to your problem\n\nThe work you put into writing a good reproducible example usually pays for itself:\n\nOften you will solve your own problem in the process of writing a reproducible example\nYou improve your chances of getting help when your problem is easy to understand and reproduce\n\nFor tips on writing good reproducible examples, see:\n\nWhat’s a reproducible example and how do I create one?\nHow to make a great R reproducible example\nReprex do’s and don’ts\ntidyverse: Get help!\n\n\n\n\n\n\n\nCreate an R project for writing reproducible examples\n\n\n\nBecause reproducible examples are self-contained and minimal, it’s good practice to create them separately from the project that inspired your problem. A nice way to do this is to create a new reprex R project that contains empty files where you can write reproducible examples like:\nreprex/\n├─reprex.Rproj\n├─reprex.R\n├─reprex.Rmd\n├─reprex.qmd\nThe contents of these files are meant to be ephemeral. After you write a reproducible example and share it, it’s safe to delete the contents of the file or overwrite it in the future. If you do want to save the reproducible example somewhere, consider turning it into a gist on GitHub or a question on Stack Overflow, Cross Validated, or the Posit forum.\n\n\n\n\nUse the reprex R package to test, render, and copy your reproducible examples to your clipboard. Then share them when asking for help on:\n\nStack Overflow\nCross Validated\nThe Posit forum\nAnd more!" }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/r-getting-help.html#writing-reproducible-examples", + "href": "series/2023-01-24_reproducible-data-science/posts/r-getting-help.html#writing-reproducible-examples", + "title": "Getting Help with R", + "section": "", + "text": "If you need help solving a problem, the first step is to create a reproducible example. The goal of a reproducible example is to make it easier for others to help you, or for you to help yourself, by packaging your problematic code in such a way that anyone can exactly reproduce your problem. A good reproducible example is:\n\nSelf-contained: Your code includes everything needed to reproduce your problem\nMinimal: Your example only includes code directly related to your problem\n\nThe work you put into writing a good reproducible example usually pays for itself:\n\nOften you will solve your own problem in the process of writing a reproducible example\nYou improve your chances of getting help when your problem is easy to understand and reproduce\n\nFor tips on writing good reproducible examples, see:\n\nWhat’s a reproducible example and how do I create one?\nHow to make a great R reproducible example\nReprex do’s and don’ts\ntidyverse: Get help!\n\n\n\n\n\n\n\nCreate an R project for writing reproducible examples\n\n\n\nBecause reproducible examples are self-contained and minimal, it’s good practice to create them separately from the project that inspired your problem. A nice way to do this is to create a new reprex R project that contains empty files where you can write reproducible examples like:\nreprex/\n├─reprex.Rproj\n├─reprex.R\n├─reprex.Rmd\n├─reprex.qmd\nThe contents of these files are meant to be ephemeral. After you write a reproducible example and share it, it’s safe to delete the contents of the file or overwrite it in the future. If you do want to save the reproducible example somewhere, consider turning it into a gist on GitHub or a question on Stack Overflow, Cross Validated, or the Posit forum.\n\n\n\n\nUse the reprex R package to test, render, and copy your reproducible examples to your clipboard. Then share them when asking for help on:\n\nStack Overflow\nCross Validated\nThe Posit forum\nAnd more!" + }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/r-getting-help.html#debugging-interactively", "href": "series/2023-01-24_reproducible-data-science/posts/r-getting-help.html#debugging-interactively", @@ -188,12 +237,19 @@ "section": "", "text": "Zotero is an open-source reference manager that styles itself as “your personal research assistant.”\nIf you are using another reference manager you should switch to Zotero, full stop. It’s easy to migrate your data from other reference management tools to Zotero, and will improve your workflow.\n\n\n\n\n\n\nUse Zotero’s built-in PDF reader\n\n\n\nZotero has its own built-in PDF reader and it’s really good. Much better than any external PDF readers. Learn more about it in the Zotero documentation.\n\n\n\n\nInstall Zotero from the Terminal with Homebrew:\nbrew install --cask zotero\n\n\n\nZotero Connector is a browser plugin that allows you to save web content (blogs, articles, etc.) to Zotero with a single click.\nThe Zotero Connector for Safari is bundled with Zotero. You can enable it from the Extensions pane of the Safari preferences.\n\n\n\n\n\n\nUse custom PDF resolvers\n\n\n\nZotero also supports custom PDF resolvers for automatically retrieving PDFs from other sources when Zotero can’t find or access a PDF. You can find some useful examples in Brenton Wiernik’s zotero-tools repository, or by searching something like “zotero pdf resolver” in your favourite search engine.\n\n\n\n\n\nThese are the core extensions I use with Zotero. See their documentation for installation instructions:\n\nBetter BibTeX\nZotFile\n\nI also use:\n\nscite\nPubPeer\n\nYou can find more extensions on the Zotero website.\n\n\n\nAs far as I’m aware, it isn’t possible to configure preferences in Zotero with dotfiles, so this has to be done manually.\nThe main preference I need to configure is the citation key formula for Better BibTeX:\nauthEtAl.lower+\"_\"+shorttitle(3,3)+\"_\"+year\nWhich means:\n\nLast name of the authors (with et al. for references with three or more authors)\nAn underscore\nFirst three words of the reference’s title in camel case\nAnother underscore\nYear of publication (if any)\n\nI use this over a authEtAl.lower+\"_\"+year citation key formula because the titles are a helpful reminder of what I’m citing. However, this citation key formula can make the markdown of a document hard to read, particularly when there are multiple citations in the same sentence or paragraph.\n\n\n\nZotero for iOS is an open source application for working with your Zotero library on an iPad or iPhone. Install it from the App Store." }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/zotero-setup.html#zotero", + "href": "series/2023-01-24_reproducible-data-science/posts/zotero-setup.html#zotero", + "title": "Reference Manager Setup", + "section": "", + "text": "Zotero is an open-source reference manager that styles itself as “your personal research assistant.”\nIf you are using another reference manager you should switch to Zotero, full stop. It’s easy to migrate your data from other reference management tools to Zotero, and will improve your workflow.\n\n\n\n\n\n\nUse Zotero’s built-in PDF reader\n\n\n\nZotero has its own built-in PDF reader and it’s really good. Much better than any external PDF readers. Learn more about it in the Zotero documentation.\n\n\n\n\nInstall Zotero from the Terminal with Homebrew:\nbrew install --cask zotero\n\n\n\nZotero Connector is a browser plugin that allows you to save web content (blogs, articles, etc.) to Zotero with a single click.\nThe Zotero Connector for Safari is bundled with Zotero. You can enable it from the Extensions pane of the Safari preferences.\n\n\n\n\n\n\nUse custom PDF resolvers\n\n\n\nZotero also supports custom PDF resolvers for automatically retrieving PDFs from other sources when Zotero can’t find or access a PDF. You can find some useful examples in Brenton Wiernik’s zotero-tools repository, or by searching something like “zotero pdf resolver” in your favourite search engine.\n\n\n\n\n\nThese are the core extensions I use with Zotero. See their documentation for installation instructions:\n\nBetter BibTeX\nZotFile\n\nI also use:\n\nscite\nPubPeer\n\nYou can find more extensions on the Zotero website.\n\n\n\nAs far as I’m aware, it isn’t possible to configure preferences in Zotero with dotfiles, so this has to be done manually.\nThe main preference I need to configure is the citation key formula for Better BibTeX:\nauthEtAl.lower+\"_\"+shorttitle(3,3)+\"_\"+year\nWhich means:\n\nLast name of the authors (with et al. for references with three or more authors)\nAn underscore\nFirst three words of the reference’s title in camel case\nAnother underscore\nYear of publication (if any)\n\nI use this over a authEtAl.lower+\"_\"+year citation key formula because the titles are a helpful reminder of what I’m citing. However, this citation key formula can make the markdown of a document hard to read, particularly when there are multiple citations in the same sentence or paragraph.\n\n\n\nZotero for iOS is an open source application for working with your Zotero library on an iPad or iPhone. Install it from the App Store." + }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/zotero-setup.html#configuring-zotero", "href": "series/2023-01-24_reproducible-data-science/posts/zotero-setup.html#configuring-zotero", "title": "Reference Manager Setup", "section": "Configuring Zotero", - "text": "Configuring Zotero\nThings to do after installing Zotero and extensions.\n\nCloud syncing\nUse Zotero’s sync functionality to sync your Zotero library across devices. Zotero syncing has two parts:\n\nData syncing\nFile syncing\n\nData syncing syncs library items, but doesn’t sync attached files (PDFs, images, etc.). Set up data syncing in the Zotero app by signing into your account.\nFile syncing syncs attached files of library items. Files can be synced using either:\n\nWebDAV\nZotero Storage\n\nWebDAV is a standard protocol for transferring files over the web and can be used to sync files in your personal library. There are a number of WebDAV storage providers known to work with Zotero.\nZotero Storage is the file sync option recommended by Zotero. Zotero Storage includes all the features of WebDAV syncing, plus file syncing in group libraries, web-based access to file attachments, easier setup, and guaranteed compatibility.\n\n\n\n\n\n\nWebDAV or Zotero Storage?\n\n\n\n\n\nI currently use WebDAV syncing through 4shared’s free plan, which gives me 15 GB of free space. I chose this option because I don’t need any of the additional features of Zotero Storage, and 15 GBs is more than enough for my personal library.\n\n\n\n\n\nTurn off automatic tagging\nGo to General > Miscellaneous and turn off “Automatically tag items with keywords and subject headings”. I personally found this feature more annoying than helpful, since different sources would use different tags for the same thing, resulting in messy metadata. Instead I manually add tags to items in my library using the following scheme:\n\nSubject: name (focus area or field)\nTopic: name (topic keywords)\nData: source (open data set name)\nPopulation: characteristic (age group, sex, gender, species, location, etc.)\nVariable: name (variable measured in the study)\nMethod: name (experimental, observational, literature review, etc.)\nAnalysis: method (statistical method used for analysis)\nSource: name (for where or how I discovered the item)\nStatus: reading status (unread, read partially, or read)\n\nI like this approach because it makes it easy to drill down a library collection to the items I need or to rediscover an item I forget the name of. It also reduces some of the cognitive overhead of working with tags by taking advantage of Zotero’s autocomplete when adding tags manually, and alphabetic tag sorting." + "text": "Configuring Zotero\nThings to do after installing Zotero and extensions.\n\nCloud syncing\nUse Zotero’s sync functionality to sync your Zotero library across devices. Zotero syncing has two parts:\n\nData syncing\nFile syncing\n\nData syncing syncs library items, but doesn’t sync attached files (PDFs, images, etc.). Set up data syncing in the Zotero app by signing into your account.\nFile syncing syncs attached files of library items. Files can be synced using either:\n\nWebDAV\nZotero Storage\n\nWebDAV is a standard protocol for transferring files over the web and can be used to sync files in your personal library. There are a number of WebDAV storage providers known to work with Zotero.\nZotero Storage is the file sync option recommended by Zotero. Zotero Storage includes all the features of WebDAV syncing, plus file syncing in group libraries, web-based access to file attachments, easier setup, and guaranteed compatibility.\n\n\n\n\n\n\nWebDAV or Zotero Storage?\n\n\n\n\n\nI currently use WebDAV syncing through 4shared’s free plan, which gives me 15 GB of free space. I chose this option because I don’t need any of the additional features of Zotero Storage, and 15 GBs is more than enough for my personal library.\n\n\n\n\n\nTurn off automatic tagging\nGo to General > Miscellaneous and turn off “Automatically tag items with keywords and subject headings”. I personally found this feature more annoying than helpful, since different sources would use different tags for the same thing, resulting in messy metadata. Instead I manually add tags to items in my library using the following scheme:\n\nSubject: name (focus area or field)\nTopic: name (topic keywords)\nData: source (open data set name)\nPopulation: characteristic (age group, sex, gender, species, location, etc.)\nVariable: name (variable measured in the study)\nMethod: name (experimental, observational, literature review, etc.)\nAnalysis: method (statistical method used for analysis)\nSource: name (for where or how I discovered the item)\nStatus: reading status (unread, read partially, or read)\n\nI like this approach because it makes it easy to drill down a library collection to the items I need or to rediscover an item I forget the name of. It also reduces some of the cognitive overhead of working with tags by taking advantage of Zotero’s autocomplete when adding tags manually, and alphabetic tag sorting." }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/zotero-setup.html#using-zotero-for-citations-in-quarto-and-r-markdown-documents", @@ -209,6 +265,13 @@ "section": "", "text": "Git is an open source distributed version control system for tracking and managing changes to files.\n\n\nInstall Git from the Terminal with Homebrew:\nbrew install git" }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/version-control-setup.html#git", + "href": "series/2023-01-24_reproducible-data-science/posts/version-control-setup.html#git", + "title": "Version Control Setup", + "section": "", + "text": "Git is an open source distributed version control system for tracking and managing changes to files.\n\n\nInstall Git from the Terminal with Homebrew:\nbrew install git" + }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/version-control-setup.html#github", "href": "series/2023-01-24_reproducible-data-science/posts/version-control-setup.html#github", @@ -237,6 +300,13 @@ "section": "", "text": "Python is an open source programming language that is the “second best language for everything.” It’s worth setting up in case you need it, but for data science R is almost always simpler and better.\n\n\npyenv is an open source Python installation manager for installing, removing, and switching between multiple Python versions.\n\nInstall pyenv from the Terminal with Homebrew:\nbrew install pyenv\nSet up your shell environment for pyenv.\nInstall Python build dependencies with Homebrew:\nbrew install openssl readline sqlite3 xz zlib tcl-tk\n\n\n\n\nInstall the latest major version of Python 3 with shared library support1 from the Terminal with pyenv:\nenv PYTHON_CONFIGURE_OPTS=\"--enable-shared\" pyenv install 3:latest\n\n\n\npipenv is an open-source dependency manager for Python projects.\nInstall pipenv from the Terminal with Homebrew:\nbrew install pipenv\npoetry is an alternative open-source dependency manager for Python projects. I’m currently experimenting between poetry and pipenv to decide which I prefer.\nInstall poetry from the Terminal with:\ncurl -sSL https://install.python-poetry.org | python3 -\n\n\n\npipx is open-source package manager for installing and running command line applications written in Python in separate environments from your Python libraries. It’s like Homebrew, but for Python applications.\nInstall pipx from the Terminal with:\nbrew install pipx\npipx ensurepath" }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/python-setup.html#python", + "href": "series/2023-01-24_reproducible-data-science/posts/python-setup.html#python", + "title": "Python Setup", + "section": "", + "text": "Python is an open source programming language that is the “second best language for everything.” It’s worth setting up in case you need it, but for data science R is almost always simpler and better.\n\n\npyenv is an open source Python installation manager for installing, removing, and switching between multiple Python versions.\n\nInstall pyenv from the Terminal with Homebrew:\nbrew install pyenv\nSet up your shell environment for pyenv.\nInstall Python build dependencies with Homebrew:\nbrew install openssl readline sqlite3 xz zlib tcl-tk\n\n\n\n\nInstall the latest major version of Python 3 with shared library support1 from the Terminal with pyenv:\nenv PYTHON_CONFIGURE_OPTS=\"--enable-shared\" pyenv install 3:latest\n\n\n\npipenv is an open-source dependency manager for Python projects.\nInstall pipenv from the Terminal with Homebrew:\nbrew install pipenv\npoetry is an alternative open-source dependency manager for Python projects. I’m currently experimenting between poetry and pipenv to decide which I prefer.\nInstall poetry from the Terminal with:\ncurl -sSL https://install.python-poetry.org | python3 -\n\n\n\npipx is open-source package manager for installing and running command line applications written in Python in separate environments from your Python libraries. It’s like Homebrew, but for Python applications.\nInstall pipx from the Terminal with:\nbrew install pipx\npipx ensurepath" + }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/python-setup.html#calling-python-from-r", "href": "series/2023-01-24_reproducible-data-science/posts/python-setup.html#calling-python-from-r", @@ -251,6 +321,13 @@ "section": "Related Reading", "text": "Related Reading\n\n\nHow to Manage your Python Projects with Pipenv and Pyenv by Bruno Michetti\nPyenv, poetry and other rascals—modern Python dependency and version management by Olaf Górski" }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/python-setup.html#footnotes", + "href": "series/2023-01-24_reproducible-data-science/posts/python-setup.html#footnotes", + "title": "Python Setup", + "section": "Footnotes", + "text": "Footnotes\n\n\nShared library support is needed to use Python versions with reticulate.↩︎" + }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/r-setup.html", "href": "series/2023-01-24_reproducible-data-science/posts/r-setup.html", @@ -258,6 +335,13 @@ "section": "", "text": "R is an open source programming language for wrangling, visualizing, modelling, and communicating data, and so much more. It has a strong community behind it and is widely used among researchers, statisticians, and data scientists in a variety of fields.\n\n\nrig is an open source R installation manager for installing, removing, configuring, and switching between multiple R versions and user-level package libraries.\nInstall rig from the Terminal with Homebrew:\nbrew tap r-lib/rig\nbrew install --cask rig\n\n\n\nInstall the latest version of R from the Terminal with rig:\nrig add release\nR can also be installed from CRAN (the comprehensive R archive network) using the following link https://cloud.r-project.org. A new major version of R comes out once a year, and there are 2-3 minor releases each year.\n\n\n\nRStudio is an open source integrated development environment, or IDE, for R programming made by Posit.\nInstall RStudio from the Terminal with Homebrew:\nbrew install --cask rstudio\nRStudio can also be installed from Posit using the following link https://posit.co/download/rstudio-desktop/. RStudio is updated a couple of times a year. When a new version is available, RStudio will let you know. It’s a good idea to upgrade regularly so you can take advantage of the latest and greatest features.\nLearn more from the RStudio User Guide.\n\n\n\nQuarto is an open source scientific and technical publishing system built on Pandoc.\nInstall Quarto from the Terminal with Homebrew:\nbrew install --cask quarto\n\n\n\n\n\n\nQuarto version manager\n\n\n\n\n\nIf you need to manage and switch between versions of Quarto you can also install qvm, the Quarto version manager." }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/r-setup.html#r", + "href": "series/2023-01-24_reproducible-data-science/posts/r-setup.html#r", + "title": "R Setup", + "section": "", + "text": "R is an open source programming language for wrangling, visualizing, modelling, and communicating data, and so much more. It has a strong community behind it and is widely used among researchers, statisticians, and data scientists in a variety of fields.\n\n\nrig is an open source R installation manager for installing, removing, configuring, and switching between multiple R versions and user-level package libraries.\nInstall rig from the Terminal with Homebrew:\nbrew tap r-lib/rig\nbrew install --cask rig\n\n\n\nInstall the latest version of R from the Terminal with rig:\nrig add release\nR can also be installed from CRAN (the comprehensive R archive network) using the following link https://cloud.r-project.org. A new major version of R comes out once a year, and there are 2-3 minor releases each year.\n\n\n\nRStudio is an open source integrated development environment, or IDE, for R programming made by Posit.\nInstall RStudio from the Terminal with Homebrew:\nbrew install --cask rstudio\nRStudio can also be installed from Posit using the following link https://posit.co/download/rstudio-desktop/. RStudio is updated a couple of times a year. When a new version is available, RStudio will let you know. It’s a good idea to upgrade regularly so you can take advantage of the latest and greatest features.\nLearn more from the RStudio User Guide.\n\n\n\nQuarto is an open source scientific and technical publishing system built on Pandoc.\nInstall Quarto from the Terminal with Homebrew:\nbrew install --cask quarto\n\n\n\n\n\n\nQuarto version manager\n\n\n\n\n\nIf you need to manage and switch between versions of Quarto you can also install qvm, the Quarto version manager." + }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/r-setup.html#related-reading", "href": "series/2023-01-24_reproducible-data-science/posts/r-setup.html#related-reading", @@ -272,6 +356,13 @@ "section": "", "text": "This series is about learning and doing reproducible data science. It’s a mix between light tutorial content, discussion, and references to more comprehensive learning material that tries to:\n\nProvide a jumping off place for beginners\nServe as a quick reference for more experienced users\nUnearth some of the “hidden curriculum” you might not have been taught while learning R (whether or not you were taught in the classroom or are self-taught)" }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/start-here.html#whats-in-this-series", + "href": "series/2023-01-24_reproducible-data-science/posts/start-here.html#whats-in-this-series", + "title": "Start Here", + "section": "", + "text": "This series is about learning and doing reproducible data science. It’s a mix between light tutorial content, discussion, and references to more comprehensive learning material that tries to:\n\nProvide a jumping off place for beginners\nServe as a quick reference for more experienced users\nUnearth some of the “hidden curriculum” you might not have been taught while learning R (whether or not you were taught in the classroom or are self-taught)" + }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/start-here.html#who-is-this-series-for", "href": "series/2023-01-24_reproducible-data-science/posts/start-here.html#who-is-this-series-for", @@ -293,6 +384,13 @@ "section": "", "text": "The standard way to install R packages is with the install.packages() function that comes with base R.\ninstall.packages(\"tibble\")\nA fresh way to install R packages is with the pkg_install() function from the pak package.\npkg_install(\"tibble\")\npak is fast, safe, and convenient. You can learn more about it on the package website.\n\n\n\n\n\n\n“Do you want to install from sources the package which needs compilation?”\n\n\n\nWhen installing a package, you will sometimes encounter the above question along with the message “There are binary versions available but the source versions are later”. What this question is asking is whether you would like to have your computer build (compile) the package binary from source code then install it (yes) or have your computer install a pre-built binary (no). If you choose yes, you will get the newest version of the package; if you choose no you will get the newest binary version of the package (typically the previous release).\nUnless you need the newest version of the package for a specific feature or bug fix, I recommend choosing no—the binary version will install faster, and you can simply wait a few days for the binary of the newest version to be built on the repository you’re installing packages from, then install the package again to get the newest version. If you need the newest version of the package right now then choose yes, but you will need to set up your R build toolchain first.\n\n\nTo learn more about package installation, see the Package structure and state chapter in R Packages by Hadley Wickham and Jenny Bryan." }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/r-packages.html#how-do-i-install-r-packages", + "href": "series/2023-01-24_reproducible-data-science/posts/r-packages.html#how-do-i-install-r-packages", + "title": "R Packages", + "section": "", + "text": "The standard way to install R packages is with the install.packages() function that comes with base R.\ninstall.packages(\"tibble\")\nA fresh way to install R packages is with the pkg_install() function from the pak package.\npkg_install(\"tibble\")\npak is fast, safe, and convenient. You can learn more about it on the package website.\n\n\n\n\n\n\n“Do you want to install from sources the package which needs compilation?”\n\n\n\nWhen installing a package, you will sometimes encounter the above question along with the message “There are binary versions available but the source versions are later”. What this question is asking is whether you would like to have your computer build (compile) the package binary from source code then install it (yes) or have your computer install a pre-built binary (no). If you choose yes, you will get the newest version of the package; if you choose no you will get the newest binary version of the package (typically the previous release).\nUnless you need the newest version of the package for a specific feature or bug fix, I recommend choosing no—the binary version will install faster, and you can simply wait a few days for the binary of the newest version to be built on the repository you’re installing packages from, then install the package again to get the newest version. If you need the newest version of the package right now then choose yes, but you will need to set up your R build toolchain first.\n\n\nTo learn more about package installation, see the Package structure and state chapter in R Packages by Hadley Wickham and Jenny Bryan." + }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/r-packages.html#where-can-i-install-r-packages-from", "href": "series/2023-01-24_reproducible-data-science/posts/r-packages.html#where-can-i-install-r-packages-from", @@ -305,7 +403,7 @@ "href": "series/2023-01-24_reproducible-data-science/posts/r-packages.html#where-can-i-find-new-r-packages", "title": "R Packages", "section": "Where can I find new R packages?", - "text": "Where can I find new R packages?\n\n\n\nThere were 18954 R packages available on CRAN in 2022, and that number only continues to grow over time. That’s not to mention the 2165 R packages available on Bioconductor, the 1704 packages available on R-universe but not CRAN or Bioconductor, and the unknown amount of packages only available on GitHub.\n\n\n\n\n\nFortunately, there are many places to find curated content about R packages:\n\nYour favourite search engine\nCRAN Task Views\nR Views\nThe Posit Blog\nPosit Videos\ntidyverse blog\nTrending on GitHub\npkgsearch\n\nYour favourite search engine is often a good starting place. Include r and package in your search, plus the topic you’re interested in, and you’ll usually find something helpful. Use boolean operators to narrow down your search or to hide unrelated things like Reddit (-site:reddit.com).\nCRAN Task Views provide guidance on which R packages are relevant for tasks related to a certain topic. The task views are not meant to endorse the “best” packages for a given task; rather, they take an encyclopedic approach that serves as a good reference.\nR Views is an R community blog edited by Posit where you can learn about new R packages and see how to use them (plus some other goodies).\nThe Posit Blog is where you can get news and updates about R packages made by Posit. The blog is about “all things data science and the world that benefits from [Posit] products, community, and events”, so it has a wider scope than R packages alone; but it’s a good place to follow anyways to stay up to date with what’s happening in the R world.\nOn a similar note, you can find videos from events hosted by Posit, such as conference talks, data science hang outs, tutorials, and more at Posit Videos (and the Posit YouTube channel). This is a good place to see what R packages the community is using, and how they use them.\nThe tidyverse blog is where you can get news and updates about R packages in the tidyverse.\nFinally, if you don’t want to leave the comfort of your IDE, the pkgsearch package is a package you can use to find other packages..!\n\n\n\n\n\n\nSocial Media\n\n\n\nWord of mouth on social media is a great way to find new packages or learn about new ways to use your favourite packages. Follow your favourite developers to keep up to date with their work, and check out the #RStats and #TidyTuesday hashtags to see what the community is up to." + "text": "Where can I find new R packages?\nThere were 18954 R packages available on CRAN in 2022, and that number only continues to grow over time. That’s not to mention the 2165 R packages available on Bioconductor, the 1704 packages available on R-universe but not CRAN or Bioconductor, and the unknown amount of packages only available on GitHub.\n\n\n\n\n\nFortunately, there are many places to find curated content about R packages:\n\nYour favourite search engine\nCRAN Task Views\nR Views\nThe Posit Blog\nPosit Videos\ntidyverse blog\nTrending on GitHub\npkgsearch\n\nYour favourite search engine is often a good starting place. Include r and package in your search, plus the topic you’re interested in, and you’ll usually find something helpful. Use boolean operators to narrow down your search or to hide unrelated things like Reddit (-site:reddit.com).\nCRAN Task Views provide guidance on which R packages are relevant for tasks related to a certain topic. The task views are not meant to endorse the “best” packages for a given task; rather, they take an encyclopedic approach that serves as a good reference.\nR Views is an R community blog edited by Posit where you can learn about new R packages and see how to use them (plus some other goodies).\nThe Posit Blog is where you can get news and updates about R packages made by Posit. The blog is about “all things data science and the world that benefits from [Posit] products, community, and events”, so it has a wider scope than R packages alone; but it’s a good place to follow anyways to stay up to date with what’s happening in the R world.\nOn a similar note, you can find videos from events hosted by Posit, such as conference talks, data science hang outs, tutorials, and more at Posit Videos (and the Posit YouTube channel). This is a good place to see what R packages the community is using, and how they use them.\nThe tidyverse blog is where you can get news and updates about R packages in the tidyverse.\nFinally, if you don’t want to leave the comfort of your IDE, the pkgsearch package is a package you can use to find other packages..!\n\n\n\n\n\n\nSocial Media\n\n\n\nWord of mouth on social media is a great way to find new packages or learn about new ways to use your favourite packages. Follow your favourite developers to keep up to date with their work, and check out the #RStats and #TidyTuesday hashtags to see what the community is up to." }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/r-packages.html#how-do-i-attach-r-packages", @@ -319,14 +417,21 @@ "href": "series/2023-01-24_reproducible-data-science/posts/r-packages.html#how-do-i-update-r-packages", "title": "R Packages", "section": "How do I update R packages?", - "text": "How do I update R packages?\nMost R packages available on CRAN are actively maintained and updated. For example, in 2022 more than half the packages available on CRAN had an update in the same or previous year. It’s good practice to keep up with these updates so you can get the latest features, improvements, and bug fixes.\n\n\n\n\n\nUse old.packages() to check which of the packages you’ve installed have updates available from the repositories listed in getOption(\"repos\").\nold.packages()\nUpdate specific packages to the most recent version by re-installing them with install.packages() or pak::pkg_install().\n# The standard way to update a package\ninstall.packages(\"tibble\")\n\n# A fresh way to update a package\npak::pkg_install(\"tibble\")\nUpdate all the packages listed by old.packages() at once with update.packages()1 or pak::pkg_install(old.packages()[,\"Package\"]). Note that update.packages() uses install.packages() under the hood, so it won’t be as fast as pak.\n# The standard way to update all packages at once\nupdate.packages()\n\n# A fresh way to update all packages at once\npak::pkg_install(old.packages()[,\"Package\"])\nYou can also install and update packages using RStudio, either from the Packages pane or from the menu bar Tools > Check for Package Updates.\n\n\n\n\n\n\nBreaking changes\n\n\n\nSometimes package updates will include breaking changes, which are named as such because they are expected to break code using older versions of the package. Most package updates won’t contain breaking changes, but occasionally they will. This should not discourage you from updating packages, but it should encourage you to update thoughtfully—don’t do your updates before important deadlines!\nI recommend creating multiple user libraries with rig that can be used for different tasks. This way different libraries can be updated more or less frequently, and it’s easier to make the choice whether or not to update. For example, you could have a devel library for package development that is updated weekly, a general main library that is updated monthly or yearly, and so forth. Another good practice—especially for long-running projects—is to create project-specific libraries with renv. The project libraries created by renv are isolated from your user libraries, so you can continue to update your user libraries as normal without worrying about breaking changes affecting the projects using renv." + "text": "How do I update R packages?\nMost R packages available on CRAN are actively maintained and updated. For example, in 2022 more than half the packages available on CRAN had an update in the same or previous year. It’s good practice to keep up with these updates so you can get the latest features, improvements, and bug fixes.\n\n\n\n\n\nUse old.packages() to check which of the packages you’ve installed have updates available from the repositories listed in getOption(\"repos\").\nold.packages()\nUpdate specific packages to the most recent version by re-installing them with install.packages() or pak::pkg_install().\n# The standard way to update a package\ninstall.packages(\"tibble\")\n\n# A fresh way to update a package\npak::pkg_install(\"tibble\")\nUpdate all the packages listed by old.packages() at once with update.packages()1 or pak::pkg_install(old.packages()[,\"Package\"]). Note that update.packages() uses install.packages() under the hood, so it won’t be as fast as pak.\n# The standard way to update all packages at once\nupdate.packages()\n\n# A fresh way to update all packages at once\npak::pkg_install(old.packages()[,\"Package\"])\nYou can also install and update packages using RStudio, either from the Packages pane or from the menu bar Tools > Check for Package Updates.\n\n\n\n\n\n\nBreaking changes\n\n\n\nSometimes package updates will include breaking changes, which are named as such because they are expected to break code using older versions of the package. Most package updates won’t contain breaking changes, but occasionally they will. This should not discourage you from updating packages, but it should encourage you to update thoughtfully—don’t do your updates before important deadlines!\nI recommend creating multiple user libraries with rig that can be used for different tasks. This way different libraries can be updated more or less frequently, and it’s easier to make the choice whether or not to update. For example, you could have a devel library for package development that is updated weekly, a general main library that is updated monthly or yearly, and so forth. Another good practice—especially for long-running projects—is to create project-specific libraries with renv. The project libraries created by renv are isolated from your user libraries, so you can continue to update your user libraries as normal without worrying about breaking changes affecting the projects using renv." }, { "objectID": "series/2023-01-24_reproducible-data-science/posts/r-packages.html#how-do-i-restore-my-current-library-into-a-new-library", "href": "series/2023-01-24_reproducible-data-science/posts/r-packages.html#how-do-i-restore-my-current-library-into-a-new-library", "title": "R Packages", "section": "How do I restore my current library into a new library?", - "text": "How do I restore my current library into a new library?\nA new library is created whenever you install a new major or minor version of R (the “x” or “y” in version “x.y.z”), so all your favourite packages need to be re-installed to this new library if you want to use them. This is also true if you create a new library in the current version of R (e.g., with rig library add).\n\n\n\n\n\n\nTransferring a library from one computer to another\n\n\n\n\n\nThe approach shown below can also be used to transfer a library from one computer to another. All you need to do is:\n\nFollow the steps to create the character vector of packages in your current library (pkgs)\nSave the pkgs R object with: saveRDS(pkgs, file = \"pkgs.rds\")\nCopy the pkgs.rds file to the other computer*\nLoad the pkgs R object with: pkgs <- readRDS(“path/to/pkgs.rds”)\nInstall pak on the other computer with: install.packages(\"pak\")\nInstall the packages with: pkg_install(pkgs)\n\n*For added convenience, store pkgs.rds in a GitHub repository so you can access it wherever you go!\n\n\n\nIf you are using rig (and you should be), restoring your current library into a new library is easy with some help from pak and dplyr.2\nFirst—in your current library—get the path to your user library. Copy this down somewhere.\n.libPaths()[1]\nYou can now switch to your new library. The easiest way to switch on macOS is with the menu bar app; otherwise use the shell prompt. Open a new RStudio window to start a session with the new library.3\n# Switch to a different R version\nrig default \n\n# Switch to a different library\nrig library default \nIn your new library, install and attach pak and dplyr.\n# install.packages(c(\"pak\", \"dplyr\"))\nlibrary(pak)\nlibrary(dplyr)\nUse pak::pkg_list() to get a data frame containing data about the packages installed in your current library.\npkgs_tbl <- pkg_list(lib = \"path/to/your/user/library\")\nWrangle this data down into a character vector specifying the package source and package, following the package reference syntax used by pak.\npkgs <- pkgs_tbl |>\n select(package, remoteusername, repotype) |>\n mutate(pkg = case_when(\n # GitHub\n !is.na(remoteusername) ~ paste0(remoteusername, \"/\", package), \n # CRAN and Bioconductor\n repotype %in% c(\"cran\", \"bioc\") ~ paste0(repotype, \"::\", package),\n # Default to the `standard` package source\n TRUE ~ paste0(\"standard::\", package)\n )) |>\n pull(pkg)\nThen install all your packages.\npkg_install(pkgs)\n\n\n\n\n\n\nDefault library locations and .Renviron\n\n\n\nIf you have previously set R_LIBS_USER in your .Renviron to change the default library path, you may need to remove this variable or change its path if there are issues locating your various user libraries. A quick way to open .Renviron is with usethis::edit_r_environ()." + "text": "How do I restore my current library into a new library?\nA new library is created whenever you install a new major or minor version of R (the “x” or “y” in version “x.y.z”), so all your favourite packages need to be re-installed to this new library if you want to use them. This is also true if you create a new library in the current version of R (e.g., with rig library add).\n\n\n\n\n\n\nTransferring a library from one computer to another\n\n\n\n\n\nThe approach shown below can also be used to transfer a library from one computer to another. All you need to do is:\n\nFollow the steps to create the character vector of packages in your current library (pkgs)\nSave the pkgs R object with: saveRDS(pkgs, file = \"pkgs.rds\")\nCopy the pkgs.rds file to the other computer*\nLoad the pkgs R object with: pkgs <- readRDS(“path/to/pkgs.rds”)\nInstall pak on the other computer with: install.packages(\"pak\")\nInstall the packages with: pkg_install(pkgs)\n\n*For added convenience, store pkgs.rds in a GitHub repository so you can access it wherever you go!\n\n\n\nIf you are using rig (and you should be), restoring your current library into a new library is easy with some help from pak and dplyr.2\nFirst—in your current library—get the path to your user library. Copy this down somewhere.\n.libPaths()[1]\nYou can now switch to your new library. The easiest way to switch on macOS is with the menu bar app; otherwise use the shell prompt. Open a new RStudio window to start a session with the new library.3\n# Switch to a different R version\nrig default <version>\n\n# Switch to a different library\nrig library default <lib-name>\nIn your new library, install and attach pak and dplyr.\n# install.packages(c(\"pak\", \"dplyr\"))\nlibrary(pak)\nlibrary(dplyr)\nUse pak::pkg_list() to get a data frame containing data about the packages installed in your current library.\npkgs_tbl <- pkg_list(lib = \"path/to/your/user/library\")\nWrangle this data down into a character vector specifying the package source and package, following the package reference syntax used by pak.\npkgs <- pkgs_tbl |>\n select(package, remoteusername, repotype) |>\n mutate(pkg = case_when(\n # GitHub\n !is.na(remoteusername) ~ paste0(remoteusername, \"/\", package), \n # CRAN and Bioconductor\n repotype %in% c(\"cran\", \"bioc\") ~ paste0(repotype, \"::\", package),\n # Default to the `standard` package source\n TRUE ~ paste0(\"standard::\", package)\n )) |>\n pull(pkg)\nThen install all your packages.\npkg_install(pkgs)\n\n\n\n\n\n\nDefault library locations and .Renviron\n\n\n\nIf you have previously set R_LIBS_USER in your .Renviron to change the default library path, you may need to remove this variable or change its path if there are issues locating your various user libraries. A quick way to open .Renviron is with usethis::edit_r_environ()." + }, + { + "objectID": "series/2023-01-24_reproducible-data-science/posts/r-packages.html#footnotes", + "href": "series/2023-01-24_reproducible-data-science/posts/r-packages.html#footnotes", + "title": "R Packages", + "section": "Footnotes", + "text": "Footnotes\n\n\nBy default update.packages() will ask you one by one whether you would like to update a given package. Use the ask = FALSE argument to update all packages without asking.↩︎\nNote that this approach will restore all packages with their newest versions in the new library, not the same versions as your current library. However, pak supports installing older versions of packages using the @ version syntax, so it should be possible to modify this pipeline to do the same. The version information can be found in the data frame returned by pkg_list() under the version (for CRAN and Bioconductor) and remotesha (for GitHub) columns.↩︎\nIt’s okay to keep the RStudio window for the current library open.↩︎" }, { "objectID": "series/2023-01-24_reproducible-data-science/index.html", @@ -347,28 +452,35 @@ "href": "snippets/2022-12-22_patchwork-shared-axis-labels/index.html", "title": "Shared axis labels in patchwork plots", "section": "", - "text": "To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:\n\nlibrary(ggplot2)\nlibrary(patchwork)\n\nThen make some data and ggplot2 plots to be used in the patchwork.\n\nhuron <- data.frame(year = 1875:1972, level = as.vector(LakeHuron))\nh <- ggplot(huron, aes(year))\n\nh1 <- h +\n geom_ribbon(aes(ymin = level - 1, ymax = level + 1), fill = \"grey70\") +\n geom_line(aes(y = level))\n\nh2 <- h + geom_area(aes(y = level))" + "text": "To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:\n\nlibrary(ggplot2)\nlibrary(patchwork)\n\nThen make some data and ggplot2 plots to be used in the patchwork.\n\nhuron <- data.frame(year = 1875:1972, level = as.vector(LakeHuron))\nh <- ggplot(huron, aes(year))\n\nh1 <- h +\n geom_ribbon(aes(ymin = level - 1, ymax = level + 1), fill = \"grey70\") +\n geom_line(aes(y = level))\n\nh2 <- h + geom_area(aes(y = level))" + }, + { + "objectID": "snippets/2022-12-22_patchwork-shared-axis-labels/index.html#prerequisites", + "href": "snippets/2022-12-22_patchwork-shared-axis-labels/index.html#prerequisites", + "title": "Shared axis labels in patchwork plots", + "section": "", + "text": "To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:\n\nlibrary(ggplot2)\nlibrary(patchwork)\n\nThen make some data and ggplot2 plots to be used in the patchwork.\n\nhuron <- data.frame(year = 1875:1972, level = as.vector(LakeHuron))\nh <- ggplot(huron, aes(year))\n\nh1 <- h +\n geom_ribbon(aes(ymin = level - 1, ymax = level + 1), fill = \"grey70\") +\n geom_line(aes(y = level))\n\nh2 <- h + geom_area(aes(y = level))" }, { "objectID": "snippets/2022-12-22_patchwork-shared-axis-labels/index.html#shared-x-axis-labels", "href": "snippets/2022-12-22_patchwork-shared-axis-labels/index.html#shared-x-axis-labels", "title": "Shared axis labels in patchwork plots", "section": "Shared x-axis labels", - "text": "Shared x-axis labels\nWe set the bottom margin to 0 so the tag is in the same vertical position that the x-axis would otherwise be in.\n\n# Create the patchwork, dropping the x-axis labels from the plots, and setting\n# the margins\nh_patch <- h1 + h2 & xlab(NULL) & theme(plot.margin = margin(5.5, 5.5, 0, 5.5))\n\n# Use the tag label as an x-axis label\nwrap_elements(panel = h_patch) +\n labs(tag = \"year\") +\n theme(\n plot.tag = element_text(size = rel(1)),\n plot.tag.position = \"bottom\"\n )" + "text": "Shared x-axis labels\nWe set the bottom margin to 0 so the tag is in the same vertical position that the x-axis would otherwise be in.\n\n# Create the patchwork, dropping the x-axis labels from the plots, and setting\n# the margins\nh_patch <- h1 + h2 & xlab(NULL) & theme(plot.margin = margin(5.5, 5.5, 0, 5.5))\n\n# Use the tag label as an x-axis label\nwrap_elements(panel = h_patch) +\n labs(tag = \"year\") +\n theme(\n plot.tag = element_text(size = rel(1)),\n plot.tag.position = \"bottom\"\n )" }, { "objectID": "snippets/2022-12-22_patchwork-shared-axis-labels/index.html#shared-y-axis-labels", "href": "snippets/2022-12-22_patchwork-shared-axis-labels/index.html#shared-y-axis-labels", "title": "Shared axis labels in patchwork plots", "section": "Shared y-axis labels", - "text": "Shared y-axis labels\nWe set the left margin to 0 so the tag is in the same horizontal position that the y-axis would otherwise be in.\n\n# Create the patchwork, dropping the y-axis labels from the plots, and setting\n# the margins\nh_patch <- h1 / h2 & ylab(NULL) & theme(plot.margin = margin(5.5, 5.5, 5.5, 0))\n\n# Use the tag label as a y-axis label\nwrap_elements(h_patch) +\n labs(tag = \"level\") +\n theme(\n plot.tag = element_text(size = rel(1), angle = 90),\n plot.tag.position = \"left\"\n )" + "text": "Shared y-axis labels\nWe set the left margin to 0 so the tag is in the same horizontal position that the y-axis would otherwise be in.\n\n# Create the patchwork, dropping the y-axis labels from the plots, and setting\n# the margins\nh_patch <- h1 / h2 & ylab(NULL) & theme(plot.margin = margin(5.5, 5.5, 5.5, 0))\n\n# Use the tag label as a y-axis label\nwrap_elements(h_patch) +\n labs(tag = \"level\") +\n theme(\n plot.tag = element_text(size = rel(1), angle = 90),\n plot.tag.position = \"left\"\n )" }, { "objectID": "snippets/2022-12-22_patchwork-shared-axis-labels/index.html#shared-axis-labels-without-using-patchwork", "href": "snippets/2022-12-22_patchwork-shared-axis-labels/index.html#shared-axis-labels-without-using-patchwork", "title": "Shared axis labels in patchwork plots", "section": "Shared axis labels without using patchwork", - "text": "Shared axis labels without using patchwork\nElio Campitelli shared a solution on Mastodon that accomplishes the same results as above, but without patchwork. It uses the magic tilde notation to create functions in the data argument of each geom that adds a grouping variable var that can be faceted on.\n\nh <- ggplot(huron, aes(year)) +\n geom_ribbon(\n data = ~ transform(.x, var = \"a\"),\n aes(ymin = level - 1, ymax = level + 1),\n fill = \"grey70\"\n ) +\n geom_line(data = ~ transform(.x, var = \"a\"), aes(y = level)) +\n geom_area(data = ~ transform(.x, var = \"b\"), aes(y = level)) +\n # Since we don't care about the facet strips here, we can remove them.\n theme(\n strip.text = element_blank(),\n strip.background = element_blank()\n )\n\nFacet by rows for a shared x-axis.\n\nh +\n facet_wrap(vars(var), scales = \"free_y\")\n\n\n\n\nFacet by columns for a shared y-axis.\n\nh +\n facet_wrap(vars(var), scales = \"free_y\", ncol = 1)" + "text": "Shared axis labels without using patchwork\nElio Campitelli shared a solution on Mastodon that accomplishes the same results as above, but without patchwork. It uses the magic tilde notation to create functions in the data argument of each geom that adds a grouping variable var that can be faceted on.\n\nh <- ggplot(huron, aes(year)) +\n geom_ribbon(\n data = ~ transform(.x, var = \"a\"),\n aes(ymin = level - 1, ymax = level + 1),\n fill = \"grey70\"\n ) +\n geom_line(data = ~ transform(.x, var = \"a\"), aes(y = level)) +\n geom_area(data = ~ transform(.x, var = \"b\"), aes(y = level)) +\n # Since we don't care about the facet strips here, we can remove them.\n theme(\n strip.text = element_blank(),\n strip.background = element_blank()\n )\n\nFacet by rows for a shared x-axis.\n\nh +\n facet_wrap(vars(var), scales = \"free_y\")\n\n\n\n\nFacet by columns for a shared y-axis.\n\nh +\n facet_wrap(vars(var), scales = \"free_y\", ncol = 1)" }, { "objectID": "snippets/2022-12-22_patchwork-shared-axis-labels/index.html#section", @@ -405,12 +517,19 @@ "section": "", "text": "To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:\n\nlibrary(tidyverse)\nlibrary(ggdist)\nlibrary(geomtextpath)" }, + { + "objectID": "snippets/2022-10-29_geomtextpath-with-ggdist/index.html#prerequisites", + "href": "snippets/2022-10-29_geomtextpath-with-ggdist/index.html#prerequisites", + "title": "Directly labeling ggdist lineribbons with geomtextpath", + "section": "", + "text": "To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:\n\nlibrary(tidyverse)\nlibrary(ggdist)\nlibrary(geomtextpath)" + }, { "objectID": "snippets/2022-10-29_geomtextpath-with-ggdist/index.html#directly-labeling-lineribbons", "href": "snippets/2022-10-29_geomtextpath-with-ggdist/index.html#directly-labeling-lineribbons", "title": "Directly labeling ggdist lineribbons with geomtextpath", "section": "Directly labeling lineribbons", - "text": "Directly labeling lineribbons\nFirst make some data.\n\nset.seed(1234)\nn = 5000\n\ndf <- tibble(\n .draw = 1:n,\n intercept = rnorm(n, 3, 1),\n slope = rnorm(n, 1, 0.25),\n x = list(-4:5),\n y = map2(intercept, slope, ~ .x + .y * -4:5)\n) %>%\n unnest(c(x, y))\n\nThen plot it.\n\ndf %>%\n group_by(x) %>%\n median_qi(y, .width = c(.50, .80, .95)) %>%\n ggplot(aes(x = x, y = y, ymin = .lower, ymax = .upper)) +\n # Hide the line from geom_lineribbon() by setting `size = 0`\n geom_lineribbon(size = 0) +\n scale_fill_brewer() +\n # Replace the hidden line with a labelled line\n geom_textline(label = \"label\")" + "text": "Directly labeling lineribbons\nFirst make some data.\n\nset.seed(1234)\nn = 5000\n\ndf <- tibble(\n .draw = 1:n,\n intercept = rnorm(n, 3, 1),\n slope = rnorm(n, 1, 0.25),\n x = list(-4:5),\n y = map2(intercept, slope, ~ .x + .y * -4:5)\n) %>%\n unnest(c(x, y))\n\nThen plot it.\n\ndf %>%\n group_by(x) %>%\n median_qi(y, .width = c(.50, .80, .95)) %>%\n ggplot(aes(x = x, y = y, ymin = .lower, ymax = .upper)) +\n # Hide the line from geom_lineribbon() by setting `size = 0`\n geom_lineribbon(size = 0) +\n scale_fill_brewer() +\n # Replace the hidden line with a labelled line\n geom_textline(label = \"label\")" }, { "objectID": "snippets/2022-10-29_geomtextpath-with-ggdist/index.html#section", @@ -447,19 +566,26 @@ "section": "", "text": "To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:\n\nlibrary(ggplot2)\nlibrary(ggdist)\nlibrary(palettes)\nlibrary(forcats)" }, + { + "objectID": "snippets/2023-01-19_ggdist-histogram-rainclouds/index.html#prerequisites", + "href": "snippets/2023-01-19_ggdist-histogram-rainclouds/index.html#prerequisites", + "title": "Histogram raincloud plots", + "section": "", + "text": "To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:\n\nlibrary(ggplot2)\nlibrary(ggdist)\nlibrary(palettes)\nlibrary(forcats)" + }, { "objectID": "snippets/2023-01-19_ggdist-histogram-rainclouds/index.html#rationale", "href": "snippets/2023-01-19_ggdist-histogram-rainclouds/index.html#rationale", "title": "Histogram raincloud plots", "section": "Rationale", - "text": "Rationale\nLikert scales are a commonly used measurement tool in surveys. A typical Likert scale is made of multiple items measuring respondent’s attitudes towards different statements (e.g., “The prime minister is doing a good job”, “The senate is doing a good job”, etc.).\nAttitudes towards each statement are then measured with a rating scale like:\n\n\nPlease indicate how much you agree or disagree with each of these statements:\n\n\n\n\n\n\n\n\n\n\n\n\nStrongly disagree\nSomewhat disagree\nNeither agree nor disagree\nSomewhat agree\nStrongly agree\n\n\n\n\nThe prime minister is doing a good job.\n1\n2\n3\n4\n5\n\n\nThe senate is doing a good job.\n1\n2\n3\n4\n5\n\n\n\n\n\nBecause items in a Likert scale are numeric but discrete, a density histogram is an ideal way to visualize the distribution of responses to each item (as opposed to the density curve typically used in raincloud plots with continuous data).\n\nWhy not a density curve?\nWhile it is possible to use a density curve, doing so should make it immediately obvious why it isn’t a great approach for discrete numeric data like this:\n\nThe density curve masks notable differences in density between different scores\nThe outermost fills in the density curve are cut off when it is trimmed to the range of the input data\nThe density curve goes far beyond the possible values of the data when it isn’t trimmed1\n\n\n\n\n\n\n\n\n\nCode\nggplot(likert_scores, aes(x = score, y = item)) +\n stat_slab(\n aes(fill = cut(after_stat(x), breaks = breaks(x))),\n justification = -.2,\n height = 0.7,\n slab_colour = \"black\",\n slab_linewidth = 0.5,\n trim = TRUE\n ) +\n geom_boxplot(\n width = .2,\n outlier.shape = NA\n ) +\n geom_jitter(width = .1, height = .1, alpha = .3) +\n scale_fill_manual(\n values = pal_ramp(met_palettes$Hiroshige, 5, -1),\n labels = 1:5,\n guide = guide_legend(title = \"score\", reverse = TRUE)\n )\nggplot(likert_scores, aes(x = score, y = item)) +\n stat_slab(\n justification = -.2,\n height = 0.7,\n slab_colour = \"black\",\n slab_linewidth = 0.5,\n trim = FALSE\n ) +\n geom_boxplot(\n width = .2,\n outlier.shape = NA\n ) +\n geom_jitter(width = .1, height = .1, alpha = .3) +\n scale_x_continuous(breaks = 1:5)\n\n\n\n\n\n\n\ntrim = TRUE\n\n\n\n\n\n\n\ntrim = FALSE\n\n\n\n\n\n\nHowever, each of these problems is easily solved by using a density histogram instead." + "text": "Rationale\nLikert scales are a commonly used measurement tool in surveys. A typical Likert scale is made of multiple items measuring respondent’s attitudes towards different statements (e.g., “The prime minister is doing a good job”, “The senate is doing a good job”, etc.).\nAttitudes towards each statement are then measured with a rating scale like:\n\n\nPlease indicate how much you agree or disagree with each of these statements:\n\n\n\n\n\n\n\n\n\n\n\n\nStrongly disagree\nSomewhat disagree\nNeither agree nor disagree\nSomewhat agree\nStrongly agree\n\n\n\n\nThe prime minister is doing a good job.\n1\n2\n3\n4\n5\n\n\nThe senate is doing a good job.\n1\n2\n3\n4\n5\n\n\n\n\n\nBecause items in a Likert scale are numeric but discrete, a density histogram is an ideal way to visualize the distribution of responses to each item (as opposed to the density curve typically used in raincloud plots with continuous data).\n\nWhy not a density curve?\nWhile it is possible to use a density curve, doing so should make it immediately obvious why it isn’t a great approach for discrete numeric data like this:\n\nThe density curve masks notable differences in density between different scores\nThe outermost fills in the density curve are cut off when it is trimmed to the range of the input data\nThe density curve goes far beyond the possible values of the data when it isn’t trimmed1\n\n\n\nCode\nggplot(likert_scores, aes(x = score, y = item)) +\n stat_slab(\n aes(fill = cut(after_stat(x), breaks = breaks(x))),\n justification = -.2,\n height = 0.7,\n slab_colour = \"black\",\n slab_linewidth = 0.5,\n trim = TRUE\n ) +\n geom_boxplot(\n width = .2,\n outlier.shape = NA\n ) +\n geom_jitter(width = .1, height = .1, alpha = .3) +\n scale_fill_manual(\n values = pal_ramp(met_palettes$Hiroshige, 5, -1),\n labels = 1:5,\n guide = guide_legend(title = \"score\", reverse = TRUE)\n )\nggplot(likert_scores, aes(x = score, y = item)) +\n stat_slab(\n justification = -.2,\n height = 0.7,\n slab_colour = \"black\",\n slab_linewidth = 0.5,\n trim = FALSE\n ) +\n geom_boxplot(\n width = .2,\n outlier.shape = NA\n ) +\n geom_jitter(width = .1, height = .1, alpha = .3) +\n scale_x_continuous(breaks = 1:5)\n\n\n\n\n\n\n\ntrim = TRUE\n\n\n\n\n\n\n\ntrim = FALSE\n\n\n\n\n\n\nHowever, each of these problems is easily solved by using a density histogram instead." }, { "objectID": "snippets/2023-01-19_ggdist-histogram-rainclouds/index.html#histogram-raincloud-plots", "href": "snippets/2023-01-19_ggdist-histogram-rainclouds/index.html#histogram-raincloud-plots", "title": "Histogram raincloud plots", "section": "Histogram raincloud plots", - "text": "Histogram raincloud plots\nFirst make some data.\n\nset.seed(123)\n\nlikert_scores <- data.frame(\n item = rep(letters[1:2], times = 33),\n score = sample(1:5, 66, replace = TRUE)\n)\n\nIt’s straightforward to make density histograms for each item with ggplot2.\n\nggplot(likert_scores, aes(x = score, y = after_stat(density))) +\n geom_histogram(\n aes(fill = after_stat(x)),\n bins = 5,\n colour = \"black\"\n ) +\n scale_fill_gradientn(\n colours = pal_ramp(met_palettes$Hiroshige, 5, -1),\n guide = guide_legend(title = \"score\", reverse = TRUE)\n ) +\n facet_wrap(vars(fct_rev(item)), ncol = 1)\n\n\n\n\nHowever, the density histograms in this plot can’t be vertically justified to give space for the box and whiskers plot and points used in a typical raincloud plot. For that we need the stat_slab() function from the ggdist package and a small helper function to determine where to put breaks in the histogram.\n\n#' Set breaks so bins are centred on each score\n#'\n#' @param x A vector of values.\n#' @param width Any value between 0 and 0.5 for setting the width of the bins.\nbreaks <- function(x, width = 0.49999999) {\n rep(1:max(x), each = 2) + c(-width, width)\n}\n\nThe default slab type for stat_slab() is a probability density (or mass) function (\"pdf\"), but it can also calculate density histograms (\"histogram\"). To match the appearance of geom_histogram(), the breaks argument needs to be given the location of each bin’s left and right edge; this also necessitates using cut() with the fill aesthetic so the fill breaks correctly align with each bin.\n\nggplot(likert_scores, aes(x = score, y = item)) +\n stat_slab(\n # Divide fill into five equal bins\n aes(fill = cut(after_stat(x), breaks = 5)),\n slab_type = \"histogram\",\n breaks = \\(x) breaks(x),\n # Justify the histogram upwards\n justification = -.2,\n # Reduce the histogram's height so it doesn't cover geoms from other items\n height = 0.7,\n # Add black outlines because they look nice\n slab_colour = \"black\",\n outline_bars = TRUE,\n slab_linewidth = 0.5\n ) +\n geom_boxplot(\n width = .2,\n # Hide outliers since the raw data will be plotted\n outlier.shape = NA\n ) +\n geom_jitter(width = .1, height = .1, alpha = .3) +\n # Cutting the fill into bins puts it on a discrete scale\n scale_fill_manual(\n values = pal_ramp(met_palettes$Hiroshige, 5, -1),\n labels = 1:5,\n guide = guide_legend(title = \"score\", reverse = TRUE)\n )" + "text": "Histogram raincloud plots\nFirst make some data.\n\nset.seed(123)\n\nlikert_scores <- data.frame(\n item = rep(letters[1:2], times = 33),\n score = sample(1:5, 66, replace = TRUE)\n)\n\nIt’s straightforward to make density histograms for each item with ggplot2.\n\nggplot(likert_scores, aes(x = score, y = after_stat(density))) +\n geom_histogram(\n aes(fill = after_stat(x)),\n bins = 5,\n colour = \"black\"\n ) +\n scale_fill_gradientn(\n colours = pal_ramp(met_palettes$Hiroshige, 5, -1),\n guide = guide_legend(title = \"score\", reverse = TRUE)\n ) +\n facet_wrap(vars(fct_rev(item)), ncol = 1)\n\n\n\n\nHowever, the density histograms in this plot can’t be vertically justified to give space for the box and whiskers plot and points used in a typical raincloud plot. For that we need the stat_slab() function from the ggdist package and a small helper function to determine where to put breaks in the histogram.\n\n#' Set breaks so bins are centred on each score\n#'\n#' @param x A vector of values.\n#' @param width Any value between 0 and 0.5 for setting the width of the bins.\nbreaks <- function(x, width = 0.49999999) {\n rep(1:max(x), each = 2) + c(-width, width)\n}\n\nThe default slab type for stat_slab() is a probability density (or mass) function (\"pdf\"), but it can also calculate density histograms (\"histogram\"). To match the appearance of geom_histogram(), the breaks argument needs to be given the location of each bin’s left and right edge; this also necessitates using cut() with the fill aesthetic so the fill breaks correctly align with each bin.\n\nggplot(likert_scores, aes(x = score, y = item)) +\n stat_slab(\n # Divide fill into five equal bins\n aes(fill = cut(after_stat(x), breaks = 5)),\n slab_type = \"histogram\",\n breaks = \\(x) breaks(x),\n # Justify the histogram upwards\n justification = -.2,\n # Reduce the histogram's height so it doesn't cover geoms from other items\n height = 0.7,\n # Add black outlines because they look nice\n slab_colour = \"black\",\n outline_bars = TRUE,\n slab_linewidth = 0.5\n ) +\n geom_boxplot(\n width = .2,\n # Hide outliers since the raw data will be plotted\n outlier.shape = NA\n ) +\n geom_jitter(width = .1, height = .1, alpha = .3) +\n # Cutting the fill into bins puts it on a discrete scale\n scale_fill_manual(\n values = pal_ramp(met_palettes$Hiroshige, 5, -1),\n labels = 1:5,\n guide = guide_legend(title = \"score\", reverse = TRUE)\n )" }, { "objectID": "snippets/2023-01-19_ggdist-histogram-rainclouds/index.html#section", @@ -489,40 +615,54 @@ "section": "Session Info", "text": "Session Info\n\n\n\n\n\n\n─ Session info ───────────────────────────────────────────────────────────────\n setting value\n version R version 4.2.2 (2022-10-31)\n os macOS Mojave 10.14.6\n system x86_64, darwin17.0\n ui X11\n language (EN)\n collate en_CA.UTF-8\n ctype en_CA.UTF-8\n tz America/Vancouver\n date 2023-01-20\n pandoc 2.14.0.3 @ /Applications/RStudio.app/Contents/MacOS/pandoc/ (via rmarkdown)\n quarto 1.2.313 @ /usr/local/bin/quarto\n\n─ Packages ───────────────────────────────────────────────────────────────────\n package * version date (UTC) lib source\n forcats * 0.5.2 2022-08-19 [1] CRAN (R 4.2.0)\n ggdist * 3.2.1 2023-01-18 [1] CRAN (R 4.2.2)\n ggplot2 * 3.4.0 2022-11-04 [1] CRAN (R 4.2.0)\n palettes * 0.1.0 2022-12-19 [1] CRAN (R 4.2.0)\n sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)\n\n [1] /Users/Michael/Library/R/x86_64/4.2/library/__tidytales\n [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library\n\n──────────────────────────────────────────────────────────────────────────────" }, + { + "objectID": "snippets/2023-01-19_ggdist-histogram-rainclouds/index.html#footnotes", + "href": "snippets/2023-01-19_ggdist-histogram-rainclouds/index.html#footnotes", + "title": "Histogram raincloud plots", + "section": "Footnotes", + "text": "Footnotes\n\n\nIt also makes it difficult to get the fill breaks right, hence the lack of any fill colours in the trim = FALSE example.↩︎" + }, { "objectID": "snippets/2021-11-01_longitudinal-measurement-invariance/index.html", "href": "snippets/2021-11-01_longitudinal-measurement-invariance/index.html", "title": "Longitudinal Measurement Invariance", "section": "", - "text": "To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:\n\nlibrary(lavaan)\nlibrary(semTools)\n\nAnd read in the data:\n\nsocial_exchanges <- read.csv(here(\"data\", \"2021-11-01_social-exchanges.csv\"))\n\nThe data contains simulated values for several indicators of positive and negative social exchanges, measured on two occasions (w1 and w2). There are three continuous indicators that measure perceived companionship (vst1, vst2, vst3), and three binary indicators that measure unwanted advice (unw1, unw2, unw3). The data and some of the examples come from Longitudinal Structural Equation Modeling: A Comprehensive Introduction by Jason Newsom." + "text": "To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:\n\nlibrary(lavaan)\nlibrary(semTools)\n\nAnd read in the data:\n\nsocial_exchanges <- read.csv(here(\"data\", \"2021-11-01_social-exchanges.csv\"))\n\nThe data contains simulated values for several indicators of positive and negative social exchanges, measured on two occasions (w1 and w2). There are three continuous indicators that measure perceived companionship (vst1, vst2, vst3), and three binary indicators that measure unwanted advice (unw1, unw2, unw3). The data and some of the examples come from Longitudinal Structural Equation Modeling: A Comprehensive Introduction by Jason Newsom." + }, + { + "objectID": "snippets/2021-11-01_longitudinal-measurement-invariance/index.html#prerequisites", + "href": "snippets/2021-11-01_longitudinal-measurement-invariance/index.html#prerequisites", + "title": "Longitudinal Measurement Invariance", + "section": "", + "text": "To access the datasets, help pages, and functions that we will use in this code snippet, load the following packages:\n\nlibrary(lavaan)\nlibrary(semTools)\n\nAnd read in the data:\n\nsocial_exchanges <- read.csv(here(\"data\", \"2021-11-01_social-exchanges.csv\"))\n\nThe data contains simulated values for several indicators of positive and negative social exchanges, measured on two occasions (w1 and w2). There are three continuous indicators that measure perceived companionship (vst1, vst2, vst3), and three binary indicators that measure unwanted advice (unw1, unw2, unw3). The data and some of the examples come from Longitudinal Structural Equation Modeling: A Comprehensive Introduction by Jason Newsom." }, { "objectID": "snippets/2021-11-01_longitudinal-measurement-invariance/index.html#configural-invariance", "href": "snippets/2021-11-01_longitudinal-measurement-invariance/index.html#configural-invariance", "title": "Longitudinal Measurement Invariance", "section": "Configural Invariance", - "text": "Configural Invariance\nUsing the lavaan package.\n\nconfigural_model_lav <- (\"\n # Measurement model\n w1comp =~ w1vst1 + w1vst2 + w1vst3\n w2comp =~ w2vst1 + w2vst2 + w2vst3\n \n # Variances and covariances\n w2comp ~~ w1comp\n w1comp ~~ w1comp\n w2comp ~~ w2comp\n\n w1vst1 ~~ w1vst1\n w1vst2 ~~ w1vst2\n w1vst3 ~~ w1vst3\n w2vst1 ~~ w2vst1\n w2vst2 ~~ w2vst2\n w2vst3 ~~ w2vst3\n\n w1vst1 ~~ w2vst1\n w1vst2 ~~ w2vst2\n w1vst3 ~~ w2vst3\n\")\n\nconfigural_model_lav_fit <- sem(configural_model_lav, data = social_exchanges)\n\nUsing the semTools package.\n\n# First, define the configural model, using the repeated measures factors and\n# indicators.\nconfigural_model_smt <- (\"\n # Measurement model\n w1comp =~ w1vst1 + w1vst2 + w1vst3\n w2comp =~ w2vst1 + w2vst2 + w2vst3\n\")\n\n# Second, create a named list indicating which factors are actually the same\n# latent variable measured repeatedly.\nlongitudinal_factor_names <- list(\n comp = c(\"w1comp\", \"w2comp\")\n)\n\n# Third, generate the lavaan model syntax using semTools.\nconfigural_model_smt <- measEq.syntax(\n configural.model = configural_model_smt,\n longFacNames = longitudinal_factor_names,\n ID.fac = \"std.lv\",\n ID.cat = \"Wu.Estabrook.2016\",\n data = social_exchanges\n)\nconfigural_model_smt <- as.character(configural_model_smt)\n\n# Finally, fit the model using lavaan.\nconfigural_model_smt_fit <- sem(configural_model_smt, data = social_exchanges)\n\nCompare lavaan and semTools fit measures.\n\n\nConfigural invariance is met if the model fits well, indicators load on the same factors, and loadings are all of acceptable magnitude. An alternative way of testing longitudinal configural invariance is to fit separate confirmatory factor models at each time point; configural invariance is met if the previously stated criteria hold and the measure has the same factor structure at each time point.\n\nfitMeasures(configural_model_lav_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 9.911 5.000 0.078 0.997 0.041\n\nfitMeasures(configural_model_smt_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 9.911 5.000 0.078 0.997 0.041" + "text": "Configural Invariance\nUsing the lavaan package.\n\nconfigural_model_lav <- (\"\n # Measurement model\n w1comp =~ w1vst1 + w1vst2 + w1vst3\n w2comp =~ w2vst1 + w2vst2 + w2vst3\n \n # Variances and covariances\n w2comp ~~ w1comp\n w1comp ~~ w1comp\n w2comp ~~ w2comp\n\n w1vst1 ~~ w1vst1\n w1vst2 ~~ w1vst2\n w1vst3 ~~ w1vst3\n w2vst1 ~~ w2vst1\n w2vst2 ~~ w2vst2\n w2vst3 ~~ w2vst3\n\n w1vst1 ~~ w2vst1\n w1vst2 ~~ w2vst2\n w1vst3 ~~ w2vst3\n\")\n\nconfigural_model_lav_fit <- sem(configural_model_lav, data = social_exchanges)\n\nUsing the semTools package.\n\n# First, define the configural model, using the repeated measures factors and\n# indicators.\nconfigural_model_smt <- (\"\n # Measurement model\n w1comp =~ w1vst1 + w1vst2 + w1vst3\n w2comp =~ w2vst1 + w2vst2 + w2vst3\n\")\n\n# Second, create a named list indicating which factors are actually the same\n# latent variable measured repeatedly.\nlongitudinal_factor_names <- list(\n comp = c(\"w1comp\", \"w2comp\")\n)\n\n# Third, generate the lavaan model syntax using semTools.\nconfigural_model_smt <- measEq.syntax(\n configural.model = configural_model_smt,\n longFacNames = longitudinal_factor_names,\n ID.fac = \"std.lv\",\n ID.cat = \"Wu.Estabrook.2016\",\n data = social_exchanges\n)\nconfigural_model_smt <- as.character(configural_model_smt)\n\n# Finally, fit the model using lavaan.\nconfigural_model_smt_fit <- sem(configural_model_smt, data = social_exchanges)\n\nCompare lavaan and semTools fit measures.\n\n\nConfigural invariance is met if the model fits well, indicators load on the same factors, and loadings are all of acceptable magnitude. An alternative way of testing longitudinal configural invariance is to fit separate confirmatory factor models at each time point; configural invariance is met if the previously stated criteria hold and the measure has the same factor structure at each time point.\n\nfitMeasures(configural_model_lav_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 9.911 5.000 0.078 0.997 0.041\n\nfitMeasures(configural_model_smt_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 9.911 5.000 0.078 0.997 0.041" }, { "objectID": "snippets/2021-11-01_longitudinal-measurement-invariance/index.html#weak-invariance", "href": "snippets/2021-11-01_longitudinal-measurement-invariance/index.html#weak-invariance", "title": "Longitudinal Measurement Invariance", "section": "Weak Invariance", - "text": "Weak Invariance\nUsing the lavaan package.\n\nweak_model_lav <- (\"\n # Measurement model\n w1comp =~ w1vst1 + a*w1vst2 + b*w1vst3 # Factor loading equality constraint\n w2comp =~ w2vst1 + a*w2vst2 + b*w2vst3 # Factor loading equality constraint\n\n # Variances and covariances\n w2comp ~~ w1comp\n w1comp ~~ w1comp\n w2comp ~~ w2comp\n\n w1vst1 ~~ w1vst1\n w1vst2 ~~ w1vst2\n w1vst3 ~~ w1vst3\n w2vst1 ~~ w2vst1\n w2vst2 ~~ w2vst2\n w2vst3 ~~ w2vst3\n\n w1vst1 ~~ w2vst1\n w1vst2 ~~ w2vst2\n w1vst3 ~~ w2vst3\n\")\n\nweak_model_lav_fit <- sem(weak_model_lav, social_exchanges)\n\nUsing the semTools package.\n\nweak_model_smt <- measEq.syntax(\n configural.model = configural_model_smt,\n longFacNames = longitudinal_factor_names,\n ID.fac = \"std.lv\",\n ID.cat = \"Wu.Estabrook.2016\",\n long.equal = c(\"loadings\"),\n data = social_exchanges\n)\nweak_model_smt <- as.character(weak_model_smt)\n\nweak_model_smt_fit <- sem(weak_model_smt, data = social_exchanges)\n\nCompare lavaan and semTools fit measures.\n\nfitMeasures(weak_model_lav_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 12.077 7.000 0.098 0.997 0.036\n\nfitMeasures(weak_model_smt_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 12.077 7.000 0.098 0.997 0.036\n\n\nTest weak invariance.\n\nlavTestLRT(configural_model_lav_fit, weak_model_lav_fit)" + "text": "Weak Invariance\nUsing the lavaan package.\n\nweak_model_lav <- (\"\n # Measurement model\n w1comp =~ w1vst1 + a*w1vst2 + b*w1vst3 # Factor loading equality constraint\n w2comp =~ w2vst1 + a*w2vst2 + b*w2vst3 # Factor loading equality constraint\n\n # Variances and covariances\n w2comp ~~ w1comp\n w1comp ~~ w1comp\n w2comp ~~ w2comp\n\n w1vst1 ~~ w1vst1\n w1vst2 ~~ w1vst2\n w1vst3 ~~ w1vst3\n w2vst1 ~~ w2vst1\n w2vst2 ~~ w2vst2\n w2vst3 ~~ w2vst3\n\n w1vst1 ~~ w2vst1\n w1vst2 ~~ w2vst2\n w1vst3 ~~ w2vst3\n\")\n\nweak_model_lav_fit <- sem(weak_model_lav, social_exchanges)\n\nUsing the semTools package.\n\nweak_model_smt <- measEq.syntax(\n configural.model = configural_model_smt,\n longFacNames = longitudinal_factor_names,\n ID.fac = \"std.lv\",\n ID.cat = \"Wu.Estabrook.2016\",\n long.equal = c(\"loadings\"),\n data = social_exchanges\n)\nweak_model_smt <- as.character(weak_model_smt)\n\nweak_model_smt_fit <- sem(weak_model_smt, data = social_exchanges)\n\nCompare lavaan and semTools fit measures.\n\nfitMeasures(weak_model_lav_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 12.077 7.000 0.098 0.997 0.036\n\nfitMeasures(weak_model_smt_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 12.077 7.000 0.098 0.997 0.036\n\n\nTest weak invariance.\n\nlavTestLRT(configural_model_lav_fit, weak_model_lav_fit)" }, { "objectID": "snippets/2021-11-01_longitudinal-measurement-invariance/index.html#strong-invariance", "href": "snippets/2021-11-01_longitudinal-measurement-invariance/index.html#strong-invariance", "title": "Longitudinal Measurement Invariance", "section": "Strong Invariance", - "text": "Strong Invariance\nUsing the lavaan package.\n\n\nEquality tests of factor variances should only be conducted when all factor loadings also are constrained to be equal over time. When all non-referent loadings are set equal in the constrained model, the chi-square is the same regardless of the referent.\n\nstrong_model_lav <- (\"\n # Measurement model\n w1comp =~ w1vst1 + a*w1vst2 + b*w1vst3 # Factor loading equality constraint\n w2comp =~ w2vst1 + a*w2vst2 + b*w2vst3 # Factor loading equality constraint\n\n # Variances and covariances\n w2comp ~~ w1comp\n w2comp ~~ v*w2comp # Factor variance equality constraint\n w1comp ~~ v*w1comp # Factor variance equality constraint\n\n w1vst1 ~~ w2vst1\n w1vst2 ~~ w2vst2\n w1vst3 ~~ w2vst3\n\")\n\nstrong_model_lav_fit <- sem(strong_model_lav, social_exchanges)\n\nUsing the semTools package.\n\n# Example 2.2\nstrong_model_smt <- measEq.syntax(\n configural.model = configural_model_smt,\n longFacNames = longitudinal_factor_names,\n ID.fac = \"std.lv\",\n ID.cat = \"Wu.Estabrook.2016\",\n long.equal = c(\"loadings\", \"lv.variances\"),\n data = social_exchanges\n)\nstrong_model_smt <- as.character(strong_model_smt)\n\nstrong_model_smt_fit <- sem(strong_model_smt, social_exchanges)\n\nCompare lavaan and semTools fit measures.\n\nfitMeasures(strong_model_lav_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 37.553 8.000 0.000 0.983 0.080\n\nfitMeasures(strong_model_smt_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 37.553 8.000 0.000 0.983 0.080\n\n\nTest strong invariance.\n\nlavTestLRT(configural_model_lav_fit, weak_model_lav_fit, strong_model_lav_fit)" + "text": "Strong Invariance\nUsing the lavaan package.\n\n\nEquality tests of factor variances should only be conducted when all factor loadings also are constrained to be equal over time. When all non-referent loadings are set equal in the constrained model, the chi-square is the same regardless of the referent.\n\nstrong_model_lav <- (\"\n # Measurement model\n w1comp =~ w1vst1 + a*w1vst2 + b*w1vst3 # Factor loading equality constraint\n w2comp =~ w2vst1 + a*w2vst2 + b*w2vst3 # Factor loading equality constraint\n\n # Variances and covariances\n w2comp ~~ w1comp\n w2comp ~~ v*w2comp # Factor variance equality constraint\n w1comp ~~ v*w1comp # Factor variance equality constraint\n\n w1vst1 ~~ w2vst1\n w1vst2 ~~ w2vst2\n w1vst3 ~~ w2vst3\n\")\n\nstrong_model_lav_fit <- sem(strong_model_lav, social_exchanges)\n\nUsing the semTools package.\n\n# Example 2.2\nstrong_model_smt <- measEq.syntax(\n configural.model = configural_model_smt,\n longFacNames = longitudinal_factor_names,\n ID.fac = \"std.lv\",\n ID.cat = \"Wu.Estabrook.2016\",\n long.equal = c(\"loadings\", \"lv.variances\"),\n data = social_exchanges\n)\nstrong_model_smt <- as.character(strong_model_smt)\n\nstrong_model_smt_fit <- sem(strong_model_smt, social_exchanges)\n\nCompare lavaan and semTools fit measures.\n\nfitMeasures(strong_model_lav_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 37.553 8.000 0.000 0.983 0.080\n\nfitMeasures(strong_model_smt_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 37.553 8.000 0.000 0.983 0.080\n\n\nTest strong invariance.\n\nlavTestLRT(configural_model_lav_fit, weak_model_lav_fit, strong_model_lav_fit)" }, { "objectID": "snippets/2021-11-01_longitudinal-measurement-invariance/index.html#strict-invariance", "href": "snippets/2021-11-01_longitudinal-measurement-invariance/index.html#strict-invariance", "title": "Longitudinal Measurement Invariance", "section": "Strict Invariance", - "text": "Strict Invariance\nUsing the lavaan package.\n\nstrict_model_lav <- (\"\n # Measurement model\n w1comp =~ w1vst1 + a*w1vst2 + b*w1vst3 # Factor loading equality constraint\n w2comp =~ w2vst1 + a*w2vst2 + b*w2vst3 # Factor loading equality constraint\n\n # Variances & covariances\n w2comp ~~ w1comp\n\n w1comp ~~ c*w1comp # Factor variance equality constraint\n w2comp ~~ c*w2comp # Factor variance equality constraint\n\n w1vst1 ~~ w2vst1\n w1vst2 ~~ w2vst2\n w1vst3 ~~ w2vst3\n\n w1vst1 ~~ d*w1vst1 # Measurement residual equality constraint\n w1vst2 ~~ e*w1vst2 # Measurement residual equality constraint\n w1vst3 ~~ f*w1vst3 # Measurement residual equality constraint\n\n w2vst1 ~~ d*w2vst1 # Measurement residual equality constraint\n w2vst2 ~~ e*w2vst2 # Measurement residual equality constraint\n w2vst3 ~~ f*w2vst3 # Measurement residual equality constraint\n\")\n\nstrict_model_lav_fit <- sem(strict_model_lav, social_exchanges)\n\nUsing the semTools package.\n\nstrict_model_smt <- measEq.syntax(\n configural.model = configural_model_smt,\n longFacNames = longitudinal_factor_names,\n ID.fac = \"std.lv\",\n ID.cat = \"Wu.Estabrook.2016\",\n long.equal = c(\"loadings\", \"lv.variances\", \"residuals\"),\n data = social_exchanges\n)\nstrict_model_smt <- as.character(strict_model_smt)\n\nstrict_model_smt_fit <- sem(strict_model_smt, social_exchanges)\n\nCompare lavaan and semTools fit measures.\n\nfitMeasures(strict_model_lav_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 78.779 11.000 0.000 0.961 0.104\n\nfitMeasures(strict_model_smt_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 78.779 11.000 0.000 0.961 0.104\n\n\nTest strict invariance.\n\nlavTestLRT(\n configural_model_lav_fit,\n weak_model_lav_fit,\n strong_model_lav_fit,\n strict_model_lav_fit\n)" + "text": "Strict Invariance\nUsing the lavaan package.\n\nstrict_model_lav <- (\"\n # Measurement model\n w1comp =~ w1vst1 + a*w1vst2 + b*w1vst3 # Factor loading equality constraint\n w2comp =~ w2vst1 + a*w2vst2 + b*w2vst3 # Factor loading equality constraint\n\n # Variances & covariances\n w2comp ~~ w1comp\n\n w1comp ~~ c*w1comp # Factor variance equality constraint\n w2comp ~~ c*w2comp # Factor variance equality constraint\n\n w1vst1 ~~ w2vst1\n w1vst2 ~~ w2vst2\n w1vst3 ~~ w2vst3\n\n w1vst1 ~~ d*w1vst1 # Measurement residual equality constraint\n w1vst2 ~~ e*w1vst2 # Measurement residual equality constraint\n w1vst3 ~~ f*w1vst3 # Measurement residual equality constraint\n\n w2vst1 ~~ d*w2vst1 # Measurement residual equality constraint\n w2vst2 ~~ e*w2vst2 # Measurement residual equality constraint\n w2vst3 ~~ f*w2vst3 # Measurement residual equality constraint\n\")\n\nstrict_model_lav_fit <- sem(strict_model_lav, social_exchanges)\n\nUsing the semTools package.\n\nstrict_model_smt <- measEq.syntax(\n configural.model = configural_model_smt,\n longFacNames = longitudinal_factor_names,\n ID.fac = \"std.lv\",\n ID.cat = \"Wu.Estabrook.2016\",\n long.equal = c(\"loadings\", \"lv.variances\", \"residuals\"),\n data = social_exchanges\n)\nstrict_model_smt <- as.character(strict_model_smt)\n\nstrict_model_smt_fit <- sem(strict_model_smt, social_exchanges)\n\nCompare lavaan and semTools fit measures.\n\nfitMeasures(strict_model_lav_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 78.779 11.000 0.000 0.961 0.104\n\nfitMeasures(strict_model_smt_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\"))\n\n#> chisq df pvalue cfi rmsea \n#> 78.779 11.000 0.000 0.961 0.104\n\n\nTest strict invariance.\n\nlavTestLRT(\n configural_model_lav_fit,\n weak_model_lav_fit,\n strong_model_lav_fit,\n strict_model_lav_fit\n)" }, { "objectID": "snippets/2021-11-01_longitudinal-measurement-invariance/index.html#section", @@ -566,6 +706,13 @@ "section": "", "text": "I am proud to announce that version 0.1.0 of the palettes package is now on CRAN. palettes is an R package for working with colour vectors and colour palettes. I made it with three main goals in mind, each described in a vignette on the package website:\n\nTo provide a new family of colour classes (palettes_colour and palettes_palette) that always print as hex codes with colour previews.\nTo provide a comprehensive library of methods for working with colour vectors and colour palettes, including methods for ggplot2, gt, biscale, and other colour packages.\nTo make it easy for anyone to make their own colour palette package. Colour palette packages made with palettes exist solely for the purpose of distributing colour palettes and get access to all the features of palettes for free.\n\nIf you just want to jump in and start using palettes, you can install it from CRAN with:\ninstall.packages(\"palettes\")\nThe package website is the best place to start: https://mccarthy-m-g.github.io/palettes/index.html\nIf you want to learn more about why you should be using palettes, read on to learn more about the motivation of the package and how it makes working with colour vectors and colour palettes easy and fun for everyone." }, + { + "objectID": "posts/2022-12-20_palettes/index.html#overview", + "href": "posts/2022-12-20_palettes/index.html#overview", + "title": "Introducing the palettes package", + "section": "", + "text": "I am proud to announce that version 0.1.0 of the palettes package is now on CRAN. palettes is an R package for working with colour vectors and colour palettes. I made it with three main goals in mind, each described in a vignette on the package website:\n\nTo provide a new family of colour classes (palettes_colour and palettes_palette) that always print as hex codes with colour previews.\nTo provide a comprehensive library of methods for working with colour vectors and colour palettes, including methods for ggplot2, gt, biscale, and other colour packages.\nTo make it easy for anyone to make their own colour palette package. Colour palette packages made with palettes exist solely for the purpose of distributing colour palettes and get access to all the features of palettes for free.\n\nIf you just want to jump in and start using palettes, you can install it from CRAN with:\ninstall.packages(\"palettes\")\nThe package website is the best place to start: https://mccarthy-m-g.github.io/palettes/index.html\nIf you want to learn more about why you should be using palettes, read on to learn more about the motivation of the package and how it makes working with colour vectors and colour palettes easy and fun for everyone." + }, { "objectID": "posts/2022-12-20_palettes/index.html#origins", "href": "posts/2022-12-20_palettes/index.html#origins", @@ -585,14 +732,14 @@ "href": "posts/2022-12-20_palettes/index.html#just-show-me-some-colour-palettes-already", "title": "Introducing the palettes package", "section": "Just show me some colour palettes already!", - "text": "Just show me some colour palettes already!\nOkay, okay.\n\nlibrary(palettes)\n\nColour classes in palettes come in two forms:\n\nColour vectors (palettes_colour), which are created by pal_colour()\nColour palettes (palettes_palette), which are created by pal_palette()\n\nColour vectors can be thought of as a base type for colours, and colour palettes are just (named) lists of colour vectors. To illustrate, let’s use some colours from the MetBrewer package.\npal_colour() is a nice way to create a colour vector.\n\njava <- pal_colour(c(\"#663171\", \"#cf3a36\", \"#ea7428\", \"#e2998a\", \"#0c7156\"))\njava\n#> \n#> • #663171\n#> • #CF3A36\n#> • #EA7428\n#> • #E2998A\n#> • #0C7156\n\n\npal_palette() is a nice way to create named colour palettes.\n\nmetbrewer_palettes <- pal_palette(\n egypt = c(\"#dd5129\", \"#0f7ba2\", \"#43b284\", \"#fab255\"),\n java = java\n)\nmetbrewer_palettes\n#> \n#> $egypt\n#> \n#> • #DD5129\n#> • #0F7BA2\n#> • #43B284\n#> • #FAB255\n#> \n#> $java\n#> \n#> • #663171\n#> • #CF3A36\n#> • #EA7428\n#> • #E2998A\n#> • #0C7156\n\n\nplot() is a nice way to showcase colour vectors and colour palettes. The appearance of the plot depends on the input.\n\nplot(metbrewer_palettes)\n\n\n\n\nCasting and coercion methods are also available to turn other objects (like character vectors or lists) into colour vectors and colour palettes.\nYou can even cast colour vectors and colour palettes into tibbles.\n\nmetbrewer_tbl <- as_tibble(metbrewer_palettes)\nmetbrewer_tbl\n#> # A tibble: 9 × 2\n#> palette colour \n#> \n#> 1 egypt • #DD5129\n#> 2 egypt • #0F7BA2\n#> 3 egypt • #43B284\n#> 4 egypt • #FAB255\n#> 5 java • #663171\n#> 6 java • #CF3A36\n#> 7 java • #EA7428\n#> 8 java • #E2998A\n#> 9 java • #0C7156\n\n\nThis is useful if you want to wrangle the colours with dplyr.\n\nlibrary(dplyr)\n\nmetbrewer_tbl <- slice(metbrewer_tbl, -8)\nmetbrewer_tbl\n#> # A tibble: 8 × 2\n#> palette colour \n#> \n#> 1 egypt • #DD5129\n#> 2 egypt • #0F7BA2\n#> 3 egypt • #43B284\n#> 4 egypt • #FAB255\n#> 5 java • #663171\n#> 6 java • #CF3A36\n#> 7 java • #EA7428\n#> 8 java • #0C7156\n\n\nThen go back to a colour palette with the deframe() function from tibble.\n\nlibrary(tibble)\n\nmetbrewer_tbl %>%\n group_by(palette) %>%\n summarise(pal_palette(colour)) %>%\n deframe()\n#> \n#> $egypt\n#> \n#> • #DD5129\n#> • #0F7BA2\n#> • #43B284\n#> • #FAB255\n#> \n#> $java\n#> \n#> • #663171\n#> • #CF3A36\n#> • #EA7428\n#> • #0C7156" + "text": "Just show me some colour palettes already!\nOkay, okay.\n\nlibrary(palettes)\n\nColour classes in palettes come in two forms:\n\nColour vectors (palettes_colour), which are created by pal_colour()\nColour palettes (palettes_palette), which are created by pal_palette()\n\nColour vectors can be thought of as a base type for colours, and colour palettes are just (named) lists of colour vectors. To illustrate, let’s use some colours from the MetBrewer package.\npal_colour() is a nice way to create a colour vector.\n\njava <- pal_colour(c(\"#663171\", \"#cf3a36\", \"#ea7428\", \"#e2998a\", \"#0c7156\"))\njava\n#> <palettes_colour[5]>\n#> • #663171\n#> • #CF3A36\n#> • #EA7428\n#> • #E2998A\n#> • #0C7156\n\n\npal_palette() is a nice way to create named colour palettes.\n\nmetbrewer_palettes <- pal_palette(\n egypt = c(\"#dd5129\", \"#0f7ba2\", \"#43b284\", \"#fab255\"),\n java = java\n)\nmetbrewer_palettes\n#> <palettes_palette[2]>\n#> $egypt\n#> <palettes_colour[4]>\n#> • #DD5129\n#> • #0F7BA2\n#> • #43B284\n#> • #FAB255\n#> \n#> $java\n#> <palettes_colour[5]>\n#> • #663171\n#> • #CF3A36\n#> • #EA7428\n#> • #E2998A\n#> • #0C7156\n\n\nplot() is a nice way to showcase colour vectors and colour palettes. The appearance of the plot depends on the input.\n\nplot(metbrewer_palettes)\n\n\n\n\nCasting and coercion methods are also available to turn other objects (like character vectors or lists) into colour vectors and colour palettes.\nYou can even cast colour vectors and colour palettes into tibbles.\n\nmetbrewer_tbl <- as_tibble(metbrewer_palettes)\nmetbrewer_tbl\n#> # A tibble: 9 × 2\n#> palette colour \n#> <chr> <colour> \n#> 1 egypt • #DD5129\n#> 2 egypt • #0F7BA2\n#> 3 egypt • #43B284\n#> 4 egypt • #FAB255\n#> 5 java • #663171\n#> 6 java • #CF3A36\n#> 7 java • #EA7428\n#> 8 java • #E2998A\n#> 9 java • #0C7156\n\n\nThis is useful if you want to wrangle the colours with dplyr.\n\nlibrary(dplyr)\n\nmetbrewer_tbl <- slice(metbrewer_tbl, -8)\nmetbrewer_tbl\n#> # A tibble: 8 × 2\n#> palette colour \n#> <chr> <colour> \n#> 1 egypt • #DD5129\n#> 2 egypt • #0F7BA2\n#> 3 egypt • #43B284\n#> 4 egypt • #FAB255\n#> 5 java • #663171\n#> 6 java • #CF3A36\n#> 7 java • #EA7428\n#> 8 java • #0C7156\n\n\nThen go back to a colour palette with the deframe() function from tibble.\n\nlibrary(tibble)\n\nmetbrewer_tbl %>%\n group_by(palette) %>%\n summarise(pal_palette(colour)) %>%\n deframe()\n#> <palettes_palette[2]>\n#> $egypt\n#> <palettes_colour[4]>\n#> • #DD5129\n#> • #0F7BA2\n#> • #43B284\n#> • #FAB255\n#> \n#> $java\n#> <palettes_colour[4]>\n#> • #663171\n#> • #CF3A36\n#> • #EA7428\n#> • #0C7156" }, { "objectID": "posts/2022-12-20_palettes/index.html#what-about-ggplot2-plots", "href": "posts/2022-12-20_palettes/index.html#what-about-ggplot2-plots", "title": "Introducing the palettes package", "section": "What about ggplot2 plots?", - "text": "What about ggplot2 plots?\nJust use one of the scale_ functions!\n\nlibrary(ggplot2)\n\nhiroshige <- pal_colour(c(\n \"#1e466e\", \"#376795\", \"#528fad\", \"#72bcd5\", \"#aadce0\",\n \"#ffe6b7\", \"#ffd06f\", \"#f7aa58\", \"#ef8a47\", \"#e76254\"\n))\n\nggplot(faithfuld, aes(waiting, eruptions, fill = density)) +\n geom_raster() +\n coord_cartesian(expand = FALSE) +\n scale_fill_palette_c(hiroshige)\n\n\n\n\nThere are scale_ functions for discrete, continuous, and binned data, and you can pass additional arguments to them for further customization." + "text": "What about ggplot2 plots?\nJust use one of the scale_ functions!\n\nlibrary(ggplot2)\n\nhiroshige <- pal_colour(c(\n \"#1e466e\", \"#376795\", \"#528fad\", \"#72bcd5\", \"#aadce0\",\n \"#ffe6b7\", \"#ffd06f\", \"#f7aa58\", \"#ef8a47\", \"#e76254\"\n))\n\nggplot(faithfuld, aes(waiting, eruptions, fill = density)) +\n geom_raster() +\n coord_cartesian(expand = FALSE) +\n scale_fill_palette_c(hiroshige)\n\n\n\n\nThere are scale_ functions for discrete, continuous, and binned data, and you can pass additional arguments to them for further customization." }, { "objectID": "posts/2022-12-20_palettes/index.html#im-sold.-how-do-i-make-a-colour-palette-package", @@ -643,26 +790,33 @@ "section": "", "text": "During my (ongoing) job search for a data science or developer-focused role where I get to do R programming, this question came to me: Just how many R developers are there? That’s the question that inspired this post. However, the data needed to answer this question can also be used to answer other interesting questions about R developers, such as how many packages they’ve contributed to, their roles in package development, and so forth. So that’s what we’ll be doing here.\nIf you just want to see the stats, you can skip to the R developer statistics section. Otherwise follow along to see how I retrieved and wrangled the data into a usable state." }, + { + "objectID": "posts/2023-05-03_r-developers/index.html#overview", + "href": "posts/2023-05-03_r-developers/index.html#overview", + "title": "The Pareto Principle in R package development", + "section": "", + "text": "During my (ongoing) job search for a data science or developer-focused role where I get to do R programming, this question came to me: Just how many R developers are there? That’s the question that inspired this post. However, the data needed to answer this question can also be used to answer other interesting questions about R developers, such as how many packages they’ve contributed to, their roles in package development, and so forth. So that’s what we’ll be doing here.\nIf you just want to see the stats, you can skip to the R developer statistics section. Otherwise follow along to see how I retrieved and wrangled the data into a usable state." + }, { "objectID": "posts/2023-05-03_r-developers/index.html#prerequisites", "href": "posts/2023-05-03_r-developers/index.html#prerequisites", "title": "The Pareto Principle in R package development", "section": "Prerequisites", - "text": "Prerequisites\n\nlibrary(tidyverse)\nlibrary(stringi)\nlibrary(scales)\nlibrary(gt)\n\nI’ll be using the CRAN package repository data returned by tools::CRAN_package_db() to get package and author metadata for the current packages available on CRAN. This returns a data frame with character columns containing most metadata from the DESCRIPTION file of a given R package.\n\n\nSince this data will change over time, here’s when tools::CRAN_package_db() was run for reference: 2023-05-03.\n\ncran_pkg_db <- tools::CRAN_package_db()\n\nglimpse(cran_pkg_db)\n\n#> Rows: 19,473\n#> Columns: 67\n#> $ Package \"A3\", \"AalenJohansen\", \"AATtools\", \"ABACUS\",…\n#> $ Version \"1.0.0\", \"1.0\", \"0.0.2\", \"1.0.0\", \"0.1\", \"0.…\n#> $ Priority NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Depends \"R (>= 2.15.0), xtable, pbapply\", NA, \"R (>=…\n#> $ Imports NA, NA, \"magrittr, dplyr, doParallel, foreac…\n#> $ LinkingTo NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, \"Rcp…\n#> $ Suggests \"randomForest, e1071\", \"knitr, rmarkdown\", N…\n#> $ Enhances NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ License \"GPL (>= 2)\", \"GPL (>= 2)\", \"GPL-3\", \"GPL-3\"…\n#> $ License_is_FOSS NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ License_restricts_use NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ OS_type NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Archs NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ MD5sum \"027ebdd8affce8f0effaecfcd5f5ade2\", \"d7eb2a6…\n#> $ NeedsCompilation \"no\", \"no\", \"no\", \"no\", \"no\", \"no\", \"no\", \"n…\n#> $ Additional_repositories NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Author \"Scott Fortmann-Roe\", \"Martin Bladt [aut, cr…\n#> $ `Authors@R` NA, \"c(person(\\\"Martin\\\", \\\"Bladt\\\", email =…\n#> $ Biarch NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ BugReports NA, NA, \"https://github.com/Spiritspeak/AATt…\n#> $ BuildKeepEmpty NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ BuildManual NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ BuildResaveData NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ BuildVignettes NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Built NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ ByteCompile NA, NA, \"true\", NA, NA, NA, NA, NA, NA, NA, …\n#> $ `Classification/ACM` NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ `Classification/ACM-2012` NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ `Classification/JEL` NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ `Classification/MSC` NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ `Classification/MSC-2010` NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Collate NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Collate.unix NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Collate.windows NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Contact NA, NA, NA, NA, NA, NA, NA, NA, \"Ian Morison…\n#> $ Copyright NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, \"Eli…\n#> $ Date \"2015-08-15\", NA, NA, NA, \"2021-12-12\", NA, …\n#> $ `Date/Publication` \"2015-08-16 23:05:52\", \"2023-03-01 10:42:09 …\n#> $ Description \"Supplies tools for tabulating and analyzing…\n#> $ Encoding NA, \"UTF-8\", \"UTF-8\", \"UTF-8\", \"UTF-8\", NA, …\n#> $ KeepSource NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Language NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ LazyData NA, NA, \"true\", \"true\", NA, \"true\", NA, NA, …\n#> $ LazyDataCompression NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ LazyLoad NA, NA, NA, NA, NA, NA, NA, NA, NA, \"yes\", N…\n#> $ MailingList NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Maintainer \"Scott Fortmann-Roe \",…\n#> $ Note NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Packaged \"2015-08-16 14:17:33 UTC; scott\", \"2023-02-2…\n#> $ RdMacros NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ StagedInstall NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ SysDataCompression NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ SystemRequirements NA, NA, NA, NA, NA, NA, NA, NA, \"GNU make\", …\n#> $ Title \"Accurate, Adaptable, and Accessible Error M…\n#> $ Type \"Package\", \"Package\", \"Package\", NA, \"Packag…\n#> $ URL NA, NA, NA, \"https://shiny.abdn.ac.uk/Stats/…\n#> $ UseLTO NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ VignetteBuilder NA, \"knitr\", NA, \"knitr\", NA, \"knitr\", NA, N…\n#> $ ZipData NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Path NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ `X-CRAN-Comment` NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Published \"2015-08-16\", \"2023-03-01\", \"2022-08-12\", \"2…\n#> $ `Reverse depends` NA, NA, NA, NA, NA, NA, \"abctools, EasyABC\",…\n#> $ `Reverse imports` NA, NA, NA, NA, NA, NA, \"ecolottery, poems\",…\n#> $ `Reverse linking to` NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ `Reverse suggests` NA, NA, NA, NA, NA, NA, \"coala\", \"abctools\",…\n#> $ `Reverse enhances` NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …" + "text": "Prerequisites\n\nlibrary(tidyverse)\nlibrary(stringi)\nlibrary(scales)\nlibrary(gt)\n\nI’ll be using the CRAN package repository data returned by tools::CRAN_package_db() to get package and author metadata for the current packages available on CRAN. This returns a data frame with character columns containing most metadata from the DESCRIPTION file of a given R package.\n\n\nSince this data will change over time, here’s when tools::CRAN_package_db() was run for reference: 2023-05-03.\n\ncran_pkg_db <- tools::CRAN_package_db()\n\nglimpse(cran_pkg_db)\n\n#> Rows: 19,473\n#> Columns: 67\n#> $ Package <chr> \"A3\", \"AalenJohansen\", \"AATtools\", \"ABACUS\",…\n#> $ Version <chr> \"1.0.0\", \"1.0\", \"0.0.2\", \"1.0.0\", \"0.1\", \"0.…\n#> $ Priority <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Depends <chr> \"R (>= 2.15.0), xtable, pbapply\", NA, \"R (>=…\n#> $ Imports <chr> NA, NA, \"magrittr, dplyr, doParallel, foreac…\n#> $ LinkingTo <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, \"Rcp…\n#> $ Suggests <chr> \"randomForest, e1071\", \"knitr, rmarkdown\", N…\n#> $ Enhances <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ License <chr> \"GPL (>= 2)\", \"GPL (>= 2)\", \"GPL-3\", \"GPL-3\"…\n#> $ License_is_FOSS <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ License_restricts_use <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ OS_type <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Archs <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ MD5sum <chr> \"027ebdd8affce8f0effaecfcd5f5ade2\", \"d7eb2a6…\n#> $ NeedsCompilation <chr> \"no\", \"no\", \"no\", \"no\", \"no\", \"no\", \"no\", \"n…\n#> $ Additional_repositories <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Author <chr> \"Scott Fortmann-Roe\", \"Martin Bladt [aut, cr…\n#> $ `Authors@R` <chr> NA, \"c(person(\\\"Martin\\\", \\\"Bladt\\\", email =…\n#> $ Biarch <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ BugReports <chr> NA, NA, \"https://github.com/Spiritspeak/AATt…\n#> $ BuildKeepEmpty <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ BuildManual <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ BuildResaveData <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ BuildVignettes <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Built <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ ByteCompile <chr> NA, NA, \"true\", NA, NA, NA, NA, NA, NA, NA, …\n#> $ `Classification/ACM` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ `Classification/ACM-2012` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ `Classification/JEL` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ `Classification/MSC` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ `Classification/MSC-2010` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Collate <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Collate.unix <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Collate.windows <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Contact <chr> NA, NA, NA, NA, NA, NA, NA, NA, \"Ian Morison…\n#> $ Copyright <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, \"Eli…\n#> $ Date <chr> \"2015-08-15\", NA, NA, NA, \"2021-12-12\", NA, …\n#> $ `Date/Publication` <chr> \"2015-08-16 23:05:52\", \"2023-03-01 10:42:09 …\n#> $ Description <chr> \"Supplies tools for tabulating and analyzing…\n#> $ Encoding <chr> NA, \"UTF-8\", \"UTF-8\", \"UTF-8\", \"UTF-8\", NA, …\n#> $ KeepSource <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Language <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ LazyData <chr> NA, NA, \"true\", \"true\", NA, \"true\", NA, NA, …\n#> $ LazyDataCompression <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ LazyLoad <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, \"yes\", N…\n#> $ MailingList <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Maintainer <chr> \"Scott Fortmann-Roe <scottfr@berkeley.edu>\",…\n#> $ Note <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Packaged <chr> \"2015-08-16 14:17:33 UTC; scott\", \"2023-02-2…\n#> $ RdMacros <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ StagedInstall <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ SysDataCompression <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ SystemRequirements <chr> NA, NA, NA, NA, NA, NA, NA, NA, \"GNU make\", …\n#> $ Title <chr> \"Accurate, Adaptable, and Accessible Error M…\n#> $ Type <chr> \"Package\", \"Package\", \"Package\", NA, \"Packag…\n#> $ URL <chr> NA, NA, NA, \"https://shiny.abdn.ac.uk/Stats/…\n#> $ UseLTO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ VignetteBuilder <chr> NA, \"knitr\", NA, \"knitr\", NA, \"knitr\", NA, N…\n#> $ ZipData <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Path <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ `X-CRAN-Comment` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ Published <chr> \"2015-08-16\", \"2023-03-01\", \"2022-08-12\", \"2…\n#> $ `Reverse depends` <chr> NA, NA, NA, NA, NA, NA, \"abctools, EasyABC\",…\n#> $ `Reverse imports` <chr> NA, NA, NA, NA, NA, NA, \"ecolottery, poems\",…\n#> $ `Reverse linking to` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n#> $ `Reverse suggests` <chr> NA, NA, NA, NA, NA, NA, \"coala\", \"abctools\",…\n#> $ `Reverse enhances` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …" }, { "objectID": "posts/2023-05-03_r-developers/index.html#wrangle", "href": "posts/2023-05-03_r-developers/index.html#wrangle", "title": "The Pareto Principle in R package development", "section": "Wrangle", - "text": "Wrangle\nSince we only care about package and author metadata, a good first step is to remove everything else. This leaves us with a Package field and two author fields: Author and Authors@R. The difference between the two author fields is that Author is an unstructured text field that can contain any text in any format, and Authors@R is a structured text field containing R code that defines authors’ names and roles with the person() function.\n\ncran_pkg_db <- cran_pkg_db |>\n select(package = Package, authors = Author, authors_r = `Authors@R`) |>\n as_tibble()\n\nHere’s a comparison of the two fields, using the dplyr package as an example:\n\n# Author\ncran_pkg_db |>\n filter(package == \"dplyr\") |>\n pull(authors) |>\n cat()\n\n#> Hadley Wickham [aut, cre] (),\n#> Romain François [aut] (),\n#> Lionel Henry [aut],\n#> Kirill Müller [aut] (),\n#> Davis Vaughan [aut] (),\n#> Posit Software, PBC [cph, fnd]\n\n# Authors@R\ncran_pkg_db |>\n filter(package == \"dplyr\") |>\n pull(authors_r) |>\n cat()\n\n#> c(\n#> person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = c(\"aut\", \"cre\"),\n#> comment = c(ORCID = \"0000-0003-4757-117X\")),\n#> person(\"Romain\", \"François\", role = \"aut\",\n#> comment = c(ORCID = \"0000-0002-2444-4226\")),\n#> person(\"Lionel\", \"Henry\", role = \"aut\"),\n#> person(\"Kirill\", \"Müller\", role = \"aut\",\n#> comment = c(ORCID = \"0000-0002-1416-3412\")),\n#> person(\"Davis\", \"Vaughan\", , \"davis@posit.co\", role = \"aut\",\n#> comment = c(ORCID = \"0000-0003-4777-038X\")),\n#> person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\"))\n#> )\n\n\nAnd a glimpse at the data:\n\ncran_pkg_db\n\n#> # A tibble: 19,473 × 3\n#> package authors autho…¹\n#> \n#> 1 A3 \"Scott Fortmann-Roe\" \n#> 2 AalenJohansen \"Martin Bladt [aut, cre],\\n Christian Furrer [aut]\" \"c(per…\n#> 3 AATtools \"Sercan Kahveci [aut, cre]\" \"perso…\n#> 4 ABACUS \"Mintu Nath [aut, cre]\" \n#> 5 abbreviate \"Sigbert Klinke [aut, cre]\" \"\\n p…\n#> 6 abbyyR \"Gaurav Sood [aut, cre]\" \"perso…\n#> 7 abc \"Csillery Katalin [aut],\\n Lemaire Louisiane [aut],\\n… \"c( \\n…\n#> 8 abc.data \"Csillery Katalin [aut],\\n Lemaire Louisiane [aut],\\n… \"c( \\n…\n#> 9 ABC.RAP \"Abdulmonem Alsaleh [cre, aut], Robert Weeks [aut], Ia… \n#> 10 ABCanalysis \"Michael Thrun, Jorn Lotsch, Alfred Ultsch\" \n#> # … with 19,463 more rows, and abbreviated variable name ¹​authors_r\n\n\nFrom the output above you can see that every package uses the Author field, but not all packages use the Authors@R field. This is unfortunate, because it means that the names and roles of authors need to be extracted from the unstructured text in the Author field for a subset of packages, which is difficult to do and somewhat error-prone. Just for consideration, here’s how many packages don’t use the Authors@R field.\n\ncran_pkg_db |>\n filter(is.na(authors_r)) |>\n nrow()\n\n#> [1] 6361\n\n\nSo roughly one-third of all packages. From the output above it’s also clear that although there are similarities in how different packages populate the Author field, it does vary; so a simple rule like splitting the text on commas isn’t sufficient. These are fairly tame examples—some packages even use multiple sentences describing each author’s roles and affiliations, or contain other comments such as copyright disclaimers. All of these things make it more difficult to extract names and roles without errors.\nConversely, for the Authors@R field, all that’s needed is to parse and evaluate the R code stored there as a character string; this will return a person vector that has format() methods to get authors’ names and roles into an analysis-ready format. This removes the possibility for me to introduce errors into the data, although it doesn’t solve things like Authors using an inconsistent name across packages (e.g., sometimes including their middle initial and sometimes not, or just generally writing their name differently).\nBecause there are two fields, I’ll make two helper functions to get name and role data from each field. Regardless of the field, the end goal is to tidy cran_pkg_db into a data frame with three columns: package, person, and roles, with one package/person combination per row.\n\nExtracting from Authors@R\nGetting the data we want from the Authors@R field is pretty straightforward. For the packages where this is used, each one has a vector of person objects stored as a character string like:\n\nmm_string <- \"person(\\\"Michael\\\", \\\"McCarthy\\\", , role = c(\\\"aut\\\", \\\"cre\\\"))\"\n\nmm_string\n\n#> [1] \"person(\\\"Michael\\\", \\\"McCarthy\\\", , role = c(\\\"aut\\\", \\\"cre\\\"))\"\n\n\nWhich can be parsed and evaluated as R code like:\n\nmm_eval <- eval(parse(text = mm_string))\n\nclass(mm_eval)\n\n#> [1] \"person\"\n\n\nThen the format() method for the person class can be used to get names and roles into the format I want simply and accurately.\n\nmm_person <- format(mm_eval, include = c(\"given\", \"family\"))\nmm_roles <- format(mm_eval, include = c(\"role\"))\ntibble(person = mm_person, roles = mm_roles)\n\n#> # A tibble: 1 × 2\n#> person roles \n#> \n#> 1 Michael McCarthy [aut, cre]\n\n\nI’ve wrapped this up into a small helper function, authors_r(), that includes some light tidying steps just to deal with a couple small discrepancies I noticed in a subset of packages.\n\n# Get names and roles from \"person\" objects in the Authors@R field\nauthors_r <- function(x) {\n # Some light preprocessing is needed to replace the unicode symbol for line\n # breaks with the regular \"\\n\". This is an edge case from at least one\n # package.\n code <- str_replace_all(x, \"\\\\\", \"\\n\")\n persons <- eval(parse(text = code))\n person <- str_trim(format(persons, include = c(\"given\", \"family\")))\n roles <- format(persons, include = c(\"role\"))\n tibble(person = person, roles = roles)\n}\n\nHere’s an example of it with dplyr:\n\ncran_pkg_db |>\n filter(package == \"dplyr\") |>\n pull(authors_r) |>\n # Normalizing names leads to more consistent results with summary statistics\n # later on, since some people use things like umlauts and accents\n # inconsistently.\n stri_trans_general(\"latin-ascii\") |>\n authors_r()\n\n#> # A tibble: 6 × 2\n#> person roles \n#> \n#> 1 Hadley Wickham [aut, cre]\n#> 2 Romain Francois [aut] \n#> 3 Lionel Henry [aut] \n#> 4 Kirill Muller [aut] \n#> 5 Davis Vaughan [aut] \n#> 6 Posit Software, PBC [cph, fnd]\n\n\n\n\nExtracting from Author\nAs I mentioned before, getting the data we want from the Author field is more complicated since there’s no common structure between all packages. I tried a few approaches, including:\n\nChatGPT\nNamed Entity Extraction\nRegular expressions (regex)\n\nChatGPT worked excellently in the few examples I tried; however, OpenAI doesn’t provide free API access, so I had no way of using this with R without paying (which I didn’t want to do). Here’s the prompt I used (note that it would need to be expanded to deal with more edge cases):\n\nSeparate these names with commas and do not include any other information (including a response to the request); if any names are within person() they belong to one person: \n\nNamed Entity Extraction, which is a natural language processing (NLP) method that extracts entities (like peoples’ names) from text, didn’t work very well in the few examples I tried. It didn’t recognize certain names even when the only thing in a sentence was names separated by commas. This is probably my fault more than anything—I’ve never used this method before and didn’t want to spend too much time learning it just for this post, so I used a pre-trained model that probably wasn’t trained on a diverse set of names.\nFortunately, regular expressions actually worked pretty well, so this is the solution I settled on. I tried two approaches to this. First I tried to split the names (and roles) up by commas (and eventually other punctuation as I ran into edge cases). This worked alright; there were clearly errors in the data with this method, but since most packages use a simple structure in the Author field it correctly extracted names from most packages.\nSecond I tried to extract the names (and roles) directly with a regular expression that could match a variety of names. This is the solution I settled on. It still isn’t perfect, but the data is cleaner than with the other method. Regardless, the difference in number of observations between both methods was only in the mid hundreds—so I think any statistics based on this data, although not completely accurate, are still sufficient to get a good idea of the R developer landscape on CRAN.\n\n# This regex was adapted from .\n# It's designed to capture a wide range of names, including those with\n# punctuation in them. It's tailored to this data, so I don't know how well\n# it would generalize to other situations, but feel free to try it.\npersons_roles <- r\"((\\'|\\\")*[A-Z]([A-Z]+|(\\'[A-Z])?[a-z]+|\\.)(?:(\\s+|\\-)[A-Z]([a-z]+|\\.?))*(?:(\\'?\\s+|\\-)[a-z][a-z\\-]+){0,2}(\\s+|\\-)[A-Z](\\'?[A-Za-z]+(\\'[A-Za-z]+)?|\\.)(?:(\\s+|\\-)[A-Za-z]([a-z]+|\\.))*(\\'|\\\")*(?:\\s*\\[(.*?)\\])?)\"\n# Some packages put the person() code in the wrong field, but it's also\n# formatted incorrectly and throws an error when evaluated, so the best we can\n# do is just extract the whole thing for each person.\nperson_objects <- r\"(person\\((.*?)\\))\"\n\n# Get names and roles from character strings in the Author field\nauthors <- function(x) {\n # The Author field is unstructured and there are idiosyncrasies between\n # different packages. The steps here attempt to fix the idiosyncrasies so\n # authors can be extracted with as few errors as possible.\n persons <- x |>\n # Line breaks should be replaced with spaces in case they occur in the\n # middle of a name.\n str_replace_all(\"\\\\n|\\\\|\\\\n(?=[:upper:])\", \" \") |>\n # Periods should always have a space after them so initials will be\n # recognized as part of a name.\n str_replace_all(\"\\\\.\", \"\\\\. \") |>\n # Commas before roles will keep them from being included in the regex.\n str_remove_all(\",(?= \\\\[)\") |>\n # Get persons and their roles.\n str_extract_all(paste0(persons_roles, \"|\", person_objects)) |>\n unlist() |>\n # Multiple spaces can be replaced with a single space for cleaner names.\n str_replace_all(\"\\\\s+\", \" \")\n\n tibble(person = persons) |>\n mutate(\n roles = str_extract(person, \"\\\\[(.*?)\\\\]\"),\n person = str_remove(\n str_remove(person, \"\\\\s*\\\\[(.*?)\\\\]\"),\n \"^('|\\\")|('|\\\")$\" # Some names are wrapped in quotations\n )\n )\n}\n\nHere’s an example of it with dplyr. If you compare it to the output from authors_r() above you can see the data quality is still good enough for rock ‘n’ roll, but it isn’t perfect; Posit’s roles are no longer defined because the comma in their name cut off the regex before it captured the roles. So there are some edge cases like this that will create measurement error in the person or roles columns, but I don’t think it’s bad enough to invalidate the results.\n\ncran_pkg_db |>\n filter(package == \"dplyr\") |>\n pull(authors) |>\n stri_trans_general(\"latin-ascii\") |>\n authors()\n\n#> # A tibble: 6 × 2\n#> person roles \n#> \n#> 1 Hadley Wickham [aut, cre]\n#> 2 Romain Francois [aut] \n#> 3 Lionel Henry [aut] \n#> 4 Kirill Muller [aut] \n#> 5 Davis Vaughan [aut] \n#> 6 Posit Software \n\n\n\n\nExtracting roles\nFrom the example dplyr output above, we can see that the roles column is currently a character string with the role codes, which isn’t super useful. Later on I’ll split these out into indicator columns with a TRUE or FALSE for whether someone had a given role. I also wanted the full names for the roles, since some of the codes aren’t very obvious.\nKurt Hornik, Duncan Murdoch and Achim Zeileis published a nice article in The R Journal explaining the roles of R package authors and where they come from. Briefly, they come from the “Relator and Role” codes and terms from MARC (MAchine-Readable Cataloging, Library of Congress, 2012) here: https://www.loc.gov/marc/relators/relaterm.html.\nThere are a lot of roles there; I just took the ones that were present in the data at the time I wrote this post.\n\nmarc_roles <- c(\n analyst = \"anl\",\n architecht = \"arc\",\n artist = \"art\",\n author = \"aut\",\n author_in_quotations = \"aqt\",\n author_of_intro = \"aui\",\n bibliographic_antecedent = \"ant\",\n collector = \"col\",\n compiler = \"com\",\n conceptor = \"ccp\",\n conservator = \"con\",\n consultant = \"csl\",\n consultant_to_project = \"csp\",\n contestant_appellant = \"cot\",\n contractor = \"ctr\",\n contributor = \"ctb\",\n copyright_holder = \"cph\",\n corrector = \"crr\",\n creator = \"cre\",\n data_contributor = \"dtc\",\n degree_supervisor = \"dgs\",\n editor = \"edt\",\n funder = \"fnd\",\n illustrator = \"ill\",\n inventor = \"inv\",\n lab_director = \"ldr\",\n lead = \"led\",\n metadata_contact = \"mdc\",\n musician = \"mus\",\n owner = \"own\",\n presenter = \"pre\",\n programmer = \"prg\",\n project_director = \"pdr\",\n scientific_advisor = \"sad\",\n second_party = \"spy\",\n sponsor = \"spn\",\n supporting_host = \"sht\",\n teacher = \"tch\",\n thesis_advisor = \"ths\",\n translator = \"trl\",\n research_team_head = \"rth\",\n research_team_member = \"rtm\",\n researcher = \"res\",\n reviewer = \"rev\",\n witness = \"wit\",\n woodcutter = \"wdc\"\n)\n\n\n\nTidying the data\nWith all the explanations out of the way we can now tidy the data with our helper functions.\n\ncran_authors <- cran_pkg_db |>\n mutate(\n # Letters with accents, etc. should be normalized so that names including\n # them are picked up by the regex.\n across(c(authors, authors_r), \\(.x) stri_trans_general(.x, \"latin-ascii\")),\n # The extraction functions aren't vectorized so they have to be mapped over.\n # This creates a list column.\n persons = if_else(\n is.na(authors_r),\n map(authors, \\(.x) authors(.x)),\n map(authors_r, \\(.x) authors_r(.x))\n )\n ) |>\n select(-c(authors, authors_r)) |>\n unnest(persons) |>\n # If a package only has one author then they must be the author and creator,\n # so it's safe to impute this when it isn't there.\n group_by(package) |>\n mutate(roles = if_else(\n is.na(roles) & n() == 1, \"[aut, cre]\", roles\n )) |>\n ungroup()\n\nThen add the indicator columns for roles. Note the use of the walrus operator (:=) here to create new columns from the full names of MARC roles on the left side of the walrus, while detecting the MARC codes with str_detect() on the right side. I’m mapping over this because the left side can’t be a vector.\n\ncran_authors_tidy <- cran_authors |>\n # Add indicator columns for all roles.\n bind_cols(\n map2_dfc(\n names(marc_roles), marc_roles,\n function(.x, .y) {\n cran_authors |>\n mutate(!!.x := str_detect(roles, .y)) |>\n select(!!.x)\n }\n )\n ) |>\n # Not everyone's role is known.\n mutate(unknown = is.na(roles))\n\nThis all leaves us with a tidy (mostly error free) data frame about R developers and their roles that is ready to explore:\n\nglimpse(cran_authors_tidy)\n\n#> Rows: 52,719\n#> Columns: 50\n#> $ package \"A3\", \"AalenJohansen\", \"AalenJohansen\", \"AATt…\n#> $ person \"Scott Fortmann-Roe\", \"Martin Bladt\", \"Christ…\n#> $ roles \"[aut, cre]\", \"[aut, cre]\", \"[aut]\", \"[aut, c…\n#> $ analyst FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ architecht FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ artist FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ author TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU…\n#> $ author_in_quotations FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ author_of_intro FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ bibliographic_antecedent FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ collector FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ compiler FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ conceptor FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ conservator FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ consultant FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ consultant_to_project FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ contestant_appellant FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ contractor FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ contributor FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ copyright_holder FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ corrector FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ creator TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, FA…\n#> $ data_contributor FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ degree_supervisor FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ editor FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ funder FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ illustrator FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ inventor FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ lab_director FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ lead FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ metadata_contact FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ musician FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ owner FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ presenter FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ programmer FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ project_director FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ scientific_advisor FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ second_party FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ sponsor FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ supporting_host FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ teacher FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ thesis_advisor FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ translator FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ research_team_head FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ research_team_member FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ researcher FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ reviewer FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ witness FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ woodcutter FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ unknown FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…" + "text": "Wrangle\nSince we only care about package and author metadata, a good first step is to remove everything else. This leaves us with a Package field and two author fields: Author and Authors@R. The difference between the two author fields is that Author is an unstructured text field that can contain any text in any format, and Authors@R is a structured text field containing R code that defines authors’ names and roles with the person() function.\n\ncran_pkg_db <- cran_pkg_db |>\n select(package = Package, authors = Author, authors_r = `Authors@R`) |>\n as_tibble()\n\nHere’s a comparison of the two fields, using the dplyr package as an example:\n\n# Author\ncran_pkg_db |>\n filter(package == \"dplyr\") |>\n pull(authors) |>\n cat()\n\n#> Hadley Wickham [aut, cre] (<https://orcid.org/0000-0003-4757-117X>),\n#> Romain François [aut] (<https://orcid.org/0000-0002-2444-4226>),\n#> Lionel Henry [aut],\n#> Kirill Müller [aut] (<https://orcid.org/0000-0002-1416-3412>),\n#> Davis Vaughan [aut] (<https://orcid.org/0000-0003-4777-038X>),\n#> Posit Software, PBC [cph, fnd]\n\n# Authors@R\ncran_pkg_db |>\n filter(package == \"dplyr\") |>\n pull(authors_r) |>\n cat()\n\n#> c(\n#> person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = c(\"aut\", \"cre\"),\n#> comment = c(ORCID = \"0000-0003-4757-117X\")),\n#> person(\"Romain\", \"François\", role = \"aut\",\n#> comment = c(ORCID = \"0000-0002-2444-4226\")),\n#> person(\"Lionel\", \"Henry\", role = \"aut\"),\n#> person(\"Kirill\", \"Müller\", role = \"aut\",\n#> comment = c(ORCID = \"0000-0002-1416-3412\")),\n#> person(\"Davis\", \"Vaughan\", , \"davis@posit.co\", role = \"aut\",\n#> comment = c(ORCID = \"0000-0003-4777-038X\")),\n#> person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\"))\n#> )\n\n\nAnd a glimpse at the data:\n\ncran_pkg_db\n\n#> # A tibble: 19,473 × 3\n#> package authors autho…¹\n#> <chr> <chr> <chr> \n#> 1 A3 \"Scott Fortmann-Roe\" <NA> \n#> 2 AalenJohansen \"Martin Bladt [aut, cre],\\n Christian Furrer [aut]\" \"c(per…\n#> 3 AATtools \"Sercan Kahveci [aut, cre]\" \"perso…\n#> 4 ABACUS \"Mintu Nath [aut, cre]\" <NA> \n#> 5 abbreviate \"Sigbert Klinke [aut, cre]\" \"\\n p…\n#> 6 abbyyR \"Gaurav Sood [aut, cre]\" \"perso…\n#> 7 abc \"Csillery Katalin [aut],\\n Lemaire Louisiane [aut],\\n… \"c( \\n…\n#> 8 abc.data \"Csillery Katalin [aut],\\n Lemaire Louisiane [aut],\\n… \"c( \\n…\n#> 9 ABC.RAP \"Abdulmonem Alsaleh [cre, aut], Robert Weeks [aut], Ia… <NA> \n#> 10 ABCanalysis \"Michael Thrun, Jorn Lotsch, Alfred Ultsch\" <NA> \n#> # … with 19,463 more rows, and abbreviated variable name ¹​authors_r\n\n\nFrom the output above you can see that every package uses the Author field, but not all packages use the Authors@R field. This is unfortunate, because it means that the names and roles of authors need to be extracted from the unstructured text in the Author field for a subset of packages, which is difficult to do and somewhat error-prone. Just for consideration, here’s how many packages don’t use the Authors@R field.\n\ncran_pkg_db |>\n filter(is.na(authors_r)) |>\n nrow()\n\n#> [1] 6361\n\n\nSo roughly one-third of all packages. From the output above it’s also clear that although there are similarities in how different packages populate the Author field, it does vary; so a simple rule like splitting the text on commas isn’t sufficient. These are fairly tame examples—some packages even use multiple sentences describing each author’s roles and affiliations, or contain other comments such as copyright disclaimers. All of these things make it more difficult to extract names and roles without errors.\nConversely, for the Authors@R field, all that’s needed is to parse and evaluate the R code stored there as a character string; this will return a person vector that has format() methods to get authors’ names and roles into an analysis-ready format. This removes the possibility for me to introduce errors into the data, although it doesn’t solve things like Authors using an inconsistent name across packages (e.g., sometimes including their middle initial and sometimes not, or just generally writing their name differently).\nBecause there are two fields, I’ll make two helper functions to get name and role data from each field. Regardless of the field, the end goal is to tidy cran_pkg_db into a data frame with three columns: package, person, and roles, with one package/person combination per row.\n\nExtracting from Authors@R\nGetting the data we want from the Authors@R field is pretty straightforward. For the packages where this is used, each one has a vector of person objects stored as a character string like:\n\nmm_string <- \"person(\\\"Michael\\\", \\\"McCarthy\\\", , role = c(\\\"aut\\\", \\\"cre\\\"))\"\n\nmm_string\n\n#> [1] \"person(\\\"Michael\\\", \\\"McCarthy\\\", , role = c(\\\"aut\\\", \\\"cre\\\"))\"\n\n\nWhich can be parsed and evaluated as R code like:\n\nmm_eval <- eval(parse(text = mm_string))\n\nclass(mm_eval)\n\n#> [1] \"person\"\n\n\nThen the format() method for the person class can be used to get names and roles into the format I want simply and accurately.\n\nmm_person <- format(mm_eval, include = c(\"given\", \"family\"))\nmm_roles <- format(mm_eval, include = c(\"role\"))\ntibble(person = mm_person, roles = mm_roles)\n\n#> # A tibble: 1 × 2\n#> person roles \n#> <chr> <chr> \n#> 1 Michael McCarthy [aut, cre]\n\n\nI’ve wrapped this up into a small helper function, authors_r(), that includes some light tidying steps just to deal with a couple small discrepancies I noticed in a subset of packages.\n\n# Get names and roles from \"person\" objects in the Authors@R field\nauthors_r <- function(x) {\n # Some light preprocessing is needed to replace the unicode symbol for line\n # breaks with the regular \"\\n\". This is an edge case from at least one\n # package.\n code <- str_replace_all(x, \"\\\\<U\\\\+000a\\\\>\", \"\\n\")\n persons <- eval(parse(text = code))\n person <- str_trim(format(persons, include = c(\"given\", \"family\")))\n roles <- format(persons, include = c(\"role\"))\n tibble(person = person, roles = roles)\n}\n\nHere’s an example of it with dplyr:\n\ncran_pkg_db |>\n filter(package == \"dplyr\") |>\n pull(authors_r) |>\n # Normalizing names leads to more consistent results with summary statistics\n # later on, since some people use things like umlauts and accents\n # inconsistently.\n stri_trans_general(\"latin-ascii\") |>\n authors_r()\n\n#> # A tibble: 6 × 2\n#> person roles \n#> <chr> <chr> \n#> 1 Hadley Wickham [aut, cre]\n#> 2 Romain Francois [aut] \n#> 3 Lionel Henry [aut] \n#> 4 Kirill Muller [aut] \n#> 5 Davis Vaughan [aut] \n#> 6 Posit Software, PBC [cph, fnd]\n\n\n\n\nExtracting from Author\nAs I mentioned before, getting the data we want from the Author field is more complicated since there’s no common structure between all packages. I tried a few approaches, including:\n\nChatGPT\nNamed Entity Extraction\nRegular expressions (regex)\n\nChatGPT worked excellently in the few examples I tried; however, OpenAI doesn’t provide free API access, so I had no way of using this with R without paying (which I didn’t want to do). Here’s the prompt I used (note that it would need to be expanded to deal with more edge cases):\n\nSeparate these names with commas and do not include any other information (including a response to the request); if any names are within person() they belong to one person: \n\nNamed Entity Extraction, which is a natural language processing (NLP) method that extracts entities (like peoples’ names) from text, didn’t work very well in the few examples I tried. It didn’t recognize certain names even when the only thing in a sentence was names separated by commas. This is probably my fault more than anything—I’ve never used this method before and didn’t want to spend too much time learning it just for this post, so I used a pre-trained model that probably wasn’t trained on a diverse set of names.\nFortunately, regular expressions actually worked pretty well, so this is the solution I settled on. I tried two approaches to this. First I tried to split the names (and roles) up by commas (and eventually other punctuation as I ran into edge cases). This worked alright; there were clearly errors in the data with this method, but since most packages use a simple structure in the Author field it correctly extracted names from most packages.\nSecond I tried to extract the names (and roles) directly with a regular expression that could match a variety of names. This is the solution I settled on. It still isn’t perfect, but the data is cleaner than with the other method. Regardless, the difference in number of observations between both methods was only in the mid hundreds—so I think any statistics based on this data, although not completely accurate, are still sufficient to get a good idea of the R developer landscape on CRAN.\n\n# This regex was adapted from <https://stackoverflow.com/a/7654214/16844576>.\n# It's designed to capture a wide range of names, including those with\n# punctuation in them. It's tailored to this data, so I don't know how well\n# it would generalize to other situations, but feel free to try it.\npersons_roles <- r\"((\\'|\\\")*[A-Z]([A-Z]+|(\\'[A-Z])?[a-z]+|\\.)(?:(\\s+|\\-)[A-Z]([a-z]+|\\.?))*(?:(\\'?\\s+|\\-)[a-z][a-z\\-]+){0,2}(\\s+|\\-)[A-Z](\\'?[A-Za-z]+(\\'[A-Za-z]+)?|\\.)(?:(\\s+|\\-)[A-Za-z]([a-z]+|\\.))*(\\'|\\\")*(?:\\s*\\[(.*?)\\])?)\"\n# Some packages put the person() code in the wrong field, but it's also\n# formatted incorrectly and throws an error when evaluated, so the best we can\n# do is just extract the whole thing for each person.\nperson_objects <- r\"(person\\((.*?)\\))\"\n\n# Get names and roles from character strings in the Author field\nauthors <- function(x) {\n # The Author field is unstructured and there are idiosyncrasies between\n # different packages. The steps here attempt to fix the idiosyncrasies so\n # authors can be extracted with as few errors as possible.\n persons <- x |>\n # Line breaks should be replaced with spaces in case they occur in the\n # middle of a name.\n str_replace_all(\"\\\\n|\\\\<U\\\\+000a\\\\>|\\\\n(?=[:upper:])\", \" \") |>\n # Periods should always have a space after them so initials will be\n # recognized as part of a name.\n str_replace_all(\"\\\\.\", \"\\\\. \") |>\n # Commas before roles will keep them from being included in the regex.\n str_remove_all(\",(?= \\\\[)\") |>\n # Get persons and their roles.\n str_extract_all(paste0(persons_roles, \"|\", person_objects)) |>\n unlist() |>\n # Multiple spaces can be replaced with a single space for cleaner names.\n str_replace_all(\"\\\\s+\", \" \")\n\n tibble(person = persons) |>\n mutate(\n roles = str_extract(person, \"\\\\[(.*?)\\\\]\"),\n person = str_remove(\n str_remove(person, \"\\\\s*\\\\[(.*?)\\\\]\"),\n \"^('|\\\")|('|\\\")$\" # Some names are wrapped in quotations\n )\n )\n}\n\nHere’s an example of it with dplyr. If you compare it to the output from authors_r() above you can see the data quality is still good enough for rock ‘n’ roll, but it isn’t perfect; Posit’s roles are no longer defined because the comma in their name cut off the regex before it captured the roles. So there are some edge cases like this that will create measurement error in the person or roles columns, but I don’t think it’s bad enough to invalidate the results.\n\ncran_pkg_db |>\n filter(package == \"dplyr\") |>\n pull(authors) |>\n stri_trans_general(\"latin-ascii\") |>\n authors()\n\n#> # A tibble: 6 × 2\n#> person roles \n#> <chr> <chr> \n#> 1 Hadley Wickham [aut, cre]\n#> 2 Romain Francois [aut] \n#> 3 Lionel Henry [aut] \n#> 4 Kirill Muller [aut] \n#> 5 Davis Vaughan [aut] \n#> 6 Posit Software <NA>\n\n\n\n\nExtracting roles\nFrom the example dplyr output above, we can see that the roles column is currently a character string with the role codes, which isn’t super useful. Later on I’ll split these out into indicator columns with a TRUE or FALSE for whether someone had a given role. I also wanted the full names for the roles, since some of the codes aren’t very obvious.\nKurt Hornik, Duncan Murdoch and Achim Zeileis published a nice article in The R Journal explaining the roles of R package authors and where they come from. Briefly, they come from the “Relator and Role” codes and terms from MARC (MAchine-Readable Cataloging, Library of Congress, 2012) here: https://www.loc.gov/marc/relators/relaterm.html.\nThere are a lot of roles there; I just took the ones that were present in the data at the time I wrote this post.\n\nmarc_roles <- c(\n analyst = \"anl\",\n architecht = \"arc\",\n artist = \"art\",\n author = \"aut\",\n author_in_quotations = \"aqt\",\n author_of_intro = \"aui\",\n bibliographic_antecedent = \"ant\",\n collector = \"col\",\n compiler = \"com\",\n conceptor = \"ccp\",\n conservator = \"con\",\n consultant = \"csl\",\n consultant_to_project = \"csp\",\n contestant_appellant = \"cot\",\n contractor = \"ctr\",\n contributor = \"ctb\",\n copyright_holder = \"cph\",\n corrector = \"crr\",\n creator = \"cre\",\n data_contributor = \"dtc\",\n degree_supervisor = \"dgs\",\n editor = \"edt\",\n funder = \"fnd\",\n illustrator = \"ill\",\n inventor = \"inv\",\n lab_director = \"ldr\",\n lead = \"led\",\n metadata_contact = \"mdc\",\n musician = \"mus\",\n owner = \"own\",\n presenter = \"pre\",\n programmer = \"prg\",\n project_director = \"pdr\",\n scientific_advisor = \"sad\",\n second_party = \"spy\",\n sponsor = \"spn\",\n supporting_host = \"sht\",\n teacher = \"tch\",\n thesis_advisor = \"ths\",\n translator = \"trl\",\n research_team_head = \"rth\",\n research_team_member = \"rtm\",\n researcher = \"res\",\n reviewer = \"rev\",\n witness = \"wit\",\n woodcutter = \"wdc\"\n)\n\n\n\nTidying the data\nWith all the explanations out of the way we can now tidy the data with our helper functions.\n\ncran_authors <- cran_pkg_db |>\n mutate(\n # Letters with accents, etc. should be normalized so that names including\n # them are picked up by the regex.\n across(c(authors, authors_r), \\(.x) stri_trans_general(.x, \"latin-ascii\")),\n # The extraction functions aren't vectorized so they have to be mapped over.\n # This creates a list column.\n persons = if_else(\n is.na(authors_r),\n map(authors, \\(.x) authors(.x)),\n map(authors_r, \\(.x) authors_r(.x))\n )\n ) |>\n select(-c(authors, authors_r)) |>\n unnest(persons) |>\n # If a package only has one author then they must be the author and creator,\n # so it's safe to impute this when it isn't there.\n group_by(package) |>\n mutate(roles = if_else(\n is.na(roles) & n() == 1, \"[aut, cre]\", roles\n )) |>\n ungroup()\n\nThen add the indicator columns for roles. Note the use of the walrus operator (:=) here to create new columns from the full names of MARC roles on the left side of the walrus, while detecting the MARC codes with str_detect() on the right side. I’m mapping over this because the left side can’t be a vector.\n\ncran_authors_tidy <- cran_authors |>\n # Add indicator columns for all roles.\n bind_cols(\n map2_dfc(\n names(marc_roles), marc_roles,\n function(.x, .y) {\n cran_authors |>\n mutate(!!.x := str_detect(roles, .y)) |>\n select(!!.x)\n }\n )\n ) |>\n # Not everyone's role is known.\n mutate(unknown = is.na(roles))\n\nThis all leaves us with a tidy (mostly error free) data frame about R developers and their roles that is ready to explore:\n\nglimpse(cran_authors_tidy)\n\n#> Rows: 52,719\n#> Columns: 50\n#> $ package <chr> \"A3\", \"AalenJohansen\", \"AalenJohansen\", \"AATt…\n#> $ person <chr> \"Scott Fortmann-Roe\", \"Martin Bladt\", \"Christ…\n#> $ roles <chr> \"[aut, cre]\", \"[aut, cre]\", \"[aut]\", \"[aut, c…\n#> $ analyst <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ architecht <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ artist <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ author <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU…\n#> $ author_in_quotations <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ author_of_intro <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ bibliographic_antecedent <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ collector <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ compiler <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ conceptor <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ conservator <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ consultant <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ consultant_to_project <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ contestant_appellant <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ contractor <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ contributor <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ copyright_holder <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ corrector <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ creator <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, FA…\n#> $ data_contributor <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ degree_supervisor <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ editor <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ funder <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ illustrator <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ inventor <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ lab_director <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ lead <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ metadata_contact <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ musician <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ owner <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ presenter <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ programmer <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ project_director <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ scientific_advisor <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ second_party <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ sponsor <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ supporting_host <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ teacher <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ thesis_advisor <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ translator <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ research_team_head <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ research_team_member <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ researcher <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ reviewer <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ witness <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ woodcutter <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…\n#> $ unknown <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…" }, { "objectID": "posts/2023-05-03_r-developers/index.html#r-developer-statistics", "href": "posts/2023-05-03_r-developers/index.html#r-developer-statistics", "title": "The Pareto Principle in R package development", "section": "R developer statistics", - "text": "R developer statistics\nI’ll start with person-level stats, mainly because some of the other stats are further summaries of these statistics. Nothing fancy here, just the number of packages a person has contributed to, role counts, and nominal and percentile rankings. Both the ranking methods used here give every tie the same (smallest) value, so if two people tied for second place both their ranks would be 2, and the next person’s rank would be 4.\n\ncran_author_pkg_counts <- cran_authors_tidy |>\n group_by(person) |>\n summarise(\n n_packages = n(),\n across(analyst:unknown, function(.x) sum(.x, na.rm = TRUE))\n ) |>\n mutate(\n # Discretizing this for visualization purposes later on\n n_pkgs_fct = case_when(\n n_packages == 1 ~ \"One\",\n n_packages == 2 ~ \"Two\",\n n_packages == 3 ~ \"Three\",\n n_packages >= 4 ~ \"Four+\"\n ),\n n_pkgs_fct = factor(n_pkgs_fct, levels = c(\"One\", \"Two\", \"Three\", \"Four+\")),\n rank = min_rank(desc(n_packages)),\n percentile = percent_rank(n_packages) * 100,\n .after = n_packages\n ) |>\n arrange(desc(n_packages))\n\nHere’s an interactive gt table of the person-level stats so you can find yourself, or ask silly questions like how many other authors share a name with you. If you page or search through it you can also get an idea of the data quality (e.g., try “Posit” under the person column and you’ll see that they don’t use a consistent organization name across all packages, which creates some measurement error here).\n\n\nCode\ncran_author_pkg_counts |>\n select(-n_pkgs_fct) |>\n gt() |>\n tab_header(\n title = \"R Developer Contributions\",\n subtitle = \"CRAN Package Authorships and Roles\"\n ) |>\n text_transform(\n \\(.x) str_to_title(str_replace_all(.x, \"_\", \" \")),\n locations = cells_column_labels()\n ) |>\n fmt_number(\n columns = percentile\n ) |>\n fmt(\n columns = rank,\n fns = \\(.x) label_ordinal()(.x)\n ) |>\n cols_width(everything() ~ px(120)) |>\n opt_interactive(use_sorting = FALSE, use_filters = TRUE)\n\n\n\n\n\n\nR Developer Contributions\nCRAN Package Authorships and Roles\n\n\n\n\n\n\nSo there are around 29453 people who have some type of authorship on at least one currently available CRAN package at the time this post was published. I’ve emphasized “around” because of the measurement error from extracting names from the Author field of DESCRIPTION and from people writing their names in multiple ways across packages, but also because this number will fluctuate over time as new packages are published, unmaintained packages are archived, and so forth.\nTo try to put this number into perspective, Ben Ubah, Claudia Vitolo, and Rick Pack put together a dashboard with data on how many R users there are worldwide belonging to different R user groups. At the time of writing this post there were:\n\nAround 775,000 members of R user groups organized on Meetup\nAround 100,000 R-Ladies members\n\nThe R Consortium also states on their website that there are more than two million R users worldwide (although they don’t state when or where this number comes from). Regardless of the exact amount, it’s apparent that there are many more R users than R developers.\n\nPackage contributions\nThe title of this post probably gave this away, but around 90% of R developers have worked on one to three packages, and only around 10% have worked on four or more packages.\n\ncran_author_pkg_counts |>\n group_by(n_pkgs_fct) |>\n summarise(n_people = n()) |>\n ggplot(mapping = aes(x = n_pkgs_fct, y = n_people)) +\n geom_col() +\n scale_y_continuous(\n sec.axis = sec_axis(\n trans = \\(.x) .x / nrow(cran_author_pkg_counts),\n name = \"Percent of sample\",\n labels = label_percent(),\n breaks = c(0, .05, .10, .15, .70)\n )\n ) +\n labs(\n x = \"Package contributions\",\n y = \"People\"\n )\n\n\n\n\nNotably, in the group that have worked on four or more packages, the spread of package contributions is huge. This vast range is mostly driven by people who do R package development as part of their job (e.g., if you look at the cran_author_pkg_counts table above, most of the people at the very top are either professors of statistics or current or former developers from Posit, rOpenSci, or the R Core Team).\n\ncran_author_pkg_counts |>\n filter(n_pkgs_fct == \"Four+\") |>\n group_by(rank, n_packages) |>\n summarise(n_people = n()) |>\n ggplot(mapping = aes(x = n_packages, y = n_people)) +\n geom_segment(aes(xend = n_packages, yend = 0)) +\n geom_point() +\n scale_y_continuous(\n sec.axis = sec_axis(\n trans = \\(.x) .x / nrow(cran_author_pkg_counts),\n name = \"Percent of sample\",\n labels = label_percent()\n )\n ) +\n labs(\n x = \"Package contributions\",\n y = \"People\"\n )\n\n\n\n\nHere are some subsample summary statistics to compliment the plots above.\n\ncran_author_pkg_counts |>\n group_by(n_packages >= 4) |>\n summarise(\n n_developers = n(),\n n_pkgs_mean = mean(n_packages),\n n_pkgs_sd = sd(n_packages),\n n_pkgs_median = median(n_packages),\n n_pkgs_min = min(n_packages),\n n_pkgs_max = max(n_packages)\n )\n\n#> # A tibble: 2 × 7\n#> `n_packages >= 4` n_developers n_pkgs_mean n_pkgs_sd n_pkgs_…¹ n_pkg…² n_pkg…³\n#> \n#> 1 FALSE 27107 1.27 0.562 1 1 3\n#> 2 TRUE 2346 7.78 8.63 5 4 202\n#> # … with abbreviated variable names ¹​n_pkgs_median, ²​n_pkgs_min, ³​n_pkgs_max\n\n\n\n\nRole distributions\nNot every contribution to an R package involves code. For example, two authors of the wiad package were woodcutters! The package is for wood image analysis, so although it’s surprising a role like that exists, it makes a lot of sense in context. Anyways, neat factoids aside, the point of this section is to look at the distribution of different roles in R package development.\nTo start, let’s get an idea of how many people were involved in programming-related roles. This won’t be universally true, but most of the time the following roles will involve programming:\n\nprogramming_roles <-\n c(\"author\", \"creator\", \"contributor\", \"compiler\", \"programmer\")\n\nHere’s the count:\n\ncran_author_pkg_counts |>\n filter(if_any(!!programming_roles, \\(.x) .x > 0)) |>\n nrow()\n\n#> [1] 24170\n\n\nThere were also 5434 whose role was unknown (either because it wasn’t specified or wasn’t picked up by my regex method). Regardless, most people have been involved in programming-related roles, and although other roles occur they’re relatively rare.\nHere’s a plot to compliment this point:\n\ncran_authors_tidy |>\n summarise(across(analyst:unknown, function(.x) sum(.x, na.rm = TRUE))) |>\n pivot_longer(cols = everything(), names_to = \"role\", values_to = \"n\") |>\n arrange(desc(n)) |>\n ggplot(mapping = aes(x = n, y = reorder(role, n))) +\n geom_segment(aes(xend = 0, yend = role)) +\n geom_point() +\n labs(\n x = \"Count across packages\",\n y = \"Role\"\n )\n\n\n\n\n\n\nRanking contributions\nThe interactive table above already contains this information, but to compliment David Smith’s post from 5 years ago, here’s the current Top 20 most prolific authors on CRAN.\n\n\nThis is why Hadley is on the cover of Glamour magazine and we’re not. \n\ncran_author_pkg_counts |>\n # We don't want organizations or groups here\n filter(!(person %in% c(\"RStudio\", \"R Core Team\", \"Posit Software, PBC\"))) |>\n head(20) |>\n select(person, n_packages) |>\n gt() |>\n tab_header(\n title = \"Top 20 R Developers\",\n subtitle = \"Based on number of CRAN package authorships\"\n ) |>\n text_transform(\n \\(.x) str_to_title(str_replace_all(.x, \"_\", \" \")),\n locations = cells_column_labels()\n ) |>\n cols_width(person ~ px(140))\n\n\n\n\n\n \n \n \n \n \n \n Top 20 R Developers\n \n \n Based on number of CRAN package authorships\n \n \n Person\n N Packages\n \n \n \n Hadley Wickham\n159\n Jeroen Ooms\n89\n Gabor Csardi\n82\n Kurt Hornik\n78\n Scott Chamberlain\n76\n Dirk Eddelbuettel\n75\n Martin Maechler\n74\n Stephane Laurent\n73\n Achim Zeileis\n68\n Winston Chang\n51\n Max Kuhn\n50\n Yihui Xie\n47\n Jim Hester\n46\n Henrik Bengtsson\n45\n John Muschelli\n45\n Roger Bivand\n43\n Ben Bolker\n42\n Bob Rudis\n42\n Brian Ripley\n42\n Michel Lang\n41" + "text": "R developer statistics\nI’ll start with person-level stats, mainly because some of the other stats are further summaries of these statistics. Nothing fancy here, just the number of packages a person has contributed to, role counts, and nominal and percentile rankings. Both the ranking methods used here give every tie the same (smallest) value, so if two people tied for second place both their ranks would be 2, and the next person’s rank would be 4.\n\ncran_author_pkg_counts <- cran_authors_tidy |>\n group_by(person) |>\n summarise(\n n_packages = n(),\n across(analyst:unknown, function(.x) sum(.x, na.rm = TRUE))\n ) |>\n mutate(\n # Discretizing this for visualization purposes later on\n n_pkgs_fct = case_when(\n n_packages == 1 ~ \"One\",\n n_packages == 2 ~ \"Two\",\n n_packages == 3 ~ \"Three\",\n n_packages >= 4 ~ \"Four+\"\n ),\n n_pkgs_fct = factor(n_pkgs_fct, levels = c(\"One\", \"Two\", \"Three\", \"Four+\")),\n rank = min_rank(desc(n_packages)),\n percentile = percent_rank(n_packages) * 100,\n .after = n_packages\n ) |>\n arrange(desc(n_packages))\n\nHere’s an interactive gt table of the person-level stats so you can find yourself, or ask silly questions like how many other authors share a name with you. If you page or search through it you can also get an idea of the data quality (e.g., try “Posit” under the person column and you’ll see that they don’t use a consistent organization name across all packages, which creates some measurement error here).\n\n\nCode\ncran_author_pkg_counts |>\n select(-n_pkgs_fct) |>\n gt() |>\n tab_header(\n title = \"R Developer Contributions\",\n subtitle = \"CRAN Package Authorships and Roles\"\n ) |>\n text_transform(\n \\(.x) str_to_title(str_replace_all(.x, \"_\", \" \")),\n locations = cells_column_labels()\n ) |>\n fmt_number(\n columns = percentile\n ) |>\n fmt(\n columns = rank,\n fns = \\(.x) label_ordinal()(.x)\n ) |>\n cols_width(everything() ~ px(120)) |>\n opt_interactive(use_sorting = FALSE, use_filters = TRUE)\n\n\n\n\n\n\nR Developer Contributions\nCRAN Package Authorships and Roles\n\n\n\n\n\n\nSo there are around 29453 people who have some type of authorship on at least one currently available CRAN package at the time this post was published. I’ve emphasized “around” because of the measurement error from extracting names from the Author field of DESCRIPTION and from people writing their names in multiple ways across packages, but also because this number will fluctuate over time as new packages are published, unmaintained packages are archived, and so forth.\nTo try to put this number into perspective, Ben Ubah, Claudia Vitolo, and Rick Pack put together a dashboard with data on how many R users there are worldwide belonging to different R user groups. At the time of writing this post there were:\n\nAround 775,000 members of R user groups organized on Meetup\nAround 100,000 R-Ladies members\n\nThe R Consortium also states on their website that there are more than two million R users worldwide (although they don’t state when or where this number comes from). Regardless of the exact amount, it’s apparent that there are many more R users than R developers.\n\nPackage contributions\nThe title of this post probably gave this away, but around 90% of R developers have worked on one to three packages, and only around 10% have worked on four or more packages.\n\ncran_author_pkg_counts |>\n group_by(n_pkgs_fct) |>\n summarise(n_people = n()) |>\n ggplot(mapping = aes(x = n_pkgs_fct, y = n_people)) +\n geom_col() +\n scale_y_continuous(\n sec.axis = sec_axis(\n trans = \\(.x) .x / nrow(cran_author_pkg_counts),\n name = \"Percent of sample\",\n labels = label_percent(),\n breaks = c(0, .05, .10, .15, .70)\n )\n ) +\n labs(\n x = \"Package contributions\",\n y = \"People\"\n )\n\n\n\n\nNotably, in the group that have worked on four or more packages, the spread of package contributions is huge. This vast range is mostly driven by people who do R package development as part of their job (e.g., if you look at the cran_author_pkg_counts table above, most of the people at the very top are either professors of statistics or current or former developers from Posit, rOpenSci, or the R Core Team).\n\ncran_author_pkg_counts |>\n filter(n_pkgs_fct == \"Four+\") |>\n group_by(rank, n_packages) |>\n summarise(n_people = n()) |>\n ggplot(mapping = aes(x = n_packages, y = n_people)) +\n geom_segment(aes(xend = n_packages, yend = 0)) +\n geom_point() +\n scale_y_continuous(\n sec.axis = sec_axis(\n trans = \\(.x) .x / nrow(cran_author_pkg_counts),\n name = \"Percent of sample\",\n labels = label_percent()\n )\n ) +\n labs(\n x = \"Package contributions\",\n y = \"People\"\n )\n\n\n\n\nHere are some subsample summary statistics to compliment the plots above.\n\ncran_author_pkg_counts |>\n group_by(n_packages >= 4) |>\n summarise(\n n_developers = n(),\n n_pkgs_mean = mean(n_packages),\n n_pkgs_sd = sd(n_packages),\n n_pkgs_median = median(n_packages),\n n_pkgs_min = min(n_packages),\n n_pkgs_max = max(n_packages)\n )\n\n#> # A tibble: 2 × 7\n#> `n_packages >= 4` n_developers n_pkgs_mean n_pkgs_sd n_pkgs_…¹ n_pkg…² n_pkg…³\n#> <lgl> <int> <dbl> <dbl> <dbl> <int> <int>\n#> 1 FALSE 27107 1.27 0.562 1 1 3\n#> 2 TRUE 2346 7.78 8.63 5 4 202\n#> # … with abbreviated variable names ¹​n_pkgs_median, ²​n_pkgs_min, ³​n_pkgs_max\n\n\n\n\nRole distributions\nNot every contribution to an R package involves code. For example, two authors of the wiad package were woodcutters! The package is for wood image analysis, so although it’s surprising a role like that exists, it makes a lot of sense in context. Anyways, neat factoids aside, the point of this section is to look at the distribution of different roles in R package development.\nTo start, let’s get an idea of how many people were involved in programming-related roles. This won’t be universally true, but most of the time the following roles will involve programming:\n\nprogramming_roles <-\n c(\"author\", \"creator\", \"contributor\", \"compiler\", \"programmer\")\n\nHere’s the count:\n\ncran_author_pkg_counts |>\n filter(if_any(!!programming_roles, \\(.x) .x > 0)) |>\n nrow()\n\n#> [1] 24170\n\n\nThere were also 5434 whose role was unknown (either because it wasn’t specified or wasn’t picked up by my regex method). Regardless, most people have been involved in programming-related roles, and although other roles occur they’re relatively rare.\nHere’s a plot to compliment this point:\n\ncran_authors_tidy |>\n summarise(across(analyst:unknown, function(.x) sum(.x, na.rm = TRUE))) |>\n pivot_longer(cols = everything(), names_to = \"role\", values_to = \"n\") |>\n arrange(desc(n)) |>\n ggplot(mapping = aes(x = n, y = reorder(role, n))) +\n geom_segment(aes(xend = 0, yend = role)) +\n geom_point() +\n labs(\n x = \"Count across packages\",\n y = \"Role\"\n )\n\n\n\n\n\n\nRanking contributions\nThe interactive table above already contains this information, but to compliment David Smith’s post from 5 years ago, here’s the current Top 20 most prolific authors on CRAN.\n\n\nThis is why Hadley is on the cover of Glamour magazine and we’re not. \n\ncran_author_pkg_counts |>\n # We don't want organizations or groups here\n filter(!(person %in% c(\"RStudio\", \"R Core Team\", \"Posit Software, PBC\"))) |>\n head(20) |>\n select(person, n_packages) |>\n gt() |>\n tab_header(\n title = \"Top 20 R Developers\",\n subtitle = \"Based on number of CRAN package authorships\"\n ) |>\n text_transform(\n \\(.x) str_to_title(str_replace_all(.x, \"_\", \" \")),\n locations = cells_column_labels()\n ) |>\n cols_width(person ~ px(140))\n\n\n\n\n\n \n \n \n \n \n \n Top 20 R Developers\n \n \n Based on number of CRAN package authorships\n \n \n Person\n N Packages\n \n \n \n Hadley Wickham\n159\n Jeroen Ooms\n89\n Gabor Csardi\n82\n Kurt Hornik\n78\n Scott Chamberlain\n76\n Dirk Eddelbuettel\n75\n Martin Maechler\n74\n Stephane Laurent\n73\n Achim Zeileis\n68\n Winston Chang\n51\n Max Kuhn\n50\n Yihui Xie\n47\n Jim Hester\n46\n Henrik Bengtsson\n45\n John Muschelli\n45\n Roger Bivand\n43\n Ben Bolker\n42\n Bob Rudis\n42\n Brian Ripley\n42\n Michel Lang\n41" }, { "objectID": "posts/2023-05-03_r-developers/index.html#conclusion", @@ -706,6 +860,13 @@ "section": "", "text": "On the first day\nMan was granted a soul\nAnd with it, clarity\nOn the second day\nupon Earth was planted\nan irrevocable poison\nA soul-devouring demon\n\nDemon’s Souls is an action role-playing video game set in the dark fantasy kingdom of Boletaria, a land cursed with a deep, terrible fog brought forth by an ancient soul-devouring demon called the Old One. To lift the curse and mend the world players must slay and absorb the souls of five powerful archdemons, whereafter they can face the Old One and lull it back to slumber. Demon’s Souls is renowned for its challenge and design, and has made a lasting impact on the video game industry. It is also the progenitor of what has become one of my favourite video game franchises." }, + { + "objectID": "posts/2021-06-15_demons-souls/index.html#overview", + "href": "posts/2021-06-15_demons-souls/index.html#overview", + "title": "Go forth, slayer of Demons", + "section": "", + "text": "On the first day\nMan was granted a soul\nAnd with it, clarity\nOn the second day\nupon Earth was planted\nan irrevocable poison\nA soul-devouring demon\n\nDemon’s Souls is an action role-playing video game set in the dark fantasy kingdom of Boletaria, a land cursed with a deep, terrible fog brought forth by an ancient soul-devouring demon called the Old One. To lift the curse and mend the world players must slay and absorb the souls of five powerful archdemons, whereafter they can face the Old One and lull it back to slumber. Demon’s Souls is renowned for its challenge and design, and has made a lasting impact on the video game industry. It is also the progenitor of what has become one of my favourite video game franchises." + }, { "objectID": "posts/2021-06-15_demons-souls/index.html#theming-inspiration", "href": "posts/2021-06-15_demons-souls/index.html#theming-inspiration", @@ -718,21 +879,21 @@ "href": "posts/2021-06-15_demons-souls/index.html#prerequisites", "title": "Go forth, slayer of Demons", "section": "Prerequisites", - "text": "Prerequisites\n\nlibrary(tidyverse)\nlibrary(ggfx)\nlibrary(magick)\n\nI’ll be using PlayStation Network trophy data for my plot. The data contains statistics for the percent of players who have slain a given boss in Demon’s Souls out of all the players who have ever played the game. I have constructed the data manually since Sony does not provide an API to access PlayStation Network trophy data programmatically. Demon’s Souls was released on February 5, 2009, so it is unlikely these stats will change much in the future.\n\n# Tribbles are not just useful for scaring Klingons, they make it easy to\n# create tibbles too\ndemons_souls <- tribble(\n ~boss, ~boss_type, ~location, ~archstone, ~percent_completed,\n \"Phalanx\", \"Demon\", \"Boletarian Palace\", \"1-1\", 63.1, \n \"Tower Knight\", \"Demon\", \"Boletarian Palace\", \"1-2\", 46.6, \n \"Penetrator\", \"Demon\", \"Boletarian Palace\", \"1-3\", 30.3, \n \"False King\", \"Archdemon\", \"Boletarian Palace\", \"1-4\", 24.2, \n \"Armor Spider\", \"Demon\", \"Stonefang Tunnel\", \"2-1\", 43.9, \n \"Flamelurker\", \"Demon\", \"Stonefang Tunnel\", \"2-2\", 35.1, \n \"Dragon God\", \"Archdemon\", \"Stonefang Tunnel\", \"2-3\", 33.1, \n \"Fool’s Idol\", \"Demon\", \"Tower of Latria\", \"3-1\", 35.7, \n \"Maneater\", \"Demon\", \"Tower of Latria\", \"3-2\", 28.7, \n \"Old Monk\", \"Archdemon\", \"Tower of Latria\", \"3-3\", 27.7, \n \"Adjudicator\", \"Demon\", \"Shrine of Storms\", \"4-1\", 36.1, \n \"Old Hero\", \"Demon\", \"Shrine of Storms\", \"4-2\", 28.8, \n \"Storm King\", \"Archdemon\", \"Shrine of Storms\", \"4-3\", 28.1, \n \"Leechmonger\", \"Demon\", \"Valley of Defilement\", \"5-1\", 32.5, \n \"Dirty Colossus\", \"Demon\", \"Valley of Defilement\", \"5-2\", 27.2, \n \"Maiden Astraea\", \"Archdemon\", \"Valley of Defilement\", \"5-3\", 26.6\n) %>%\n mutate(across(boss_type:archstone, as_factor))\n\ndemons_souls" + "text": "Prerequisites\n\nlibrary(tidyverse)\nlibrary(ggfx)\nlibrary(magick)\n\nI’ll be using PlayStation Network trophy data for my plot. The data contains statistics for the percent of players who have slain a given boss in Demon’s Souls out of all the players who have ever played the game. I have constructed the data manually since Sony does not provide an API to access PlayStation Network trophy data programmatically. Demon’s Souls was released on February 5, 2009, so it is unlikely these stats will change much in the future.\n\n# Tribbles are not just useful for scaring Klingons, they make it easy to\n# create tibbles too\ndemons_souls <- tribble(\n ~boss, ~boss_type, ~location, ~archstone, ~percent_completed,\n \"Phalanx\", \"Demon\", \"Boletarian Palace\", \"1-1\", 63.1, \n \"Tower Knight\", \"Demon\", \"Boletarian Palace\", \"1-2\", 46.6, \n \"Penetrator\", \"Demon\", \"Boletarian Palace\", \"1-3\", 30.3, \n \"False King\", \"Archdemon\", \"Boletarian Palace\", \"1-4\", 24.2, \n \"Armor Spider\", \"Demon\", \"Stonefang Tunnel\", \"2-1\", 43.9, \n \"Flamelurker\", \"Demon\", \"Stonefang Tunnel\", \"2-2\", 35.1, \n \"Dragon God\", \"Archdemon\", \"Stonefang Tunnel\", \"2-3\", 33.1, \n \"Fool’s Idol\", \"Demon\", \"Tower of Latria\", \"3-1\", 35.7, \n \"Maneater\", \"Demon\", \"Tower of Latria\", \"3-2\", 28.7, \n \"Old Monk\", \"Archdemon\", \"Tower of Latria\", \"3-3\", 27.7, \n \"Adjudicator\", \"Demon\", \"Shrine of Storms\", \"4-1\", 36.1, \n \"Old Hero\", \"Demon\", \"Shrine of Storms\", \"4-2\", 28.8, \n \"Storm King\", \"Archdemon\", \"Shrine of Storms\", \"4-3\", 28.1, \n \"Leechmonger\", \"Demon\", \"Valley of Defilement\", \"5-1\", 32.5, \n \"Dirty Colossus\", \"Demon\", \"Valley of Defilement\", \"5-2\", 27.2, \n \"Maiden Astraea\", \"Archdemon\", \"Valley of Defilement\", \"5-3\", 26.6\n) %>%\n mutate(across(boss_type:archstone, as_factor))\n\ndemons_souls" }, { "objectID": "posts/2021-06-15_demons-souls/index.html#wrangle", "href": "posts/2021-06-15_demons-souls/index.html#wrangle", "title": "Go forth, slayer of Demons", "section": "Wrangle", - "text": "Wrangle\nThe data is already structured the way I want it for my plot, but there are still some interesting things to explore through wrangling and summary stats.\nWithin each location, players have to slay each demon in the order specified by the archstones. For example, in the Boletarian Palace a player cannot face the Tower Knight before they have slain the Phalanx. So each location has a first, second, and third boss (and the Boletarian Palace has a fourth that can only be faced after slaying all the other demons). This can be used to get an imperfect idea of player attrition in the game.\n\n# Detect the order of bosses based on archstone suffix\ndemons_souls <- demons_souls %>%\n mutate(\n archstone_boss = case_when(\n str_detect(archstone, \"-1\") ~ \"First\",\n str_detect(archstone, \"-2\") ~ \"Second\",\n str_detect(archstone, \"-3\") ~ \"Third\",\n str_detect(archstone, \"-4\") ~ \"Fourth (False King)\"\n ),\n archstone_boss = as_factor(archstone_boss),\n .after = archstone\n )\n\ndemons_souls\n\n\n\n \n\n\n\nNow, there are two ways to go about getting this imperfect idea of player attrition in the game. The first involves using the entire data set.\n\n# Calculate the average percent of players who have slain the first, second,\n# ..., archstone boss across locations. \ndemons_souls %>%\n group_by(archstone_boss) %>%\n summarise(average_completed = mean(percent_completed))\n\n\n\n \n\n\n\nThe second involves removing the Phalanx from the data set due to its influential pull on the average for the first archstone boss. It has a much higher completion percent (63.1%) than the other bosses in the game, and the reason for this is that the Phalanx is the first boss in the game. Players must slay it before they can go to face the first archstone boss from other locations in the game. Removing the Phalanx might give a more accurate picture of average completion for first archstone bosses.\n\n# Trophy earned: Slayer of Demon \"Phalanx\"\ndemons_souls %>%\n filter(boss != \"Phalanx\") %>%\n group_by(archstone_boss) %>%\n summarise(average_completed = mean(percent_completed))\n\n\n\n \n\n\n\nWith the Phalanx’s influence removed, it looks like there is roughly a 4% drop in average completion for each successive archstone boss. In order to face the False King players must first slay every other demon and archdemon in the game, so it is interesting the drop stays consistent there. Most players who made it far enough to slay their first archdemon then went on to slay the rest.\n\n\nUmbassa.\nAbout one quarter of Demon’s Souls players persisted to the end of the game. But three quarters did not. Assuming most players at least attempted each location, then averaging by location can give an imperfect idea of their overall difficulty for players during their first playthrough.\n\n# Calculate the average completion rate by location, arranged from \"easiest\" to\n# \"hardest\"\ndemons_souls %>%\n group_by(location) %>%\n summarise(average_completed = mean(percent_completed)) %>%\n arrange(desc(average_completed))\n\n\n\n \n\n\n\nIt looks like there are two clusters here, an easier one with the Boletarian Palace and Stonefang Tunnel, and a harder one with Shrine of Storms, Tower of Latria, and the Valley of Defilement. I finished my first playthrough of the game in 2012, so I only have distant memories to reflect on, but this ranking looks sound to me. For experienced players I think this ranking is less relevant. Once you’re experienced most of the variability in difficulty comes down to the character build you choose." + "text": "Wrangle\nThe data is already structured the way I want it for my plot, but there are still some interesting things to explore through wrangling and summary stats.\nWithin each location, players have to slay each demon in the order specified by the archstones. For example, in the Boletarian Palace a player cannot face the Tower Knight before they have slain the Phalanx. So each location has a first, second, and third boss (and the Boletarian Palace has a fourth that can only be faced after slaying all the other demons). This can be used to get an imperfect idea of player attrition in the game.\n\n# Detect the order of bosses based on archstone suffix\ndemons_souls <- demons_souls %>%\n mutate(\n archstone_boss = case_when(\n str_detect(archstone, \"-1\") ~ \"First\",\n str_detect(archstone, \"-2\") ~ \"Second\",\n str_detect(archstone, \"-3\") ~ \"Third\",\n str_detect(archstone, \"-4\") ~ \"Fourth (False King)\"\n ),\n archstone_boss = as_factor(archstone_boss),\n .after = archstone\n )\n\ndemons_souls\n\n\n\n \n\n\n\nNow, there are two ways to go about getting this imperfect idea of player attrition in the game. The first involves using the entire data set.\n\n# Calculate the average percent of players who have slain the first, second,\n# ..., archstone boss across locations. \ndemons_souls %>%\n group_by(archstone_boss) %>%\n summarise(average_completed = mean(percent_completed))\n\n\n\n \n\n\n\nThe second involves removing the Phalanx from the data set due to its influential pull on the average for the first archstone boss. It has a much higher completion percent (63.1%) than the other bosses in the game, and the reason for this is that the Phalanx is the first boss in the game. Players must slay it before they can go to face the first archstone boss from other locations in the game. Removing the Phalanx might give a more accurate picture of average completion for first archstone bosses.\n\n# Trophy earned: Slayer of Demon \"Phalanx\"\ndemons_souls %>%\n filter(boss != \"Phalanx\") %>%\n group_by(archstone_boss) %>%\n summarise(average_completed = mean(percent_completed))\n\n\n\n \n\n\n\nWith the Phalanx’s influence removed, it looks like there is roughly a 4% drop in average completion for each successive archstone boss. In order to face the False King players must first slay every other demon and archdemon in the game, so it is interesting the drop stays consistent there. Most players who made it far enough to slay their first archdemon then went on to slay the rest.\n\n\nUmbassa.\nAbout one quarter of Demon’s Souls players persisted to the end of the game. But three quarters did not. Assuming most players at least attempted each location, then averaging by location can give an imperfect idea of their overall difficulty for players during their first playthrough.\n\n# Calculate the average completion rate by location, arranged from \"easiest\" to\n# \"hardest\"\ndemons_souls %>%\n group_by(location) %>%\n summarise(average_completed = mean(percent_completed)) %>%\n arrange(desc(average_completed))\n\n\n\n \n\n\n\nIt looks like there are two clusters here, an easier one with the Boletarian Palace and Stonefang Tunnel, and a harder one with Shrine of Storms, Tower of Latria, and the Valley of Defilement. I finished my first playthrough of the game in 2012, so I only have distant memories to reflect on, but this ranking looks sound to me. For experienced players I think this ranking is less relevant. Once you’re experienced most of the variability in difficulty comes down to the character build you choose." }, { "objectID": "posts/2021-06-15_demons-souls/index.html#visualize", "href": "posts/2021-06-15_demons-souls/index.html#visualize", "title": "Go forth, slayer of Demons", "section": "Visualize", - "text": "Visualize\n\n# Define aliases for plot fonts and colours\noptimus <- \"OptimusPrinceps\"\noptimus_b <- \"OptimusPrincepsSemiBold\"\nyellow <- \"#ffaf24\" # #fec056\n\nThe plot I want to make is inspired by this Tidy Tuesday plot by Georgios Karamanis. I used Georgios’ code as a starting point, then modified it to get the behaviour and result I wanted.\nThe centrepiece of the plot is the coloured text that shows the percent of Demon’s Souls players who have completed a given boss in yellow and who have not in red. This effect is achieved by applying a rectangular filter over the text that only allows the portion of the text within the filter’s borders to be shown. Doing this once for yellow text and once for red text allows the full string to appear, with the ratio of colours within a boss’s name reflecting the percent of players that have completed it. A few calculations are needed in order for the ratios to be accurate, and for the text to look aesthetically pleasing.\n\ndemons_souls_plot <- demons_souls %>%\n mutate(\n # Percentages need to be in decimal form for the calculations and plotting\n # to work properly\n percent_completed = percent_completed/100,\n boss = fct_reorder(toupper(boss), percent_completed),\n # In order to justify text to the same width, a ratio of how many times\n # each string would fit into the widest string needs to be calculated. This\n # can then be multiplied by an arbitrary value to determine the final size\n # for each string of text.\n str_width = strwidth(boss, family = optimus_b, units = \"inches\") * 25.4, # in millimetres\n str_ratio = max(str_width)/str_width,\n text_size = 4.9 * str_ratio,\n # The division here is arbitrary, its effect is reflected in the scale of the\n # y-axis\n tile_height = text_size / 10\n ) %>%\n # Bosses will appear from top to bottom based on completion ratios. The\n # calculation here accounts for the differences in text size for each string.\n arrange(percent_completed) %>%\n mutate(y = cumsum(lag(tile_height/2, default = 0) + tile_height/2))\n\nNow the plot can be constructed. The final code for the plot is roughly 100 lines long, so I’ve hidden it in the section below. However, there are a few parts of the code I want to highlight before showing the final plot.\n\n\nShow Code\n# The trick for geom spacing is to set the size of the plot from the start\nfile <- tempfile(fileext = '.png')\nragg::agg_png(file, width = 4, height = 5.5, res = 300, units = \"in\")\n\nggplot(demons_souls_plot) +\n # Make it easier to see where 50% is using a vertical line. geom_segment() is\n # used here instead of geom_vline() because the latter goes up into the title\n # text. An empty data frame is supplied so that only one copy of the geom is\n # drawn.\n geom_segment(aes(\n x = 0,\n xend = 0,\n y = 10.9,\n yend = 0,\n size = 0.6),\n data = data.frame(),\n alpha = 0.3,\n colour = \"grey\",\n lineend = \"round\",\n linetype = \"twodash\"\n ) +\n scale_alpha_identity() +\n \n # Set bounding box for yellow portion of centrepiece text\n as_reference(\n geom_rect(aes(\n xmin = -0.5,\n xmax = -0.5 + ((percent_completed)),\n ymin = y - (tile_height * 0.5),\n ymax = y + (tile_height * 0.5)\n )), \n id = \"demon_vanquished\"\n ) +\n # Only show the portion of yellow centrepiece text located within the\n # bounding box\n with_blend(\n geom_text(aes(\n x = 0,\n y = y,\n label = boss,\n size = text_size\n ),\n colour = yellow,\n family = optimus_b),\n bg_layer = \"demon_vanquished\",\n blend_type = \"in\"\n ) +\n # Set bounding box for red portion of centrepiece text\n as_reference(\n geom_rect(aes(\n xmin = 0.5 - ((1 - percent_completed)),\n xmax = 0.5,\n ymin = y - (tile_height * 0.5),\n ymax = y + (tile_height * 0.5)\n )), \n id = \"you_died\"\n ) +\n # Only show the portion of red centrepiece text located within the bounding\n # box\n with_blend(\n geom_text(aes(\n x = 0,\n y = y,\n label = boss,\n size = text_size\n ),\n colour = \"red\",\n family = optimus_b),\n bg_layer = \"you_died\",\n blend_type = \"in\"\n ) +\n \n # Draw \"axis\" for Demon Vanquished\n annotate(\n \"text\",\n x = -0.65,\n y = 7.75,\n label = \"demon vanquished\",\n angle = 90,\n size = 5,\n family = optimus,\n colour = yellow\n ) +\n geom_segment(aes(\n x = -0.645,\n xend = -0.645,\n y = 10.05,\n yend = 10.45),\n lineend = \"round\",\n colour = yellow,\n size = 0.3,\n arrow = arrow(angle = 45, length = unit(1, \"mm\"), type = \"open\")\n ) +\n # Draw \"axis\" for You Died\n annotate(\n \"text\",\n x = 0.65,\n y = 4.65,\n label = \"you died\",\n angle = 270,\n size = 5,\n family = optimus,\n colour = \"red\"\n ) +\n geom_segment(aes(\n x = 0.645,\n xend = 0.645,\n y = 3.51,\n yend = 3.01),\n lineend = \"round\",\n colour = \"red\",\n size = 0.3,\n arrow = arrow(angle = 45, length = unit(1, \"mm\"), type = \"open\")\n ) +\n \n # Draw a title surrounded by line decorations at the top of the panel\n geom_segment(aes(\n x = -0.75,\n xend = 0.75,\n y = 13.2,\n yend = 13.2,\n size = 0.3),\n lineend = \"round\",\n colour = \"grey\"\n ) +\n annotate(\n \"text\",\n x = 0,\n y = 12.325,\n size = 7,\n family = optimus_b,\n colour = \"white\",\n lineheight = 0.75,\n label = \"DEMON’S SOULS\\nBOSS COMPLETION\"\n ) +\n geom_segment(aes(\n x = -0.025,\n xend = -0.75,\n y = 11.4,\n yend = 11.4,\n size = 0.3),\n lineend = \"round\",\n colour = \"grey\"\n ) +\n geom_segment(aes(\n x = 0.025,\n xend = 0.75,\n y = 11.4,\n yend = 11.4,\n size = 0.3),\n lineend = \"round\",\n colour = \"grey\"\n ) +\n annotate(\n \"point\",\n x = 0,\n y = 11.4,\n colour = \"grey\",\n shape = 5,\n size = 2,\n stroke = 0.6\n ) +\n annotate(\n \"point\",\n x = 0,\n y = 11.4,\n colour = \"grey\",\n shape = 5,\n size = 0.75\n ) +\n \n # Draw plot caption\n annotate(\n \"text\",\n x = 1,\n y = 10.33,\n angle = 270,\n hjust = 0,\n size = 3,\n alpha = 0.3,\n label = \"SOURCE: PLAYSTATION NETWORK | GRAPHIC: MICHAEL MCCARTHY\",\n family = optimus,\n color = \"grey\"\n ) +\n \n # Make sure the text size calculated for each string is used so that strings\n # are justified\n scale_size_identity() +\n # Take axis limits exactly from data so there's no spacing around the panel,\n # allow drawing outside of the panel for annotations, and set the axis limits\n # to match the limits of the text.\n coord_cartesian(expand = FALSE, clip = \"off\", xlim = c(-0.5, 0.5)) +\n # Specify the panel size manually. This makes it easier to position plot\n # elements with absolute positions.\n ggh4x::force_panelsizes(rows = unit(5, \"in\"), # height\n cols = unit(1.8, \"in\")) + # width\n theme_void() +\n theme(\n legend.position = \"none\",\n plot.margin = unit(c(0.5, 4, 0.5, 4), \"in\"),\n plot.background = element_rect(fill = \"black\", color = NA))\n\ninvisible(dev.off())\n\n# Apply a mask texture to the final image to mimic the style of the Demon's\n# Souls logo in the plot title\nmask <- image_read(\n here(\"posts\", \"2021-06-15_demons-souls\", \"images\", \"texture.png\")\n ) %>%\n image_transparent(\"white\") %>%\n image_threshold(\"black\", \"90%\")\n\nfinal_plot <- image_composite(image_read(file), mask, operator = \"Over\")\n\n\nFirst, the code behind the coloured centrepiece text. It uses ggfx::as_reference() and ggfx::with_blend() to selectively apply a filter over portions of the text, as I discussed earlier. The boundaries of the filter are defined by the ggplot2 geom inside of ggfx::as_reference(), then ggfx::with_blend() applies a filter specified by blend_type to the ggplot2 geom inside of it. By duplicating this process twice—once for yellow text and again for red text—but with different filter boundaries based on the percent completed and not completed, the entire boss name is displayed with accurate colour fills.\n\n # Set bounding box for yellow portion of centrepiece text\n as_reference(\n geom_rect(aes(\n xmin = -0.5,\n xmax = -0.5 + ((percent_completed)),\n ymin = y - (tile_height * 0.5),\n ymax = y + (tile_height * 0.5)\n )), \n id = \"demon_vanquished\"\n ) +\n # Only show the portion of yellow centrepiece text located within the\n # bounding box\n with_blend(\n geom_text(aes(\n x = 0,\n y = y,\n label = boss,\n size = text_size\n ),\n colour = yellow,\n family = optimus_b),\n bg_layer = \"demon_vanquished\",\n blend_type = \"in\"\n ) +\n # Set bounding box for red portion of centrepiece text\n as_reference(\n geom_rect(aes(\n xmin = 0.5 - ((1 - percent_completed)),\n xmax = 0.5,\n ymin = y - (tile_height * 0.5),\n ymax = y + (tile_height * 0.5)\n )), \n id = \"you_died\"\n ) +\n # Only show the portion of red centrepiece text located within the bounding\n # box\n with_blend(\n geom_text(aes(\n x = 0,\n y = y,\n label = boss,\n size = text_size\n ),\n colour = \"red\",\n family = optimus_b),\n bg_layer = \"you_died\",\n blend_type = \"in\"\n )\n\nSecond, the code behind the distressed, broken style of the title text. This one is actually quite simple. It uses magick::image_composite() to apply a texture mask I made in Krita over the composed plot. The mask has a transparent background with black lines located over the space where the plot title is. Both the composed plot and mask images have the same dimensions, so when they’re composed together the effect is applied exactly where I want it.\n\nimage_composite(plot, mask, operator = \"Over\")\n\nFinally, I just wanted to note that the decorative lines around the plot’s title text are actually made up of ggplot2 geoms. I used two ggplot2::geom_point() geoms with different sizes to create the diamond on the bottom line." + "text": "Visualize\n\n# Define aliases for plot fonts and colours\noptimus <- \"OptimusPrinceps\"\noptimus_b <- \"OptimusPrincepsSemiBold\"\nyellow <- \"#ffaf24\" # #fec056\n\nThe plot I want to make is inspired by this Tidy Tuesday plot by Georgios Karamanis. I used Georgios’ code as a starting point, then modified it to get the behaviour and result I wanted.\nThe centrepiece of the plot is the coloured text that shows the percent of Demon’s Souls players who have completed a given boss in yellow and who have not in red. This effect is achieved by applying a rectangular filter over the text that only allows the portion of the text within the filter’s borders to be shown. Doing this once for yellow text and once for red text allows the full string to appear, with the ratio of colours within a boss’s name reflecting the percent of players that have completed it. A few calculations are needed in order for the ratios to be accurate, and for the text to look aesthetically pleasing.\n\ndemons_souls_plot <- demons_souls %>%\n mutate(\n # Percentages need to be in decimal form for the calculations and plotting\n # to work properly\n percent_completed = percent_completed/100,\n boss = fct_reorder(toupper(boss), percent_completed),\n # In order to justify text to the same width, a ratio of how many times\n # each string would fit into the widest string needs to be calculated. This\n # can then be multiplied by an arbitrary value to determine the final size\n # for each string of text.\n str_width = strwidth(boss, family = optimus_b, units = \"inches\") * 25.4, # in millimetres\n str_ratio = max(str_width)/str_width,\n text_size = 4.9 * str_ratio,\n # The division here is arbitrary, its effect is reflected in the scale of the\n # y-axis\n tile_height = text_size / 10\n ) %>%\n # Bosses will appear from top to bottom based on completion ratios. The\n # calculation here accounts for the differences in text size for each string.\n arrange(percent_completed) %>%\n mutate(y = cumsum(lag(tile_height/2, default = 0) + tile_height/2))\n\nNow the plot can be constructed. The final code for the plot is roughly 100 lines long, so I’ve hidden it in the section below. However, there are a few parts of the code I want to highlight before showing the final plot.\n\n\nShow Code\n# The trick for geom spacing is to set the size of the plot from the start\nfile <- tempfile(fileext = '.png')\nragg::agg_png(file, width = 4, height = 5.5, res = 300, units = \"in\")\n\nggplot(demons_souls_plot) +\n # Make it easier to see where 50% is using a vertical line. geom_segment() is\n # used here instead of geom_vline() because the latter goes up into the title\n # text. An empty data frame is supplied so that only one copy of the geom is\n # drawn.\n geom_segment(aes(\n x = 0,\n xend = 0,\n y = 10.9,\n yend = 0,\n size = 0.6),\n data = data.frame(),\n alpha = 0.3,\n colour = \"grey\",\n lineend = \"round\",\n linetype = \"twodash\"\n ) +\n scale_alpha_identity() +\n \n # Set bounding box for yellow portion of centrepiece text\n as_reference(\n geom_rect(aes(\n xmin = -0.5,\n xmax = -0.5 + ((percent_completed)),\n ymin = y - (tile_height * 0.5),\n ymax = y + (tile_height * 0.5)\n )), \n id = \"demon_vanquished\"\n ) +\n # Only show the portion of yellow centrepiece text located within the\n # bounding box\n with_blend(\n geom_text(aes(\n x = 0,\n y = y,\n label = boss,\n size = text_size\n ),\n colour = yellow,\n family = optimus_b),\n bg_layer = \"demon_vanquished\",\n blend_type = \"in\"\n ) +\n # Set bounding box for red portion of centrepiece text\n as_reference(\n geom_rect(aes(\n xmin = 0.5 - ((1 - percent_completed)),\n xmax = 0.5,\n ymin = y - (tile_height * 0.5),\n ymax = y + (tile_height * 0.5)\n )), \n id = \"you_died\"\n ) +\n # Only show the portion of red centrepiece text located within the bounding\n # box\n with_blend(\n geom_text(aes(\n x = 0,\n y = y,\n label = boss,\n size = text_size\n ),\n colour = \"red\",\n family = optimus_b),\n bg_layer = \"you_died\",\n blend_type = \"in\"\n ) +\n \n # Draw \"axis\" for Demon Vanquished\n annotate(\n \"text\",\n x = -0.65,\n y = 7.75,\n label = \"demon vanquished\",\n angle = 90,\n size = 5,\n family = optimus,\n colour = yellow\n ) +\n geom_segment(aes(\n x = -0.645,\n xend = -0.645,\n y = 10.05,\n yend = 10.45),\n lineend = \"round\",\n colour = yellow,\n size = 0.3,\n arrow = arrow(angle = 45, length = unit(1, \"mm\"), type = \"open\")\n ) +\n # Draw \"axis\" for You Died\n annotate(\n \"text\",\n x = 0.65,\n y = 4.65,\n label = \"you died\",\n angle = 270,\n size = 5,\n family = optimus,\n colour = \"red\"\n ) +\n geom_segment(aes(\n x = 0.645,\n xend = 0.645,\n y = 3.51,\n yend = 3.01),\n lineend = \"round\",\n colour = \"red\",\n size = 0.3,\n arrow = arrow(angle = 45, length = unit(1, \"mm\"), type = \"open\")\n ) +\n \n # Draw a title surrounded by line decorations at the top of the panel\n geom_segment(aes(\n x = -0.75,\n xend = 0.75,\n y = 13.2,\n yend = 13.2,\n size = 0.3),\n lineend = \"round\",\n colour = \"grey\"\n ) +\n annotate(\n \"text\",\n x = 0,\n y = 12.325,\n size = 7,\n family = optimus_b,\n colour = \"white\",\n lineheight = 0.75,\n label = \"DEMON’S SOULS\\nBOSS COMPLETION\"\n ) +\n geom_segment(aes(\n x = -0.025,\n xend = -0.75,\n y = 11.4,\n yend = 11.4,\n size = 0.3),\n lineend = \"round\",\n colour = \"grey\"\n ) +\n geom_segment(aes(\n x = 0.025,\n xend = 0.75,\n y = 11.4,\n yend = 11.4,\n size = 0.3),\n lineend = \"round\",\n colour = \"grey\"\n ) +\n annotate(\n \"point\",\n x = 0,\n y = 11.4,\n colour = \"grey\",\n shape = 5,\n size = 2,\n stroke = 0.6\n ) +\n annotate(\n \"point\",\n x = 0,\n y = 11.4,\n colour = \"grey\",\n shape = 5,\n size = 0.75\n ) +\n \n # Draw plot caption\n annotate(\n \"text\",\n x = 1,\n y = 10.33,\n angle = 270,\n hjust = 0,\n size = 3,\n alpha = 0.3,\n label = \"SOURCE: PLAYSTATION NETWORK | GRAPHIC: MICHAEL MCCARTHY\",\n family = optimus,\n color = \"grey\"\n ) +\n \n # Make sure the text size calculated for each string is used so that strings\n # are justified\n scale_size_identity() +\n # Take axis limits exactly from data so there's no spacing around the panel,\n # allow drawing outside of the panel for annotations, and set the axis limits\n # to match the limits of the text.\n coord_cartesian(expand = FALSE, clip = \"off\", xlim = c(-0.5, 0.5)) +\n # Specify the panel size manually. This makes it easier to position plot\n # elements with absolute positions.\n ggh4x::force_panelsizes(rows = unit(5, \"in\"), # height\n cols = unit(1.8, \"in\")) + # width\n theme_void() +\n theme(\n legend.position = \"none\",\n plot.margin = unit(c(0.5, 4, 0.5, 4), \"in\"),\n plot.background = element_rect(fill = \"black\", color = NA))\n\ninvisible(dev.off())\n\n# Apply a mask texture to the final image to mimic the style of the Demon's\n# Souls logo in the plot title\nmask <- image_read(\n here(\"posts\", \"2021-06-15_demons-souls\", \"images\", \"texture.png\")\n ) %>%\n image_transparent(\"white\") %>%\n image_threshold(\"black\", \"90%\")\n\nfinal_plot <- image_composite(image_read(file), mask, operator = \"Over\")\n\n\nFirst, the code behind the coloured centrepiece text. It uses ggfx::as_reference() and ggfx::with_blend() to selectively apply a filter over portions of the text, as I discussed earlier. The boundaries of the filter are defined by the ggplot2 geom inside of ggfx::as_reference(), then ggfx::with_blend() applies a filter specified by blend_type to the ggplot2 geom inside of it. By duplicating this process twice—once for yellow text and again for red text—but with different filter boundaries based on the percent completed and not completed, the entire boss name is displayed with accurate colour fills.\n\n # Set bounding box for yellow portion of centrepiece text\n as_reference(\n geom_rect(aes(\n xmin = -0.5,\n xmax = -0.5 + ((percent_completed)),\n ymin = y - (tile_height * 0.5),\n ymax = y + (tile_height * 0.5)\n )), \n id = \"demon_vanquished\"\n ) +\n # Only show the portion of yellow centrepiece text located within the\n # bounding box\n with_blend(\n geom_text(aes(\n x = 0,\n y = y,\n label = boss,\n size = text_size\n ),\n colour = yellow,\n family = optimus_b),\n bg_layer = \"demon_vanquished\",\n blend_type = \"in\"\n ) +\n # Set bounding box for red portion of centrepiece text\n as_reference(\n geom_rect(aes(\n xmin = 0.5 - ((1 - percent_completed)),\n xmax = 0.5,\n ymin = y - (tile_height * 0.5),\n ymax = y + (tile_height * 0.5)\n )), \n id = \"you_died\"\n ) +\n # Only show the portion of red centrepiece text located within the bounding\n # box\n with_blend(\n geom_text(aes(\n x = 0,\n y = y,\n label = boss,\n size = text_size\n ),\n colour = \"red\",\n family = optimus_b),\n bg_layer = \"you_died\",\n blend_type = \"in\"\n )\n\nSecond, the code behind the distressed, broken style of the title text. This one is actually quite simple. It uses magick::image_composite() to apply a texture mask I made in Krita over the composed plot. The mask has a transparent background with black lines located over the space where the plot title is. Both the composed plot and mask images have the same dimensions, so when they’re composed together the effect is applied exactly where I want it.\n\nimage_composite(plot, mask, operator = \"Over\")\n\nFinally, I just wanted to note that the decorative lines around the plot’s title text are actually made up of ggplot2 geoms. I used two ggplot2::geom_point() geoms with different sizes to create the diamond on the bottom line." }, { "objectID": "posts/2021-06-15_demons-souls/index.html#final-graphic", @@ -790,6 +951,13 @@ "section": "", "text": "Projectile motion describes the motion of an object launched into the air whose trajectory after launch is influenced only by the force of gravity and for which air resistance is negligible. Projectile motion was first accurately described by Galileo Galilei in his book Two New Sciences, published in 1638. In what he dubbed compound motion, Galileo demonstrated that projectile motion can be broken down into independent horizontal and vertical components that can be analyzed separately to describe an object’s trajectory. He used this principle to prove that the trajectory of an object in projectile motion will always follow a curve in the shape of a parabola.\n\n\n\n\n\nProjectile motion of an object launched at the same height and velocity but different angles. The symmetrical U-shaped curve of each trajectory is known as a parabola.\n\n\n\n\nGalileo used an inclined plane to demonstrate his principle of compound motion. I’m going to use R." }, + { + "objectID": "posts/2022-06-16_projectile-motion/index.html#overview", + "href": "posts/2022-06-16_projectile-motion/index.html#overview", + "title": "On motion", + "section": "", + "text": "Projectile motion describes the motion of an object launched into the air whose trajectory after launch is influenced only by the force of gravity and for which air resistance is negligible. Projectile motion was first accurately described by Galileo Galilei in his book Two New Sciences, published in 1638. In what he dubbed compound motion, Galileo demonstrated that projectile motion can be broken down into independent horizontal and vertical components that can be analyzed separately to describe an object’s trajectory. He used this principle to prove that the trajectory of an object in projectile motion will always follow a curve in the shape of a parabola.\n\n\n\n\n\nProjectile motion of an object launched at the same height and velocity but different angles. The symmetrical U-shaped curve of each trajectory is known as a parabola.\n\n\n\n\nGalileo used an inclined plane to demonstrate his principle of compound motion. I’m going to use R." + }, { "objectID": "posts/2022-06-16_projectile-motion/index.html#post-inspiration", "href": "posts/2022-06-16_projectile-motion/index.html#post-inspiration", @@ -809,21 +977,21 @@ "href": "posts/2022-06-16_projectile-motion/index.html#simulate", "title": "On motion", "section": "Simulate", - "text": "Simulate\nThe equations for projectile motion use a common set of variables which are listed below. The equations assume that the force of air resistance is negligible\n\n\\begin{align*}\nV &\\leftarrow \\textrm{initial velocity}, \\\\\nV_x &\\leftarrow \\textrm{horizontal velocity}, \\\\\nV_y &\\leftarrow \\textrm{vertical velocity}, \\\\\n\\alpha &\\leftarrow \\textrm{launch angle}, \\\\\nh &\\leftarrow \\textrm{initial height}, \\\\\nt &\\leftarrow \\textrm{time of flight}, \\\\\nd &\\leftarrow \\textrm{distance (range)}, \\\\\nh_{\\textrm{max}} &\\leftarrow \\textrm{maximum height}, \\\\\ng &\\leftarrow \\textrm{gravity}.\n\\end{align*}\n\n\nHorizontal and vertical velocity\nThe horizontal velocity, V_x, and vertical velocity, V_y, of an object moving in projectile motion are given by the equations\n\n\\begin{align*}\nV_x &= V \\times \\cos(\\alpha), \\textrm{ and} \\\\\nV_y &= V \\times \\sin(\\alpha),\n\\end{align*}\n\nwhere V is the initial velocity and \\alpha is the launch angle. Horizontal and vertical velocity can be computed in R with the following functions.\n\nvelocity_x <- function(velocity, angle) {\n # Degrees need to be converted to radians in cos() since that is what the\n # function uses\n velocity * cos(angle * (pi/180))\n}\n\nvelocity_y <- function(velocity, angle) {\n # Degrees need to be converted to radians in sin() since that is what the\n # function uses\n velocity * sin(angle * (pi/180))\n}\n\n\n\nTime of flight\nThe time of flight, t, of an object moving in projectile motion is given by the equation\n\nt = \\left(V_y + \\sqrt{V_y^2 + 2 \\times g \\times h}\\right) \\div g,\n\nwhere V_y is the vertical velocity, g is the force of gravity, and h, is the initial height the object is launched from. Time of flight is the time from when the object is launched to the time the object reaches the surface. It can be computed in R with the following function.\n\nflight_time <- function(velocity_y, height, gravity = 9.80665) {\n ( velocity_y + sqrt(velocity_y^2 + 2 * gravity * height) ) / gravity\n}\n\n\n\nDistance (range)\nThe distance, d, or range travelled by an object moving in projectile motion is given by the equation\n\nd = V_x \\times t,\n\nwhere V_x is the horizontal velocity and t is the time of flight. The range of the projectile is the total horizontal distance travelled during the time of flight. It can be computed in R with the following function.\n\ndistance <- function(velocity_x, velocity_y, height, gravity = 9.80665) {\n velocity_x * ( velocity_y + sqrt(velocity_y^2 + 2 * gravity * height) ) /\n gravity\n}\n\n\n\nMaximum height\nThe maximum height, h_{\\textrm{max}}, reached by an object moving in projectile motion is given by the equation\n\nh_{\\textrm{max}} = h + V_y^2 \\div (2 \\times g),\n\nwhere h is the initial height, V_y is the vertical velocity, and g is the force of gravity. The maximum height is reached when V_y = 0. It can be computed in R with the following function.\n\nheight_max <- function(velocity_y, height, gravity = 9.80665) {\n height + velocity_y^2 / (2 * gravity)\n}\n\n\n\nProjectile motion calculator\nNow to wrap all the components into a single function that will calculate the result for each component based on a set of parameters given to it. These results can then be used to determine the position and velocity of the projectile at any point in time during its trajectory, which I want to return as a data frame that can be used for plotting.\n\n#' nframes and fps can be used to animate the trajectory as close to real time as possible.\n#' There will be some rounding error though so it won't be exactly the same as the flight\n#' time.\nprojectile_motion <- function(velocity, angle, height, gravity = 9.80665, nframes = 30) {\n \n # Velocity components\n vx <- velocity_x(velocity, angle)\n vy <- velocity_y(velocity, angle)\n # Flight components\n t <- flight_time(vy, height, gravity)\n d <- distance(vx, vy, height, gravity)\n # Max height components\n hm <- height_max(vy, height, gravity)\n th <- vy / gravity\n hd <- vx * th\n \n # Calculate the position of the projectile in 2D space at a given point in\n # time to approximate its trajectory over time\n x_pos <- map_dbl(seq(0, t, length = nframes), ~ {\n vx * .x\n })\n \n y_pos <- map_dbl(seq(0, t, length = nframes), ~ {\n height + ( vy * .x + 0.5 * -gravity * .x^2 )\n })\n \n # Calculate the vertical velocity of the projectile at a given point in time\n vy_t <- map_dbl(seq(0, t, length = nframes), ~ {\n vy - gravity * .x\n })\n \n trajectory <- data.frame(\n x = x_pos,\n y = y_pos,\n vx = vx,\n vy = vy_t,\n second = seq(0, t, length = nframes)\n )\n \n # Return a list with all calculated values\n list(\n velocity_x = vx,\n velocity_y = vy,\n flight_time = t,\n distance = d,\n max_height = hm,\n max_height_time = th,\n max_height_dist = hd,\n trajectory = trajectory,\n nframes = nframes,\n fps = nframes/t\n )\n \n}" + "text": "Simulate\nThe equations for projectile motion use a common set of variables which are listed below. The equations assume that the force of air resistance is negligible\n\n\\begin{align*}\nV &\\leftarrow \\textrm{initial velocity}, \\\\\nV_x &\\leftarrow \\textrm{horizontal velocity}, \\\\\nV_y &\\leftarrow \\textrm{vertical velocity}, \\\\\n\\alpha &\\leftarrow \\textrm{launch angle}, \\\\\nh &\\leftarrow \\textrm{initial height}, \\\\\nt &\\leftarrow \\textrm{time of flight}, \\\\\nd &\\leftarrow \\textrm{distance (range)}, \\\\\nh_{\\textrm{max}} &\\leftarrow \\textrm{maximum height}, \\\\\ng &\\leftarrow \\textrm{gravity}.\n\\end{align*}\n\n\nHorizontal and vertical velocity\nThe horizontal velocity, V_x, and vertical velocity, V_y, of an object moving in projectile motion are given by the equations\n\n\\begin{align*}\nV_x &= V \\times \\cos(\\alpha), \\textrm{ and} \\\\\nV_y &= V \\times \\sin(\\alpha),\n\\end{align*}\n\nwhere V is the initial velocity and \\alpha is the launch angle. Horizontal and vertical velocity can be computed in R with the following functions.\n\nvelocity_x <- function(velocity, angle) {\n # Degrees need to be converted to radians in cos() since that is what the\n # function uses\n velocity * cos(angle * (pi/180))\n}\n\nvelocity_y <- function(velocity, angle) {\n # Degrees need to be converted to radians in sin() since that is what the\n # function uses\n velocity * sin(angle * (pi/180))\n}\n\n\n\nTime of flight\nThe time of flight, t, of an object moving in projectile motion is given by the equation\n\nt = \\left(V_y + \\sqrt{V_y^2 + 2 \\times g \\times h}\\right) \\div g,\n\nwhere V_y is the vertical velocity, g is the force of gravity, and h, is the initial height the object is launched from. Time of flight is the time from when the object is launched to the time the object reaches the surface. It can be computed in R with the following function.\n\nflight_time <- function(velocity_y, height, gravity = 9.80665) {\n ( velocity_y + sqrt(velocity_y^2 + 2 * gravity * height) ) / gravity\n}\n\n\n\nDistance (range)\nThe distance, d, or range travelled by an object moving in projectile motion is given by the equation\n\nd = V_x \\times t,\n\nwhere V_x is the horizontal velocity and t is the time of flight. The range of the projectile is the total horizontal distance travelled during the time of flight. It can be computed in R with the following function.\n\ndistance <- function(velocity_x, velocity_y, height, gravity = 9.80665) {\n velocity_x * ( velocity_y + sqrt(velocity_y^2 + 2 * gravity * height) ) /\n gravity\n}\n\n\n\nMaximum height\nThe maximum height, h_{\\textrm{max}}, reached by an object moving in projectile motion is given by the equation\n\nh_{\\textrm{max}} = h + V_y^2 \\div (2 \\times g),\n\nwhere h is the initial height, V_y is the vertical velocity, and g is the force of gravity. The maximum height is reached when V_y = 0. It can be computed in R with the following function.\n\nheight_max <- function(velocity_y, height, gravity = 9.80665) {\n height + velocity_y^2 / (2 * gravity)\n}\n\n\n\nProjectile motion calculator\nNow to wrap all the components into a single function that will calculate the result for each component based on a set of parameters given to it. These results can then be used to determine the position and velocity of the projectile at any point in time during its trajectory, which I want to return as a data frame that can be used for plotting.\n\n#' nframes and fps can be used to animate the trajectory as close to real time as possible.\n#' There will be some rounding error though so it won't be exactly the same as the flight\n#' time.\nprojectile_motion <- function(velocity, angle, height, gravity = 9.80665, nframes = 30) {\n \n # Velocity components\n vx <- velocity_x(velocity, angle)\n vy <- velocity_y(velocity, angle)\n # Flight components\n t <- flight_time(vy, height, gravity)\n d <- distance(vx, vy, height, gravity)\n # Max height components\n hm <- height_max(vy, height, gravity)\n th <- vy / gravity\n hd <- vx * th\n \n # Calculate the position of the projectile in 2D space at a given point in\n # time to approximate its trajectory over time\n x_pos <- map_dbl(seq(0, t, length = nframes), ~ {\n vx * .x\n })\n \n y_pos <- map_dbl(seq(0, t, length = nframes), ~ {\n height + ( vy * .x + 0.5 * -gravity * .x^2 )\n })\n \n # Calculate the vertical velocity of the projectile at a given point in time\n vy_t <- map_dbl(seq(0, t, length = nframes), ~ {\n vy - gravity * .x\n })\n \n trajectory <- data.frame(\n x = x_pos,\n y = y_pos,\n vx = vx,\n vy = vy_t,\n second = seq(0, t, length = nframes)\n )\n \n # Return a list with all calculated values\n list(\n velocity_x = vx,\n velocity_y = vy,\n flight_time = t,\n distance = d,\n max_height = hm,\n max_height_time = th,\n max_height_dist = hd,\n trajectory = trajectory,\n nframes = nframes,\n fps = nframes/t\n )\n \n}" }, { "objectID": "posts/2022-06-16_projectile-motion/index.html#wrangle", "href": "posts/2022-06-16_projectile-motion/index.html#wrangle", "title": "On motion", "section": "Wrangle", - "text": "Wrangle\nInstead of jumping straight into a visualization, I want to play around with the output of projectile_motion(). First to show what its output looks like, and second to explore the interesting ways it can be extended through wrangling.\n\nA simple trajectory\nFirst off, demonstrating projectile_motion() and its output. The function takes five arguments:\n\nvelocity in metres per second,\nangle in degrees,\nheight in metres per second,\ngravity in metres per second (this defaults to Earth’s gravity, 9.80665 m/s), and\nnframes which represents how many points in time to record in the trajectory data frame.\n\n\nprojectile_motion(\n velocity = 11.4,\n angle = 52.1,\n height = 18,\n nframes = 10\n)\n\n#> $velocity_x\n#> [1] 7.002851\n#> \n#> $velocity_y\n#> [1] 8.995559\n#> \n#> $flight_time\n#> [1] 3.041533\n#> \n#> $distance\n#> [1] 21.29941\n#> \n#> $max_height\n#> [1] 22.12578\n#> \n#> $max_height_time\n#> [1] 0.9172917\n#> \n#> $max_height_dist\n#> [1] 6.423657\n#> \n#> $trajectory\n#> x y vx vy second\n#> 1 0.000000 1.800000e+01 7.002851 8.9955586 0.0000000\n#> 2 2.366601 2.048003e+01 7.002851 5.6814194 0.3379481\n#> 3 4.733201 2.184005e+01 7.002851 2.3672802 0.6758963\n#> 4 7.099802 2.208006e+01 7.002851 -0.9468589 1.0138444\n#> 5 9.466402 2.120007e+01 7.002851 -4.2609981 1.3517926\n#> 6 11.833003 1.920007e+01 7.002851 -7.5751373 1.6897407\n#> 7 14.199604 1.608006e+01 7.002851 -10.8892765 2.0276889\n#> 8 16.566204 1.184005e+01 7.002851 -14.2034156 2.3656370\n#> 9 18.932805 6.480029e+00 7.002851 -17.5175548 2.7035852\n#> 10 21.299405 3.552714e-15 7.002851 -20.8316940 3.0415333\n#> \n#> $nframes\n#> [1] 10\n#> \n#> $fps\n#> [1] 3.287815\n\n\nThe function returns calculations for each of the projectile motion equations I covered above, as well as some additional output that can be used for plotting and animation:\n\nmax_height_time and max_height_dist give the time (s) and distance (m) it takes for the projectile to reach its maximum height.\ntrajectory gives the horizontal and vertical position and velocity at a given moment during the projectile’s trajectory.\nfps gives the number of frames per second that are needed to animate the trajectory in real time based on nframes. Because it is impossible to have a fraction of a frame there will be variance in how closely an animation’s duration matches the actual time of flight based on the value of nframes.\n\n\n\nLaunching a projectile on different planets\nGiven the inspiration for this post, a space themed simulation seems appropriate. Here I want to test how the gravity of each planet in our solar system influences projectile motion, given a projectile is launched with the same velocity, angle, and height.\nFirst I need to construct a named vector of the gravity of each planet in our solar system. NASA provides these values came as ratios of each planet’s gravity relative to Earth, so I had to multiply each one by Earth’s gravity to get the units correct.\n\n# All values are in metres per second\nplanets <- c(\n mercury = 3.7069137,\n venus = 8.8946315,\n earth = 9.80665,\n moon = 1.6279039,\n mars = 3.697107,\n jupiter = 23.143694,\n saturn = 8.9828914,\n uranus = 8.7181118,\n neptune = 10.983448,\n pluto = 0.6962721\n)\n\nThen I can create a named list of projectile motion calculations, one for each planet. Each planet has its own list of output from projectile_motion(), so the resulting list of projectile motion calculations is actually a list of lists. This can be tidied into a tibble to make it easier to work with.\n\n# Calculate projectile motion for each planet, given the same velocity,\n# angle, and height\nplanets_pm <- map(planets, ~{\n projectile_motion(\n velocity = 20,\n angle = 45,\n height = 35,\n gravity = .x,\n nframes = 100)\n})\n\n# Tidying the list of lists into a tibble makes it easier to work with. Note\n# that the trajectory column is a list column since it contains the trajectory\n# data frame for each planet.\nplanets_df <- planets_pm %>%\n enframe() %>%\n unnest_wider(value) %>%\n rename(planet = name)\n\nplanets_trajectory <- planets_df %>%\n select(planet, trajectory) %>%\n unnest(trajectory) %>% \n mutate(planet = factor(planet, levels = names(planets)))" + "text": "Wrangle\nInstead of jumping straight into a visualization, I want to play around with the output of projectile_motion(). First to show what its output looks like, and second to explore the interesting ways it can be extended through wrangling.\n\nA simple trajectory\nFirst off, demonstrating projectile_motion() and its output. The function takes five arguments:\n\nvelocity in metres per second,\nangle in degrees,\nheight in metres per second,\ngravity in metres per second (this defaults to Earth’s gravity, 9.80665 m/s), and\nnframes which represents how many points in time to record in the trajectory data frame.\n\n\nprojectile_motion(\n velocity = 11.4,\n angle = 52.1,\n height = 18,\n nframes = 10\n)\n\n#> $velocity_x\n#> [1] 7.002851\n#> \n#> $velocity_y\n#> [1] 8.995559\n#> \n#> $flight_time\n#> [1] 3.041533\n#> \n#> $distance\n#> [1] 21.29941\n#> \n#> $max_height\n#> [1] 22.12578\n#> \n#> $max_height_time\n#> [1] 0.9172917\n#> \n#> $max_height_dist\n#> [1] 6.423657\n#> \n#> $trajectory\n#> x y vx vy second\n#> 1 0.000000 1.800000e+01 7.002851 8.9955586 0.0000000\n#> 2 2.366601 2.048003e+01 7.002851 5.6814194 0.3379481\n#> 3 4.733201 2.184005e+01 7.002851 2.3672802 0.6758963\n#> 4 7.099802 2.208006e+01 7.002851 -0.9468589 1.0138444\n#> 5 9.466402 2.120007e+01 7.002851 -4.2609981 1.3517926\n#> 6 11.833003 1.920007e+01 7.002851 -7.5751373 1.6897407\n#> 7 14.199604 1.608006e+01 7.002851 -10.8892765 2.0276889\n#> 8 16.566204 1.184005e+01 7.002851 -14.2034156 2.3656370\n#> 9 18.932805 6.480029e+00 7.002851 -17.5175548 2.7035852\n#> 10 21.299405 3.552714e-15 7.002851 -20.8316940 3.0415333\n#> \n#> $nframes\n#> [1] 10\n#> \n#> $fps\n#> [1] 3.287815\n\n\nThe function returns calculations for each of the projectile motion equations I covered above, as well as some additional output that can be used for plotting and animation:\n\nmax_height_time and max_height_dist give the time (s) and distance (m) it takes for the projectile to reach its maximum height.\ntrajectory gives the horizontal and vertical position and velocity at a given moment during the projectile’s trajectory.\nfps gives the number of frames per second that are needed to animate the trajectory in real time based on nframes. Because it is impossible to have a fraction of a frame there will be variance in how closely an animation’s duration matches the actual time of flight based on the value of nframes.\n\n\n\nLaunching a projectile on different planets\nGiven the inspiration for this post, a space themed simulation seems appropriate. Here I want to test how the gravity of each planet in our solar system influences projectile motion, given a projectile is launched with the same velocity, angle, and height.\nFirst I need to construct a named vector of the gravity of each planet in our solar system. NASA provides these values came as ratios of each planet’s gravity relative to Earth, so I had to multiply each one by Earth’s gravity to get the units correct.\n\n# All values are in metres per second\nplanets <- c(\n mercury = 3.7069137,\n venus = 8.8946315,\n earth = 9.80665,\n moon = 1.6279039,\n mars = 3.697107,\n jupiter = 23.143694,\n saturn = 8.9828914,\n uranus = 8.7181118,\n neptune = 10.983448,\n pluto = 0.6962721\n)\n\nThen I can create a named list of projectile motion calculations, one for each planet. Each planet has its own list of output from projectile_motion(), so the resulting list of projectile motion calculations is actually a list of lists. This can be tidied into a tibble to make it easier to work with.\n\n# Calculate projectile motion for each planet, given the same velocity,\n# angle, and height\nplanets_pm <- map(planets, ~{\n projectile_motion(\n velocity = 20,\n angle = 45,\n height = 35,\n gravity = .x,\n nframes = 100)\n})\n\n# Tidying the list of lists into a tibble makes it easier to work with. Note\n# that the trajectory column is a list column since it contains the trajectory\n# data frame for each planet.\nplanets_df <- planets_pm %>%\n enframe() %>%\n unnest_wider(value) %>%\n rename(planet = name)\n\nplanets_trajectory <- planets_df %>%\n select(planet, trajectory) %>%\n unnest(trajectory) %>% \n mutate(planet = factor(planet, levels = names(planets)))" }, { "objectID": "posts/2022-06-16_projectile-motion/index.html#visualize", "href": "posts/2022-06-16_projectile-motion/index.html#visualize", "title": "On motion", "section": "Visualize", - "text": "Visualize\nNow for visualization. First I’ll plot a simple trajectory, then a projectile launched on different planets.\n\nA simple trajectory\nThis is the same simple trajectory I showed the output for earlier, only with more frames to make the animation smoother.\n\nsimple_trajectory <- projectile_motion(\n velocity = 11.4,\n angle = 52.1,\n height = 18,\n nframes = 100\n)\n\n# Assign the data frame and max height parameters to objects to make the plot\n# code easier to read\ndf <- simple_trajectory$trajectory\nmax_height_dist <- simple_trajectory$max_height_dist\nmax_height_time <- simple_trajectory$max_height_time\nmax_height <- simple_trajectory$max_height\n\nI’m going to build the plot for this simple trajectory up in chunks to make the code easier to understand. The foundation of the plot is fairly standard. The only unusual thing here are the group aesthetics in geom_line() and geom_point(). These are used to tell gganimate which rows in the data correspond to the same graphic element.\n\np <- ggplot(df, aes(x = x, y = y)) +\n geom_line(\n aes(group = 1),\n linetype = \"dashed\",\n colour = \"red\",\n alpha = 0.5\n ) +\n geom_point(aes(group = 1), size = 2)\n\nFor the data I simulated, the projectile starts with a positive vertical velocity. However, at its maximum height, the vertical velocity of the projectile becomes 0 m/s for a brief moment, as it stops rising and starts falling. This happens Because gravity is constantly influencing the vertical velocity of the projectile. This is an important and interesting piece of information I want to communicate in my plot. This can be accomplished subtly by displaying the vertical velocity of the projectile at each point in time, or more overtly using a text annotation. I’m going to do both.\nFirst the text annotation. I’m using geom_curve() to draw an arrow between the annotation and the point at which the projectile is at its maximum height, and geom_text() to draw the annotation. I’ve supplied each geom with its own data frame containing a second column whose sole value corresponds to the time the projectile reaches its maximum height. This will control when the annotation appears in the animation. I’ve also given the pair a different group aesthetic from geom_line() and geom_point().\n\np <- p +\n geom_curve(\n data = data.frame(\n second = max_height_time\n ),\n aes(\n xend = max_height_dist,\n yend = max_height + 0.2,\n x = max_height_dist + 2,\n y = max_height + 2,\n group = 2\n ),\n curvature = 0.45,\n angle = 105,\n ncp = 15,\n arrow = arrow(length = unit(0.1,\"cm\"), type = \"closed\")\n ) +\n geom_text(\n data = data.frame(\n second = max_height_time\n ),\n aes(\n x = max_height_dist + 2.4,\n y = max_height + 2,\n group = 2\n ),\n hjust = \"left\",\n lineheight = 1,\n family = \"serif\",\n label = str_c(\n \"At its maximum height, the vertical velocity \\n\", \n \"of the projectile is 0 m/s for a brief moment, \\n\",\n \"as it stops rising and starts falling.\"\n )\n )\n\nSecond the vertical velocity. I’m displaying this in the plot’s subtitle along with the time elapsed since the projectile was launched. The displayed values are updated each frame using the value returned by the expression enclosed in glue braces for a frame. The variable frame_along is made available by gganimate::transition_along() (see below) and gives the position on the along-dimension (time in seconds in this case) that a frame corresponds to. Here I’m using frame_along to display the elapsed time, and to index the data frame df for the vertical velocity at a given second. The latter is a slight workaround because vy cannot be accessed directly in the glue braces.\n\np <- p +\n labs(\n title = str_c(\n \"Projectile motion of an object launched with \",\n #\"
    \",\n \"a speed of 11.4 m/s at an angle of 52.1°\"\n ),\n subtitle = str_c(\n \"Time: \",\n \"{formattable(frame_along, digits = 2, format = 'f')}s\",\n \"\\n\",\n \"Vertical velocity = \",\n \"{formattable(df$vy[df$second == frame_along], digits = 2, format = 'f')}\",\n \" m/s\"\n ),\n x = \"Distance (m)\",\n y = \"Height (m)\",\n caption = \"Graphic: Michael McCarthy\"\n )\n\nNow for theming. I wanted something minimalistic with a scientific feel—the classic theme paired with truncated axes courtesy of ggh4x does this nicely. Finally, I originally planned to use element_markdown() from ggtext to enable markdown text in the subtitle of the plot so that vertical velocity could be written like \\textrm{Velocity}_Y; however, this caused issues with the text spacing when rendering the animation to video, so I opted not to.1\n\np <- p +\n guides(x = \"axis_truncated\", y = \"axis_truncated\") +\n theme_classic(base_family = \"serif\")\n\nAnd finally, the animation code. Yes, that’s it. Animations can be tweaked and spiced up with other functions in gganimate, but I ran into issues making the ones I wanted to use work with transition_reveal().\n\n\nJust a note: The behaviour of transition_reveal() shown here was broken in v1.0.8 of gganimate.\n\nanim <- p +\n transition_reveal(second)\n\nanim\n\n\n\n\n\n\n\nLaunching a projectile on different planets\nNow to test how the gravity of each planet in our solar system influences projectile motion. As a reminder, I already simulated the projectile motion data in planets_trajectory, so now it’s just a matter of plotting it.\nSince the simulation is space themed, the plot should be too. Instead of using a simple point to represent the projectile, I’m going to use Font Awesome’s rocket icon by way of the emojifont package. To make it extra, I’ll also add propulsion and rotation to the rocket’s animation.\n\n# Make it so the propulsion is only present for first half of animation, so it\n# looks like the rockets are launching.\nrocket_propulsion <- planets_trajectory %>%\n group_by(planet) %>%\n mutate(retain = rep(c(TRUE, FALSE), each = 50)) %>%\n ungroup() %>%\n mutate(x = case_when(\n retain == FALSE ~ NA_real_,\n TRUE ~ x\n ))\n\nThe plotting code is mostly boilerplate, but I’ve added comments to highlight a few noteworthy points.\n\np <- ggplot(planets_trajectory, aes(x = x, y = y)) +\n geom_line(\n aes(colour = planet, group = planet),\n linetype = \"dashed\",\n alpha = 0.5,\n # Change the key glyph in the legend to a point, to represent a planet\n key_glyph = \"point\"\n ) +\n geom_point(\n data = rocket_propulsion,\n aes(group = planet),\n colour = \"orange\"\n ) +\n # Change the angle at different frames to rotate the rocket\n geom_text(\n aes(colour = planet, group = planet, label = fontawesome(\"fa-rocket\")),\n family='fontawesome-webfont',\n angle = rep(seq(0, 45, length = 100), 10),\n size = 6,\n # There is no rocket key glyph, so override this too\n key_glyph = \"point\"\n ) +\n scale_color_manual(\n values = c(\n \"#97979F\",\n \"#BBB7AB\",\n \"#8CB1DE\",\n \"#DAD9D7\",\n \"#E27B58\",\n \"#C88B3A\",\n \"#C5AB6E\",\n \"#93B8BE\",\n \"#6081FF\",\n \"#4390BA\"\n )\n ) +\n labs(\n title = str_c(\n \"projectile motion of an object launched on different planets in our solar system\"\n ),\n x = \"distance (m)\",\n y = \"height (m)\",\n caption = \"graphic: michael mccarthy\"\n ) +\n guides(\n x = \"axis_truncated\",\n y = \"axis_truncated\",\n colour = guide_legend(title.vjust = .7, nrow = 1, label.position = \"bottom\")\n ) +\n theme_classic(base_family = \"mono\") +\n theme(\n text = element_text(colour = \"white\"),\n axis.text = element_text(colour = \"white\"),\n rect = element_rect(fill = \"black\"),\n panel.background = element_rect(fill = \"black\"),\n axis.ticks = element_line(colour = \"white\"),\n axis.line = element_line(colour = \"white\"),\n legend.position = \"top\",\n legend.justification = \"left\"\n )\n\nFinally, the animation code. The shadow_wake() function is applied to the orange points used for rocket propulsion to really sell the effect.\n\nanim <- p +\n transition_reveal(second) +\n shadow_wake(wake_length = 0.1, size = 2, exclude_layer = c(1, 3))" + "text": "Visualize\nNow for visualization. First I’ll plot a simple trajectory, then a projectile launched on different planets.\n\nA simple trajectory\nThis is the same simple trajectory I showed the output for earlier, only with more frames to make the animation smoother.\n\nsimple_trajectory <- projectile_motion(\n velocity = 11.4,\n angle = 52.1,\n height = 18,\n nframes = 100\n)\n\n# Assign the data frame and max height parameters to objects to make the plot\n# code easier to read\ndf <- simple_trajectory$trajectory\nmax_height_dist <- simple_trajectory$max_height_dist\nmax_height_time <- simple_trajectory$max_height_time\nmax_height <- simple_trajectory$max_height\n\nI’m going to build the plot for this simple trajectory up in chunks to make the code easier to understand. The foundation of the plot is fairly standard. The only unusual thing here are the group aesthetics in geom_line() and geom_point(). These are used to tell gganimate which rows in the data correspond to the same graphic element.\n\np <- ggplot(df, aes(x = x, y = y)) +\n geom_line(\n aes(group = 1),\n linetype = \"dashed\",\n colour = \"red\",\n alpha = 0.5\n ) +\n geom_point(aes(group = 1), size = 2)\n\nFor the data I simulated, the projectile starts with a positive vertical velocity. However, at its maximum height, the vertical velocity of the projectile becomes 0 m/s for a brief moment, as it stops rising and starts falling. This happens Because gravity is constantly influencing the vertical velocity of the projectile. This is an important and interesting piece of information I want to communicate in my plot. This can be accomplished subtly by displaying the vertical velocity of the projectile at each point in time, or more overtly using a text annotation. I’m going to do both.\nFirst the text annotation. I’m using geom_curve() to draw an arrow between the annotation and the point at which the projectile is at its maximum height, and geom_text() to draw the annotation. I’ve supplied each geom with its own data frame containing a second column whose sole value corresponds to the time the projectile reaches its maximum height. This will control when the annotation appears in the animation. I’ve also given the pair a different group aesthetic from geom_line() and geom_point().\n\np <- p +\n geom_curve(\n data = data.frame(\n second = max_height_time\n ),\n aes(\n xend = max_height_dist,\n yend = max_height + 0.2,\n x = max_height_dist + 2,\n y = max_height + 2,\n group = 2\n ),\n curvature = 0.45,\n angle = 105,\n ncp = 15,\n arrow = arrow(length = unit(0.1,\"cm\"), type = \"closed\")\n ) +\n geom_text(\n data = data.frame(\n second = max_height_time\n ),\n aes(\n x = max_height_dist + 2.4,\n y = max_height + 2,\n group = 2\n ),\n hjust = \"left\",\n lineheight = 1,\n family = \"serif\",\n label = str_c(\n \"At its maximum height, the vertical velocity \\n\", \n \"of the projectile is 0 m/s for a brief moment, \\n\",\n \"as it stops rising and starts falling.\"\n )\n )\n\nSecond the vertical velocity. I’m displaying this in the plot’s subtitle along with the time elapsed since the projectile was launched. The displayed values are updated each frame using the value returned by the expression enclosed in glue braces for a frame. The variable frame_along is made available by gganimate::transition_along() (see below) and gives the position on the along-dimension (time in seconds in this case) that a frame corresponds to. Here I’m using frame_along to display the elapsed time, and to index the data frame df for the vertical velocity at a given second. The latter is a slight workaround because vy cannot be accessed directly in the glue braces.\n\np <- p +\n labs(\n title = str_c(\n \"Projectile motion of an object launched with \",\n #\" <br> \",\n \"a speed of 11.4 m/s at an angle of 52.1°\"\n ),\n subtitle = str_c(\n \"Time: \",\n \"{formattable(frame_along, digits = 2, format = 'f')}s\",\n \"\\n\",\n \"Vertical velocity = \",\n \"{formattable(df$vy[df$second == frame_along], digits = 2, format = 'f')}\",\n \" m/s\"\n ),\n x = \"Distance (m)\",\n y = \"Height (m)\",\n caption = \"Graphic: Michael McCarthy\"\n )\n\nNow for theming. I wanted something minimalistic with a scientific feel—the classic theme paired with truncated axes courtesy of ggh4x does this nicely. Finally, I originally planned to use element_markdown() from ggtext to enable markdown text in the subtitle of the plot so that vertical velocity could be written like \\textrm{Velocity}_Y; however, this caused issues with the text spacing when rendering the animation to video, so I opted not to.1\n\np <- p +\n guides(x = \"axis_truncated\", y = \"axis_truncated\") +\n theme_classic(base_family = \"serif\")\n\nAnd finally, the animation code. Yes, that’s it. Animations can be tweaked and spiced up with other functions in gganimate, but I ran into issues making the ones I wanted to use work with transition_reveal().\n\n\nJust a note: The behaviour of transition_reveal() shown here was broken in v1.0.8 of gganimate.\n\nanim <- p +\n transition_reveal(second)\n\nanim\n\n\n\n\nLaunching a projectile on different planets\nNow to test how the gravity of each planet in our solar system influences projectile motion. As a reminder, I already simulated the projectile motion data in planets_trajectory, so now it’s just a matter of plotting it.\nSince the simulation is space themed, the plot should be too. Instead of using a simple point to represent the projectile, I’m going to use Font Awesome’s rocket icon by way of the emojifont package. To make it extra, I’ll also add propulsion and rotation to the rocket’s animation.\n\n# Make it so the propulsion is only present for first half of animation, so it\n# looks like the rockets are launching.\nrocket_propulsion <- planets_trajectory %>%\n group_by(planet) %>%\n mutate(retain = rep(c(TRUE, FALSE), each = 50)) %>%\n ungroup() %>%\n mutate(x = case_when(\n retain == FALSE ~ NA_real_,\n TRUE ~ x\n ))\n\nThe plotting code is mostly boilerplate, but I’ve added comments to highlight a few noteworthy points.\n\np <- ggplot(planets_trajectory, aes(x = x, y = y)) +\n geom_line(\n aes(colour = planet, group = planet),\n linetype = \"dashed\",\n alpha = 0.5,\n # Change the key glyph in the legend to a point, to represent a planet\n key_glyph = \"point\"\n ) +\n geom_point(\n data = rocket_propulsion,\n aes(group = planet),\n colour = \"orange\"\n ) +\n # Change the angle at different frames to rotate the rocket\n geom_text(\n aes(colour = planet, group = planet, label = fontawesome(\"fa-rocket\")),\n family='fontawesome-webfont',\n angle = rep(seq(0, 45, length = 100), 10),\n size = 6,\n # There is no rocket key glyph, so override this too\n key_glyph = \"point\"\n ) +\n scale_color_manual(\n values = c(\n \"#97979F\",\n \"#BBB7AB\",\n \"#8CB1DE\",\n \"#DAD9D7\",\n \"#E27B58\",\n \"#C88B3A\",\n \"#C5AB6E\",\n \"#93B8BE\",\n \"#6081FF\",\n \"#4390BA\"\n )\n ) +\n labs(\n title = str_c(\n \"projectile motion of an object launched on different planets in our solar system\"\n ),\n x = \"distance (m)\",\n y = \"height (m)\",\n caption = \"graphic: michael mccarthy\"\n ) +\n guides(\n x = \"axis_truncated\",\n y = \"axis_truncated\",\n colour = guide_legend(title.vjust = .7, nrow = 1, label.position = \"bottom\")\n ) +\n theme_classic(base_family = \"mono\") +\n theme(\n text = element_text(colour = \"white\"),\n axis.text = element_text(colour = \"white\"),\n rect = element_rect(fill = \"black\"),\n panel.background = element_rect(fill = \"black\"),\n axis.ticks = element_line(colour = \"white\"),\n axis.line = element_line(colour = \"white\"),\n legend.position = \"top\",\n legend.justification = \"left\"\n )\n\nFinally, the animation code. The shadow_wake() function is applied to the orange points used for rocket propulsion to really sell the effect.\n\nanim <- p +\n transition_reveal(second) +\n shadow_wake(wake_length = 0.1, size = 2, exclude_layer = c(1, 3))" }, { "objectID": "posts/2022-06-16_projectile-motion/index.html#section", @@ -853,6 +1021,13 @@ "section": "Session Info", "text": "Session Info\n\n\n\n\n\n\n─ Session info ───────────────────────────────────────────────────────────────\n setting value\n version R version 4.2.2 (2022-10-31)\n os macOS Mojave 10.14.6\n system x86_64, darwin17.0\n ui X11\n language (EN)\n collate en_CA.UTF-8\n ctype en_CA.UTF-8\n tz America/Vancouver\n date 2022-12-24\n pandoc 2.14.0.3 @ /Applications/RStudio.app/Contents/MacOS/pandoc/ (via rmarkdown)\n quarto 1.2.280 @ /usr/local/bin/quarto\n\n─ Packages ───────────────────────────────────────────────────────────────────\n package * version date (UTC) lib source\n dplyr * 1.0.10 2022-09-01 [1] CRAN (R 4.2.0)\n emojifont * 0.5.5 2021-04-20 [1] CRAN (R 4.2.0)\n forcats * 0.5.2 2022-08-19 [1] CRAN (R 4.2.0)\n formattable * 0.2.1 2021-01-07 [1] CRAN (R 4.2.0)\n gganimate * 1.0.7 2020-10-15 [1] CRAN (R 4.2.2)\n ggh4x * 0.2.3 2022-11-09 [1] CRAN (R 4.2.0)\n ggplot2 * 3.4.0 2022-11-04 [1] CRAN (R 4.2.0)\n glue * 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)\n here * 1.0.1 2020-12-13 [1] CRAN (R 4.2.0)\n purrr * 0.3.5 2022-10-06 [1] CRAN (R 4.2.0)\n readr * 2.1.3 2022-10-01 [1] CRAN (R 4.2.0)\n sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)\n stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.2.0)\n tibble * 3.1.8 2022-07-22 [1] CRAN (R 4.2.0)\n tidyr * 1.2.1 2022-09-08 [1] CRAN (R 4.2.0)\n tidyverse * 1.3.2 2022-07-18 [1] CRAN (R 4.2.0)\n\n [1] /Users/Michael/Library/R/x86_64/4.2/library/__tidytales\n [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library\n\n──────────────────────────────────────────────────────────────────────────────" }, + { + "objectID": "posts/2022-06-16_projectile-motion/index.html#footnotes", + "href": "posts/2022-06-16_projectile-motion/index.html#footnotes", + "title": "On motion", + "section": "Footnotes", + "text": "Footnotes\n\n\nI didn’t look into it too deeply, but I’m guessing it’s related to this issue in ggtext. If you render to a gif instead you won’t have this issue and can use ggtext as normal.↩︎" + }, { "objectID": "posts/2021-06-19_distill/index.html", "href": "posts/2021-06-19_distill/index.html", @@ -872,14 +1047,14 @@ "href": "posts/2021-06-19_distill/index.html#page-and-article-metadata", "title": "What’s he building in there?", "section": "Page and Article Metadata", - "text": "Page and Article Metadata\nDistill comes equipped with a number of features to automatically enable richer sharing of article links on the web using article metadata. However, some of these features are not available for non-article pages on distill blogs (such as link preview images for the home page), and the automatic behaviour of these features limits how much they can be customized. Both of these limitations can be overcome using the metathis package by Garrick Aden-Buie.\nTom Mock has a great blog post diving into how metadata can be used to customize how links from a distill blog appear on social media. It’s a great resource and I followed it to add metadata and preview images to the home and about pages of Tidy Tales.\nHere is what the index.Rmd file for the Tidy Tales home page looks like.\n---\ntitle: \"Wrangling, Visualizing, Modelling, Communicating data\"\nsite: distill::distill_website\nlisting: posts\n---\n\n```{r, include=FALSE, results='asis'}\nlibrary(metathis)\n\nmeta() %>%\n meta_social(\n title = \"Tidy Tales\",\n description = \"Wrangling, Visualizing, Modelling, Communicating data\",\n url = \"https://tidytales.ca\",\n image = \"https://tidytales.ca/inst/images/twittercard.png\",\n image_alt = \"Tidy Tales logo\",\n og_type = \"website\",\n twitter_card_type = \"summary\",\n twitter_site = NULL\n )\n```\nWhen the site is built distill will automatically generate metadata for the home page, and the metathis code in index.Rmd will generate additional metadata for the home page. Here is what it looks like in HTML.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThere is some overlap between the tags generated by distill and metathis, however, the metadata tags generated by metathis seem to take precedence over those automatically generated by distill. For example, the Twitter card for the Tidy Tales home page displays “Tidy Tales” as its title, rather than “Tidy Tales | Michael McCarthy: Wrangling, Visualizing, Modelling, Communicating data”.\n\nArticle Metadata\nThe ability to override some of the metadata generated by distill using metathis is hacky, but it also affords more customization for distill blogs. One trick I’m taking full advantage of with this is to have unique preview images between posts on Tidy Tales and their social cards. Distill allows you to specify a preview image for a post using the preview chunk option.\n```{r, preview=TRUE}\nlibrary(ggplot2)\nggplot(diamonds, aes(carat, price)) +\n geom_smooth()\n```\nThis preview image will be used alongside post listings and in social cards. However, if a different image is specified in metathis::meta_social() that image will be used in social cards but the preview image specified in the post chunk will still be used alongside post listings. I’m using this on Tidy Tales to have branded images for social cards and plain images for post listings. Here’s an example of the branded social card image from my first post.\n\n\n\n\n\nThe branded social card image for my first post. Copy the post link into a tweet to see it in action." + "text": "Page and Article Metadata\nDistill comes equipped with a number of features to automatically enable richer sharing of article links on the web using article metadata. However, some of these features are not available for non-article pages on distill blogs (such as link preview images for the home page), and the automatic behaviour of these features limits how much they can be customized. Both of these limitations can be overcome using the metathis package by Garrick Aden-Buie.\nTom Mock has a great blog post diving into how metadata can be used to customize how links from a distill blog appear on social media. It’s a great resource and I followed it to add metadata and preview images to the home and about pages of Tidy Tales.\nHere is what the index.Rmd file for the Tidy Tales home page looks like.\n---\ntitle: \"Wrangling, Visualizing, Modelling, Communicating data\"\nsite: distill::distill_website\nlisting: posts\n---\n\n```{r, include=FALSE, results='asis'}\nlibrary(metathis)\n\nmeta() %>%\n meta_social(\n title = \"Tidy Tales\",\n description = \"Wrangling, Visualizing, Modelling, Communicating data\",\n url = \"https://tidytales.ca\",\n image = \"https://tidytales.ca/inst/images/twittercard.png\",\n image_alt = \"Tidy Tales logo\",\n og_type = \"website\",\n twitter_card_type = \"summary\",\n twitter_site = NULL\n )\n```\nWhen the site is built distill will automatically generate metadata for the home page, and the metathis code in index.Rmd will generate additional metadata for the home page. Here is what it looks like in HTML.\n<!-- Generated by distill -->\n<meta property=\"og:title\" content=\"Tidy Tales | Michael McCarthy: Wrangling, Visualizing, Modelling, Communicating data\">\n<meta property=\"og:type\" content=\"article\">\n<meta property=\"og:locale\" content=\"en_US\">\n<meta property=\"og:site_name\" content=\"Tidy Tales | Michael McCarthy\">\n<meta property=\"twitter:card\" content=\"summary\">\n<meta property=\"twitter:title\" content=\"Tidy Tales | Michael McCarthy: Wrangling, Visualizing, Modelling, Communicating data\">\n<meta property=\"twitter:site\" content=\"@propertidytales\">\n<meta property=\"twitter:creator\" content=\"@mccarthymg\">\n\n<!-- Generated by metathis -->\n<meta property=\"og:locale\" content=\"en_US\">\n<meta name=\"twitter:title\" content=\"Tidy Tales\">\n<meta name=\"twitter:description\" content=\"Wrangling, Visualizing, Modelling, Communicating data\">\n<meta name=\"twitter:url\" content=\"https://tidytales.ca\">\n<meta name=\"twitter:image:src\" content=\"https://tidytales.ca/inst/images/twittercard.png\">\n<meta name=\"twitter:image:alt\" content=\"Tidy Tales logo\">\n<meta name=\"twitter:card\" content=\"summary\">\n<meta property=\"og:title\" content=\"Tidy Tales\">\n<meta property=\"og:description\" content=\"Wrangling, Visualizing, Modelling, Communicating data\">\n<meta property=\"og:url\" content=\"https://tidytales.ca\">\n<meta property=\"og:image\" content=\"https://tidytales.ca/inst/images/twittercard.png\">\n<meta property=\"og:image:alt\" content=\"Tidy Tales logo\">\n<meta property=\"og:type\" content=\"website\">\nThere is some overlap between the <meta> tags generated by distill and metathis, however, the metadata tags generated by metathis seem to take precedence over those automatically generated by distill. For example, the Twitter card for the Tidy Tales home page displays “Tidy Tales” as its title, rather than “Tidy Tales | Michael McCarthy: Wrangling, Visualizing, Modelling, Communicating data”.\n\nArticle Metadata\nThe ability to override some of the metadata generated by distill using metathis is hacky, but it also affords more customization for distill blogs. One trick I’m taking full advantage of with this is to have unique preview images between posts on Tidy Tales and their social cards. Distill allows you to specify a preview image for a post using the preview chunk option.\n```{r, preview=TRUE}\nlibrary(ggplot2)\nggplot(diamonds, aes(carat, price)) +\n geom_smooth()\n```\nThis preview image will be used alongside post listings and in social cards. However, if a different image is specified in metathis::meta_social() that image will be used in social cards but the preview image specified in the post chunk will still be used alongside post listings. I’m using this on Tidy Tales to have branded images for social cards and plain images for post listings. Here’s an example of the branded social card image from my first post.\n\n\n\n\n\nThe branded social card image for my first post. Copy the post link into a tweet to see it in action." }, { "objectID": "posts/2021-06-19_distill/index.html#utterances-comments", "href": "posts/2021-06-19_distill/index.html#utterances-comments", "title": "What’s he building in there?", "section": "Utterances Comments", - "text": "Utterances Comments\nDistill only supports Disqus comments officially. I did not want to use Disqus comments on Tidy Tales because it would add bloat to my posts, and because I do not want a third-party data mining and tracking Tidy Tales readers. Utterances is a lightweight alternative that uses GitHub issues for comments. Miles McBain shared an HTML script on his blog to add Utterances to a distill blog.\nHere is what the script for Tidy Tales looks like.\n\nThe script uses JavaScript to inject the Utterances