Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LaTeX: Document links are not created properly #64

Open
J-Moravec opened this issue Dec 16, 2024 · 3 comments
Open

LaTeX: Document links are not created properly #64

J-Moravec opened this issue Dec 16, 2024 · 3 comments

Comments

@J-Moravec
Copy link

Expected behaviour that works in HTML:

  1. Labels are automatically created for headings (#, ##, ###)
  2. Labels can be linked simply with [name](#label)

However, both of these don't work in LaTeX:

  1. Labels are not automatically created
  2. When forced (using ## heading {#foo}), they still won't work.

After exploring the generated tex document with keep_tex: true, here is the following issue:

  1. Label is not created
  2. When forced, label is created using label
  3. link is created using hyperlink

The problem seems to be that hyperlink doesn't work with label and requires hypertarget (see here). Replacing hyperlink with hyperref might work, but still doesn't solve the issue of non-existing named label.

MRE:

rmd:

---
output:
    latex:
        latex_engine: pdflatex
        keep_tex: true
---

## Foo

This should create link to the [Foo section](#foo)

tex:

\documentclass[]{article}
\usepackage[T1]{fontenc}
\usepackage{graphicx,hyperref}





\begin{document}




\subsection{Foo}

This should create link to the \protect\hyperlink{foo}{Foo section}



\end{document}

<-- that's a looot of whitespace

(btw. I thought that adding additional output html would create both files as per documentation, but it seems that it doesn't?)

@yihui
Copy link
Owner

yihui commented Dec 17, 2024

Labels are not automatically created for headings when the output format is latex. I feel the implementation will be ugly since LaTeX documents are much harder to parse than HTML. If you want to try it, please feel free to submit a PR. At the moment, if you want to cross-reference a heading, you have to manually assign an ID to it, which will be used to generate the \label{}. Then you can use the syntax @sec:ID to reference it. Full documentation is at https://yihui.org/litedown/#sec:cross-references

---
output:
    latex:
        latex_engine: pdflatex
        keep_tex: true
---

## Foo {#sec:foo}

This should create link to Section @sec:foo.

<-- that's a looot of whitespace

If you don't like the blank lines, you can provide your own template. The blank lines are caused by the empty values of variables in the default template: https://github.com/yihui/litedown/blob/main/inst/resources/litedown.latex On the other hand, you know that these blank lines are harmless.

(btw. I thought that adding additional output html would create both files as per documentation, but it seems that it doesn't?)

No, only the first output format is used by default. If you need both .tex and .html output, you have to call fuse() or mark() twice.

@J-Moravec
Copy link
Author

Labels are not automatically created for headings when the output format is latex. I feel the implementation will be ugly since LaTeX documents are much harder to parse than HTML. If you want to try it, please feel free to submit a PR.

Yeah, I will try to cook up something. I feel that this is basic feature of markdown that should be supported without any special hoolabaloos.

Without looking into code, I am thinking about getting a database of links in document and then creating labels as required. I will refer back in a few days if this idea panned out.

@yihui
Copy link
Owner

yihui commented Dec 18, 2024

The relevant code to add automatic IDs to headings in HTML is here:

litedown/R/utils.R

Lines 695 to 708 in 3241888

# add auto identifiers to headings
auto_identifier = function(x) {
r = '<(h[1-6])([^>]*)>(.+?)</\\1>'
match_replace(x, r, function(z) {
z1 = sub(r, '\\1', z) # tag
z2 = sub(r, '\\2', z) # attrs
z3 = sub(r, '\\3', z) # content
i = !grepl(' id="[^"]*"', z2) # skip headings that already have IDs
p = ifelse(z1 == 'h1', 'chp:', 'sec:') # h1 is chapter; h2+ are sections
id = unique_id(paste0(p[i], alnum_id(z3[i])), 'section')
z[i] = sprintf('<%s id="%s"%s>%s</%s>', z1[i], id, z2[i], z3[i], z1[i])
z
})
}

The complications with LaTeX are at least: 1) we can't reliably detect headings by regular expressions, e.g., \section{} may appear in a verbatim environment; 2) LaTeX output may be hard-wrapped (see the width argument on ?commonmark::markdown_latex), so a heading might span across multiple lines. These problems don't exist for HTML output.

One possible way to solve this problem is to add IDs to the Markdown source, e.g., turn ## Hello world to ## Hello world {#sec:hello-world}. You would have to parse Markdown via commonmark::markdown_xml() to find out all headings.

Anyway, personally I feel it may not worth the effort but am open to discussion. I tend to manually add IDs to headings (like this) even though I only need HTML output and not LaTeX. Manual IDs have two advantages: 1) They are stable (not subject to changes to heading text); 2) They can be terse (I don't want references like @sec:a-long-auto-generated-id-from-a-long-heading).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants