Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite URLs in imported WXR files to avoid broken navigation links (white screen, errors, nested Playground) #1780

Open
bph opened this issue Sep 18, 2024 · 13 comments
Labels
[Aspect] Networking Blocked [Type] Bug An existing feature does not function as intended

Comments

@bph
Copy link
Collaborator

bph commented Sep 18, 2024

On this Playground site. I get intermittent success when using the navigation menu. One-page load works, subsequent page loads show a white screen.
The content is all working when I got to WP-admin > Pages and use the View of each page. But sometimes on link works, but then the next one doesn't.

The content and blueprint can be viewed in this repo.

Here is a video of my clicking around on the site.

Screen.Recording.2024-09-18.at.13.16.13.mov
@bph
Copy link
Collaborator Author

bph commented Sep 18, 2024

When I navigate around the site not using the links in the navigation block, the pages load consistently.
Archive page, author page, single posts etc.
when I click again on one of the links in the navigation bar, I get a white screen once more.

Screen.Recording.2024-09-18.at.14.51.42.mov

At the end of the video, you see me click on the Home link in the navigation, it actually loads another playground instance into the site.

Screenshot 2024-09-18 at 14 52 51

@bgrgicak
Copy link
Collaborator

I'm wondering if these two could be related #349

This seems like a caching bug to me.
Playground is trying to load the page from cache while it should call PHP.
Screenshot 2024-09-18 at 14 14 58

@adamziel adamziel transferred this issue from WordPress/playground-tools Sep 18, 2024
@bph
Copy link
Collaborator Author

bph commented Sep 19, 2024

Thank you @adamziel for setting me straight... Glad it was so easy to transfer.

@bph
Copy link
Collaborator Author

bph commented Sep 19, 2024

The default instance of playground uses a URL like https://playground.wordpress.net/scope:0.5198681762892301/?page_id=2 (TT4, Sample page in Header)

On my site it only has the URL https://playground.wordpress.net/about-us

Is there a way for me to modify the URL in the Navigation space of my .xml file from relative links
<!-- wp:navigation-link {"label":"About Us","type":"page","description":"","id":28,"url":"/about-us/","kind":"post-type"} /-->
To something like https://playground.wordpress.net/scope:{somestring}/about-us?
The line of code is in the XML import, that I modified to remove the original site's absolute links to show only relative links.

@bgrgicak bgrgicak self-assigned this Sep 20, 2024
@bgrgicak bgrgicak moved this from Inbox to In progress in Playground Board Sep 20, 2024
@bgrgicak bgrgicak added [Type] Bug An existing feature does not function as intended [Aspect] Networking labels Sep 20, 2024
@bgrgicak
Copy link
Collaborator

bgrgicak commented Sep 20, 2024

This is definitely related to scope.

After the first load, the page is /.
When you click on a page like /patterns/ the referer (/) has scope so we avoid caching.
When you click on /news/ the referer (/patterns/) doesn't have scope and it goes to cache.

I think that there is an underlying problem because / gets a scope when used as a referer, while /patterns/ doesn't.

@bgrgicak
Copy link
Collaborator

Is there a way for me to modify the URL in the Navigation space of my .xml file from relative links

To something like https://playground.wordpress.net/scope:{somestring}/about-us?
The line of code is in the XML import, that I modified to remove the original site's absolute links to show only relative links.

Great research @bph! You are right about the root cause being imported URLs that aren't rewritten.

I'm not sure what's the best way to address this and will need to work with @adamziel and @brandonpayton on finding possible next steps.

@bgrgicak
Copy link
Collaborator

bgrgicak commented Sep 20, 2024

It looks like we are attempting to add the scope to the URL if it doesn't exist but that scope isn't used later by the browser or our code (I still don't know).

@bgrgicak
Copy link
Collaborator

I see a few directions here, but I'm not sure what to do.

  • Rewrite all URLs upfront. In this case, ensure WXR imported URLs have a scope.
  • Find a way to ingest scope into URLs after Playground loads.
  • Find another way to store and propagate scope instead of pretending it to the URL.

@bgrgicak bgrgicak moved this from In progress to Needs Triage/Our Reply in Playground Board Sep 20, 2024
@bgrgicak
Copy link
Collaborator

I'm moving this to blocked until I get some feedback from @WordPress/playground-maintainers.

@bph
Copy link
Collaborator Author

bph commented Sep 20, 2024

@bgrgicak thank you so much for pushing this forward.

This is actually also a problem when migrating sites to other servers, as absolute links need to have a search/replace function. If Playground can do it out of the box, there wouldn't be a need for me to modify the original site export file for images and links. And a two section of my tutorial could be cut could be cut. 🤔

Seems you have enough information to tackle this. Just want to mention that this is not only a hick-up in relation to the navigation block but happens with normal on page links, to be visible on the Templates page. Those also don't work.
the string of the link is

<li>a <a href="/page-no-title/" data-type="page" data-id="192">page  no title template</a> that allows for a Hero image or a Cover block directly on the top of the page. </li>
<!-- /wp:list-item -->

Screenshot 2024-09-20 at 11 39 49

@adamziel
Copy link
Collaborator

@bph A proper resolution will take a few months. Is there a way you could ship that block without an absolute URL in the href=""? Maybe a relative one would work? Or maybe the block could handle only having a page ID?

Longer answer:

The imported WXR file contains this code:

<!-- wp:list-item -->
<li>a <a href="/page-no-title/" data-type="page" data-id="192">page  no title template</a> that allows for a Hero image or a Cover block directly on the top of the page. </li>
<!-- /wp:list-item -->

Which is not rewritten by the WXR importer we're currently using. I'm not aware of a tool that we could use in Playground that would also could correctly handle that today. I'm planning to fork/build a WXR importer and bake in the URL rewriting using the plumbing we've been exploring for the past year [1] [2]. Once it matures, I'll want to propose it for WordPress core.

[1] https://github.com/adamziel/site-transfer-protocol
[2] adamziel/wxr-normalize#1

@adamziel adamziel moved this from Needs Triage/Our Reply to Blocked in Playground Board Sep 23, 2024
@adamziel adamziel changed the title Intermittent white screen from navigation links Rewrite URLs in imported WXR files to avoid broken navigation links (white screen, errors, nested Playground) Sep 23, 2024
@bph
Copy link
Collaborator Author

bph commented Sep 27, 2024

@adamziel thanks for looking into this again.

I am a bit confused as to what you see as absolute link and relative link

Maybe a relative one would work?
Isn't <a href="/page-no-title/" a relative link? An absolute link would be something lie https://wordpress67.local/page-no-title

@bph
Copy link
Collaborator Author

bph commented Sep 27, 2024

So for the header navigation, the examples of how the theme Twenty-Twenty-Four works out of the box got me thinking.

If I added all the pages and be deliberate with the page parent selection, the theme default navigation probably will work with the page list, create the submenus and some voodoo that is built into it. (voodoo = not entirely clear, how it works)

So with the v2 blueprint and v2 content, I was able to get this part working.
)

Screen.Recording.2024-09-27.at.17.37.35.mov

In the video you can see that all link from the top navigation have a scope assigned and load pages from a virtual (or how you want to call it) directory. It works because I didn't create a custom navigation block. The automatism built into WordPress takes care of it. but it seems Playground already rewrites links and adds scope to the URLs.

Next steps:
Before the next upload

  • Get the images references fixed in the * xml and
  • make the page links on the Templates page relative again.

@bgrgicak bgrgicak removed their assignment Nov 1, 2024
adamziel added a commit that referenced this issue Dec 11, 2024
…2058)

## Description

Adds the Data Liberation WXR importer as an option in the `importWxr`
step. The new importer is turned by including the `"importer":
"data-liberation"` option:

```json
{
  "steps": [
    {
      "step": "importWxr",
      "file": {
        "resource": "url",
        "url": "https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml"
      },
      "importer": "data-liberation"
    }
  ]
}
```

When the `importer` option is missing or set to "default," nothing
changes in the behavior of the step and it continues using the
https://github.com/humanmade/WordPress-Importer importer.

The new importer:

* Rewrites links in the imported content
* Downloads assets through Playground's CORS proxy
* Parallelizes the downloads
* Communicates progress

This PR is a part of
#1894

## Implementation details

This `importWxr` step fetches and includes the
`data-liberation-core.phar` file. The phar file is built with
[Box](https://box-project.github.io/box/configuration/) and contains the
importer library with its dependencies, which is a subset of the Data
Liberation library, a subset of the Blueprints library, and a few vendor
libraries.

This, unfortunately, means that any changes in the PHP files require
rebuilding the .phar file. Here's how you can do it:

```bash
nx build:phar playground-data-liberation
```

You can also build the entire Data Liberation package as a WordPress
plugin complete with a wp-admin page:

```bash
nx build:plugin playground-data-liberation
```

Both commands will output the built files to
`packages/playground/data-liberation/dist`

The progress updates are a first-class feature of the new importer. The
updated `importer` step receives them in real-time via a
`post_message_to_js()` call running after every import step. Then, it
passes them on to the progress bar UI.

### Other changes

* **TLS traffic now goes through the CORS proxy.** Since the new
importer uses `AsyncHTTP\Client` which deals with raw sockets,
Playground's [TLS-based network
bridge](#1926)
runs the outbound traffic through a cors proxy. Technically,
`TCPOverFetchWebsocket` gets the `corsProxy` URL passed to the
`playground.boot()` call.
* A few composer dependencies were forked, downgraded to PHP 7.2 using
Rector, and bundled with this PR to keep the Data Liberation importer
working.

## Remaining work

- [x] PHP 7.2 compatibility. Done by forking and Rector-downgrading
dependencies that were incompatible with PHP 7.2.
- [x] Report the importer's progress on the overall Blueprint progress
bar
- [x] Enqueue the data liberation plugin files for downloading at the
blueprint compilation stage
- [x] Don't eagerly rewrite attachments URLs in `WP_Stream_Importer`.
Exposing this information to the API consumer requires an explicit
decision. Do we rewrite it? Or do we ignore it?
- [x] Fix the TLS errors at the intersection of Playground network
transport and the async HTTP client library
- [x] Separate the markdown importer and its dependencies (md parser,
frontmatter parser, Symfony libraries) from the core plugin
- [x] Ship the importer and its tree-shaken deps (URL parser) as a
minified zip/phar

## Follow-up work

- [ ] Reconsider the `WP_Import_Session` API – do we need so many
verbosely named methods? Can we achieve the same outcomes with fewer
methods?
- [ ] Investigate why there's a significant delay before media downloads
start on PHP 7.2 – 7.4. It's likely a PHP.wasm issue.

## Testing instructions

* Default importer – [Open this
link](http://localhost:5400/website-server/#{%20%22plugins%22:%20[],%20%22steps%22:%20[%20{%20%22step%22:%20%22importWxr%22,%20%22file%22:%20{%20%22resource%22:%20%22url%22,%20%22url%22:%20%22https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml%22%20}%20}%20],%20%22preferredVersions%22:%20{%20%22php%22:%20%228.3%22,%20%22wp%22:%20%226.7%22%20},%20%22features%22:%20{%20%22networking%22:%20true%20},%20%22login%22:%20true%20})
and confirm it does what the current `importWxr` step do, that is it
stays at "Importing content" for a moment, fails to fetch media files
(CORS issues in network tools), but inserts posts and pages.
* Data Liberation – [Open this
link](http://localhost:5400/website-server/#{%20%22plugins%22:%20[],%20%22steps%22:%20[%20{%20%22step%22:%20%22importWxr%22,%20%22importer%22:%20%22data-liberation%22,%20%22file%22:%20{%20%22resource%22:%20%22url%22,%20%22url%22:%20%22https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml%22%20}%20}%20],%20%22preferredVersions%22:%20{%20%22php%22:%20%228.3%22,%20%22wp%22:%20%226.7%22%20},%20%22features%22:%20{%20%22networking%22:%20true%20},%20%22login%22:%20true%20}),
confirm the import progress is visible and that the content and media
indeed get imported:

![CleanShot 2024-12-08 at 14 54
49@2x](https://github.com/user-attachments/assets/a7da3244-a10f-43d2-8e94-43d305220a7e)

## Related issues

* #1211 
* #2012 
* #1477 
* #1250 
* #1780
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Aspect] Networking Blocked [Type] Bug An existing feature does not function as intended
Projects
Status: Blocked
Development

No branches or pull requests

3 participants