Skip to content
This repository has been archived by the owner on Feb 27, 2024. It is now read-only.

Rewrite links #2

Open
sminnee opened this issue Nov 12, 2012 · 6 comments
Open

Rewrite links #2

sminnee opened this issue Nov 12, 2012 · 6 comments
Labels
Milestone

Comments

@sminnee
Copy link
Owner

sminnee commented Nov 12, 2012

Internal links within the site should be rewritten to point to the imported SilverStripe pages.

@sminnee
Copy link
Owner Author

sminnee commented May 30, 2013

This is partially completed (there is a build task) but I think it's buggy.

@sminnee
Copy link
Owner Author

sminnee commented Jun 10, 2013

It sounds like @phptek is going to to finish this off.

Most of the code is in StaticSiteRewriteLinksTask, and if my assessment is correct, the main source of bugs here is that it's not clear which page you should be linking to when the import script has been imported multiple times.

StaticSiteDataExtension defines a StaticSiteURL field, and I think that, in order to provide a robust tool, the StaticSiteURL should be unique across a single StaticSiteContentSource (StaticSiteContentSource is also a has_one created by StaticSiteDataExtension)

So, before importing a page, run something like this: (it would need to be abstracted out for different content types, the query would be created using the ORM, etc, but you get the idea)

 UPDATE SiteTree SET StaticSiteURL = NULL WHERE StaticSiteURL = 'this-url-im-about-to-add' AND StaticSiteContentSourceID = CurrentlyImportedContentSourceID

The other issue I ran into is that there were a lot of URLs that couldn't be rewritten, and I'm not sure if this is because of trivial differences (like case, or escaping of characters) that should be detected. I would have "link couldn't be rewritten" warnings aggregated into a single list so we can dig into what's going on with them.

@phptek
Copy link
Contributor

phptek commented Jun 23, 2013

OK, so to summarise from what @sminnee has said above, as a base to work from here's what I'll do:

  • Test and fix existing URL rewriting task (SiteTree specific)
    • Run ORM UPDATE (see above) query prior to the Import procedure being run
    • Add error message if links are unable to be written
    • Write a CMS report comprising a list of imported pages containing un-rewritten links
  • Based on data from above, improve the "Hit rate" and re-test
  • Replicate a similar process for asset shortcodes for imported File and Image objects (see Import files & images to assets #1)

@phptek
Copy link
Contributor

phptek commented Jul 9, 2013

State of Play

  • File and image links are now being re-written
  • Added comprehensive logging to link-rewrite BuildTask complete with initial summary of where failures lie:
    • Bad import
    • Junk links
    • Un-rewritable URL scheme (e.g. mailto:, tel: etc or those that have already been re-written)
    • External URLs (Those not matching $baseURL)
  • Moved common logging into new StaticSiteUtils class along with a resetStaticSiteURLs() method (as suggested by @sminnee )
  • resetStaticSiteURLs() has been added to StaticSiteFileTransformer and StaticSitePageTransformer but are commented at this time as I haven't had time to really check whether it is needed
  • @stojg has done some further investigation based on my link-rewrite branch as to why so many import-failures are occurring which has obvious side-effects here, and causes link-rewrite failures. There seem to be issues with the specific MOSS-CMS we're scraping with multiple URLs with spaces and urlencoded spaces, being treated differently; in that some throw a 400 error and then redirect to a canonical URL and others don't.

Further investigation and work should occur next week.

@phptek
Copy link
Contributor

phptek commented Mar 26, 2014

A massive number of changes, refactoring, bugfixing and tests have been added to my fork (https://github.com/phptek/silverstripe-staticsiteconnector).

The link-rewrite is much more effective as each import can now be identified by an ID, the ID can therefore be passed to the task so it "knows" which duplicate to modify.

phptek referenced this issue in phptek/silverstripe-staticsiteconnector Apr 8, 2014
@phptek
Copy link
Contributor

phptek commented May 2, 2014

Update: this can now be optionally run automatically via the UI after an import. I have also added a DatObject driven CMS report, derived from data gathered during this task, which shows in detail, which links failed to be re-written, breaks them down by type and provides a per-imported page count of each.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants