-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Page not imported? #18
Comments
I have manually copied and converted to markdown the page https://github.com/delph-in/docs/wiki/CambridgeSEM-I. You are right, my code should have missed that page because of the character that looks like an hyphen in the name or because in MoinMoin we had two very similar pages:
The first was deleted, so maybe I could have made something wrong! thank you for open the issue. I fixed the links in https://github.com/delph-in/docs/wiki/CambridgeSchedule and https://github.com/delph-in/docs/wiki/RmrsDiscussions (needs some edition to improve format).
Not sure at this stage how I can check that. In the dump from MoinMoin I have 1266 pages:
In the current wiki we have 1057 pages:
But many MoinMoin pages were intentionally removed:
We do have some weird names in the new wiki, but the content looks right:
In the MoinMoin dump we have
But some are garbage in MoinMoin, see the last two. The content is an image. Many pages were correctly imported by renamed from One more case similar to the CambridgeSEM-I page:
I have just manually create https://github.com/delph-in/docs/wiki/ToolsTop_converter. |
Pages
the first was deleted, the second is protected in MoinMoin. So I removed them from here:
|
Help needed! Can someone see any important page in the lists above that is not in the current wiki? |
Pages
were duplicated (related to #25), I fixed the name and merged the contents in https://github.com/delph-in/docs/wiki/SaabruckenTop. |
I think all those >>> chr(int('2d', 16)) # convert base-16 int to character
'-'
>>> chr(int('2f', 16))
'/' Although the one for SaarbrückenTop is strange: >>> chr(int('c3bc', 16))
'쎼'
>>> hex(ord('ü')) # going the other way
'0xfc' Then the >>> chr(int('28', 16))
'('
>>> chr(int('29', 16))
')' It looks like all the ones with only (2f) ( $ cat moin.txt | sed -e 's/(2f)/_/g' -e 's/(2d)/-/g' -e 's/$/.md/' > moin-norm.txt
$ ls | grep "[^a-zA-Z0-9.]" | sort > current.txt Then I can find which ones are not already ported: $ comm -2 -3 moin-norm.txt current.txt # find lines in common, only show unique in moin-norm.txt It produces the following list, which I have manually sorted and annotated: # System pages (I'm just guessing for the non-English titles)
(28)c396(29)nskadeSidor.md
(28)c396(29)vergivnaSidor.md
(28)c398(29)nskedeSider.md
Aktuelle(28)c384(29)nderungen.md
Aktuelle(c384)nderungen.md
Anv(28)c3a4(29)ndarInst(28)c3a4(29)llningar.md
Anv(c3a4)ndarInst(c3a4)llningar.md
(c396)nskadeSidor.md
(c396)vergivnaSidor.md
(c398)nskedeSider.md
ChangementsR(28)c3a9(29)cents.md
ChangementsR(c3a9)cents.md
F(28)c3b6(29)r(28)c3a4(29)ldrarl(28)c3b6(29)saSidor.md
F(c3b6)r(c3a4)ldrarl(c3b6)saSidor.md
For(28)c3a6(29)ldrel(28)c3b8(29)seSider.md
For(c3a6)ldrel(c3b8)seSider.md
MoinMoin_InstallationsAnleitung.md
MoinMoin_InstallDocs.md
MoinMoin_TextFormatting.md
PageAl(28)c3a9(29)atoire.md
PageAl(c3a9)atoire.md
PagesAbandonn(28)c3a9(29)es.md
PagesAbandonn(c3a9)es.md
PagesSouhait(28)c3a9(29)es.md
PagesSouhait(c3a9)es.md
Pr(28)c3a9(29)f(28)c3a9(29)rencesUtilisateur.md
Pr(c3a9)f(c3a9)rencesUtilisateur.md
S(28)c3b6(29)kSida.md
S(c3b6)kSida.md
SeitenGr(28)c3b6c39f(29)e.md
SeitenGr(c3b6c39f)e.md
Senaste(28)c384(29)ndringar.md
Senaste(c384)ndringar.md
SideSt(28)c3b8(29)rrelse.md
Tilf(28)c3a6(29)ldigSide.md
Tilf(c3a6)ldigSide.md
WikiSandL(28)c3a5(29)da.md
WikiSandL(c3a5)da.md
# Personal pages or accidental (?) pages
https(3a2f2f)students(2e)washington(2e)edu_olzama_ge.md
LtgOslo_Cristin.md
Tu(28)e1baa5(29)nAnhL(28)c3aa(29).md
Tu(e1baa5)nAnhL(c3aa).md
venue(28)2d(29)map(28)2e(29)png.md
venue-map(2e)png.md
Singapore(28)20(29)Top.md # see SingaporeTop
# Other duplicates from bad escaping
4(28)2d(29)16_Meeting_Notes.md
CambridgeSEM(28)2d(29)I.md
LtgOslo_Hank(28)c3b8(29).md
ToolsTop_converter(28)2e(29)html.md
WeSearch_Hank(28)c3b8(29)Schedule.md
WeSearch_Hank(28)c3b8(29)TheRest.md
# Potentially good pages; some already converted
4-16_Meeting_Notes.md
ClarinoTop_RelatedWork.md
ClarinoTop_RequirementsSurvey.md
ClarinoTop_TechnologySurvey.md
ErgProcessing_ExportExample.md
ErgSemantics_Fundamentals.md
ErgSemantics_NonScopalModifiers.md
ErgSemantics_RunOnConstruction.md
ItsdbTreebanking_ItsdbTrouble.md
KyotoTop_InterWiki.md
LapDevelopment_Abel.md
LapDevelopment_SeverDeployment.md
LapDevelopment_Tasks.md
LogonInstallation_CvsBasics.md
LogonInstallation_InstallationBasics.md
LogonMrs_InformationStructure.md
LogonMrs_MessageRelations.md
LtgOslo_Hank(c3b8).md # LtgOslo/Hankø
MatrixDoc_WhQ.md
ToolsTop_converter(2e)html.md # wiki actually had ".html" in the title; already imported as ToolsTop_converter
WeSearch_Berlin.md
WeSearch_Demonstrator.md
WeSearch_FeforTopics.md
WeSearch_Hank(c3b8)Schedule.md # WeSearch/HankøSchedule
WeSearch_Hank(c3b8)TheRest.md # WeSearch/HankøTheRest
WeSearch_Interface.md
WeSearch_PestExamples.md
WeSearch_RealisticTextParsing.md
WeSearch_StarSem_MrsCrawling.md
WeSearch_SuperTagging.md
WeSearch_Tokenization.md
WeSearch_TripleStores.md
WeSearch_UberTagging.md |
Thank you @goodmami , yes As you noticed, many of the cases above I already fixed. |
(edited) The case of See
|
That last comment looks like one by @oepen and at a guess we decided to delete the page/merge the content elsewhere. |
One more crazy page is This new file is not a big problem, it is empty and even if it generate an empty page here, we can easily delete. The original page
So for me, nothing wrong here, the page does not exist in http://moin.delph-in.net/wiki/OsloScopalNonScopal?action=fullsearch&context=180&value=notes&titlesearch=Titles, one extra clue that it was deleted. Content of the rev 0000003 looks like a draft anyway:
But them in the current wiki I found https://github.com/delph-in/docs/wiki/notes, the name is not very informative and it looks duplicated from https://github.com/delph-in/docs/wiki/OsloScopalNonScopal. But they are not identical. So I found in the
and this same message in the history of the http://moin.delph-in.net/wiki/OsloScopalNonScopal?action=info. So
During the process, to preserve the history of the changes, the migration process created the |
I realize that it was a mistake from my side to not detected all these details during the migration. I am sorry for that. But no content was lost, I do have the dump, we do have MoinMoin in ready-only mode running. I still believe that for the majority of the pages, the final result is fine. So maybe we just need to be aware of those problems and try to solve the issues as we find them? |
The migration is such a huge job, @arademaker ! Thank you for taking it on. I think that notes.md file was indeed spurious, and I see that OsloScopalNotScopal has survived the transition. It's too bad that the 'delete' actions aren't apparent (at least as far as I can tell) in the migrated data. |
The deletion of
The good news is that we do have a way to know all pages in MoinMoin that we renamed:
|
Is it possible to tell which pages were deleted during the MoinMoin days, though? |
Hum, yes. For pages that are actually deleted, MoinMoin represents deletion by increasing the version number without creating an actually revision in the proper subdirectory. Each page is represented as:
So if a page is deleted, the content of the So the list of pages DELETED in MoinMoin are below. The renamed ones are not here:
|
I see ErgSemantics(2f)NonScopalModifiers there, confirming our decision to delete it in the github wiki. |
ah, I now see your point @emilymbender. my #18 (comment) was wrong (I just edited). The page |
The page http://moin.delph-in.net/wiki/LkbLexDb
But page https://github.com/delph-in/docs/wiki/LkbLexDB
this is very weird since the page in this wiki is older than the page in the original frozen MoinMoin installation. Contents differ too. In the dump, the
|
It looks like this page didn't get imported:
http://moin.delph-in.net/wiki/CambridgeSEM-I
It's world readable, so I wonder if the problem is that the page name is a bit odd (has a hyphen) and if so, if there might be other pages that weren't imported.
@arademaker can you import it and also see if maybe there are others?
It also looks like links to the page will need to be updated. I discovered it was missing by looking here:
https://github.com/delph-in/docs/wiki/RmrsDiscussions
The text was updated successfully, but these errors were encountered: