Skip to content

Converts the Oracle Commerce (Endeca) thesaurus file from XML to JSON for easier migration.

Notifications You must be signed in to change notification settings

heyomi/endeca-migrate-thesaurus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

The task:

I need the thesaurus, stop words, phrases and redirects from the previous system transferred over to the new one. Instead of taking the manual approach, we decided to create a script to do the following:

Sample one way:

#!xml

<THESAURUS_ENTRY_ONEWAY>
	<THESAURUS_FORM_FROM>bed bugs</THESAURUS_FORM_FROM>
	<THESAURUS_FORM_TO>insect spray</THESAURUS_FORM_TO>
	<THESAURUS_FORM_TO>mattress covers</THESAURUS_FORM_TO>
</THESAURUS_ENTRY_ONEWAY>
#!json

{
    "searchTerms": "bed bugs",
    "synonyms": [
    	"insect spray", 
    	"mattress covers"
	],
    "type": "one-way"
}

Sample two way:

#!xml

<THESAURUS_ENTRY>
	<THESAURUS_FORM>television</THESAURUS_FORM>
	<THESAURUS_FORM>tv</THESAURUS_FORM>
</THESAURUS_ENTRY>
#!json

{
    "synonyms": [
        "television",
        "tv"
    ],
    "type": "multi-way"
}

Final output:

#!json

{
    "ecr:createDate": "2016-08-26T10:52:18.211-05:00",
    "thesaurus-entries": [
	    {
		    "searchTerms": "bed bugs",
		    "synonyms": [
		    	"insect spray", 
		    	"mattress covers"
			],
		    "type": "one-way"
		},
		{
		    "synonyms": [
		        "television",
		        "tv"
		    ],
		    "type": "multi-way"
		}
    ],	
    "ecr:type": "thesaurus"
}

TEST:

python app.py data-in/SAMPLE.thesaurus.xml -o -p

Notes:

  • AppName thesaurus xml:
/endeca/apps/AppName/config/pipeline/AppName.thesaurus.xml
  • To export thesaurus entires:
runcommand[.sh|.bat] IFCR exportContent thesaurus /endeca/apps/AppName/config/import/thesaurus true
  • To import thesaurus entries:
runcommand[.sh|.bat] IFCR importContent thesaurus /endeca/apps/AppName/config/import/thesaurus.zip
  • Old version
/PlatformServices/6.1.0/bin/emgr_update --host localhost:8006 --action get_ws_settings --prefix appPrefix --dir ../ --app_name AppName

About

Converts the Oracle Commerce (Endeca) thesaurus file from XML to JSON for easier migration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages