Skip to content

Latest commit

 

History

History
109 lines (85 loc) · 1.83 KB

README.md

File metadata and controls

109 lines (85 loc) · 1.83 KB

The task:

I need the thesaurus, stop words, phrases and redirects from the previous system transferred over to the new one. Instead of taking the manual approach, we decided to create a script to do the following:

Sample one way:

#!xml

<THESAURUS_ENTRY_ONEWAY>
	<THESAURUS_FORM_FROM>bed bugs</THESAURUS_FORM_FROM>
	<THESAURUS_FORM_TO>insect spray</THESAURUS_FORM_TO>
	<THESAURUS_FORM_TO>mattress covers</THESAURUS_FORM_TO>
</THESAURUS_ENTRY_ONEWAY>
#!json

{
    "searchTerms": "bed bugs",
    "synonyms": [
    	"insect spray", 
    	"mattress covers"
	],
    "type": "one-way"
}

Sample two way:

#!xml

<THESAURUS_ENTRY>
	<THESAURUS_FORM>television</THESAURUS_FORM>
	<THESAURUS_FORM>tv</THESAURUS_FORM>
</THESAURUS_ENTRY>
#!json

{
    "synonyms": [
        "television",
        "tv"
    ],
    "type": "multi-way"
}

Final output:

#!json

{
    "ecr:createDate": "2016-08-26T10:52:18.211-05:00",
    "thesaurus-entries": [
	    {
		    "searchTerms": "bed bugs",
		    "synonyms": [
		    	"insect spray", 
		    	"mattress covers"
			],
		    "type": "one-way"
		},
		{
		    "synonyms": [
		        "television",
		        "tv"
		    ],
		    "type": "multi-way"
		}
    ],	
    "ecr:type": "thesaurus"
}

TEST:

python app.py data-in/SAMPLE.thesaurus.xml -o -p

Notes:

  • AppName thesaurus xml:
/endeca/apps/AppName/config/pipeline/AppName.thesaurus.xml
  • To export thesaurus entires:
runcommand[.sh|.bat] IFCR exportContent thesaurus /endeca/apps/AppName/config/import/thesaurus true
  • To import thesaurus entries:
runcommand[.sh|.bat] IFCR importContent thesaurus /endeca/apps/AppName/config/import/thesaurus.zip
  • Old version
/PlatformServices/6.1.0/bin/emgr_update --host localhost:8006 --action get_ws_settings --prefix appPrefix --dir ../ --app_name AppName