Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for -custom-uri on import #443

Open
starhound opened this issue Aug 15, 2024 · 5 comments
Open

support for -custom-uri on import #443

starhound opened this issue Aug 15, 2024 · 5 comments
Assignees
Milestone

Comments

@starhound
Copy link

Please add support for custom uri designation for importing compressed manuals or enable a proper override in the default URI labeling behaviors.

I have followed: https://docs.marklogic.com/guide/mlcp-guide/en/importing-content-into-marklogic-server/controlling-database-uris-during-ingestion/transforming-the-default-uri.html

Various configurations do not work, perhaps I have the incorrect syntax, it is not obvious from the documentation what version of regular expressions is utilized, (PERL, PCRE?).

I have a need to import compressed manuals (all .zip but possibly gzip later), but need to do minor alterations to the URI as it's including the .zip extension by default.

java.lang.IllegalArgumentException: Invalid option argument for output_uri_replace :Boeing 777 Test Manual.zip,TESTPATH43/Boeing_777_Test_Manual

My filename is Boeing 777 Test Manual.zip and it needs to become /USER_INPUT_ROOT/MANUAL_NAME/<files>.

I have a python api thats acting as a wrapper for MLCP and it functions entirely without issue except for this behavior.

def import_data(database, root_path, files, marklogic_connection):
    for file in files:
        #convert spaces in file.filename to underscores
        file_name = file.filename.replace(" ", "_")
        # remove file extension from the end of string
        file_name = file_name.split(".")[0]
        # if root_path has a trailing slash, do nothing, else add a trailing slash
        root_path = root_path if root_path.endswith("/") else f"{root_path}/"
        # if root_path has starting slash, remove it
        root_path = root_path[1:] if root_path.startswith("/") else root_path
        file_uri = f"{root_path}{file_name}"
        print(f"Importing {file.filename} to {file_uri}")
        cmd = [
            MLCP,
            "import",
            f"-host {marklogic_connection['host']}",
            f"-port {marklogic_connection['port']}",
            f"-database {database}",
            f"-username {marklogic_connection['username']}",
            f"-password {marklogic_connection['password']}",
            "-input_compressed true",
            "-mode local",
            "-base_path /",
            "-input_compression_codec zip",
            "-ssl false",
            f"-input_file_path '/tmp/{file.filename}'",
            f"-output_uri_replace '{file.filename},{file_uri}'"
        ]
        invoke_mlcp(cmd)
        # remove the file from the /tmp directory
        subprocess.run(["rm", f"/tmp/{file.filename}"])

invoke_mlcp() simply invokes the bash script provided on a subprocess.

Thank you

@starhound
Copy link
Author

A temporary solution for us, and maybe other users, could be to utilize the python client library to update our URI's after uploading.

@starhound
Copy link
Author

My apologies, you clearly state the syntax here but it is just difficult to find: https://docs.marklogic.com/guide/mlcp-guide/en/introduction-to-marklogic-content-pump/understanding-the-mlcp-command-line/regular-expression-syntax.html

Will continue trying to get my desired functionality.

@starhound
Copy link
Author

starhound commented Aug 15, 2024

The solution to my issue was described here: https://stackoverflow.com/a/72952214

The output_uri_replace needs to be encased in double quotes.

My request is now to expand your documentation on MLCP regarding this functionality as I've wasted a day of development time on this effort.

@abika5
Copy link
Contributor

abika5 commented Aug 15, 2024

Hi @starhound, Thanks for filing the issue, I will take a look and triage it for documentation.

@yunzvanessa
Copy link

@abika5 Please create a ticket in Jira and address it in the next sprint.

Thanks,
Vanessa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants