-
Notifications
You must be signed in to change notification settings - Fork 78
Exporting data
New in 3.3.0, ml-gradle provides several tasks for easily exporting any set of documents in MarkLogic to either a single file/zip or many files/zips. These tasks make use of the Data Movement SDK and the DMSDK Jobs provided by the ml-javaclient-util library.
The tasks provided by 3.3.0 are:
- mlExportToFile
- mlExportToZip
- mlExportBatchesToDirectory
- mlExportBatchesToZips
Note that if none of these tasks support your requirements for exporting data, you can likely use the Data Movement SDK by itself to write a program that will meet your requirements.
Each of the tasks for exporting data can be configured via several properties. To see the available properties for any task, just run the task with "-PjobProperties" (no value needed) - for example:
gradle mlExportToFile -PjobProperties
All documents selected by a query can be exported to a single file via mlExportToFile:
gradle mlExportToFile -PexportPath=export.xml -PwhereCollections=example
This task is similar to the other DMSDK Tasks in that a "where" property is required to specify the documents to export. "whereCollections", "whereUriPattern", and "whereUrisQuery" are the current supported properties, e.g.:
gradle mlExportToFile -PwhereUriPattern=*.xml
gradle mlExportToFile -PwhereUrisQuery="cts:element-value-query(xs:QName('hello'), 'world')"
This export capability is simply wrapping existing DMSDK functionality, specifically the ExportToWriterListener class. So you can utilize some of the properties on that class, e.g.:
gradle mlExportToFile -PrecordPrefix="<wrapper>" -PrecordSuffix="</wrapper>" -PwhereCollections=example
You can also specify content to be written to the beginning and end of the file:
gradle mlExportToFile -PwhereCollections=example -PfileHeader="<results>" -PfileFooter="</results>"
In version 3.9.0 of ml-gradle, you can write JSON documents to a valid JSON array by using the new "omitLastRecordSuffix" property:
gradle mlExportToFile -PwhereCollections=some-json-documents -PfileHeader="[" -PfileFooter="]" -PrecordSuffix="," -PomitLastRecordSuffix=true
This will result in a comma being written after every JSON document except for the last one, thus resulting in a valid JSON array.
With mlExportToFile, you can reference a REST API transform, which enables exporting data to CSV - i.e. write a transform that converts a document to the exact CSV that you want (and of course you can load that transform with ml-gradle):
gradle mlExportToFile -Ptransform=my-csv-transform -PwhereCollections=example
You can use ml-gradle to stub out that transform first:
gradle mlCreateTransform -PtransformName=my-csv-transform -PtransformType=sjs|xqy|xsl
Of course, the REST API transform can produce any content that you want.
All documents selected by a query can be exported to a single zip via mlExportToZip:
gradle mlExportToZip -PexportPath=export.zip -PwhereCollections=example
Like exporting to a file, you can also apply a transform on each document:
gradle mlExportToZip -PexportPath=export.zip -PwhereCollections=example -Ptransform=my-transform
Each URI as is used for creating a zip entry for each document. The URI can be "flattened" - i.e. everything up to and including the last "/" will be dropped:
gradle mlExportToZip -PexportPath=export.zip -PwhereCollections=example -PflattenUri=true
You can also provide a prefix on each zip entry:
gradle mlExportToZip -PexportPath=export.zip -PwhereCollections=example -PuriPrefix=/my-prefix
Instead of writing all documents to a single file or zip, you can export each batch as a separate file to a given directory:
gradle mlExportBatchesToDirectory -PexportPath=/path/to/batches -PwhereCollections=example
The "batchSize" property controls the number of documents processed at one time by the task, and thus controls the number of documents written to each file.
The following properties supported by mlExportToFile are also supported by mlExportBatchesToDirectory:
- fileHeader
- fileFooter
- recordPrefix
- recordSuffix
- transform
In addition, you can customize the name of each file that's written per batch - "filenamePrefix" defaults to "batch-" and "filenameExtension" defaults to ".xml":
gradle mlExportBatchesToDirectory -PfilenamePrefix=my-batch -PfilenameExtension=.json -PexportPath=/path/to/batches -PwhereCollections=example
Another option for exporting batches is to write each one to a zip:
gradle mlExportBatchesToZips -PexportPath=/path/to/zips -PwhereCollections=example
The following properties supported by mlExportToZip are also supported by mlExportBatchesToZips:
- flattenUri
- transform
- uriPrefix
And you can set "filenamePrefix" (defaults to "batch-") and "filenameExtension" (defaults to ".zip") just like you can for mlExportBatchesToDirectory:
gradle mlExportBatchesToZips -PfilenamePrefix=my-zip- -PfilenameExtension=.jar -PexportPath=/path/to/zips -PwhereCollections=example