Skip to content
Rob Rudin edited this page Apr 16, 2024 · 17 revisions

This page covers the following topics:

  • The two approaches for creating forests - either via properties or via payloads
  • Previewing the forests that will be created
  • Customizing the naming of forests when using the property-driven approach

ml-gradle provides two primary ways for creating forests and replicas:

  1. A "payload-driven" way, where you create the exact payloads for all of the primary forests and replica forests you want for a particular database. Tedious, but effective.
  2. A "property-driven" way, where several properties are available to configure the number and properties of forests and replicas. ml-gradle uses these properties to build forest payloads dynamically.

The payload-driven way means any forest configuration is possible, but it's also much more tedious than simply setting a handful of properties. The two approaches are described below.

Payload-driven

This approach is simple and is shown in this sample project. For each database that you want to create forests for by defining all of the forest payloads, you create an src/main/ml-config/forests/(name of database)/(any-filename-you-want).json file. As shown in the sample project, you can put many forest payloads into one file.

For example, if you wish to create custom forests for a database named my-database, you would add a JSON file to src/main/ml-config/forests/my-database containing the forests you desire. The example project linked to above demonstrates how this is done.

Properties-driven

This is the preferred approach - the payload-driven approach exists only when the set of available properties can't be used to meet your use case (and if you run into this problem, please file an issue to identify it).

The Database and forest section of the Property Reference page covers all of the properties, with version 3.2.0 adding a number of them. The bullets below provide a little more detail about which properties you'll want to use and when:

  1. To set how many forests are created on each host, use mlForestsPerHost. New in 3.7.0 - you can specify multiple data directories per host for a database; if you do this, then mlForestsPerHost will really specify the number of forests per data directory per host (this is true for mlContentForestsPerHost as well). So if you have a value of 2 for a database and 3 data directories for a database, you'll end up with 6 forests on a host.
  2. To specify that forests should only be created on one host for certain databases, use mlDatabasesWithForestsOnOneHost.
  3. To specify which hosts forests should be created on for certain databases, use mlDatabaseHosts.
  4. New in 3.3.0 - to specify which groups' hosts forests should be created on for certain databases, use mlDatabaseGroups. This takes precedence over mlDatabaseHosts, in the event that a database has an entry in both properties.
  5. To set default data/fast/large directories for all forests, regardless of the database, use mlForestDataDirectory, mlForestFastDataDirectory, and mlForestLargeDataDirectory.
  6. To set data/fast/large directories for specific databases (thus overriding the above properties), use mlDatabaseDataDirectories, mlDatabaseFastDataDirectories, and mlDatabaseLargeDataDirectories. New in 3.7.0 - you can specify multiple data directories per database.
  7. To create forest replicas for specific databases, use mlDatabaseNamesAndReplicaCounts.
  8. To set default data/fast/large directories for all replica forests, regardless of the database, use mlReplicaForestDataDirectory, mlReplicaForestFastDataDirectory, and mlReplicaForestLargeDataDirectory.
  9. To set data/fast/large directories for replicas for specific databases (thus overriding the above properties), use mlDatabaseReplicaDataDirectories, mlDatabaseReplicaFastDataDirectories, and mlDatabaseReplicaLargeDataDirectories.

Replica forest creation

In addition to the properties above that control replica forest creation, the underlying ml-app-deployer library also has a ReplicaBuilderStrategy interface class that defines how replicas are constructed. In the 3.12.0 release of ml-gradle and ml-app-deployer, the default strategy now properly distributes replicas across hosts.

If you run into any issues with how this works and would like to define your own implementation of ReplicaBuilderStrategy, you can do so by setting a different implementation on the AppConfig object:

ext {
  def myStrategy = new org.example.MyStrategy() // must be on buildscript classpath
  mlAppConfig.setReplicaBuilderStrategy(myStrategy)
}

An existing alternative is the implementation used prior to the 3.12.0 release:

ext {
  def myStrategy = new com.marklogic.appdeployer.command.forests.GroupedReplicaBuilderStrategy()
  mlAppConfig.setReplicaBuilderStrategy(myStrategy)
}

Previewing forest creation

New in 3.7.0 - you can use the mlPrintForestPlan task to see what forests and replicas will be created for a database before the database is created (there's not yet support for seeing what replicas will be created for a database that already exists - if that's of interest, please file an issue!):

gradle -Pdatabase=my-database mlPrintForestPlan

This task will use all of the above configuration properties to determine what forests and replicas will be created when you run "mlDeploy" (or via a combination of "mlDeployDatabases" and "mlConfigureForestReplicas").

As an example, let's say you're connecting to a 3-host cluster (with host names of host1, host2, and host3), and you want to preview forests for a not-yet-created content database named "example-content". Since mlContentForestsPerHost defaults to 3, running the above task will print out 9 forests, each looking like this:

{
  "forest-name" : "example-content-1",
  "host" : "host1",
  "database" : "example-content"
}

Now let's add some replicas to the database - we'll add the following in gradle.properties:

mlDatabaseNamesAndReplicaCounts=example-content,2

Running the task again will still print out 9 primary forests, and each will now have 2 replicas:

{
  "forest-name" : "example-content-1",
  "host" : "host1",
  "database" : "example-content",
  "forest-replica" : [ {
    "host" : "host2",
    "replica-name" : "example-content-1-replica-1"
  }, {
    "host" : "host3",
    "replica-name" : "example-content-1-replica-2"
  } ]
}

Version 3.7.0 lets us specify multiple data directories per host - let's try that out by adding this to gradle.properties:

mlDatabaseDataDirectories=example-content,/path1|/path2

Running mlPrintForestPlan now returns 18 forests, and ml-gradle will try to balance replicas across the different data directories as well:

{
  "forest-name" : "example-content-1",
  "host" : "host1",
  "database" : "example-content",
  "data-directory" : "/path1",
  "forest-replica" : [ {
    "host" : "host2",
    "replica-name" : "example-content-1-replica-1",
    "data-directory" : "/path2"
  }, {
    "host" : "host3",
    "replica-name" : "example-content-1-replica-2",
    "data-directory" : "/path1"
  } ]
}

If you're interested in reusing the code for calculating forests, just check out the source of the PrintForestPlanTask.

Customizing forest names

Starting in version 3.7.0, you can customize how forests are named when using the property-driven approach. Forests are created using the ml-app-deployer library, and that library uses an instance of the ForestNamingStrategy interface to name primary forests and replica forests.

To use your own instance of ForestNamingStrategy, you'll need to add a script like what's below to your build.gradle file (you can of course reference an implementation of ForestNamingStrategy that's in an external jar). The important part is to associate an implementation of ForestNamingStrategy with a database name, as shown in the "ext" block in the script:

import com.marklogic.appdeployer.AppConfig;
import com.marklogic.appdeployer.command.forests.ForestNamingStrategy;

class MyNamingStrategy implements ForestNamingStrategy {

  String getForestName(String databaseName, int forestNumber, AppConfig appConfig) {
      return "my-forest-" + databaseName + "-" + forestNumber
  }

  String getReplicaName(String databaseName, String forestName, int forestReplicaNumber, AppConfig appConfig) {
    return "my-replica-" + forestName + "-" + forestReplicaNumber
  }
}

ext {
  mlAppConfig.forestNamingStrategies.put("example-content", new MyNamingStrategy())
}
Clone this wiki locally