diff --git a/modules/cli/pages/actions-anonymization.adoc b/modules/cli/pages/actions-anonymization.adoc new file mode 100644 index 0000000..ee638fc --- /dev/null +++ b/modules/cli/pages/actions-anonymization.adoc @@ -0,0 +1,145 @@ += Actions types for anonymization +:description: Description of all the possible actions type for anonymization + +== Overview +Action types are mechanisms used to anonymize specific values in the exported data. A value can have one or multiple actions associated with it. If no actions are explicitly specified in the configuration for a particular value, it will default to a predefined action as described in the page xref:configuration-for-anonymization.adoc[Configure the anonymization]. + +Additionally, you can define a fallback mechanism in case the primary action does not work or does not find a match (for example, using the `regex_replace` action). + +== Hash - Hash a value + +Replace the values by a Hash, this allows to remove readable values but keeps the cardinality. Safe hash function should be used ( SHA-256 for instance ) + +*Usage* : Value containing sensitive information but on which you want to have accurate statistic and the ability to see the impact on the process execution. + +*Parameters* : + +*value* : the algorithm to use for hashing. Optional, default to SHA-512 + +Sample configuration : +[source,yaml] +---- + arch_process_instance: + stringindex1: + actions: + - action: hash + value: SHA-256 +---- + +== Replace - Replace a value + +Replace the value with a specified value ( which can be empty ) + +*Usage* : Value containing sensitive information you want to hide. + +*Parameters* : + +*value* : The value used to replace the current field. Default value empty + +Sample configuration : +[source,yaml] +---- + arch_flownode_instance: + displayname: + actions: + - action: REPLACE + value: 'hidden' +---- + +== Replace_with_other - Replace a value by another column value of the same table + +Replace a value by another value of the same table and line. + +*Usage* : Value containing sensitive information you want to replace with another non sensitive meaningful value in the same table. + +*Parameters* : + +*value* : column name of the column to be use to replace the value. No Default , mandatory + +Sample configuration : +[source,yaml] +---- + arch_flownode_instance: + displayname: + actions: + - action: REPLACE_WITH_OTHER + value: name +---- +== Regex_replace - Replace a value matching a regex expression + +Replace the value if the regexp match and use the matching group to create the new value. + +if multiple part of the value match , they are all replace. ( Behavior similar to Matcher.replaceAll ) + +If none of the regexp match , allow to configure a fallback which can be any of the other action. + +*Usage* : Value containing some sensitive information with other non sensitive one and you want to keep the non sensitive information. + +*Parameters* : + +*pattern* : The regular expression to match Default , mandatory + +*value* : The replacement pattern ( including captured group ) to used to change the value. No Default , mandatory + +*fallback* : action to apply if no regexp match. + +Sample configuration : +[source,yaml] +---- + arch_process_comment: + content: + actions: + - action: regex_replace + pattern: contract (\d+) is ready for user (\S+)\.(\S+) + value: contract XXXX is ready for $2 + - action: regex_replace + pattern: The task Allocate repair agent on car (\S+) (is now assigned to .*) + value: The task Allocate repair agent on car *** $2 + fallback: + - action: replace + value: hidden comment +---- + +== Keep - Keep a value + +Keep the value, no anonymization done. + +*Parameters*: none + +Sample configuration : +[source,yaml] +---- + arch_flownode_instance: + displayname: + actions: + - action: KEEP +---- + +== Remove_line - Remove a full line + +Remove the whole data line ( only possible on data contract and comment ) + +*Parameters* : +optional where clause expressed as a regex to match with the value for the configured column. + +Sample configuration : +[source,yaml] +---- + arch_contract_data: + val: + actions: + - action: REMOVE_LINE + where: + - column: name + regex: PurchasedLicenseInput\.bypassSysDate + - column: name + regex: PurchasedLicenseInput\.caseCounterStartDate + - column: name + regex: PurchasedLicenseInput\.description + - column: name + regex: PurchasedLicenseInput\.endDate + - column: name + regex: PurchasedLicenseInput\.name + - column: name + regex: PurchasedLicenseInput\.numberCases +---- \ No newline at end of file diff --git a/modules/cli/pages/configuration-for-anonymization.adoc b/modules/cli/pages/configuration-for-anonymization.adoc index d804737..f25dab8 100644 --- a/modules/cli/pages/configuration-for-anonymization.adoc +++ b/modules/cli/pages/configuration-for-anonymization.adoc @@ -1,4 +1,116 @@ = How-to configure the anonymization in the CLI :description: Learn how-to fine-tune the anonymization in the CLI -IMPORTANT: this is a work in progress!! +== Default anonymization +Using the command `export` of the export tool will activate automatically the anonymization. + +[NOTE] +==== +In case you don't want to activate the anonymization during your export, you can add the argument `-da=true` at the end of the `export` command. +==== + +Exported data will be anonymized by default to avoid sensitive data leak or with a specified configuration of your own. + +=== List of tables anonymized by default + +This section lists the tables being exported and identified fields that may contain sensitive information. These fields are handled with specific rules to ensure data security. + +==== arch_process_instance + +The `arch_process_instance` table contains information related to process instances. While it primarily includes technical data necessary for BPI, certain fields within this table may contain sensitive information and require special handling: + +* **description**: +** **Default Handling**: KEEP +** **Details**: This field contains the description of the process instance itself that is added by developers during the development phase. Although it might contain sensitive data (e.g., company names, departments, addresses), this is usually not the case. + +* **stringindex1, stringindex2, stringindex3, stringindex4, stringindex5**: +** **Default Handling**: HASH +** **Details**: These fields holds the search indexes that are defined at development phase that can be used to search cases using the APIs and case list. These fields are usually filled with specific Groovy code, which may expose sensitive data. Therefore, they are hashed by default to maintain data security. + +==== arch_flownode_instance + +The `arch_flownode_instance` table contains information related to the execution of flow node instances (human and automatic tasks, gateways and events). While it primarily includes technical data necessary for BPI, certain fields within this table may contain sensitive information and require special handling: + +* **displayname**: +** **Default Handling**: REPLACE("") +** **Details**: This field contains the displayed name of the flow node instance that is added by developers during the development phase, which may include sensitive data. By default, this field will be cleared. + +* **description**: +** **Default Handling**: KEEP +** **Details**: This field contains the description of the flow node itself that is added by developers during the development phase. Although it might contain sensitive data (e.g., company names, departments, addresses), this is usually not the case. + +* **hitbys, loopdatainputref, loopdataoutputref, datainputitemref, dataoutputitemref**: +** **Default Handling**: KEEP +** **Details**: These fields are technical data of the flow node instance. Those data are not sensitive and can be used for statistic purpose. + +==== user_ + +The `user_` table contains information related to the users. A specific field require some handling: + +* **username**: +** **Default Handling**: HASH +** **Details**: This field contains the username of an user, which indeed include sensitive data. By default, this field will be hashed. + +==== actor + +The `actor` table contains information related to the actors of a process. It defines who can perform a task or start a process. While it primarily includes technical data necessary for BPI, certain fields within this table may contain sensitive information and require special handling: + +* **name, displayname, description**: +** **Default Handling**: KEEP +** **Details**: Normally, an actor shoudn't have sensitive data because it represent a department, team, job of a company. Those data are not sensitive and can be used for statistic purpose. + + +==== arch_process_comment + +The `arch_process_comment` table contains information about users who have interact with specific flow nodes, along with other sensitive details. Those comments can include name of the users. + +* **content**: +** **Default Handling**: REMOVE_LINE +** **Details**: Without a specific configuration, the anonymization by default will never export the content of this table because of the sensible data in it. You need to have your own anonymization configuration to handle those data. + +==== arch_contract_data +The `arch_contract_data` table contains information about contracts, including inputs and constraints. Due to the flexibility in specifying various types of inputs, this table often contains sensitive data. + +* **name, val**: +** **Default Handling**: REMOVE_LINE +** **Details**: Without a specific configuration, the anonymization by default will never export the content of this table because of the sensible data in it. You need to have your own anonymization configuration to handle those data. + + +== Advanced anonymization + +=== Generate a sample configuration for data anonymization + +Before performing a full export, you can configure the anonymization of specific fields. To assist with this, a command is available in the tool to generate a sample configuration file based on a default setup, allowing you to choose which columns and tables to anonymize. + +The command `gen_default_anon_conf` has been added to the export tool to streamline this process. If needed, you can use the `--output` argument to specify the location for the generated file. + +[NOTE] +==== +The generated file itself is only a sample of the configuration file, the anonymization section. You'll need to copy and paste that part into your own configuration file used by your export tool. +==== + +The generated configuration will also contains all your data contracts key to allow you a convenient way to anonymize them. + +=== Handling contract data anonymization +Process data can include contract data used within your processes, which may contain sensitive information. + +[WARNING] +==== +By default, if you do not specify how to handle this contract data, the anonymization process will exclude it from export. +==== + +During the export, contract data will be transformed into CSV lines in the `arch_contract_data.csv` file within the export zip file. Each line represents a key-value pair of contract data. The concept of the key is crucial as it allows you to specify the exact type of anonymization you want for each contract data field. + +To specify which inputs of your contract data to anonymize, use the `where` clause in the configuration. + +For example, suppose you have a contract named `loanRequestInput` with a field `loanAmount`. If you want to keep this value because it is not sensitive and could be useful in BPI dashboards, you need to override the default removal setting. Specify a `KEEP` action using the `where` clause to retain `loanAmount`. Here is an example configuration extract: + +[source,yaml] +---- +arch_contract_data: + val: + actions: + - action: KEEP + where: + name: loanRequestInput\.loanAmount +---- \ No newline at end of file diff --git a/modules/cli/pages/index.adoc b/modules/cli/pages/index.adoc index 9ab4039..f823fd5 100644 --- a/modules/cli/pages/index.adoc +++ b/modules/cli/pages/index.adoc @@ -1,4 +1,40 @@ = Command Line tool for Export :description: Explain how to use and configure the CLI to export data from a Bonita database -IMPORTANT: this is a work in progress!! +== Overview +The Command Line tool is used to provision data extracted from Bonita instances into a Bonita Process Insights environment for deeper process analysis. + +The command line tool is delivered in a package, containing a basic documentation about commands and a sample configuration file. + +== Export data +=== Configuration +Before exporting, you need to configure the connection to the Bonita Instance from which you want to export the data. + +Export tool can support two type of databases Postgresql and Oracle. The jdbc url must be adapted according to the type of database you will use. + +* **Oracle** : +[source,yaml] +---- +jdbc-url: jdbc:oracle:thin:@${bonita.database.host}:${bonita.database.port}/${bonita.database.name}?oracle.net.disableOob=true`jdbc-url` +---- +* **PostgresSql** : +[source,yaml] +---- +jdbc-url: jdbc:postgresql://${bonita.database.host}:${bonita.database.port}/${bonita.database.name} +---- + +[NOTE] +==== +After you finish to configure your file, place it next to the executable jar directory or in a subdirectory named config. +==== + +=== Exporting data and Importing to BPI +To export your data, use the following command line : +`pi-cli bonita export` + +You can add some arguments like `-output` to specify the exact path of the exported zip file. + +=== Anonymize exported data +By default, your exported data will be anonymized. It's possible to deactivate the anonymization or adding your own configuration. + +For more details, see xref:configuration-for-anonymization.adoc[Configure the anonymization] \ No newline at end of file diff --git a/modules/cli/taxonomy.adoc b/modules/cli/taxonomy.adoc index 160bf14..0a28826 100644 --- a/modules/cli/taxonomy.adoc +++ b/modules/cli/taxonomy.adoc @@ -1,2 +1,3 @@ * xref:index.adoc[Command line for Exporting from Bonita] ** xref:configuration-for-anonymization.adoc[Configure the anonymization] +*** xref:actions-anonymization.adoc[Actions types for anonymization]