Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip for role anonymizer #23592

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions pip/pip-385-role-anonmyzer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# PIP-385: Role Anonymizer for Pulsar Logging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PIP-385 is already taken.

Please find another available PIP. The current way to find an available PIP is to search on the dev mailing list discussions whether there's a conflict with the picked PIP number. The first one to open the discussion about the PIP "wins".

You should also make the PR title match the way that we use
[feat][pip] PIP-XXX: Role Anonymizer for Pulsar Logging would be the way to name the PR. Start the thread on the dev mailing list asap after assigning the PR number so that you "win" the possible race condition in assigning the PIP number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the current moment, PIP-393 is the next available PIP number.
Here's the search: https://lists.apache.org/[email protected]:lte=1y:PIP-393

I first looked at https://lists.apache.org/[email protected] and then started incrementing the PIP number until there were no search matches. That's how you find an available PIP number.

You could also check if someone has already created a PR with PIP-393 with this query "is:pr in:title PIP-393"
https://github.com/apache/pulsar/pulls?q=is%3Apr+in%3Atitle+PIP-393

That's the reason also why you should follow the current conventions for starting the thread and for naming the PR for the PIP. We should include the proper steps in the PIP README so that it's clear to everyone how the PIP number gets assigned in practice.

Here's a good example of how to start the discussion thread: https://lists.apache.org/thread/9wddmj4o5mrdst427r40rr7phqb05y6s
And after the discussion, starting the voting thread:
https://lists.apache.org/thread/p4zvok4l6dxrm0hqbno5s21tq4s33f7s
There will soon be an example of closing the voting thread there.


# Background knowledge

In distributed systems, logging is a critical aspect of monitoring, troubleshooting, and auditing. However, it’s equally important to protect sensitive information, such as authentication roles, in logs. In Pulsar, token-based authentication is widely used, and the role associated with the token can appear in logs.

To enhance privacy and comply with security regulations, it’s necessary to anonymize authentication roles in logs. Anonymization ensures that sensitive details are hidden while still allowing meaningful analysis of logs for operational purposes.

This PIP introduces the **Role Anonymizer** feature in Pulsar, providing different levels of anonymization for roles before they are logged in the broker and proxy components. The anonymizer supports the following modes:
- **NONE**: No anonymization, roles are logged as-is.
- **REDACTED**: Replaces the role with `[REDACTED]`.
- **SHA256**: Hashes the role using the SHA-256 algorithm.
- **MD5**: Hashes the role using the MD5 algorithm.

This feature allows operators to configure the level of anonymization based on their compliance needs without changing the core logging infrastructure.

# Motivation

The current Pulsar logging mechanism logs authentication roles in plain text. This can expose sensitive information, especially in environments where logs are centrally aggregated or monitored by third parties. It is essential to anonymize these roles to prevent potential misuse or unauthorized access to role information from logs.

The main problem this proposal solves is the risk of exposing sensitive information (such as user roles) through logs. Anonymizing roles in logs reduces this risk while maintaining useful logs for debugging and operational monitoring.

# Goals

## In Scope

- Introduce a configurable anonymizer for roles in logs for both broker and proxy components.
- Support multiple anonymization strategies, including no anonymization, redaction, and hashing using SHA-256 and MD5.
- Ensure that the anonymization strategy is easily configurable through the existing configuration files.

## Out of Scope

- This proposal does not cover anonymization of other sensitive fields beyond roles in logs.
- No changes will be made to non-logging aspects of the authentication process.

# High Level Design

This feature adds a configurable anonymization layer to Pulsar’s logging mechanism. The anonymization logic will be applied to the role field during the logging of authentication information on both brokers and proxies.

The anonymization strategy will be defined through a configuration parameter (`authenticationRoleLoggingAnonymizer`) and can be set to one of the following values:
- `NONE`: Logs the role without modification.
- `REDACTED`: Logs `[REDACTED]` instead of the actual role.
- `hash:SHA256`: Logs a SHA-256 hash of the role.
- `hash:MD5`: Logs an MD5 hash of the role.

The default strategy is `NONE`, meaning no anonymization will be applied unless explicitly configured.

# Detailed Design

## Design & Implementation Details

The `DefaultAuthenticationRoleLoggingAnonymizer` class will be introduced to handle the anonymization of roles in logs. This class will accept a configuration parameter to select the anonymization strategy, and apply the corresponding transformation to the role before it is logged.

### Core Components:
1. **`DefaultRoleAnonymizerType` Enum**: Defines the available anonymization strategies (`NONE`, `REDACTED`, `SHA256`, `MD5`).
2. **`DefaultAuthenticationRoleLoggingAnonymizer` Class**: Handles the anonymization process by selecting and applying the chosen strategy based on the configuration.
3. **Broker and Proxy Configuration**: New configuration options will be added to both the broker and proxy configuration files, allowing administrators to specify the desired anonymization strategy.

### Code Example:
```java
// Anonymizer logic
public final class DefaultAuthenticationRoleLoggingAnonymizer {
private static DefaultRoleAnonymizerType anonymizerType = NONE;

public DefaultAuthenticationRoleLoggingAnonymizer(String authenticationRoleLoggingAnonymizer) {
if (authenticationRoleLoggingAnonymizer.startsWith("hash:")) {
anonymizerType = DefaultRoleAnonymizerType.valueOf(authenticationRoleLoggingAnonymizer
.substring("hash:".length()).toUpperCase());
} else {
anonymizerType = DefaultRoleAnonymizerType.valueOf(authenticationRoleLoggingAnonymizer);
}
}

public String anonymize(String role) {
return anonymizerType.anonymize(role);
}
}
```

## Public-facing Changes

The following public-facing components will be affected:

### Public API

This PIP does not introduce changes to the public API. The anonymization functionality only affects the internal logging of the Pulsar broker and proxy components.

### Configuration

New configuration options will be added to both the broker and proxy configuration files to control the role anonymization strategy. These options are as follows:

**Broker Configuration:**
```yaml
authenticationRoleLoggingAnonymizer: "NONE"
# Options: NONE, REDACTED, hash:SHA256, hash:MD5
```

**Proxy Configuration**
```yaml
authenticationRoleLoggingAnonymizer: "NONE"
# Options: NONE, REDACTED, hash:SHA256, hash:MD5
```

# Monitoring
Administrators can monitor anonymized logs to ensure that roles are being anonymized according to the configuration. Logs should be checked to verify the correct anonymization strategy is applied.

# Security Considerations
This feature strengthens security by preventing sensitive role information from being exposed in logs. However, care should be taken to select an appropriate anonymization strategy that balances security and operational needs. For example, hashing strategies like SHA-256 provide stronger anonymization compared to MD5.

# Backward & Forward Compatibility

## Upgrade

No special upgrade instructions are needed. The new configuration parameter will default to NONE, ensuring backward compatibility.

## Downgrade / Rollback
No special rollback instructions are required. The anonymizer will only take effect when the configuration parameter is set, so downgrading will simply result in roles being logged in plain text.

## Alternatives
One alternative considered was redacting roles entirely without offering hashing options. This was rejected because it would reduce the usefulness of logs for operational monitoring, particularly in environments where roles need to be traced without revealing their actual values.
Loading