-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pip for role anonymizer #23592
Open
KannarFr
wants to merge
1
commit into
apache:master
Choose a base branch
from
KannarFr:optionallyPreventRoleLoggingPIP385
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
pip for role anonymizer #23592
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
# PIP-385: Role Anonymizer for Pulsar Logging | ||
|
||
# Background knowledge | ||
|
||
In distributed systems, logging is a critical aspect of monitoring, troubleshooting, and auditing. However, it’s equally important to protect sensitive information, such as authentication roles, in logs. In Pulsar, token-based authentication is widely used, and the role associated with the token can appear in logs. | ||
|
||
To enhance privacy and comply with security regulations, it’s necessary to anonymize authentication roles in logs. Anonymization ensures that sensitive details are hidden while still allowing meaningful analysis of logs for operational purposes. | ||
|
||
This PIP introduces the **Role Anonymizer** feature in Pulsar, providing different levels of anonymization for roles before they are logged in the broker and proxy components. The anonymizer supports the following modes: | ||
- **NONE**: No anonymization, roles are logged as-is. | ||
- **REDACTED**: Replaces the role with `[REDACTED]`. | ||
- **SHA256**: Hashes the role using the SHA-256 algorithm. | ||
- **MD5**: Hashes the role using the MD5 algorithm. | ||
|
||
This feature allows operators to configure the level of anonymization based on their compliance needs without changing the core logging infrastructure. | ||
|
||
# Motivation | ||
|
||
The current Pulsar logging mechanism logs authentication roles in plain text. This can expose sensitive information, especially in environments where logs are centrally aggregated or monitored by third parties. It is essential to anonymize these roles to prevent potential misuse or unauthorized access to role information from logs. | ||
|
||
The main problem this proposal solves is the risk of exposing sensitive information (such as user roles) through logs. Anonymizing roles in logs reduces this risk while maintaining useful logs for debugging and operational monitoring. | ||
|
||
# Goals | ||
|
||
## In Scope | ||
|
||
- Introduce a configurable anonymizer for roles in logs for both broker and proxy components. | ||
- Support multiple anonymization strategies, including no anonymization, redaction, and hashing using SHA-256 and MD5. | ||
- Ensure that the anonymization strategy is easily configurable through the existing configuration files. | ||
|
||
## Out of Scope | ||
|
||
- This proposal does not cover anonymization of other sensitive fields beyond roles in logs. | ||
- No changes will be made to non-logging aspects of the authentication process. | ||
|
||
# High Level Design | ||
|
||
This feature adds a configurable anonymization layer to Pulsar’s logging mechanism. The anonymization logic will be applied to the role field during the logging of authentication information on both brokers and proxies. | ||
|
||
The anonymization strategy will be defined through a configuration parameter (`authenticationRoleLoggingAnonymizer`) and can be set to one of the following values: | ||
- `NONE`: Logs the role without modification. | ||
- `REDACTED`: Logs `[REDACTED]` instead of the actual role. | ||
- `hash:SHA256`: Logs a SHA-256 hash of the role. | ||
- `hash:MD5`: Logs an MD5 hash of the role. | ||
|
||
The default strategy is `NONE`, meaning no anonymization will be applied unless explicitly configured. | ||
|
||
# Detailed Design | ||
|
||
## Design & Implementation Details | ||
|
||
The `DefaultAuthenticationRoleLoggingAnonymizer` class will be introduced to handle the anonymization of roles in logs. This class will accept a configuration parameter to select the anonymization strategy, and apply the corresponding transformation to the role before it is logged. | ||
|
||
### Core Components: | ||
1. **`DefaultRoleAnonymizerType` Enum**: Defines the available anonymization strategies (`NONE`, `REDACTED`, `SHA256`, `MD5`). | ||
2. **`DefaultAuthenticationRoleLoggingAnonymizer` Class**: Handles the anonymization process by selecting and applying the chosen strategy based on the configuration. | ||
3. **Broker and Proxy Configuration**: New configuration options will be added to both the broker and proxy configuration files, allowing administrators to specify the desired anonymization strategy. | ||
|
||
### Code Example: | ||
```java | ||
// Anonymizer logic | ||
public final class DefaultAuthenticationRoleLoggingAnonymizer { | ||
private static DefaultRoleAnonymizerType anonymizerType = NONE; | ||
|
||
public DefaultAuthenticationRoleLoggingAnonymizer(String authenticationRoleLoggingAnonymizer) { | ||
if (authenticationRoleLoggingAnonymizer.startsWith("hash:")) { | ||
anonymizerType = DefaultRoleAnonymizerType.valueOf(authenticationRoleLoggingAnonymizer | ||
.substring("hash:".length()).toUpperCase()); | ||
} else { | ||
anonymizerType = DefaultRoleAnonymizerType.valueOf(authenticationRoleLoggingAnonymizer); | ||
} | ||
} | ||
|
||
public String anonymize(String role) { | ||
return anonymizerType.anonymize(role); | ||
} | ||
} | ||
``` | ||
|
||
## Public-facing Changes | ||
|
||
The following public-facing components will be affected: | ||
|
||
### Public API | ||
|
||
This PIP does not introduce changes to the public API. The anonymization functionality only affects the internal logging of the Pulsar broker and proxy components. | ||
|
||
### Configuration | ||
|
||
New configuration options will be added to both the broker and proxy configuration files to control the role anonymization strategy. These options are as follows: | ||
|
||
**Broker Configuration:** | ||
```yaml | ||
authenticationRoleLoggingAnonymizer: "NONE" | ||
# Options: NONE, REDACTED, hash:SHA256, hash:MD5 | ||
``` | ||
|
||
**Proxy Configuration** | ||
```yaml | ||
authenticationRoleLoggingAnonymizer: "NONE" | ||
# Options: NONE, REDACTED, hash:SHA256, hash:MD5 | ||
``` | ||
|
||
# Monitoring | ||
Administrators can monitor anonymized logs to ensure that roles are being anonymized according to the configuration. Logs should be checked to verify the correct anonymization strategy is applied. | ||
|
||
# Security Considerations | ||
This feature strengthens security by preventing sensitive role information from being exposed in logs. However, care should be taken to select an appropriate anonymization strategy that balances security and operational needs. For example, hashing strategies like SHA-256 provide stronger anonymization compared to MD5. | ||
|
||
# Backward & Forward Compatibility | ||
|
||
## Upgrade | ||
|
||
No special upgrade instructions are needed. The new configuration parameter will default to NONE, ensuring backward compatibility. | ||
|
||
## Downgrade / Rollback | ||
No special rollback instructions are required. The anonymizer will only take effect when the configuration parameter is set, so downgrading will simply result in roles being logged in plain text. | ||
|
||
## Alternatives | ||
One alternative considered was redacting roles entirely without offering hashing options. This was rejected because it would reduce the usefulness of logs for operational monitoring, particularly in environments where roles need to be traced without revealing their actual values. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PIP-385 is already taken.
Please find another available PIP. The current way to find an available PIP is to search on the dev mailing list discussions whether there's a conflict with the picked PIP number. The first one to open the discussion about the PIP "wins".
You should also make the PR title match the way that we use
[feat][pip] PIP-XXX: Role Anonymizer for Pulsar Logging
would be the way to name the PR. Start the thread on the dev mailing list asap after assigning the PR number so that you "win" the possible race condition in assigning the PIP number.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the current moment, PIP-393 is the next available PIP number.
Here's the search: https://lists.apache.org/[email protected]:lte=1y:PIP-393
I first looked at https://lists.apache.org/[email protected] and then started incrementing the PIP number until there were no search matches. That's how you find an available PIP number.
You could also check if someone has already created a PR with PIP-393 with this query "is:pr in:title PIP-393"
https://github.com/apache/pulsar/pulls?q=is%3Apr+in%3Atitle+PIP-393
That's the reason also why you should follow the current conventions for starting the thread and for naming the PR for the PIP. We should include the proper steps in the PIP README so that it's clear to everyone how the PIP number gets assigned in practice.
Here's a good example of how to start the discussion thread: https://lists.apache.org/thread/9wddmj4o5mrdst427r40rr7phqb05y6s
And after the discussion, starting the voting thread:
https://lists.apache.org/thread/p4zvok4l6dxrm0hqbno5s21tq4s33f7s
There will soon be an example of closing the voting thread there.