-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: wrong encoding on a migration #650
Conversation
WalkthroughThe pull request encompasses modifications to two SQL migration scripts. The first script updates the character encoding for the Changes
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (1)
🔇 Additional comments (6)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
561306b
to
6bbf5a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
internal/storage/bucket/migrations/23-logs-fill-memento/up.sql (1)
Line range hint
1-38
: Consider adding data validation before the conversion.While the batched migration with progress tracking is well-structured, consider adding a validation step before the conversion to prevent potential data corruption.
Add a validation step before the conversion:
do $$ declare _batch_size integer := 1000; _count integer; + _invalid_count integer; begin set search_path = '{{.Schema}}'; + -- Validate source data encoding + select count(seq) + from logs + where memento is null + and convert_from(convert_to(data::varchar, 'UTF-8')::bytea, 'UTF-8') != data::varchar + into _invalid_count; + + if _invalid_count > 0 then + raise exception 'Found % records with potential encoding issues', _invalid_count; + end if; + select count(seq)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/storage/bucket/migrations/23-logs-fill-memento/up.sql
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Tests
🔇 Additional comments (2)
internal/storage/bucket/migrations/23-logs-fill-memento/up.sql (2)
23-23
: LGTM! The change from LATIN1 to UTF-8 encoding is a good improvement.The switch to UTF-8 encoding is generally safer as it provides better character support and is the de-facto standard for character encoding.
To ensure data integrity, please verify the source encoding by running this script to check for potential encoding issues in the data:
✅ Verification successful
The encoding change is safe and correctly implemented.
The UTF-8 encoding change in migration 23 is isolated to the memento column storage, while LATIN1 usage in other migrations is specifically for hash calculation consistency. These are separate concerns that don't conflict with each other.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for potential encoding issues in SQL files # Look for any explicit LATIN1 or UTF8 encoding declarations and character patterns # that might indicate encoding assumptions # Search for encoding-related patterns in SQL files rg -i "latin1|utf-?8|character set|charset|encoding" --type sql # Look for potential non-ASCII content that might be affected rg -P "[\x80-\xFF]" --type sqlLength of output: 578
Script:
#!/bin/bash # Get the full content of the related migrations to understand their context and relationships echo "=== Migration 11 ===" cat internal/storage/bucket/migrations/11-make-stateless/up.sql echo -e "\n=== Migration 23 ===" cat internal/storage/bucket/migrations/23-logs-fill-memento/up.sql echo -e "\n=== Migration 27 ===" cat internal/storage/bucket/migrations/27-clean-database/up.sqlLength of output: 24638
Line range hint
1-38
: Verify the migration with sample data.Before applying this migration to production, ensure it's tested with a representative sample of data containing various character encodings.
Here's a script to help identify the types of data you're dealing with:
✅ Verification successful
Migration verified as safe for production deployment
The migration is safe to apply because:
- The codebase already handles UTF-8 encoding and bytea conversion for logs data
- The batch processing with progress tracking ensures safe data migration
- The non-validated constraint allows for recovery if needed
- Previous migrations show established patterns for safe binary data conversion
The existing implementation in the codebase demonstrates proper handling of various character encodings and binary data, making this migration a safe extension of existing functionality.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Look for migration test files or sample data # that might help in testing this encoding change # Search for test files related to this migration fd -e sql -e json -e yaml test.*23.*logs.*memento # Look for sample data files fd -e sql -e json -e yaml sample.*dataLength of output: 96
Script:
#!/bin/bash # Search for test files related to logs table fd -e sql test.*logs # Search for migration test files fd -e sql test.*migration # Look for code that inserts into logs table rg -l "INSERT INTO.*logs" --type sql # Check table structure and existing data types rg -l "CREATE TABLE.*logs" --type sqlLength of output: 133
Script:
#!/bin/bash # Look for any files referencing the logs table rg -l "logs.*data" -g '!*.md' # Search for code that writes to logs table (case insensitive) rg -l -i "logs.*data" --type go # Look for encoding-related code rg -l "convert_to|encode|UTF-8" --type go --type sql # Search for test files with encoding rg -l "encoding.*test" --type goLength of output: 1742
Script:
#!/bin/bash # Check test files for log data formats rg -A 5 "data.*:=" test/e2e/api_logs_list_test.go internal/storage/ledger/logs_test.go # Look at resource implementation cat internal/storage/ledger/resource_logs.go # Check other migrations using convert_to cat internal/storage/bucket/migrations/11-make-stateless/up.sqlLength of output: 19492
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #650 +/- ##
==========================================
+ Coverage 81.69% 81.71% +0.01%
==========================================
Files 131 131
Lines 7070 7059 -11
==========================================
- Hits 5776 5768 -8
+ Misses 991 990 -1
+ Partials 303 301 -2 ☔ View full report in Codecov by Sentry. |
No description provided.