-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-generated Pull Request for refactor/uniprot-fetch #433
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…e environment variable keys The import alias for the Redis package is changed from 'r' to 'rds' to enhance code readability and avoid confusion with other variables or packages. Additionally, the environment variable keys for Redis service host and port are simplified from 'REDIS_MASTER_SERVICE_HOST' and 'REDIS_MASTER_SERVICE_PORT' to 'REDIS_SERVICE_HOST' and 'REDIS_SERVICE_PORT' respectively, to make them more intuitive and consistent with common naming conventions.
…ty and efficiency The Uniprot API URL and its parameters in `mapping.go` have been updated to use the new REST API endpoint. This change simplifies the URL construction and aligns with the updated API standards. The parameters are now more explicitly defined, improving the readability and maintainability of the code. The response format has been changed from tab-separated values to JSON, and the number of results returned is explicitly set to 500, enhancing data handling and parsing capabilities.
Adding new structs `UniProtResponse`, `UniProtEntry`, `UniProtCrossReference`, and `UniProtCrossRefProperty` enables the application to deserialize JSON responses from the UniProt API effectively.
…rformance This update transitions the Redis client library from version 7 to version 9 across the project. The upgrade includes changes in the `go.mod` file to reflect the new version dependency, and updates in the source code to use the new `github.com/redis/go-redis/v9` import path.
… URLs The new function `extractNextPageURL` is introduced to handle the extraction of the 'next' page URL from the Link header in API responses.
…eference info The function `handleGeneNames` is replaced with `extractCrossReferenceInfo` to streamline the process of extracting gene names and dictyBase IDs from UniProt entries. This change simplifies the code by removing redundant error handling and Redis operations, focusing instead on directly extracting and returning the necessary information.
…or better abstraction The function `handleGeneIDs` was replaced with `extractUniprotMaps` to improve the abstraction level and maintainability of the code. The new function directly constructs a list of `UniprotMap` structures from the `UniProtResponse`, which simplifies the handling of UniProt data by separating the concerns of data extraction and data storage.
…ponse to streamline data processing The readLine function was removed and replaced with decodeUniprotResponse to handle the response from the UniProt API more efficiently. This change allows the application to directly decode the gzip-compressed JSON response into a structured format, improving both the clarity and performance of the data processing workflow.
…dis for enhanced data handling The function `handleIsoforms` is replaced by `loadUniprotMapsToRedis` to improve the efficiency and clarity of data handling in Redis. The new function uses a pipeline for batch processing Redis commands, which reduces the number of round trips to the server.
…security This change introduces several improvements to the Uniprot data processing in the Go application: 1. **Security Enhancements**: The URL validation ensures that only HTTPS requests are made to the expected domain, enhancing the security of data transfers. 2. **Code Simplification**: The removal of the `Count` struct and related logic simplifies the codebase, making it easier to maintain and understand. 3. **Efficiency Improvements**: By replacing the line-by-line parsing with JSON decoding, the process becomes more efficient and less error-prone. 4. **Pagination Support**: The addition of pagination handling allows the application to process large datasets that span multiple pages, ensuring complete data retrieval. 5. **Error Handling**: Improved error handling provides clearer error messages, making it easier to troubleshoot issues during data fetching and processing.
…ssage Adding context support to the Redis ping method allows for better control over timeouts and cancellations of database operations, which is crucial for maintaining the responsiveness and stability of the application. The error message formatting has been improved by using `%w` to wrap the error, which aids in error handling by allowing the error to be unwrapped in higher-level code.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #433 +/- ##
===========================================
- Coverage 87.91% 1.99% -85.93%
===========================================
Files 15 549 +534
Lines 1365 59672 +58307
===========================================
- Hits 1200 1188 -12
- Misses 159 58478 +58319
Partials 6 6
|
Standardizing the naming convention for 'ID' related variables and functions across various files enhances code readability and maintainability.
…odate larger projects Increasing the timeout for the linter from 5 minutes to 15 minutes allows for more comprehensive linting processes, especially beneficial for larger projects that may require more time to analyze fully.
Adding the 'importer' directory to the .gitignore file ensures that temporary or sensitive files within this directory are not tracked or uploaded to the version control system, maintaining the cleanliness and security of the repository.
…ot module Introduces a new file to handle command-line interface flags specifically for the uniprot module, laying the groundwork for future CLI enhancements and functionality.
Introduces a new client package for managing Redis connections within the application. This setup includes a function `SetRedisClient` that initializes a Redis client using configuration from the application context and verifies the connection by pinging the Redis server.
This change introduces a new function `UniprotFlags` to the CLI package, which provides flags for configuring the Uniprot URL and Redis service connection details. The `uniprotURL` is pre-configured to fetch specific data from the Uniprot database, enhancing the ease of use for the end-users by providing a ready-to-use URL. The Redis service host and port can now be configured via environment variables or command-line flags, improving the flexibility and configurability of the application.
…hing and Redis storage Introduces a new module in the `internal/uniprot/cli` package for handling the fetching of UniProt data and storing it in Redis. This module includes functionality to validate URLs, make HTTP requests to the UniProt API, parse the JSON response, and store the mappings between UniProt IDs and gene names/IDs in Redis.
…up and error handling The removal of the `client.SetRedisClient` function and the direct use of `registry.GetRedisClient` simplifies the Redis client setup process. This change ensures that the Redis client is retrieved directly from the registry without additional setup overhead, enhancing code maintainability. Additionally, replacing generic error returns with `cli.Exit` provides clearer error messages and standardized exit codes, improving the command-line interface's usability and error management.
The loader command line application now supports loading Uniprot mappings, enhancing its functionality to handle more diverse data types. This update includes the necessary CLI flags and setup for Uniprot data, allowing users to manage and load Uniprot mappings directly through the command line interface.
…temporary loader files Adding 'loader' to the .gitignore file ensures that temporary files created by the loader process are not tracked by git, keeping the repository clean from unnecessary files.
Added detailed logging to the LoadUniprotMappings function to improve traceability and debugging capabilities. This includes logging at the start of processing each Uniprot page, after loading entries to Redis, and for each individual Uniprot entry loaded. Additionally, a summary log entry is added at the end to indicate the total number of entries processed.
…ogic into RedisUniprotLoader class This change introduces a new `RedisUniprotLoader` class that implements the `UniprotLoader` interface, encapsulating the Redis data loading logic. This refactoring improves the code structure by separating concerns, making the `LoadUniprotMappings` function cleaner and more focused on its primary responsibility. The use of an interface for loading also makes the system more flexible and easier to extend or modify in the future, such as adding different storage mechanisms.
…module This change improves the code organization by moving all Redis-related functionalities into a dedicated module under `internal/uniprot/redis`. This separation of concerns makes the codebase easier to maintain and enhances the modularity of the application.
Added `github.com/alicebob/gopher-json` for improved JSON processing capabilities, enhancing the application's ability to handle JSON data more efficiently. Also, `github.com/alicebob/miniredis/v2` is included to facilitate in-memory Redis testing, which allows for better testing environments without the need for a live Redis setup.
…unctionality Introduces comprehensive unit tests for the RedisUniprotLoader to ensure its reliability in various scenarios, including handling single and multiple UniprotMap entries, error conditions, empty inputs, and duplicate entries.
…fetching directly Refactoring the test code in `redis_test.go` improves readability by breaking long function calls into multiple lines, making the code easier to read and maintain. Additionally, updating the test logic to use `HExists` instead of `HGet` for checking the existence of keys in Redis is more semantically correct and efficient, as it directly checks for the presence of the key without retrieving its value.
… duplicate entries The unit tests in `redis_test.go` are updated to accommodate the new logic for handling duplicate entries in the Redis cache. The tests now verify that the last entry correctly overwrites the previous ones for different keys, ensuring that the cache behaves as expected when duplicate UniprotIDs or GeneIDs are encountered.
…for clarity The output binary name is changed from 'content' to 'dataloader' to better reflect its functionality and purpose, enhancing clarity and maintainability of the Dockerfile.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pulling 'refactor/uniprot-fetch into develop. Please review and merge.