Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-generated Pull Request for refactor/uniprot-fetch #433

Merged
merged 32 commits into from
Aug 27, 2024

Conversation

github-actions[bot]
Copy link
Contributor

Pulling 'refactor/uniprot-fetch into develop. Please review and merge.

…e environment variable keys

The import alias for the Redis package is changed from 'r' to 'rds' to
enhance code readability and avoid confusion with other variables or
packages. Additionally, the environment variable keys for Redis service
host and port are simplified from 'REDIS_MASTER_SERVICE_HOST' and
'REDIS_MASTER_SERVICE_PORT' to 'REDIS_SERVICE_HOST' and
'REDIS_SERVICE_PORT' respectively, to make them more intuitive and
consistent with common naming conventions.
…ty and efficiency

The Uniprot API URL and its parameters in `mapping.go` have been updated
to use the new REST API endpoint. This change simplifies the URL
construction and aligns with the updated API standards. The parameters
are now more explicitly defined, improving the readability and
maintainability of the code. The response format has been changed from
tab-separated values to JSON, and the number of results returned is
explicitly set to 500, enhancing data handling and parsing capabilities.
Adding new structs `UniProtResponse`, `UniProtEntry`,
`UniProtCrossReference`, and `UniProtCrossRefProperty` enables the
application to deserialize JSON responses from the UniProt API
effectively.
…rformance

This update transitions the Redis client library from version 7 to
version 9 across the project. The upgrade includes changes in the
`go.mod` file to reflect the new version dependency, and updates in the
source code to use the new `github.com/redis/go-redis/v9` import path.
… URLs

The new function `extractNextPageURL` is introduced to handle the
extraction of the 'next' page URL from the Link header in API responses.
…eference info

The function `handleGeneNames` is replaced with
`extractCrossReferenceInfo` to streamline the process of extracting gene
names and dictyBase IDs from UniProt entries. This change simplifies the
code by removing redundant error handling and Redis operations, focusing
instead on directly extracting and returning the necessary information.
…or better abstraction

The function `handleGeneIDs` was replaced with `extractUniprotMaps` to
improve the abstraction level and maintainability of the code. The new
function directly constructs a list of `UniprotMap` structures from the
`UniProtResponse`, which simplifies the handling of UniProt data by
separating the concerns of data extraction and data storage.
…ponse to streamline data processing

The readLine function was removed and replaced with
decodeUniprotResponse to handle the response from the UniProt API more
efficiently. This change allows the application to directly decode the
gzip-compressed JSON response into a structured format, improving both
the clarity and performance of the data processing workflow.
…dis for enhanced data handling

The function `handleIsoforms` is replaced by `loadUniprotMapsToRedis` to
improve the efficiency and clarity of data handling in Redis. The new
function uses a pipeline for batch processing Redis commands, which
reduces the number of round trips to the server.
…security

This change introduces several improvements to the Uniprot data processing in the Go application:
1. **Security Enhancements**: The URL validation ensures that only HTTPS requests are made to the expected domain, enhancing the security of data transfers.
2. **Code Simplification**: The removal of the `Count` struct and related logic simplifies the codebase, making it easier to maintain and understand.
3. **Efficiency Improvements**: By replacing the line-by-line parsing with JSON decoding, the process becomes more efficient and less error-prone.
4. **Pagination Support**: The addition of pagination handling allows the application to process large datasets that span multiple pages, ensuring complete data retrieval.
5. **Error Handling**: Improved error handling provides clearer error messages, making it easier to troubleshoot issues during data fetching and processing.
…ssage

Adding context support to the Redis ping method allows for better
control over timeouts and cancellations of database operations, which is
crucial for maintaining the responsiveness and stability of the
application. The error message formatting has been improved by using
`%w` to wrap the error, which aids in error handling by allowing the
error to be unwrapped in higher-level code.
Copy link

codecov bot commented Aug 24, 2024

Codecov Report

Attention: Patch coverage is 6.39269% with 205 lines in your changes missing coverage. Please review.

Project coverage is 1.99%. Comparing base (54b59e4) to head (08f514f).
Report is 153 commits behind head on develop.

Files Patch % Lines
internal/uniprot/cli/action.go 0.00% 105 Missing ⚠️
internal/load/stockcenter/gwdi.go 0.00% 43 Missing ⚠️
internal/uniprot/cli/flag.go 0.00% 21 Missing ⚠️
internal/uniprot/client/client.go 0.00% 14 Missing ⚠️
cmd/loader/main.go 0.00% 7 Missing ⚠️
internal/baserow/strain/functional_handler.go 0.00% 7 Missing ⚠️
internal/registry/registry.go 0.00% 3 Missing ⚠️
internal/baserow/cli/phenotype_action.go 0.00% 2 Missing ⚠️
internal/baserow/cli/strain_action.go 0.00% 2 Missing ⚠️
internal/baserow/strain/load.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #433       +/-   ##
===========================================
- Coverage    87.91%   1.99%   -85.93%     
===========================================
  Files           15     549      +534     
  Lines         1365   59672    +58307     
===========================================
- Hits          1200    1188       -12     
- Misses         159   58478    +58319     
  Partials         6       6               
Files Coverage Δ
internal/baserow/cli/table.go 0.00% <ø> (ø)
internal/cli/cli.go 0.00% <ø> (ø)
internal/uniprot/redis/redis.go 100.00% <100.00%> (ø)
internal/baserow/strain/load.go 0.00% <0.00%> (ø)
internal/baserow/cli/phenotype_action.go 0.00% <0.00%> (ø)
internal/baserow/cli/strain_action.go 0.00% <0.00%> (ø)
internal/registry/registry.go 0.00% <0.00%> (ø)
cmd/loader/main.go 0.00% <0.00%> (ø)
internal/baserow/strain/functional_handler.go 0.00% <0.00%> (ø)
internal/uniprot/client/client.go 0.00% <0.00%> (ø)
... and 3 more

... and 528 files with indirect coverage changes

Standardizing the naming convention for 'ID' related variables and
functions across various files enhances code readability and
maintainability.
…odate larger projects

Increasing the timeout for the linter from 5 minutes to 15 minutes
allows for more comprehensive linting processes, especially beneficial
for larger projects that may require more time to analyze fully.
Adding the 'importer' directory to the .gitignore file ensures that
temporary or sensitive files within this directory are not tracked or
uploaded to the version control system, maintaining the cleanliness and
security of the repository.
…ot module

Introduces a new file to handle command-line interface flags
specifically for the uniprot module, laying the groundwork for future
CLI enhancements and functionality.
Introduces a new client package for managing Redis connections within
the application. This setup includes a function `SetRedisClient` that
initializes a Redis client using configuration from the application
context and verifies the connection by pinging the Redis server.
This change introduces a new function `UniprotFlags` to the CLI package,
which provides flags for configuring the Uniprot URL and Redis service
connection details. The `uniprotURL` is pre-configured to fetch specific
data from the Uniprot database, enhancing the ease of use for the
end-users by providing a ready-to-use URL. The Redis service host and
port can now be configured via environment variables or command-line
flags, improving the flexibility and configurability of the application.
…hing and Redis storage

Introduces a new module in the `internal/uniprot/cli` package for
handling the fetching of UniProt data and storing it in Redis. This
module includes functionality to validate URLs, make HTTP requests to
the UniProt API, parse the JSON response, and store the mappings between
UniProt IDs and gene names/IDs in Redis.
…up and error handling

The removal of the `client.SetRedisClient` function and the direct use
of `registry.GetRedisClient` simplifies the Redis client setup process.
This change ensures that the Redis client is retrieved directly from the
registry without additional setup overhead, enhancing code
maintainability. Additionally, replacing generic error returns with
`cli.Exit` provides clearer error messages and standardized exit codes,
improving the command-line interface's usability and error management.
The loader command line application now supports loading Uniprot
mappings, enhancing its functionality to handle more diverse data types.
This update includes the necessary CLI flags and setup for Uniprot data,
allowing users to manage and load Uniprot mappings directly through the
command line interface.
…temporary loader files

Adding 'loader' to the .gitignore file ensures that temporary files
created by the loader process are not tracked by git, keeping the
repository clean from unnecessary files.
Added detailed logging to the LoadUniprotMappings function to improve
traceability and debugging capabilities. This includes logging at the
start of processing each Uniprot page, after loading entries to Redis,
and for each individual Uniprot entry loaded. Additionally, a summary
log entry is added at the end to indicate the total number of entries
processed.
…ogic into RedisUniprotLoader class

This change introduces a new `RedisUniprotLoader` class that implements
the `UniprotLoader` interface, encapsulating the Redis data loading
logic. This refactoring improves the code structure by separating
concerns, making the `LoadUniprotMappings` function cleaner and more
focused on its primary responsibility. The use of an interface for
loading also makes the system more flexible and easier to extend or
modify in the future, such as adding different storage mechanisms.
…module

This change improves the code organization by moving all Redis-related
functionalities into a dedicated module under `internal/uniprot/redis`.
This separation of concerns makes the codebase easier to maintain and
enhances the modularity of the application.
Added `github.com/alicebob/gopher-json` for improved JSON processing
capabilities, enhancing the application's ability to handle JSON data
more efficiently. Also, `github.com/alicebob/miniredis/v2` is included
to facilitate in-memory Redis testing, which allows for better testing
environments without the need for a live Redis setup.
…unctionality

Introduces comprehensive unit tests for the RedisUniprotLoader to ensure
its reliability in various scenarios, including handling single and
multiple UniprotMap entries, error conditions, empty inputs, and
duplicate entries.
…fetching directly

Refactoring the test code in `redis_test.go` improves readability by
breaking long function calls into multiple lines, making the code easier
to read and maintain. Additionally, updating the test logic to use
`HExists` instead of `HGet` for checking the existence of keys in Redis
is more semantically correct and efficient, as it directly checks for
the presence of the key without retrieving its value.
… duplicate entries

The unit tests in `redis_test.go` are updated to accommodate the new
logic for handling duplicate entries in the Redis cache. The tests now
verify that the last entry correctly overwrites the previous ones for
different keys, ensuring that the cache behaves as expected when
duplicate UniprotIDs or GeneIDs are encountered.
…for clarity

The output binary name is changed from 'content' to 'dataloader' to
better reflect its functionality and purpose, enhancing clarity and
maintainability of the Dockerfile.
@cybersiddhu cybersiddhu merged commit 512e88e into develop Aug 27, 2024
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant