Skip to content

Refactoring

Past due by 9 months 81% complete

General Guide

Since it is deployed directly from the repository, master branch will stay.
We will have a separate main branch and will merge refactored code into main.
Once we finish an integration tests and confirms that it is doing at least the same job as the current implementation,
we can set the main as a new master.

TL;DR: Open a pull request into main

General Guide

Since it is deployed directly from the repository, master branch will stay.
We will have a separate main branch and will merge refactored code into main.
Once we finish an integration tests and confirms that it is doing at least the same job as the current implementation,
we can set the main as a new master.

TL;DR: Open a pull request into main, not into master.

Higher Level TODOs

  • Create a package scicat_ingestor and gradually move all the existing code to the package.
    It not for deployment so it just needs to be accessible to the entry point script (The command or a script that is used for
    deployment)
  • Setting up smaller scope tests environments
  • Setting up automated continuous integration tests environments
  • Update documentation
  • Turn main branch the new master branch and archive master

Brainstorming what should be done for refactoring.
Feel free to shoot down the ideas : D

Encapsulating partial code into modules/functions

For example, these can be split into another module...?

parser = argparse.ArgumentParser()
parser.add_argument(
'-c','--cf','--config','--config-file',
default='config.20230125.json',
dest='config_file',
help='Configuration file name. Default": config.20230125.json',
type=str
)

Kafka Consumer/Message Handling

As it is carrying the core-logic, we maybe better wrap the confluent kafka interfaces so that we can easily test them as well...?
For example,

if message is None:
logger.info("Received empty message")
continue
if message.error():
logger.info("Consumer error: {}".format(message.error()))
continue

This kind of error-handling can be separated into a smaller function.
Or
Wrap this into a function or automate this somehow...?
kafka_config_run_time = {}
for k1,v in kafka_config.items():
if k1 in ["topics", "individual_message_commit"]:
continue
k2 = k1.replace("_",".")
kafka_config_run_time[k2] = v
logger.info(f" - {k2:<30}: {v}")
consumer = Consumer(kafka_config_run_time)

Logging

Logging seems like to require some configuration steps but despite of the configuration the logging interface logger.log(info/debug/error) should be the same.
But since the configuration of the logger requires file or link handling, it'd better be tested I guess...?

Ingestor

This guy definitely seems like to need some chopping.

def ingest_message(

For example,
I think this should be wrapped into a separate file handling interface so that we can test them.

if config["run_options"]["message_to_file"]:
message_file_path = ingestor_files_path
logger.info("message file will be saved in {}".format(message_file_path))
if os.path.exists(message_file_path):

Redefining Interfaces

Nexus structure path

METADATA_PROPOSAL_PATH = [
"children",
("children", "name", "entry"),
("config", "module", "dataset"),
(None, "name", "experiment_identifier", "values")
]

This mixture of filter and indexing can maybe refactored into more explicit way...?
But I also think it is already working conveniently already... hmm...

Ingestor

  • Maybe not redefining, but just better type-hinting or wrapping some of the arguments into a tighter data structure.

    def ingest_message(

  • Some of if statements can be merged into one if we let get_instrument to accept defaultInstrument like get_nested_value_with_default.

    if instrument_id or instrument_name:
    instrument = get_instrument(
    scClient,
    instrument_id,
    instrument_name.lower(),
    logger
    )
    if instrument is None:
    instrument = defaultInstrument

Documentation / Docstrings

Maybe we can collect the comments into docstrings. Copilot is good at it : D
And the documentation in the readme might be sufficient but we probably need to clean that up too.
Here is a list of some sections that might be needed in the documentation.

  • Deployment Environment/Method
  • Whole structure (briefly, maybe mermaid graph : D)
  • Logging options
  • API dependencies
  • Kafka Consumer/Message Handling
    It seems quite straight forward but since it's the core part of it. Maybe we can document it better.
    Shall we write some mermaid graph : D....??
Loading