Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How-to create tests for module development purposes in plugin templates? #19

Open
MariellaCC opened this issue Nov 29, 2023 · 83 comments
Open
Labels
how-to Request or outline for a how-to or tutorial type doc

Comments

@MariellaCC
Copy link

Are there best practices to keep in mind or advice on tools to use to create tests while developing modules in plugin templates?

@MariellaCC MariellaCC added the how-to Request or outline for a how-to or tutorial type doc label Nov 29, 2023
@makkus
Copy link
Collaborator

makkus commented Nov 30, 2023

Good question. This is a first draft, and not finalized yet, so take the following with that in mind. Also, happy for any input on how to improve this, or other suggestions related to tests.

Basically, we have 2 types of tests. "Normal" unit tests, which go under the 'tests' folder in a project folder, just create those as you would create normal pytest tests.

Then there are sort of end-to-end module tests, which are supposed to make it easy to specify an operation and inputs, then test that the results are as expected. Technically, they are also run as pytest tests, and they are kicked of in the test_job_descs.py file under tests, so don't change that unless you know what you are doing.

It works like this:

  1. write a job description under examples/jobs

Those are yaml (or json) files with that look like:

operation: logic.and
inputs:
  a: true
  b: false

There are two mandatory keys:

  • operation: the name of the operation you want to run, or a path to a pipeline description
  • inputs: the inputs for the operation

At the moment, only scalar input types are supported, so in most cases you want to provide a pipeline that contains the operation you want to test, as well as other operations that create the inputs for that target operation from scalars. For example, this pipeline that imports csv files from a directory and converts it into a tables value:

pipeline_name: import.tables.from.csv_files
doc: |
  Create a tables value from a folder of csv files.

  Each file will represent a table in the database.

steps:
  - module_type: import.local.file_bundle
    module_config:
      include_file_types:
        - ".csv"
    step_id: import_csv_files
  - module_type: create.tables.from.file_bundle
    step_id: create_tables
    input_links:
      file_bundle: import_csv_files.file_bundle

input_aliases:
  import_csv_files.path: path

output_aliases:
  create_tables.tables: tables

Could be saves under pipelines/tables_from_csv_files.yaml, and then referenced in the job description like:

operation: "${this_dir}/../pipelines/tables_from_csv_files.yaml"
inputs:
  path: "${this_dir}/../data/journals/"

In this example I'm saving the job description as import_journal_tables.yaml. The ${this_dir} variable is the only one supported at the moment, and it gets resolved to the directory the job description file is in (the examples/jobs directory in this case).

This is already a basic form of testing now, kiara will run all of the jobs in that folder when you push to Github via Github Actions, and if one of the jobs fail, the GH action wll complain. So stuff like invalid/outdated input fields or processing errors inside the module will be caught.

The next step is to test the results against expected outputs. I will write about that later in another comment.

@makkus
Copy link
Collaborator

makkus commented Nov 30, 2023

2.) write expressions to test the results

You can test your jobs manually with the commandline (or Python, but that's a bit more tedious in my opinion):

kiara run examples/jobs/import_journal_tables.yaml

This will tell you whether your job will run successfully in principle, and mimic what will happen in the Github Action when pushing.

In most cases we will also want to test that the result we are getting is actually correct, not just available. This is done by adding a folder that is named exactly like the job description file (without extension) to tests/job_tests, so import_journal_tables in our case. Then, we create a file outputs.yaml in that folder, and add property keys and expected values to it, as a dictionary. In our case, that file looks like:

tables::properties::metadata.tables::tables::JournalEdges1902::rows: 321
tables::properties::metadata.tables::tables::JournalNodes1902::rows: 276

The test runner will test the result value against the expected value in this dictionary. The dictionary keys are assembled like:

<OUTPUT_FIELD_NAME>::properties::<PROPERTY_NAME>::<REST_OF_PATH_TO_VALUE>

You can check result properies, available keys, etc in the cli:

kiara run --print-properties examples/jobs/import_journal_tables.yaml

(this will only work from kiara>=0.5.6 onwards)

This method of testing outputs does not support any non-scalar outputs, so in most cases testing properties is the only possible thing to do.

If you have a scalar result, you can test against it using a dictionary key like:

y::data: False

( <FIELD_NAME>::data: <PYTHON_REPRESENTATION_OF_VALUE>)

For more in-depth tests, you can do those in Python directly. For that, instead (or in addition to) outputs.yaml, add a file called outputs.py in the folder named after the job to test. Then add one or several functions (name those whatever you like) that can have one or several arguments, named after the result fields you want to test. So for example if the result field to test is named tables, you'd do:

from kiara.models.values.value import Value
from kiara_plugin.tabular.models.tables import KiaraTables


def check_tables_result(tables: Value):

    # we can check properties here like we did in the outputs.yaml file
    # for that you need to look up the metadata Python classes, which is something that
    # is not documented yet, not sure how to best do that
    assert tables.get_property_data("metadata.tables").tables["JournalEdges1902"].rows == 321

    # more interestingly, we can test the data itself
    tables_data: KiaraTables = tables.data

    assert "JournalEdges1902" in tables_data.table_names
    assert "JournalNodes1902" in tables_data.table_names

    edges_table = tables_data.get_table("JournalEdges1902")
    assert "Source" in edges_table.column_names
    assert "Target" in edges_table.column_names

As I said, the testing (esp. the checking of results) is a work in progress, but it works reasonably well so far. I'd still like to get some input or more validation that the solution I have is reasonable before locking it in.

@makkus
Copy link
Collaborator

makkus commented Nov 30, 2023

One thing I forgot to mention, you can run the tests manually doing either:

pytest tests

or

make test

@makkus
Copy link
Collaborator

makkus commented Nov 30, 2023

Ah, and another thing that is relevant in the context of generating documentation. I tried to design the testing around example job descriptions, because I think we also should render the examples in the plugin documentation itself. In what form exactly (yaml files, python API examples, ...) is up for us to decide, but they contain all the information (esp. if we add well written 'doc' fields to them) to create a useful examples section for each plugin (or in other parts of the documentation).

@makkus
Copy link
Collaborator

makkus commented Dec 29, 2023

Some updates: to make writing those tests a bit more efficient, I've added support for a special 'init' example job description. This is a job description like described above, but it is run before any other job description invocation. This lets you prepare the kiara context in which the test runs with some example data (using the save key), which in turn can then be used in your specific test job.

For reference, have a look at the kiara_plugin.tabular plugin, for example this test:

Any questions, just ask.

@MariellaCC
Copy link
Author

MariellaCC commented Jan 11, 2024

Thanks a lot for the info, I am now trying to experiment, and as a first step, I am recapping via a to-do.
Concerning the 1st type of tests @makkus, you wrote "just create those as you would create normal pytest tests", but could you please elaborate: does this apply in a modules development scenario and if it does, could you point to an example that would be relevant for this kind of scenario?

If I understand right, the other insights in the current discussion item relate to the 2nd type of tests, which are the end-to-end module tests? (I will experiment on those first and it's possible that I will have additional questions once I am there)

In the current question, I am trying to make sure that I understand the added value of creating unit tests (pytest / 1st type of tests you mentioned) in addition to the end-to-end module tests (2nd type of tests you mentioned).

@makkus
Copy link
Collaborator

makkus commented Jan 11, 2024

It's really not much different to when you write a normal Python application. You'd use pytests mainly to test utility functions, with different inputs & esp. edge cases. Since modules typically don't have a lot of code, you might not have any of those. End-to-end/integration tests are not that different really, and they are also run via pytest in our case, there is just this little framework around them that I described above to make it a bit easier. It's still important to test it with (meaningfully) different inputs & esp. edge cases.

@MariellaCC
Copy link
Author

Concerning the support added for the special 'init' example job description, could you please specify from which version of Kiara? Thanks a lot

@makkus
Copy link
Collaborator

makkus commented Jan 16, 2024

Should be available with the latest, current version (0.5.9).

@MariellaCC
Copy link
Author

In the case of the current workflow I am working on, data are onboarded from external sources: https://github.com/DHARPA-Project/kiara_plugin.topic_modelling/blob/develop/src/kiara_plugin/topic_modelling/modules/onboarding.py does that change something in the testing approach you described above?

@makkus
Copy link
Collaborator

makkus commented Jan 17, 2024

What was your previous testing approach?
Basically, it just makes it easier to get data into the testing context, other than that nothing changes.

@MariellaCC
Copy link
Author

well, so far we hadn't a testing procedure in place - this is precisely the scope of this thread..

@makkus
Copy link
Collaborator

makkus commented Jan 17, 2024

Ah well, then nothing changes I guess.

You can still do your testing the same way it was possible before, but if you need pre-loaded data to test your module, this makes it easier because instead of your job refering to a pipeline, you can just use the input alias(es) of your 'init-ed' values in your job description, instead of writing a pipeline, refering to it in your job description, and use a local file path or url or whatever as job description input. For onboarding modules nothing would change, since those would not need such a pipeline and would have local paths/urls as job description inputs anyway.

As I said, the tabular plugin can serve as an example of how such tests can be specified.

@MariellaCC
Copy link
Author

MariellaCC commented Jan 22, 2024

Ah well, then nothing changes, I guess.

My previous testing approach was to use and attach a Jupyter Notebook as I mentioned already in other threads, since Jupyter usage was, until now, my area of focus for Kiara usage as well as prototyping. This is a bit different now as these modules are now prepared to be used in a functional front-end app.

From what I understand in general, for our users who are module creators, we should prompt them to do the tests as you described: 1) unit tests, if necessary, according to use cases, and 2) init job descriptions.

As our users are not necessarily software engineers, we will need to document these tests in a user-friendly way, assuming no previous knowledge of testing processes.

@makkus
Copy link
Collaborator

makkus commented Jan 22, 2024

Yes, I agree. Not sure who's going to do that, but ideally someone who is on the same level as our supposed target audience here, so they are aware of what information needs to be provided. I guess we can't really document everything concerning testing, it's a non-trivial area of work, so for the more fundamental stuff we have to find good tutorials/docs on the internet and link to them.

Anyone who's writing tests: make notes of the things that weren't clear to you or where you had difficulties when you wrote your first tests, so we can consider that for the content of this part of the documentation.

@MariellaCC
Copy link
Author

MariellaCC commented Jan 22, 2024

Here's the result of my first experiment for this procedure (for an init job description test)

I tried adding a test for one single module by creating an init.yaml file in the examples/jobs directory of the plugin I am working on.
Here's how my init.yaml file looks (I replaced module/inputs names by generic ones here)

operation: “operation_name”
inputs:
	operation_name__input_name_1: “input_value”
	operation_name__input_name_2: “input_value”
save:
	# operation_name__output_name: “saved_output_name” (this didn’t work for me when prefixing with op name)
	output_name: “saved_output_name” (this worked for me)

for running the operation:

this worked: kiara run init.yaml
this didn’t work: make test

I had the following error when trying while being in the jobs directory: "make: *** No rule to make target `test'. Stop."

And from root directory of plugin, here's the error:
"py.test
make: py.test: No such file or directory
make: *** [test] Error 1"

Is there anything I should do before trying a make test or is this only for unit tests maybe?

@makkus
Copy link
Collaborator

makkus commented Jan 22, 2024

Right, it seems you don't have pytest installed.

You can do that is to pip install like:

pip install -e '.[dev_utils]'

In the project root. And yes, make commands always need to be run in the project root.

@MariellaCC
Copy link
Author

MariellaCC commented Jan 22, 2024

oh, ok I assumed it was installed by default sorry,
so, from what I understand, the command you wrote would be a first step for users who are using the plugin template and want to create tests.

pip install -e '.[dev_utils]'

(I was following this procedure for modules development https://dharpa.org/kiara.documentation/latest/extending_kiara/creating_modules/the_basics/#pre-loading-a-table-dataset )

@makkus
Copy link
Collaborator

makkus commented Jan 22, 2024

Yes, correct. Those dependencies are not included in the default dependencies of a plugins dependencies, because if they were they would also be installed whenever an end-user installs it, which is something we don't want.

I guess this is one of the things that applies to any Python project, not just kiara plugins. So maybe we can find some tutorial or similar we can link to. Or we write our own recommended 'create a dev environment' doc section if we decide to have one.

@MariellaCC
Copy link
Author

well the tutorial I pointed to above is meant for users extending Kiara (so module developers)
so we just need to keep in mind to add this instruction when there will be an updated version, that's all

@makkus
Copy link
Collaborator

makkus commented Jan 22, 2024

Right, yeah.

@makkus
Copy link
Collaborator

makkus commented Jan 24, 2024

Concerning:

save:
	# operation_name__output_name: “saved_output_name” (this didn’t work for me when prefixing with op name)
	output_name: “saved_output_name” (this worked for me)

The only valid names here are the output fields of the operation that the job uses (in the 'operation' field). You can get the available ones via kiara operation explain <your_operation_name_or_path_to_pipeline>.

operation_name__output_name might coincidently be an actual valid name, but in most cases this is not a thing and would never work.

@MariellaCC
Copy link
Author

thanks! so just recapping, because when I tried as documented in the experiment feedback above, I also did the same for the inputs (operation_name__input_name) so I would like to modify the generic example for single operation testing below:

correct version is:

operation: “operation_name”
inputs:
	input_name_1: “input_value”
	input_name_2: “input_value”
save:
	output_name: “saved_output_name” 

Please correct if there's a mistake.

If others try it, I think that it is worth noting that the input_name and output_name need to be exactly the same as kiara operation explain indicates, and that some examples may coincidently look like the syntax is operation_name__output_name, like the examples shared as a reference in this thread (https://github.com/DHARPA-Project/kiara_plugin.tabular/blob/develop/examples/jobs/init.yaml), but that it is a coincidence, both for inputs and outputs.

@makkus
Copy link
Collaborator

makkus commented Jan 24, 2024

Yes, correct. One thing to note is that the 'operation' can also be a path to a pipeline file, in which case kiara will run the pipeline. Pipelines are just 'special' operations, and they also have a set of input- and output-fields.

Again, you can just use kiara operation explain <path_to_pipeline_file> to figure out what the output field names are. In case whoever wrote the pipeline did not adjust the input/output field names of the pipelines, the field names might look a bit like the long ones from above.

@MariellaCC
Copy link
Author

Yes, absolutely, and worth noting indeed. At the moment, this first experiment was specifically targeted for a single operation testing scenario, I will recap the same way at a later stage for the pipeline scenario.

@MariellaCC
Copy link
Author

Is there a way to explore (from the CLI) data items created/saved via the save key of an example/jobs/init.yaml file?

@MariellaCC
Copy link
Author

MariellaCC commented Apr 9, 2024

maybe "table.augment_column" if I understand the concept right (I am borrowing your terminology with the "augment")? we would keep the initial column and augment the table with the new column, but maybe this is what the merge was?
or "table.add_column"
I think the purpose would be something like that if we understood each other:
table_pa.add_column(len(table_cols),"tokens", [tokens_array])

@makkus
Copy link
Collaborator

makkus commented Apr 9, 2024

Ah, yes, sorry, you are right, we want to keep the old column, not replace it. I guess 'add_column' would be easiest to grasp for users, even though it's still tempting to think of the table as still 'being' the old table, just with a new column, instead of a newly created table that was created from the old table and an extra column. It's a fine distinction, but in the context of kiara I'd really like to understand users that they never ever modify data, only 'transform' into something new. But we have to balance that with usability, so I reckon 'add_column' would be good enough.

How would the interface for that module look like? Just the original table, the new column, and an optional index where the column should be inserted, if that is not provided, it'll be attached to the end?

And yes, that is possible with the table.merge module at the moment, it's just a bit more difficult to use than such an add_column operation: #18 (comment)

@MariellaCC
Copy link
Author

Great, such a module would be really helpful indeed to help users assess the output in the context of the data.

Indeed, the interface you described is precisely what seems to be needed. Optional index would be very useful for display purposes and, as you said, by default, appending the column at the end seems to make sense.

Alright, thanks! I will proceed with the table.merge module then. And I will replace it later on.

@makkus
Copy link
Collaborator

makkus commented Apr 9, 2024

Ok, great. Implementing the module itself should be simple, so will have that ready sometime next week, hopefully along with proper releases of kiara itself and all the other plugins.

@MariellaCC
Copy link
Author

On my side I have the assemble.tables module (version of tabular plugin 0.5.3) but I do not have the table.merge one, or maybe this the one? But I don't see config options for arrays, it seems to be scoped at tables.

@makkus
Copy link
Collaborator

makkus commented Apr 9, 2024

Did you try the example code I linked in #18 (comment) ? This example merges a table with a (single) array (even though its a bit silly because it attaches the same array to the table it previously picked from it. In real life there would be some processing happening of course).

Seems to work fine for me. table.merge is an (unconfigured) module, not an operation, meaning it needs to be configured before it can be used (similar to what we talked about here, unlike per-configured operations you would see in kiara operation list. Do you see table.merge in kiara module list ?

@MariellaCC
Copy link
Author

Ah I see, I hadn't realized that such modules only appear via kiara module list. I was trying via kiara operation list indeed.

@makkus
Copy link
Collaborator

makkus commented May 8, 2024

Update for 0.5.10:

job failures

Tests can now also check for job failures. To do this, the job name (the name of the job description file) must contain the string 'fail'. If it does, kiara will expect an expection to be thrown, and fail the test if that doesn't happen. In addition, you can also check if the right exception is thrown (or the exception contains the right details), by adding one of the following files under job_tests/<job_name> (similar to checking successful outputs):

  • outputs.yaml with a key being either:
    • error::msg
    • error::msg_contains_<INTEGER>

An example to match the complete string of the exception error message would be a file containing:

error::msg: "not a string"

Or, if you want to check for the existence of one or multiple substrings, you can do like:

error::msg_contains_1: "not"
error::msg_contains_2: "string"

Alternatively, you can also test using python code (again, similar to check successful tests), using the argument name error (and a random function name):

def check_job_result(error: Exception):

    assert "string" in str(error), "invalid error message"
    assert isinstance(error, KiaraProcessingException), "invalid exception type"

additional job location

Tests jobs can now, in addition to examples/jobs, also live in tests/resources/jobs, if each of those folders contains an init.yaml file, both of those will be run before each test.

Checking test results stays the same, using a subfolder under tests/job_tests named after the job name, this means jobs under examples/jobs and tests/job_tests should never have the same names.

This is done so we can have tests that check for exceptions (see above), without having them in the user-visible examples/jobs location.

For examples how this is done in practice, check the kiara_plugin.core_types repo.

@MariellaCC
Copy link
Author

MariellaCC commented May 13, 2024

Hi,
This question is not related to the last feature you described in last message, but on the impact of the comment/notes features implementation on how to run a test job from the cli.
When I try something like (for example):
kiara run /Users/mariella.decrouychan/Documents/GitHub/kiara_plugin.topic_modelling/tests/resources/jobs/zenodo_get_subfolder.yaml

I get the error message:
No job metadata provided. You need to provide a 'comment' argument when running your job.

But not sure where to add the comment in this scenario?
Thank you

@makkus
Copy link
Collaborator

makkus commented May 13, 2024

Ah, good catch, it looks like the --help cli arg doesn't work if there are invalid inputs:

❯ kiara run logic.xor --help

╭─ Run info: logic.xor ────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                          │
│ Can't run operation: invalid or insufficient input(s)                                                    │
│                                                                                                          │
│ ──────────────────────────────────────────────────────────────────────────────────────────────────────── │
│                                                                                                          │
│ Operation: logic.xor                                                                                     │
│                                                                                                          │
│ Returns 'True' if exactly one of it's two inputs is 'True'.                                              │
│                                                                                                          │
│ Inputs:                                                                                                  │
│                                                                                                          │
│   field name   status    type      description                              required   default           │
│  ──────────────────────────────────────────────────────────────────────────────────────────────          │
│   a            not set   boolean   A boolean describing this input state.   yes                          │
│   b            not set   boolean   A boolean describing this input state.   yes                          │
│                                                                                                          │
│                                                                                                          │
│ Outputs:                                                                                                 │
│                                                                                                          │
│   field name   type      description                                                                     │
│  ──────────────────────────────────────────────────────────────────────────────────────────────────────  │
│   y            boolean   A boolean describing the module output state.                                   │
│                                                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────╯

But if the inputs are all there (as probably in your example), the --help would give you that information:

❯ kiara run logic.xor a=true b=true --help
                                                                                                            
 Usage: kiara run [OPTIONS] logic.xor [INPUTS]                                                              
                                                                                                            
 Returns 'True' if exactly one of it's two inputs is 'True'.                                                
                                                                                                            
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --module-config     -c  TEXT  (Optional) module configuration, only valid when run target is a module    │
│                               name.                                                                      │
│ --explain           -e        Display information about the selected operation and exit.                 │
│ --output            -o  TEXT  The output format and configuration.                                       │
│ --comment           -c  TEXT  Add comment metadata to the job you run.                                   │
│ --save              -s  TEXT  Save one or several of the outputs of this run. If the argument contains a │
│                               '=', the format is =, if not, the values will be saved as '-'.             │
│ --print-properties  -p        Also display the properties of the result values.                          │
│ --help              -h        Show this message and exit.                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Anyway, long story short, you need the -c/--comment cli arg...

@MariellaCC
Copy link
Author

ok great, thank you!

@MariellaCC
Copy link
Author

MariellaCC commented Jun 6, 2024

Hi,
Is there a recommended approach about where to put jobs that I would not want to be run when doing a make test ?
My use case pipeline has onboarding scenarios when data are downloaded via github/zenodo. From what I remember, the corresponding tests should only be run on demand to avoid triggering unnecessary calls.
But if I put the job files under examples/jobs, they will be run when doing something like make test, for example. Is that right?

@makkus
Copy link
Collaborator

makkus commented Jun 6, 2024

No best practice for that, you could just put them in a folder examples_no_tests? I can add a feature so specific example jobs can be skipped, I've added an issue here: DHARPA-Project/kiara#74

@MariellaCC
Copy link
Author

Alright great, thanks a lot!

@MariellaCC
Copy link
Author

I created such a job (in a no test folder), like so:

operation: topic_modelling.create_table_from_zenodo
inputs:
  doi: "4596345"
  file_name: "ChroniclItaly_3.0_original.zip"

when copying the path to this job and running it in the CLI, I am getting an error:


╭─ Run info: topic_modelling.create_table_from_zenodo ──────────────────────────────────────────────────────────────────────╮
│                                                                                                                           │
│ Can't run operation: Could not parse argument into data:                                                                  │
│                                                                                                                           │
│ ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│                                                                                                                           │
│ Operation: topic_modelling.create_table_from_zenodo                                                                       │
│                                                                                                                           │
│ This module retrieves text files from a specified folder hosted on Zenodo.                                                │
│                                                                                                                           │
│ Outputs:                                                                                                                  │
│                                                                                                                           │
│   field name     type    description                                                                                      │
│  ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │
│   corpus_table   table   A table with two columns: file names and their contents.                                         │
│                                                                                                                           │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Have you had this kind of error in the past, do you have an idea why this may be?

@MariellaCC
Copy link
Author

(I am getting the same error for another job that I created the same way)

@makkus
Copy link
Collaborator

makkus commented Jun 6, 2024

And those work if the job is in the 'examples/jobs' folder?

@MariellaCC
Copy link
Author

I will try (this was after my last question), I am not saying that the location is the problem. I just don't know where the error could come from and where I should investigate.

@MariellaCC
Copy link
Author

one of the modules that I want to test doesn't work anymore in the notebook, and I wanted to investigate (this is why I created the job)

@MariellaCC
Copy link
Author

MariellaCC commented Jun 6, 2024

same error if I put it in former location. The command that I run to test the job is:
kiara run /Users/mariella.decrouychan/Documents/GitHub/kiara_plugin.topic_modelling/examples/jobs/onboarding_zenodo.yaml -c = " "

@makkus
Copy link
Collaborator

makkus commented Jun 6, 2024

Right, sorry, just wanted to make sure I understand the situation. Did you try with DEV=true ?

@makkus
Copy link
Collaborator

makkus commented Jun 6, 2024

a.k.a

DEV=true kiara run /Users/mariella.decrouychan/Documents/GitHub/kiara_plugin.topic_modelling/examples/jobs/onboarding_zenodo.yaml -c = " "

That usually gives better error details etc.

@MariellaCC
Copy link
Author

ok, I put it once after setting up the environment but I don't put it every time. trying that now. thanks

@MariellaCC
Copy link
Author

I get indeed much more info, thanks, but still not sure where to investigate:


╭─ Exception details ───────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                           │
│ ╭────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮ │
│ │ /opt/miniconda3/envs/tm_2024_05/lib/python3.12/site-packages/kiara/utils/cli/__init__.py:336 in dict_from_cli_args    │ │
│ │                                                                                                                       │ │
│ │   333 │   │   │   assert isinstance(part_config, Mapping)                                                             │ │
│ │   334 │   │   else:                                                                                                   │ │
│ │   335 │   │   │   try:                                                                                                │ │
│ │ ❱ 336 │   │   │   │   part_config = json.loads(arg)                                                                   │ │
│ │   337 │   │   │   │   assert isinstance(part_config, Mapping)                                                         │ │
│ │   338 │   │   │   except Exception:                                                                                   │ │
│ │   339 │   │   │   │   raise Exception(f"Could not parse argument into data: {arg}")                                   │ │
│ │                                                                                                                       │ │
│ │ /opt/miniconda3/envs/tm_2024_05/lib/python3.12/json/__init__.py:346 in loads                                          │ │
│ │                                                                                                                       │ │
│ │   343 │   if (cls is None and object_hook is None and                                                                 │ │
│ │   344 │   │   │   parse_int is None and parse_float is None and                                                       │ │
│ │   345 │   │   │   parse_constant is None and object_pairs_hook is None and not kw):                                   │ │
│ │ ❱ 346 │   │   return _default_decoder.decode(s)                                                                       │ │
│ │   347 │   if cls is None:                                                                                             │ │
│ │   348 │   │   cls = JSONDecoder                                                                                       │ │
│ │   349 │   if object_hook is not None:                                                                                 │ │
│ │                                                                                                                       │ │
│ │ /opt/miniconda3/envs/tm_2024_05/lib/python3.12/json/decoder.py:337 in decode                                          │ │
│ │                                                                                                                       │ │
│ │   334 │   │   containing a JSON document).                                                                            │ │
│ │   335 │   │                                                                                                           │ │
│ │   336 │   │   """                                                                                                     │ │
│ │ ❱ 337 │   │   obj, end = self.raw_decode(s, idx=_w(s, 0).end())                                                       │ │
│ │   338 │   │   end = _w(s, end).end()                                                                                  │ │
│ │   339 │   │   if end != len(s):                                                                                       │ │
│ │   340 │   │   │   raise JSONDecodeError("Extra data", s, end)                                                         │ │
│ │                                                                                                                       │ │
│ │ /opt/miniconda3/envs/tm_2024_05/lib/python3.12/json/decoder.py:355 in raw_decode                                      │ │
│ │                                                                                                                       │ │
│ │   352 │   │   try:                                                                                                    │ │
│ │   353 │   │   │   obj, end = self.scan_once(s, idx)                                                                   │ │
│ │   354 │   │   except StopIteration as err:                                                                            │ │
│ │ ❱ 355 │   │   │   raise JSONDecodeError("Expecting value", s, err.value) from None                                    │ │
│ │   356 │   │   return obj, end                                                                                         │ │
│ │   357                                                                                                                 │ │
│ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ JSONDecodeError: Expecting value: line 1 column 2 (char 1)                                                                │
│                                                                                                                           │
│ During handling of the above exception, another exception occurred:                                                       │
│                                                                                                                           │
│ ╭────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮ │
│ │ /opt/miniconda3/envs/tm_2024_05/lib/python3.12/site-packages/kiara/utils/cli/run.py:211 in set_and_validate_inputs    │ │
│ │                                                                                                                       │ │
│ │   208 │   │   │   list_keys.append(name)                                                                              │ │
│ │   209 │                                                                                                               │ │
│ │   210 │   try:                                                                                                        │ │
│ │ ❱ 211 │   │   inputs_dict: Dict[str, Any] = dict_from_cli_args(*inputs, list_keys=list_keys)                          │ │
│ │   212 │   │   if base_inputs:                                                                                         │ │
│ │   213 │   │   │   for k, v in base_inputs.items():                                                                    │ │
│ │   214 │   │   │   │   if k not in inputs_dict.keys():                                                                 │ │
│ │                                                                                                                       │ │
│ │ /opt/miniconda3/envs/tm_2024_05/lib/python3.12/site-packages/kiara/utils/cli/__init__.py:339 in dict_from_cli_args    │ │
│ │                                                                                                                       │ │
│ │   336 │   │   │   │   part_config = json.loads(arg)                                                                   │ │
│ │   337 │   │   │   │   assert isinstance(part_config, Mapping)                                                         │ │
│ │   338 │   │   │   except Exception:                                                                                   │ │
│ │ ❱ 339 │   │   │   │   raise Exception(f"Could not parse argument into data: {arg}")                                   │ │
│ │   340 │   │                                                                                                           │ │
│ │   341 │   │   if list_keys is None:                                                                                   │ │
│ │   342 │   │   │   list_keys = []                                                                                      │ │
│ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ Exception: Could not parse argument into data:                                                                            │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

@MariellaCC
Copy link
Author

Is it a problem in my module? (most probably yes, but can I understand it via the error above?)

@makkus
Copy link
Collaborator

makkus commented Jun 6, 2024

Not sure, tbh. Looks like something wrong with your job file maybe? Maybe it's not valid yaml? Have you tried to run it directly, like:

kiara run topic_modelling.create_table_from_zenodo doi=4596345 file_name=ChroniclItaly_3.0_original.zip

@makkus
Copy link
Collaborator

makkus commented Jun 6, 2024

This is probably not something for github issues. Slack instead?

@MariellaCC
Copy link
Author

thanks, I just tried and got the same error

@makkus
Copy link
Collaborator

makkus commented Jun 6, 2024

Just to put this into writing, in case someone else has the same error: the issue was calling the -c cli arg with whitespace before and after the = character. Should have been -c ' ' or --comment=' ' ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
how-to Request or outline for a how-to or tutorial type doc
Projects
None yet
Development

No branches or pull requests

2 participants