Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I import a bundle of files into Kiara via the python API? #13

Open
3 tasks
caro401 opened this issue Nov 16, 2023 · 3 comments
Open
3 tasks

How do I import a bundle of files into Kiara via the python API? #13

caro401 opened this issue Nov 16, 2023 · 3 comments
Labels
how-to Request or outline for a how-to or tutorial type doc low-priority Things we don't have resources to address right now python API Docs about how to use kiara via Python API (in jupyter or otherwise)

Comments

@caro401
Copy link
Collaborator

caro401 commented Nov 16, 2023

marked as low-priority, since I don't think we have a pressing usecase for this right now, but capturing discussion from #11 , quoting @makkus

What is a file bundle

A data type that contains one or several files, each identified by an internal (relative) sub-path within the bundle. The contained files are usually related in some way that is relevant to the computations that will be done on them (for example multiple text files belonging to the same corpus)

when would you use that rather than just importing lots of files individually

whenever you have files that have that shared context, and would be fed into a downstream operation at the same time. Otherwise the downstream operation would need to have an input field for every individual file, which would be inefficient and only possible if you know exactly how many (sub-) files you will be dealing with.

Is there anything you can do with a file bundle you can't do with a file or vice versa?

Technically not I guess, but the question really is what operation would make sense for a single file that also makes sense for a file bundle. The only thing I can think of is doing the same operation on every sub-file of a bundle, which would be very inefficient and painful to have to do manually, so it'd be nice to have a module that can take a file-bundle and does that operation for all included files. But we haven't had a use-case like that so far, if I remember right.

For kiaras purposes, a file and a file_bundle are 2 different data types, and a module that takes one as input can't be used with the other. You'd have to use a 'pick.file' operation on a file bundle first, for example, if you have a single file input in an operation you want to use. Or you'd have to 'augment' a single file with an internal relative-path (which basically means adding information to data) if you wanted to convert a single file to a file_bundle (but that's not something we had to do so far I think).

  • Provide a code example of how to import a file bundle, in the context of some actual usecase (eg you have a corpus of documents).
  • Include an example of how to get individual files out of the bundle in case you need them.
  • Discuss when you should use file bundle instead of multiple files, what the tradeoffs are. Eg why the network analysis examples have nodes and edges CSVs but usually import them separately (is this wrong???)
@caro401 caro401 added low-priority Things we don't have resources to address right now python API Docs about how to use kiara via Python API (in jupyter or otherwise) how-to Request or outline for a how-to or tutorial type doc labels Nov 16, 2023
@makkus
Copy link
Collaborator

makkus commented Nov 16, 2023

from kiara.api import KiaraAPI
from kiara.models.filesystem import KiaraFile, KiaraFileBundle

api = KiaraAPI.instance()

inputs = {
    "path": "/home/markus/projects/kiara/kiara/src/"
}
results = api.run_job("import.local.file_bundle", inputs=inputs)
bundle = results["file_bundle"]

bundle_data: KiaraFileBundle = bundle.data
print(bundle_data.included_files.keys())

inputs = {
    "file_bundle": bundle,
    "path": "kiara/version.txt"
}
results = api.run_job("file_bundle.pick.file", inputs=inputs)
file = results["file"]

data: KiaraFile = file.data
print(data.read_text())

Rewrite according to whatever example code standards you choose. I included some extra type hints in case you want to explain how to acess the actual data in Python, and the modules they live in. Just remove those lines/hints if appropriate.

@makkus
Copy link
Collaborator

makkus commented Nov 16, 2023

Discuss when you should use file bundle instead of multiple files, what the tradeoffs are. Eg why the network analysis examples have nodes and edges CSVs but usually import them separately (is this wrong???)

It really is usually fairly obvious when you design a module which one you need, I haven't had to think about it really. In some cases it would maybe be beneficial to offer both input types, but that would mean module proliferation. So not sure. It just seems to make sense to import nodes and edges seperately, but maybe my design is wrong here, happy to change that module if necessary.

@caro401
Copy link
Collaborator Author

caro401 commented Nov 16, 2023

#11 (comment) - so can you not import a zip file via import.local.file? Do you have to use file bundle? Is the same true for other archive formats (tar etc)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
how-to Request or outline for a how-to or tutorial type doc low-priority Things we don't have resources to address right now python API Docs about how to use kiara via Python API (in jupyter or otherwise)
Projects
None yet
Development

No branches or pull requests

2 participants