- Access your protected resources by setting passwords in a .tuttlepass file
- Dependency graph from left to right is easier to read than top to bottom
- Logs can be accessed even if the process is not complete yet
- Link to find definition of process that creates a resource
- Nicer durations in hours, minutes, seconds
- odbc resources and processor for handling any SQL database
- ftp resources. Available for download processor
- Download processor uses curl witch makes it more robust for long downloads
- Download processor can have multiple inputs, in order to ensure downloading in a subdirectory
- hdfs resources
- Tuttle can now run several processes in parallel in respect to dependency order. For example,
tuttle run --jobs 2
will run your workflow with two workers. Default is still 1. - Live logs : you don't need anymore to wait until a process is complete to see the logs anymore. As soon as a line is complete it is displayed.
- With
--keep-going
option,tuttle run
doesn't stop at first error but tries to process as much as it can. Thus multiple failures can occur. Also running a failing process withkeep-going
will try to run all remaining processes
- New
check-integrity
option validates that no resource have changed since tuttle have produced them - Two processes without outputs can't have exactly the same inputs because we can't make the difference between them
- Error message in report for failing (pre-)processes
- Version in report and in dumps so we can remember with which tuttle the data was crafted
- Interrupting tuttle with ^C will set running processes in error as aborted
- Major refactoring of the invalidation system in order to make it easier to reason about
- Only one call to
exists()
per resource and per run, because checking if an external piece of exist can be long. Alsosignature()
is called maximum once because it can be very long - Be sure to terminate every process that might have been created by processors after running the workflow
- Invalidation is now coherent for a processes without outputs : once it have succeeded, it won't run again
- Fixed persistence of logs in the
.tuttle
directory when a process id changes (ie : when its position change in the tuttlefile) --threshold
now take into account duration of processes that don't create outputs
- Running tuttle with a postgresql resource will fail with an explicit error message if it can't connect to the database instead of saying that resources don't exists
... To describe a workflow according to a configuration file or a the content of a directory :
- 'preprocesses' are run before the workflow is executed
- you can add processes to a workflow with the new command
tuttle-extend-workflow
from a preprocesses - a new tutorial explains how it works in detail
- coma is DEPRECATED to separate resources in dependency definitions. You should now use space instead
- docker images are available to use tuttle
- escape process ids in the report
file://
is not a valid resource!shell
does not stand for processorhell
... To split a tuttle project in several files
the reference lists all the resources and processors available
- PostgreSQL tables, views, functions and index resources
- PostgreSQL Processor
- https resources
- AWS s3 resources (experimental)
Part of tuttle's job is to connect to third party tools. Integration tests must cover these tools, like Postgresql or a web server... Two methods have been developed :
- mock the third party tool with some python code (web server, s3 server)
- use the third party tool if it is installed on the machine (postgresql)
- bug on install that required jinja2 before installing dependencies
- SQLite tables, views, triggers and index resources
- SQLite Processor
- http resources
- download processor
- Pyton Processor
The goal of 0.1 is to show the intended usage of tuttle, in term of command line workflow.