Skip to content

Releases: ibis-project/ibis

10.1.0

22 Feb 14:51
Compare
Choose a tag to compare

10.1.0 (2025-02-22)

Features

  • pyspark: add partitionBy argument to create_table (c99cc23), closes #8900
  • python: allow python 3.9 installations (#10859) (fbe8c8b)

Bug Fixes

  • bigquery: allow sane use of params with raw_sql (#10874) (0a684c3)
  • deps: update dependency datafusion to v44 (979cf59)
  • deps: update dependency sqlglot to >=23.4,<26.5 (#10807) (f09e8e2)
  • deps: update dependency sqlglot to >=23.4,<26.7 (15111f8)
  • dev-tools: ensure that bump is minimal so that later release sort properly (#10878) (39729c7)
  • duckdb: use the delta extension for reading deltalake data (#10833) (beeaa29), closes #10829
  • join: error in more places on colliding column names (#10778) (ec06e1e)
  • mssql: ensure that dots in database parameter to list_tables are used as path delineators (#10863) (cdbbcb9)
  • mssql: ensure that we only escape passwords if the password is not None (e589344)
  • mysql: explicitly handle the zero integer -> timestamp case (f5e8c4f)
  • pyspark: avoid potentially different field names produced by SQL by using python-native APIs (#10877) (9538d51)
  • snowflake: use get instead of get_path; get_path does not support columns with spaces (#10836) (50c978b), closes #10835
  • sqlglot: ensure that sge.Median is only accessed when it exists (dc6b7e0)
  • sqlite: avoid generating double-quoted string literals (#10873) (76b0114)

Documentation

  • add blogpost for Athena backend (#10796) (f2f09eb)
  • add information about reading from cloud buckets (32e82c7)
  • add udf rewriting blog post (c6ecf6b)
  • blog: add post on SQL understanding and Ibis (#10762) (94425ec)
  • blog: convert case to cases in blog posts (#10560) (bbf98de)
  • blog: use more reliable URL for geospatial data (72b7673)
  • fix reference to incorrect value (1945237)
  • move __getitem__ docs so that quarto publishes them (#10870) (269cdfe)
  • release-notes: fixup release notes (fb0798e)
  • remove incorrect parameters (#10876) (a707778)
  • update post date (ea0cc95)

Refactors

  • duckdb: remove the pyarrow read_parquet fallback (5fa0103)

10.0.0

06 Feb 21:07
Compare
Choose a tag to compare

10.0.0 (2025-02-06)

⚠ BREAKING CHANGES

  • api: change as_interval unit argument to be positional-only
  • api: change as_timestamp unit argument to be positional-only
  • api: standardize unnest and pivot_longer signatures
  • api: remove deprecated Table.relabel method
  • api: standardize StringValue method signatures
  • api: standardize NumericValue methods
  • api: make GeoSpatialValue.contains positional-only
  • api: make Table.describe quantile argument keyword-only
  • api: remove deprecated Table.relabel method
  • api: make Table.drop_null/Table.fill_null/Table.window_by/Table.alias argument positional-only
  • api: make Table.sample fraction argument positional-only
  • api: make Table.aggregate metrics argument positional-only
  • api: make Table set operation methods positional-only
  • api: make Table.cast and Table.try_cast methods positional-only
  • api: make nth positional-only
  • api: make isin/notin/cases/identical_to positional-only
  • api: make null-related methods and null function positional-only
  • api: make Value.cast and Value.try_cast positional-only
  • internals: make Value.name positional-only
  • internals: make Expr.pipe positional-only
  • internals: make Expr.equals positional-only
  • api: align signatures of to_json methods
  • api: align signatures of to_delta methods
  • api: align signatures of to_csv/to_csv_dir methods
  • api: align signatures of to_parquet/to_parquet_dir methods
  • api: align .sql method signatures across polars and sql as well as the Table method
  • api: top-level connect method now takes its first argument as positional-only
  • duckdb: align signatures of read_sqlite/read_mysql/read_postgres methods in the duckdb backend
  • api: align signatures of read_delta method; sources are positional-only, everything else is required-keyword
  • api: canonicalize has_operation backend method; single argument is positional-only
  • api: canonicalize read_kafka and to_kafka methods of the PySpark backend
  • api: canonicalize drop_table_or_view method of the impala backend
  • api: canonicalize to_geo signature of the the DuckDB backend
  • api: canonicalize read_geo signature of the the DuckDB backend
  • api: align signatures of list_catalogs; like` argument is now keyword-only
  • bigquery: canonicalize set_database signature
  • api: make list_databases arguments all required-keyword
  • risingwave: canonicalize signatures of risingwave-specific create_* methods
  • polars: canonicalize signature of read_pandas method
  • api: align signatures of drop_table method; name is positional-only; the rest are keyword-only
  • api: align signatures of create_catalog and drop_catalog methods; name is positional-only; the rest are keyword-only
  • api: compile method is now the same across backends
  • api: align signatures of create_table method; name is positional-only; obj is positional-or-keyword; the rest are keyword-only
  • api: align signatures of create_view method; name is positional-only; obj is positional-or-keyword; the rest are keyword-only
  • api: align signatures of drop_view method; name is positional-only; the rest are keyword-only
  • api: align signatures of truncate_table method; name is positional-only; the rest are keyword-only
  • api: align signatures of insert method; name is positional-only; obj is positional-or-keyword; the rest are keyword-only
  • api: align signatures of read_json method; sources are positional-only, everything else is required-keyword
  • api: align signatures of read_csv method; sources are positional-only, everything else is required-keyword
  • api: align signatures of read_parquet method; sources are positional-only, everything else is required-keyword
  • api: align signatures of to_torch method
  • api: align signatures of to_polars method
  • api: align signatures of Backend.list_tables method; all arguments are now keyword-only
  • api: align signatures of Backend.table method; name is positional-only; everything else is required-keyword
  • api: align signatures of create_database and drop_database; name is positional-only; everything else is required-keyword
  • api: standardize MapValue method signatures
  • api: standardize ArrayValue method signatures
  • api: type argument of struct function is now required-keyword
  • api: standardize TemporalValue APIs
  • api: where argument of aggregate functions is now required-keyword
  • api: hashbytes and hexdigest are now positional-only
  • api: standardize how argument to join methods as keyword-only and standardize remaining arguments
  • api: ibis.coalesce/ibis.greatest/ibis.least are now positional-only
  • api: Expr.ifelse is now positional-only
  • api: top-level set operation functions are now positional-only
  • api: set_backend and get_backend functions are now positional-only
  • api: ntile function and method is now positional-only
  • api: ibis.preceding/ibis.following` are now positional-only
  • api: expr argument of ibis.asc/ibis.desc is now positional-only; nulls_first is keyword-only
  • api: data argument of ibis.memtable is now positional-only; the rest are keyword-only
  • api: pairs argument of ibis.schema is now positional-only; the rest are keyword-only
  • api: ibis.param is now positional-only
  • api: n argument in Table.limit and Table.head is now required-positional
  • api: offset argument in Table.limit is now required-keyword
  • api: temporal window expression APIs now require all arguments as keywords
  • api: to_pyarrow and to_pyarrow_batches requires expr as positional-only and keyword for everything else
  • api: to_pandas_batches requires expr as positional-only
  • api: execute and to_pandas methods now require expr as positional-only
  • api: distance is now a required keyword argument for the d_within api
  • duckdb: The duckdb backend's read_csv method accepts only DuckDB types for the values components of the columns and types arguments. You may need need to adjust existing code. For example, the string "float64" should be replaced with the string "double".
  • duckdb: The read_in_memory method is removed from the duckdb backend. Use ibis.memtable instead.
  • api: The how parameter of the Value.arbitrary method is removed. call Value.first or Value.last explicitly
  • api: The StringValue.initcap method is removed. Use StringValue.capitalize instead.
  • api: IntegerValue.label is redundant with the IntegerValue.cases method, use that instead. Replace expr.label(labels) with expr.cases(*enumerate(labels))
  • register: The deprecated register method has been removed. Please use the file-specific read_* methods instead. For in-memory objects, pass them to ibis.memtable or create_table.
  • duckdb: Special handling of the temp_directory argument passed to Ibis is removed in favor of passing the argument through directly to duckdb.connect. Interior nodes of directory trees must be created, e.g., using Path.mkdir(exists_ok=True, parents=True), mkdir -p, etc.
  • config: option_context is removed. Use contextlib.contextmanager to create your own version of this functionality if necessary.
  • duckdb: The DuckDB lower bound has been bumped to a version that has storage backwards compatibility. You may need to migrate your DuckDB database files.
  • api: has_name has always returned True since 9.0. It is safe to remove any calls to has_name.
  • backends: execute now returns non-numpy objects for scalar values.
  • api: ibis.negate is removed. Use the negate method on a
    specific column, instead.
  • api: All ibis.geo_* functions are removed. Equivalent
    methods are available on all geo columns.
  • api: where is removed. Use ibis.ifelse instead.
  • value: Value.greatest and Value.least are removed. Use
    ibis.greatest and ibis.least, instead.
  • joins: Passing a pyarrow.Table or a pandas.DataFrame as
    the right-hand-side of a join is no longer supported.

To join against in-memory data, you can pass the in-memory object to
ibis.memtable or con.create_table and use the resulting table object
instead.

Issues closed

  • api: Removed hierarchical usage of schema.
    Ibis uses the following naming conventions:

    • schema: a mapping of column names to datatypes
    • database: a collection of tables
    • catalog: a collection of databases
  • mysql: Ibis now uses the MySQLdb driver. You may need to install MySQL client libraries to build the extension.

  • padding: String padding operations now follow Python semantics and leave strings greater than the padding length untouched.

  • pandas: The pandas backend is removed. Note that pandas DataFrames are STILL VALID INPUTS AND OUTPUTS and will remain so for the foreseeable future. Please use one of the other local backends like DuckDB, Polars, or DataFusion to perform operations directly on pandas DataFrames.

  • dask: The dask backend is removed. Please use one of the
    other backends that Ibis supports.

  • api: remove deprecated where methodism (886b2d1)

  • api: remove top-level negate function (c8c37dd)

  • api: remove top-level geo functions ([6b18...

Read more

9.5.0

11 Sep 20:57
Compare
Choose a tag to compare

9.5.0 (2024-09-11)

Features

  • api: add name argument to topk (1652076)
  • api: add name argument to value_counts (24be184)
  • api: add to_sqlglot method to Schema objects (#10063) (9488115)
  • mssql: add lpad and rpad ops (#10060) (77af14b)
  • mssql: add startswith and endswith ops (17a628c)

Bug Fixes

  • backends: pass kwargs to _from_url() in every case (#10003) (9ca92f0)
  • bigquery: handle column name mismatches and _TABLE_SUFFIX everywhere (5ade49e)
  • clickhouse: fix lstrip, rstrip, and strip (d2539c4)
  • datafusion: raise when attempting to create temp table (#10072) (1cf5439)
  • deps: update dependency fsspec to <2024.9.1 (#10036) (ea71719)
  • deps: update dependency sqlglot to >=23.4,<25.20 (#10010) (ba07da7)
  • deps: update dependency sqlglot to >=23.4,<25.21 (#10050) (422d361)
  • docs: update invalid read_parquet link (2ae9ef4)
  • duckdb: allow setting auto_detect to False by fixing translation of columns argument (#10065) (883d2d3)
  • duckdb: free memtables based on operation lifetime (#10042) (a121ab3)
  • duckdb: support version 1.1.0 (#10037) (3a37626)
  • flink: fix strip (01117a5)
  • impala: allow specifying temp=False in create_table (e29712c)
  • impala: fix lstrip, rstrip, strip (413df3b)
  • mssql: ensure that dot-sql can be executed when column names are not provided (#10028) (1936437), closes #10025
  • mssql: fix strip, lstrip, rstrip (f53feab)
  • oracle: fix lstrip, rstrip, and strip (3f5a304)
  • pandas: don't silently ignore result column name mismatches (48be246)
  • polars: support polars Enum type (#10017) (869829f)
  • sqlite: list temporary tables by default (#10058) (dfa55b6)
  • sql: properly parenthesize binary ops containing named expressions (5c2eadc)

Documentation

Refactors

Performance

  • backends: speed up most memtable existence checks (#10067) (a205ab7)
  • ir: don't recreate nodes in replace if their children haven't changed (ac79604)
  • sql: avoid parenthesizing chains of commutative operators (f86515c)

Deprecations

  • api: deprecate bool_val.negate()/-bool_val in favor of ~bool_val (499fc03)
  • api: deprecate filtering/expression projection in Table.__getitem__ (62c63d2)
  • selectors: deprecate c and r selectors in favor of cols and index (29b865e)

9.4.0

03 Sep 20:05
Compare
Choose a tag to compare

9.4.0 (2024-09-03)

Features

  • api: add approx_quantiles for computing approximate quantiles (dcdb7a7)
  • api: add DateValue.epoch api for computing days since epoch (#9856) (8b0fb66)
  • api: make the null function deferrable (0613ef1)
  • api: support SchemaLike in Backend.create_table() (#9885) (949fbea)
  • api: support deferred objects in literal (#9904) (0a07906)
  • clickhouse: partition kwargs for compile and execution in to_pyarrow and to_pandas (2dd2c3f)
  • clickhouse: support ms/us/ns truncate units (9881edb)
  • decompile: make the decompiler run on TPCH query 1 (#9779) (0268044)
  • exasol: implement approx_nunique, std, var (d9c3daa)
  • exasol: implement approx_nunique, std, var (63c20c0)
  • exasol: implement cov/corr (24f41b2)
  • exasol: implement median and approx_median (3cfc344)
  • exasol: implement quantile (ecbef94)
  • exasol: implement Table.nunique (a24200c)
  • exasol: implement Table.nunique (7ead7c7)
  • flink: array sort (ca85ae2)
  • flink: support ArrayValue.collect (eb857e6)
  • impala: add tbl_properties to create_table (#9839) (e3d02bd)
  • mssql: support connecting with a url (#9894) (8bb12e1), closes #9856
  • oracle: implement mode aggregation (#9914) (9ee910d)
  • output-formats: add support for to_parquet_dir (#9781) (80dfbe2)
  • polars: array sort (9a2563b)
  • polars: implement approx_nunique (3f3738d)
  • pyspark: support quantile (26d8516)
  • selectors: support naming deferreds in across (de1595c)
  • snowflake: implement interval arithmetic (#9794) (41e10ca), closes #9783
  • sql: enable cross-database joins (#9849) (c3ff6ae)
  • sql: fuse distinct with other select nodes when possible (c31412b)
  • sqlite: support most date/timestamp interval arithmetic (75f594d)
  • sql: load parsed but unsupported types as unknown (#9868) (a76acfc)
  • sql: support inserts with default constraints (#9844) (86a3c06)
  • timestamps: add support for timestamp/date +/- intervals for additional backends (#9799) (79cef68)
  • trino: support years and months in datetime arithmetic (1133973)
  • trino: wrap auth strings with BasicAuthentication (#9960) (e0f54c9)

Bug Fixes

  • bigquery: disallow column names longer than 300 characters (#9916) (ea97794), closes #8931
  • clickhouse: workaround EXCEPT and INTERSECT generation in sqlglot; add tpcds query 87 (#9959) (910b8f5)
  • datafusion: fix creation of SessionContext in datafusion 40.1.0 (eec5328)
  • datafusion: handle NULLs in array flatten (ecc199f)
  • deps: update dependency datafusion to v40 (4aa402a)
  • deps: update dependency sqlglot to >=23.4,<25.11 (#9805) (84bfeb5)
  • deps: update dependency sqlglot to >=23.4,<25.12 (#9834) (69a10d9)
  • deps: update dependency sqlglot to >=23.4,<25.13 (#9851) (6780a6b)
  • deps: update dependency sqlglot to >=23.4,<25.15 (#9864) (d182e9e)
  • deps: update dependency sqlglot to >=23.4,<25.16 (#9875) (0a6765b)
  • deps: update dependency sqlglot to >=23.4,<25.17 (#9907) (9e52edb)
  • deps: update dependency sqlglot to >=23.4,<25.18 (#9935) (ee5116d)
  • deps: update dependency sqlglot to >=23.4,<25.19 (#9962) (4c136d8)
  • dot-sql: ensure that CTEs can be used in .sql (b63e0fd)
  • duckdb: fix create_table() in databases with spaces in the name (#9817) (9da3c9f)
  • exasol: properly handle returning BIGINT values (e20bdad)
  • ir: convert analytic functions to window functions in filters (31295dd)
  • mssql: remove sort key to keep order (#9848) (3780a13)
  • mssql: support .cache() for caching tables (1de2f45)
  • oracle: avoid double cursor closing by removing unnecessary close...
Read more

9.3.0

07 Aug 19:56
Compare
Choose a tag to compare

9.3.0 (2024-08-07)

Features

  • api: support ignore_null in collect (71271dd)
  • api: support ignore_null in first/last (8d4f97f)
  • api: support order_by in order-sensitive aggregates (collect/group_concat/first/last) (#9729) (a18cb5d)
  • api: support quarterly truncation (#9715) (75b31c2), closes #9714
  • array: implement min, max, any, all, sum, mean (#9704) (793efbc)
  • bigquery: support timestamp bucket (fd61f2c)
  • datafusion: pivot_longer (2330b0c)
  • datafusion: enable array flatten, group concat, and timestamp now (4d110a0)
  • datafusion: struct literals (a63cee9)
  • datafusion: unnest (a706f54)
  • duckdb: add support for passing a subset of column types to read_csv (#9776) (c1dcf67)
  • duckdb: support arbitrary url prefixes (#9691) (11af489)
  • mssql: support case-sensitive collations (#9700) (9382a0e)
  • oracle: support group_concat operator (47d97ea)
  • pyspark: add support for pyarrow and python UDFs (#9753) (02a1d48)
  • snowflake: add userinfo URL parsing (524a2fa)
  • ux: allow window functions in predicates and compile to QUALIFY where possible (#9787) (0370bcb)

Bug Fixes

  • algolia: add parent class docstring to algolia index (#9739) (3bc9799)
  • bigquery: repr geospatial values in interactive mode (#9712) (bd8c93f)
  • case: fix dshape, error on noncomparable and empty cases (#9559) (ff2d019)
  • compiler-internals: define unsupported operations after simple operations (#9755) (d9b6264)
  • deps: update dependency atpublic to v5 (#9697) (a4f3940)
  • deps: update dependency sqlglot to >=23.4,<25.10 (#9774) (7144257)
  • deps: update dependency sqlglot to >=23.4,<25.8 (#9696) (d4a2ea2)
  • deps: update dependency sqlglot to >=23.4,<25.9 (#9719) (b1d8b2e)
  • drop: ignore order for DropColumns equality (#9677) (ae1e112)
  • druid: get basic timestamp functionality working (#9692) (6cd3eee)
  • duckdb: avoid literals casts that might defeat optimization (e4ff1bd)
  • duckdb: ensure that array remove doesn't remove NULLs (f0c3be4)
  • duckdb: use register directly instead of calling read_in_memory (597817f)
  • internals: ensure that CTEs are emitted in topological order (#9726) (acd7d82)
  • polars: fix polars std/var to properly handle sample/population (f83d84f)
  • polars: remove bogus minus-one-week truncation (ac519b2)
  • postgres: handle enums by delegating to the parent class (#9769) (3f01075), closes #9295
  • snowflake: bring back where filter support in group_concat; fix array_agg ordering (#9758) (6e7e4de)
  • sql: only return tables in current_database (#9748) (c7f5717)
  • types: fix histogram bin allocation (#9711) (6634864), closes #9687

Documentation

  • algolia: add custom attributes to backend and core methods (#9730) (d9473cf)
  • browser-repl: fix jupyterlite build (#9762) (f403aa1)
  • fix spelling in pivot_longer explanation (#9780) (3201d8b)
  • fix typo in drop method docstring (#9727) (4cf0014)
  • presentations: update overview slides (#9685) (d3a2c0c)
  • replace all double graves with single graves (#9679) (dd26d60)

Refactors

  • dependencies: pandas and numpy are now optional for non-backend installs (#9564) (cff210a)
  • duckdb: use replace to generate less sql (#9713) (f89aa32)
  • internals: remove unnecessary dynamism in drop method (#9682) (5ac84c5)
  • pandas: remove unreachable code in pandas backend (#9786) (dc6bfe2)
  • polars: delete some dead versioning code (b23c5a3)
  • polars: remove casting where possible; handle conversion on output (#9673) (8717629)
  • polars: remove extra backwards co...
Read more

9.2.0

22 Jul 23:23
Compare
Choose a tag to compare

9.2.0 (2024-07-22)

Features

  • api: accept more input types in ibis.range (#9659) (310ad30)
  • api: add nulls_first=False argument to order_by (#9385) (ce9011e)
  • api: add TableUnnest operation to support cross-join unnest semantics as well as offset (#9423) (3352a84)
  • api: add positional joins (#9533) (85ea9da)
  • api: allow grouping by scalar values (#9451) (14f1821)
  • api: support deferred or string column names in cov/corr methods (#9657) (4d135b3)
  • api: support selectors in window function order_by and group_by (#9649) (0ad47de)
  • backends: support creation from a DB-API con (#9603) (fc4d1e3)
  • bigquery: implement CountDistinctStar (#9470) (273e4bc)
  • caching: tie lifetime of cached tables to python refs (#9477) (f51546e)
  • datafusion: datafusion enhancements (#9544) (f11ca43)
  • dtypes: fall back to dt.unknown for unknown types (#9567) (6e0b5f5)
  • dtypes: fall back to dt.unknown for unknown types (#9576) (56a10d2)
  • duckdb: use delta_scan instead of reading pyarrow datasets (#9566) (0ff595e)
  • flink: create views from more mem data types (#9622) (b83fc2b)
  • geospatial: use geoarrow extension types when returning geometry columns as pyarrow (#9549) (cba7367)
  • polars: add more accurate type mapping for timestamps (#8954) (3eafac4)
  • polars: support version 1.0 and later (#9516) (62a1864)
  • postgres: support basic jsonb type and existing operations (#9630) (7179cc6)
  • pyarrow: support __arrow_c_schema__ on ibis.Schema objects (#9665) (00a776e)
  • pyspark: implement new experimental read/write directory methods (#9272) (adade5e)

Bug Fixes

  • api: add support for using deferreds in the argmin/argmax key argument (#9652) (3f05cbc)
  • bigquery: escape table names with spaces for bigquery backend (#9589) (ca21dbb)
  • bigquery: support microseconds in time literals (#9610) (c876abc), closes #9609
  • clickhouse: generate redundant aliases to workaround clickhouse naming behavior (#9525) (b44dac2), closes #9508
  • clickhouse: support Date32 database type (#9509) (efa6fb7)
  • datatypes: proper handling of srid in geospatial datatypes (#9519) (a3ceb59)
  • deps: update dependency datafusion to v39 (#9506) (21ef0a6)
  • deps: update dependency fsspec to <2024.6.2 (#9463) (8e225ec)
  • deps: update dependency geopandas to v1 (#9437) (fa1037b)
  • deps: update dependency numpy to v2 (#9395) (3cb39a5)
  • deps: update dependency pyarrow to v17 (#9614) (16998df)
  • deps: update dependency sqlglot to >=23.4,<25.3 (#9401) (bdc1b3f)
  • deps: update dependency sqlglot to >=23.4,<25.4 (#9427) (8e015b6)
  • deps: update dependency sqlglot to >=23.4,<25.5 (#9472) (f6f80da)
  • deps: update dependency sqlglot to >=23.4,<25.6 (#9523) (6a748c4)
  • deps: update dependency sqlglot to >=23.4,<25.7 (#9628) (f5207ff)
  • druid: handle typed nulls where possible (#9452) (33ec754)
  • fix and improve shape inference in many ops (7a0b21e)
  • ir: avoid deduplicating filters based solely on their name (#9476) (b35582e), closes #9474
  • ir: repr iterables when constructing name of operations (#9480) (f5a541c)
  • join: skip substitution of non-field references in join chains (#9595) (61ef0ed)
  • mssql: always pass port to pyodbc in host string (#9656) (2e3fd9a)
  • mssql: avoid calling .commit() unless a DDL operation is being performed (#9658) (69c5bf0), closes #9654
  • mssql: fix temporary table creation and implement cache (#9434) ([196d8a...
Read more

9.1.0

13 Jun 17:39
Compare
Choose a tag to compare

9.1.0 (2024-06-13)

Features

  • all: enable passing in-memory data to create_table (#9251) (fa15c7d), closes #6593 #8863
  • api: add Table.value_counts for easy group by count on multiple fields (aba913d)
  • api: isoyear method (#9034) (4707c44)
  • api: support type arg to ibis.null() (8db686e)
  • api: support wider range of types in where arg to column reductions (582165f)
  • api: support wider range of types in where arg to table reductions (7aba385)
  • bigquery: implement a few URL ops (#9210) (3d0f9bc)
  • bigquery: support filtering by _TABLE_SUFFIX when using a wildcard table name (#9375) (62a25c4), closes #9371
  • datafusion: use pyarrow for type conversion (#9299) (5bef96a)
  • drop Python 3.9 and test on Python 3.10/3.12 (#9213) (c06285e)
  • duckdb: add catalog support to create_table (#9147) (07331b5)
  • duckdb: allow to use named in-memory db (#9241) (67460aa), closes #9240
  • duckdb: support and test 1.0 (#9297) (395c8b5)
  • pandas,dask: implement ops.StructColumn (#9302) (ea81d85)
  • polars: accept list of CSVs to read_csv (#9232) (7a272e3), closes #9230
  • polars: implement create_view/drop_view/drop_table (#9263) (c4324f5)
  • postgres: provide translation for hash ops (#9348) (57e2348)
  • pyarrow: support Arrow PyCapsule interface on ibis.Table objects (1a262b9)
  • pyspark: builtin udf support (#9191) (142c105)
  • pyspark: provide a mode option to manage both batch and streaming connections (e425ad5)
  • pyspark: support reading from and writing to Kafka (#9266) (1c7c6e3)
  • selectors: parse Python types in s.of_type (#9356) (c0ebdc8)
  • snowflake: implement array map and array filter (#9178) (9b42751)
  • snowflake: implement support for asof_join API (#9180) (49c6ce3)
  • snowflake: implement Table.sample (#9071) (307334b)
  • ux: improve error message on unequal schemas during set ops (#9115) (5488896)

Bug Fixes

  • api: treat col == None or col == ibis.NA as col.isnull() (#9114) (711bf9f)
  • bigquery: only register memtable if obj is not None (#9268) (f175d0a)
  • bigquery: quote all parts of table names (#9141) (e1338d5)
  • bigquery: quote qualified memtable names (#9149) (878d0d5)
  • bigquery: strip whitespace from bigquery field names (#9160) (8e5cc3b), closes #9112
  • clickhouse: more explicitly disallow null structs (#9305) (fc1d00f)
  • convert the uint64's from some backends' hash() to the desired int64 (900ecca)
  • datatypes: manually cast the type of pos to int16 for table.info() (#9139) (9eb1ed1)
  • datatypes: manually cast the type of pos to int16 for table.describe() (#9314) (c7fcddf)
  • ddl: use column names, not position, for insertion order (#9264) (3506f40)
  • deps: remove pydruid sqlalchemy dependency (#9092) (a0df103)
  • deps: update dependency datafusion to v37 (#9189) (49ecf8d)
  • deps: update dependency datafusion to v38 (#9278) (77aaecd)
  • deps: update dependency fsspec to <2024.5.1 (#9201) (15a5257)
  • deps: update dependency fsspec to <2024.6.1 (#9304) (d600a0d)
  • deps: update dependency sqlglot to >=23.4,<23.14 (#9118) (d8119fb)
  • deps: update dependency sqlglot to >=23.4,<23.15 (#9151) (ac2201d)
  • deps: update dependency sqlglot to >=23.4,<23.17 (#9209) (82a5f93)
  • deps: update dependency sqlglot to >=23.4,<23.18 (#9212) (b92dd7b)
  • deps: update dependency sqlglot to >=23.4,<24.2 (#9277) (98cb460)
  • deps: update dependency sqlglot to >=23.4,<25.2 ([#9368](htt...
Read more

9.0.0

30 Apr 18:01
Compare
Choose a tag to compare

9.0.0 (2024-04-30)

⚠ BREAKING CHANGES

  • udf: The schema parameter for UDF definition has been removed. A new catalog parameter has been added. Ibis uses the word database to refer to a collection of tables, and the word catalog to refer to a collection of databases. You can use a combination of catalog and database to specify a hierarchical location for the UDF.
  • pyspark: Arguments to create_database, drop_database, and get_schema are now keyword-only except for the name args. Calls to these functions that have relied on positional argument ordering need to be updated.
  • dask: the dask backend no longer supports cov/corr with how="pop".
  • duckdb: Calling the get or contains method on NULL map
    values now returns NULL. Use coalesce(map.get(...), default) or
    coalesce(map.contains(), False) to get the previous behavior.
  • api: Integer inputs to select and mutate are now always interpreted as literals. Columns can still be accessed by their integer index using square-bracket syntax.
  • api: strings passed to table.mutate() are now interpreted as
    column references instead of literals, use ibis.literal(string) to
    pass the string as a literal
  • ir: Schema.apply_to() is removed, use ibis.formats.pandas.PandasConverter.convert_frame() instead
  • ddl: We are removing the word schema in its hierarchical
    sense. We use database to mean a collection of tables. The behavior of
    all *_database methods now applies only to collections of tables and
    never to collections of database (formerly schema)
  • CanListDatabases abstract methods now all refer to
    collections of tables.
  • CanCreateDatabases abstract methods now all refer to
    collections of tables.
  • list_databases now takes a kwarg catalog.
  • create_database now takes a kwarg catalog.
  • drop_database now takes a kwarg catalog.
  • current_database now refers to the current collection of tables.
  • CanCreateSchema is deprecated and create_schema, drop_schema,
    list_schemas, and current_schema are deprecated and redirect to the
    corresponding method/property ending in database.
  • We add a CanListCatalog and CanCreateCatalog that can list and
    create collections of database, respectively.
    The new methods are list_catalogs, create_catalog, drop_catalog,
  • There is a new current_catalog property.
  • api: timecontext feature is removed
  • api: The by argument from asof_join is removed. Calls to asof_join that previously used by should pass those arguments to predicates instead.
  • cleanup: Deprecated methods and properties op, output_dtype, and output_shape are removed. op is no longer needed, and use .dtype and .shape respectively for the other two.
  • api: expr.topk(...) now includes null counts. The row count of the topk call will not differ, but the number of nulls counted will no longer be zero. To drop the null row use the dropna method.
  • api: ibis.rows_with_max_lookback() function and ibis.window(max_lookback) argument are removed
  • strings: Backends that previously used initcap (analogous to str.title) to implement StringValue.capitalize() will produce different results when the input string contains multiple words (a word's definition being backend-specific).
  • impala: Impala UDFs no longer require explicit registration. Remove any calls to Function.register. If you were passing database to Function.register, pass that to scalar_function or aggregate_function as appropriate.
  • pandas: the timecontext feature is not supported anymore
  • api: on paremater of table.asof_join() is now only
    accept a single predicate, use predicates to supply additional
    join predicates.

Features

  • add to_date function to StringValue (#9030) (0701978), closes #8908
  • api: add .as_scalar() method for turning expressions into scalar subqueries (#8350) (8130169)
  • api: add catalog and database kwargs to ibis.table (#8801) (7d593c4)
  • api: add describe method to compute summary stats of table expressions (#8739) (c8d98a1)
  • api: add ibis.today() for retrieving the current date (#8664) (5e10d17)
  • api: add a to_polars() method for returning query results as polars objects (53454c1)
  • api: add a uuid function for returning a new uuid (#8438) (965b6d9)
  • api: add API for unwrapping JSON values into backend-native values (#8958) (aebb5cf)
  • api: add disconnect method (#8341) (32665af), closes #5940
  • api: allow *arg syntax with GroupedTable methods (#8923) (489bb89)
  • api: count nulls with topk (#8531) (54c2c70)
  • api: expose common types in the top-level ibis namespace (#9008) (3f3ed27), closes #8717
  • api: include bad type in NotImplementedError (#8291) (36da06b)
  • api: natively support polars dataframes in ibis.memtable (464bebc)
  • api: support Table.order_by(*keys) (6ade4e9)
  • api: support all dtypes in MapGet and MapContains (#8648) (401e0a4)
  • api: support converting ibis types & schemas to/from polars types & schemas (73add93)
  • api: support Deferreds in Array.map and .filter (#8267) (8289d2c)
  • api: support the inner join convenience to not repeat fields known to be equal (#8127) (798088d)
  • api: support variadic arguments on Table.group_by() (#8546) (665bc4f)
  • backends: introducing ibish the infinite scale backend you always wanted (#8785) (1d51243)
  • bigquery: support polars memtables (26d103d)
  • common: add Dispatched base class for convenient visitor pattern implementation (f80c5b3)
  • common: add Node.find_below() methods to exclude the root node from filtering (#8861) (80d12a2)
  • common: add a memory efficient Node.map() implementation (e3f2217)
  • common: also traverse nodes used as dictionary keys (#9041) (02c6607)
  • common: introduce FrozenOrderedDict (#9081) (f926995), closes #9063
  • datafusion, flink, mssql: add uuid operation (#8545) (2f85a42)
  • datafusion: add array and strings functions ([#...
Read more

8.0.0

05 Feb 19:31
Compare
Choose a tag to compare

8.0.0 (2024-02-05)

⚠ BREAKING CHANGES

  • backends: Columns with Ibis date types are now returned as object dtype containing datetime.date objects when executing with the pandas backend.
  • impala: Direct HDFS integration is removed and support for ingesting pandas DataFrames directly is as well. The Impala backend still works with HDFS, but data in HDFS must be managed outside of ibis.
  • api: replace ibis.show_sql(expr) calls with print(ibis.to_sql(expr)) or if using Jupyter or IPython ibis.to_sql(expr)
  • bigquery: nullifzero is removed; use nullif(0) instead
  • bigquery: zeroifnull is removed; use fillna(0) instead
  • bigquery: list_databases is removed; use list_schemas instead
  • bigquery: the bigquery current_database method returns the data_project instead of the dataset_id. Use current_schema to retrieve dataset_id. To explicitly list tables in a given project and dataset, you can use f"{con.current_database}.{con.current_schema}"

Features

  • api: define RegexSplit operation and re_split API (07beaed)
  • api: support median and quantile on more types (#7810) (49c75a8)
  • clickhouse: implement RegexSplit (e3c507e)
  • datafusion: implement ops.RegexSplit using pyarrow UDF (37b6b7f)
  • datafusion: set ops (37abea9)
  • datatypes: add decimal and basic geospatial support to the sqlglot type parser/generator (59783b9)
  • datatypes: make intervals round trip through sqlglot type mapper (d22f97a)
  • duckdb-geospatial: add support for flipping coordinates (d47088b)
  • duckdb-geospatial: enable use of literals (23ad256)
  • duckdb: implement RegexSplit (229a1f4)
  • examples: add zones geojson example (#8040) (2d562b7), closes #7958
  • flink: add new temporal operators (dfef418)
  • flink: add primary key support (da04679)
  • flink: export result to pyarrow (9566263)
  • flink: implement array operators (#7951) (80e13b4)
  • flink: implement struct field, clean up literal, and adjust timecontext test markers (#7997) (2d5e108)
  • impala: rudimentary date support (d4bcf7b)
  • mssql: add hashbytes and test for binary output hash fns (#8107) (91f60cd), closes #8082 #8082
  • mssql: use odbc (f03ad0c)
  • polars: implement ops.RegexSplit using pyarrow UDF (a3bed10)
  • postgres: implement RegexSplit (c955b6a)
  • pyspark: implement RegexSplit (cfe0329)
  • risingwave: init impl for Risingwave (#7954) (351747a), closes #8038
  • snowflake: implement RegexSplit (2c1a726)
  • snowflake: implement insert method (2162e3f)
  • trino: implement RegexSplit (9d1295f)

Bug Fixes

  • api: deferred values are not truthy (00b3ece)
  • backends: ensure that returned date results are actually proper date values (0626fb2)
  • backends: preserve order_by position in window function when subsequent expressions are duplicated (#7943) (89056b9), closes #7940
  • common: do not convert callables to resolveable objects (9963705)
  • datafusion: work around lack of support for uppercase units in intervals (ebb6cde)
  • datatypes: ensure that array construction supports literals and infers their shape from its inputs (#8049) (899dce1), closes #8022
  • datatypes: fix bad references in to_numpy() (6fd4550)
  • deps: remove filelock from required dependencies (76dded5)
  • deps: update dependency black to v24 (425f7b1)
  • deps: update dependency datafusion to v34 (601f889)
  • deps: update dependency datafusion to v35 (#8224) (a34af25)
  • deps: update dependency oracledb to v2 (e7419ca)
  • deps: update dependency pyarrow to v15 (ef6a9bd)
  • deps: update dependency pyodbc to v5 (32044ea)
  • docs: surround executable code blocks with interactive mode on/off (4c660e0)
  • duckdb: allow table creation from expr with geospatial datatypes (#7818) (ecac322)
  • duckdb: ensure that casting to floating point values produces valid types in generated sql (424b206)
  • examples: use anonymous access when reading example data from GCS (8e5c0af)
  • impala: generate memtables using UNION ALL to work around sqlglot bug (399a5ef)
  • mutate/select: ensure that unsplatted dictionaries work in mutateandselect APIs (#8014) (8ed19ea), closes #8013
  • mysql: catch PyMySQL OperationalError exception (#7919) (f2c2664), closes #6010 #7918
  • pandas: support non-string categorical columns (5de08c7)
  • polars: avoid using unnecessary subquery for schema inference (0f43667)
  • **p...
Read more

7.2.0

18 Dec 23:17
Compare
Choose a tag to compare

7.2.0 (2023-12-18)

Features

  • api: add ArrayValue.flatten method and operation (e6e995c)
  • api: add ibis.range function for generating sequences (f5a0a5a)
  • api: add timestamp range (c567fe0)
  • base: add to_pandas method to BaseBackend (3d1cf66)
  • clickhouse: implement array flatten support (d15c6e6)
  • common: node.replace() now supports mappings for quick lookup-like substitutions (bbc93c7)
  • common: add node.find_topmost() method to locate matching nodes without descending further to their children (15acf7d)
  • common: allow matching on dictionaries in possibly nested patterns (1d314f7)
  • common: expose node.__children__ property to access the flattened list of children of a node (2e91476)
  • duckdb: add initial support for geospatial functions (65f496c)
  • duckdb: add read_geo function (b19a8ce)
  • duckdb: enforce aswkb for projections, coerce to geopandas (33327dc)
  • duckdb: implement array flatten support (0a0eecc)
  • exasol: add exasol backend (295903d)
  • export: allow passing keyword arguments to PyArrow ParquetWriter and CSVWriter (40558fd)
  • flink: implement nested schema support (057fabc)
  • flink: implement windowed computations (256767f)
  • geospatial: add support for GeoTransform on duckdb (ec533c1)
  • geospatial: update read_geo to support url (3baf509)
  • pandas/dask: implement flatten (c2e8d9d)
  • polars: add streaming kwarg to to_pandas (703507f)
  • polars: implement array flatten support (19b2aa0)
  • pyspark: enable multiple values in .substitute (291a290)
  • pyspark: implement array flatten support (5d1fadf)
  • snowflake: implement array flatten support (d3c754f)
  • snowflake: read_csv with https (72752eb)
  • snowflake: support udf arguments for reading from staged files (529a3a2)
  • snowflake: use upstream array_sort (9624341)
  • sqlalchemy: support expressions in window bounds (5dbb3b1)
  • trino: implement array flatten support (0d1faaa)

Bug Fixes

  • api: avoid casting to bool for table.info() nullable column (3b3bd7b)
  • bigquery: escape the schema (project ID) for BQ builtin UDFs (8096552)
  • bigquery: fully qualified memtable names in compile (a81e432)
  • clickhouse: use backwards compatible methods of getting query metadata (975556f)
  • datafusion: bring back UDF registration (43084fa)
  • datafusion: ensure that non-matching re_search calls return bool values when patterns do not match (088b027)
  • datafusion: support computed group by when the aggregation is count distinct (18bdb7e)
  • decompile: handle isin (6857751)
  • deferred: don't pass expression in fstringified error message (724859d)
  • deps: update dependency datafusion to v33 (57047a2)
  • deps: update dependency sqlglot to v20 (13bc6e2)
  • duckdb: ensure that already quoted identifiers are not erased (45ee391)
  • duckdb: ensure that parameter names are unlikely to overlap with column names (d93dbe2)
  • duckdb: gate geoalchemy import in duckdb geospatial (8f012c4)
  • duckdb: render dates, times, timestamps and none literals correctly (5d8866a)
  • duckdb: use functions for temporal literals (b1407f8)
  • duckdb: use the UDF's signature instead of arguments' output type for generating a duckdb signature (233dce1)
  • flink: add more test (33e1a31)
  • flink: add os to the cache key (1b92b33)
  • flink: add test cases for recreate table (1413de9)
  • flink: customize the list of base idenitifers (0b5d343)
  • flink: fix recreating table/view issue on flink backend (0c9791f)
  • flink: implement TypeMapper and SchemaMapper for Flink backend (f983bfa)
  • flink: use lazy import to prevent premature loading of pyflink during gen_matrix (d042402)
  • geospatial: pretty print data in interactive mode (afb04ed)
  • ir: ensure that join projection columns are all always nullable (f5f35c6)
  • ir: handle renaming for scalar operations (6f77f17)
  • ir: handle the case of non-overlapping data and add a test (1c9ae1b)
  • ir: implicitly convert None literals with dt.Null type to the requested type during value coercion (d51ec4e)
  • ir: merge window frames for bound analytic window functions with a subsequent over call (e12ce8d)
  • ir: raise if Concrete.copy() receives unexpected arguments (442199a)
  • memtable: ensure column names match provided data (faf99df)
  • memtables: disallow duplicat...
Read more