Skip to content

Commit

Permalink
house cleaning. Fixes #356 (#368)
Browse files Browse the repository at this point in the history
  • Loading branch information
fdosani authored Jan 9, 2025
1 parent 23e3ab1 commit 2d7e8aa
Show file tree
Hide file tree
Showing 5 changed files with 24 additions and 18 deletions.
4 changes: 3 additions & 1 deletion CONTRIBUTORS
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,6 @@
- Mark Zhou
- Ian Whitestone
- Faisal Dosani
- Lorenzo Mercado
- Lorenzo Mercado
- Jacob Dawang
- Raymond Haffar
18 changes: 13 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,19 @@
![PyPI - Downloads](https://img.shields.io/pypi/dm/datacompy)


DataComPy is a package to compare two Pandas DataFrames. Originally started to
be something of a replacement for SAS's ``PROC COMPARE`` for Pandas DataFrames
with some more functionality than just ``Pandas.DataFrame.equals(Pandas.DataFrame)``
(in that it prints out some stats, and lets you tweak how accurate matches have to be).
Then extended to carry that functionality over to Spark Dataframes.
DataComPy is a package to compare two DataFrames (or tables) such as Pandas, Spark, Polars, and
even Snowflake. Originally it was created to be something of a replacement
for SAS's ``PROC COMPARE`` for Pandas DataFrames with some more functionality than
just ``Pandas.DataFrame.equals(Pandas.DataFrame)`` (in that it prints out some stats,
and lets you tweak how accurate matches have to be). Supported types include:

- Pandas
- Polars
- Spark
- Snowflake (via snowpark)
- Dask (via Fugue)
- DuckDB (via Fugue)


## Quick Installation

Expand Down
10 changes: 2 additions & 8 deletions ROADMAP.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,7 @@ datacompy Roadmap
-----------------

At this current time ``datacompy`` is in a stable state. We are planning on continuing to
add features and functionality as the community of users asks for them, but there are no
add features and functionality as the community of users asks for them, but there are no
pressing issues which we are looking to add in immediately.

There are some longer term issues which are open for people to work on, and some which are more of a nice to have.
We are looking for contributors and also maintaners to help with the project.

- Add in docs how to change the number of mismatches in report `#6 <https://github.com/capitalone/datacompy/issues/6>`_
- Make duplicate handling better `#7 <https://github.com/capitalone/datacompy/issues/7>`_
- Refactor Spark datacompy `#13 <https://github.com/capitalone/datacompy/issues/13>`_
- Drop Python 3.7 suport `#173 <https://github.com/capitalone/datacompy/issues/173>`_
Please feel free to check the issues section of the repository for the most up to date list.
4 changes: 2 additions & 2 deletions docs/source/pandas_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Overview
--------

The main goal of ``datacompy`` is to provide a human-readable output describing
differences between two dataframes. For example, if you have two dataframes
differences between two dataframes. For example, if you have two dataframes
containing data like:

df1
Expand Down Expand Up @@ -289,4 +289,4 @@ There's a number of limitations with ``datacompy``:
#Numpy testing
npt.assert_array_equal(arr1, arr2)
npt.assert_almost_equal(obj1, obj2)
npt.assert_almost_equal(obj1, obj2)
6 changes: 4 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,14 @@ name = "datacompy"
description = "Dataframe comparison in Python"
readme = "README.md"
authors = [
{ name="Faisal Dosani", email="[email protected]" },
{ name="Ian Robertson" },
{ name="Dan Coates" },
{ name="Faisal Dosani", email="[email protected]" },
]
maintainers = [
{ name="Faisal Dosani", email="[email protected]" }
{ name="Faisal Dosani", email="[email protected]" },
{ name="Jacob Dawang", email="[email protected]" },
{ name="Raymond Haffar", email="[email protected]" },
]
license = {text = "Apache Software License"}
dependencies = ["pandas<=2.2.3,>=0.25.0", "numpy<=2.2.0,>=1.22.0", "ordered-set<=4.1.0,>=4.0.2", "polars[pandas]<=1.17.1,>=0.20.4"]
Expand Down

0 comments on commit 2d7e8aa

Please sign in to comment.