- Callers can constrain inference with hints (thanks, @neat-web!)
- DropTransformer can now be used to drop columns
- Anonymizing transformer for PII (thanks, @neat-web!)
- DateTime transformer
- Fix MinMax and StandardScaler to replace null with 0.0 when indicator column present
- Support for conditional sampling
- Fix bug with onehot encoding on single-category column
Thanks to @neat-web for both contributions!
- Update to OpenDP v0.6
- Update PAC-Synth to v0.6
- Fix depenedencies to allow StandardScaler to run on Windows
This release is a breaking change from v0.2.x.
- Add
Synthesizer.create()
factory method as preferred way to create synthesizers. See Getting Started for new factory syntax. - Add library for differentially private reversible data transforms. All synthesizers now accept a
TableTransformer
object, and infer one if none provided. See Data Transforms for more information. - All synthesizers use safe differentially private preprocessing by default.
- Removed option for
log_frequency
from CTGAN synthesizers. - Support for Apple Silicon
- Add diagnostics to GANs to show epsilon spend on preprocessor
- Support for pac-synth DP Marginals synthesizer (thanks, @rracanicci!)
- Bump Python requirement to 3.7+
- Support for measuring Cuboids. Cuboids include multiple disjoint queries that can be measured under a single iteration.
- Default iterations and query count adapt based on dimensionality of source data
- Support for measure-only MWEM, for small cubes with optimal query workloads
- Basic accountant keeps track of spent epsilon
- Removed bin edge support, since we delegate to preprocessor now
- Better handles cases where exponential mechanism can't find a query. Should always find queries to measure now
- Debug flag prints trace information
- Support for MST synthesizer.
- Re-enabled support for continuous values in GAN synthesizers.
- Fixed bug where MWEM was adding too much noise
Bug fix where CTGAN synthesizers could silently use continuous column if called without PytorchDPSynthesizer wrapper.
- Fixed bug in dpsgd synthesizers where final batch was not being counted against budget, potentially causing privacy leak
- Alert caller if continuous column is passed as a categorical column to CTGAN
- Warn if log_frequency for CTGAN is set to unsafe value. Spend a small fraction of epsilon to estimate frequencies for conditional sampling.
- Fixed DPCTGAN regression that was impairing utility
- MWEM allows sampling of arbitrary number of records
- Missing splits now handled correctly
- All synthesizers support numpy or pandas input
- Update to use newest CTGAN
- Initial release.
- Split smartnoise-sdk into 3 packages, switch to use
opendp
instead ofsmartnoise-core
.