Releases: sb-ai-lab/RePlay
v0.18.0
RePlay 0.18.0 Release notes
- Highlights
- Backwards Incompatible Changes
- Improvements
Highlights
We are excited to announce the release of RePlay 0.18.0!
In this release, we added Python 3.11 support, updated dependency versions to the latest ones, and improved performance of the transformers (Bert4Rec
, SasRec
).
Backwards Incompatible Changes
No changes.
Improvements
Performance of the transformers
Inside the models, when using torch.nn.MultiheadAttention
, all the conditions for using optimized implementation are met. You can read more about them in the class description here. In addition, there is also a decrease in memory costs, so you can use a longer sequence length or increase the size of the batch when learning.
v0.17.1
RePlay 0.17.0 Release notes
- Highlights
- Backwards Incompatible Changes
- New Features
Highlights
We are ready to announce the release of RePlay 0.17.1!
In this release, we introduced item undersampling filter QuantileItemsFilter in replay.preprocessing.filters.
Backwards Incompatible Changes
No changes.
New Features
Undersampling filter QuantileItemsFilter in replay.preprocessing.filters
v0.17.0
RePlay 0.17.0 Release notes
- Highlights
- Backwards Incompatible Changes
- Deprecations
- New Features
- Improvements
- Bug fixes
Highlights
We are excited to announce the release of RePlay 0.17.0!
The new version fixes serious bugs related to the performance of LabelEncoder
and saving checkpoints in transformers. In addition, methods have been added to save splitters
and SequentialTokenizer
without using pickle.
Backwards Incompatible Changes
Change SequentialDataset
behavior
When training transformers on big data, a slowdown was detected that increased the epoch time from 5 minutes to 1 hour. The slowdown was due to the fact that by default, the model trainer saves checkpoints every 50 steps of the epoch. While saving the checkpoint, not only the model was saved, but also the entire training dataset was implicitly saved. The behavior was corrected by changing the SequentialDataset
and the callbacks used in it. Therefore, using SequentialDataset
from older versions will not be possible. Otherwise, no interface changes were required.
Deprecations
Added a deprecation warning related to saving splitters
and SequentialTokenizer
using a pickle. In future versions, the functionality will be removed.
New Features
A new strategy in the LabelEncoder
The drop
strategy has been added. It allows you to throw tokens from the dataset that were not present at the training stage. If all rows are deleted, the corresponding warning will appear.
New Linters
We keep up with the latest trends in code quality control, so the list of linters for testing code quality has been updated. The use of Pylint
and PyCodestyle
has been removed. Added the linters Ruff
, Black
and toml-sort
.
Improvements
PyArrow dependency
The dependency on PyArrow
has been adjusted. The RePlay now can work with any version that is greater than 12.0.1
.
Bug fixes
Performance fixes at the partial_fit
stage in LabelEncoder
The slowdown occurred when using DataFrame
from Pandas
. The partial_fit
stage had a quadratic running time. The bug has been fixed, now the time linearly depends on the size of the dataset.
Timestamp tokenization when using SasRec
Fixed an error that occurs when training a SasRec
transformer with a ti_modification=True
parameter.
Loading a checkpoint with a modified embedding in the transformers
The error occurred when loading the model on another device, when the dimensions of embeddings in transformers were changed before that. The example of working with embeddings in transformers has been updated.
v0.16.0
- It was introduced the support of the dateframes from the
polars
package. This is available in the following modules: data (Dataset, SequenceTokenizer, SequentialDataset) for working with transformers, metrics, preprocessing and splitters. The new format allows to achieve multiple acceleration of calculations relative to thePandas
andPySpark
dataframes. You can see more details about usage in the examples. - Removed dependencies on
seaborn
andmatplotlib
. Removed functionsreplay.utils.distributions.plot_item_dist
andreplay.utils.distributions.plot_user_dist
. - Added functions to get and set embeddings in transformers -
get_all_embeddings
,set_item_embeddings_by_size
,set_item_embeddings_by_tensor
,append_item_embeddings
. You can see more details about their use in the examples. - Added a
QueryEmbeddingsPredictionCallback
to get query embeddings at the inference stage in transformers. You can see more details about usage in the examples. - Added support for numerical features in
SequenceTokenizer
andTorchSequentialDataset
. It becomes possible to use numerical features inside transformers. - Auto padding for inference stage of transformer-based models in a single-user mode is supported.
- Added a new KL UСB model based on https://arxiv.org/pdf/1102.2490.pdf.
- Added a callback to calculate
cardinality
inTensorSchema
. Now it is not necessary to pass thecardinality
parameter, the value will be calculated automatically. - Added the
core_count
parameter toreplay.utils.session_handler.get_spark_session
. If nothing is specified, the env variablesREPLAY_SPARK_CORE_COUNT
andREPLAY_SPARK_MEMORY
are taken into account. If they are not specified, the value is set to-1
. - Corrected the behavior of the
item_count
parameter inValidationMetricsCallback
. If you are not going to calculate theCoverage
metric, then you do not need to pass this parameter. - The calculation of the
Coverage
metric onPandas
andPySpark
has been aligned. - Removed conversion from
PySpark
toPandas
in some models. Added theallow_collect_to_master
parameter,False
by default. - 100% test coverage has been achieved.
- Undetectable type correction during fit in
LabelEncoder
. The problem occurred when using multiple tuples with null values. - Changes in the experimental part:
- Python 3.10 is supported
- Interface updates due to the d3rlpy version update
- Adding a DesicionTransformer
v0.15.0
- Bert4Rec and SasRec interfaces naming was aligned with each others
- Minor changes in sasrec_example regarding naming
v0.14.0
- Introduced support for various hardware configurations including CPU, GPU, Multi-GPU and Clusters (based on PySpark)
- The part of the library was moved to experimental submodule for further stabilizing and productizing
- Preprocessing, splitters, metrics support pandas now
- Introduced 2 SOTA models: BERT4Rec and SASRec transformers with online and offline inference
Let's start a new chapter of RePlay! 🚀🚀🚀