API
This is the API page of the package. For a general overview of the functionalities consult the ReadMe.
General Functionalities
StreamSampling.ReservoirSample
— TypeReservoirSample{T}([rng], method = AlgRSWRSKIP())
-ReservoirSample{T}([rng], n::Int, method = AlgL(); ordered = false)
Initializes a reservoir sample which can then be fitted with fit!
. The first signature represents a sample where only a single element is collected. If ordered
is true, the reservoir sample values can be retrived in the order they were collected with ordvalue
.
Look at the Sampling Algorithms
section for the supported methods.
StatsAPI.fit!
— Functionfit!(rs::AbstractReservoirSample, el)
-fit!(rs::AbstractReservoirSample, el, w)
Updates the reservoir sample by taking into account the element passed. If the sampling is weighted also the weight of the elements needs to be passed.
Base.merge!
— FunctionBase.merge!(rs::AbstractReservoirSample, rs::AbstractReservoirSample...)
Updates the first reservoir sample by merging its value with the values of the other samples. Currently only supported for samples with replacement.
Base.merge
— FunctionBase.merge(rs::AbstractReservoirSample...)
Creates a new reservoir sample by merging the values of the samples passed. Currently only supported for sample with replacement.
Base.empty!
— FunctionBase.empty!(rs::AbstractReservoirSample)
Resets the reservoir sample to its initial state. Useful to avoid allocating a new sample in some cases.
OnlineStatsBase.value
— Functionvalue(rs::AbstractReservoirSample)
Returns the elements collected in the sample at the current sampling stage.
Note that even if the sampling respects the schema it is assigned when ReservoirSample
is instantiated, some ordering in the sample can be more probable than others. To represent each one with the same probability call shuffle!
over the result.
StreamSampling.ordvalue
— Functionordvalue(rs::AbstractReservoirSample)
Returns the elements collected in the sample at the current sampling stage in the order they were collected. This applies only when ordered = true
is passed in ReservoirSample
.
StatsAPI.nobs
— Functionnobs(rs::AbstractReservoirSample)
Returns the total number of elements that have been observed so far during the sampling process.
StreamSampling.StreamSample
— TypeStreamSample{T}([rng], iter, n, [N], method = AlgD())
Initializes a stream sample, which can then be iterated over to return the sampling elements of the iterable iter
which is assumed to have a eltype of T
. The methods implemented in StreamSample
require the knowledge of the total number of elements in the stream N
, if not provided it is assumed to be available by calling length(iter)
.
StreamSampling.itsample
— Functionitsample([rng], iter, method = AlgRSWRSKIP())
+ReservoirSample{T}([rng], n::Int, method = AlgL(); ordered = false)
Initializes a reservoir sample which can then be fitted with fit!
. The first signature represents a sample where only a single element is collected. If ordered
is true, the reservoir sample values can be retrived in the order they were collected with ordvalue
.
Look at the Sampling Algorithms
section for the supported methods.
StatsAPI.fit!
— Functionfit!(rs::AbstractReservoirSample, el)
+fit!(rs::AbstractReservoirSample, el, w)
Updates the reservoir sample by taking into account the element passed. If the sampling is weighted also the weight of the elements needs to be passed.
Base.merge!
— FunctionBase.merge!(rs::AbstractReservoirSample, rs::AbstractReservoirSample...)
Updates the first reservoir sample by merging its value with the values of the other samples. Currently only supported for samples with replacement.
Base.merge
— FunctionBase.merge(rs::AbstractReservoirSample...)
Creates a new reservoir sample by merging the values of the samples passed. Currently only supported for sample with replacement.
Base.empty!
— FunctionBase.empty!(rs::AbstractReservoirSample)
Resets the reservoir sample to its initial state. Useful to avoid allocating a new sample in some cases.
OnlineStatsBase.value
— Functionvalue(rs::AbstractReservoirSample)
Returns the elements collected in the sample at the current sampling stage.
Note that even if the sampling respects the schema it is assigned when ReservoirSample
is instantiated, some ordering in the sample can be more probable than others. To represent each one with the same probability call shuffle!
over the result.
StreamSampling.ordvalue
— Functionordvalue(rs::AbstractReservoirSample)
Returns the elements collected in the sample at the current sampling stage in the order they were collected. This applies only when ordered = true
is passed in ReservoirSample
.
StatsAPI.nobs
— Functionnobs(rs::AbstractReservoirSample)
Returns the total number of elements that have been observed so far during the sampling process.
StreamSampling.StreamSample
— TypeStreamSample{T}([rng], iter, n, [N], method = AlgD())
Initializes a stream sample, which can then be iterated over to return the sampling elements of the iterable iter
which is assumed to have a eltype of T
. The methods implemented in StreamSample
require the knowledge of the total number of elements in the stream N
, if not provided it is assumed to be available by calling length(iter)
.
StreamSampling.itsample
— Functionitsample([rng], iter, method = AlgRSWRSKIP())
itsample([rng], iter, wfunc, method = AlgWRSWRSKIP())
Return a random element of the iterator, optionally specifying a rng
(which defaults to Random.default_rng()
) and a function wfunc
which accept each element as input and outputs the corresponding weight. If the iterator is empty, it returns nothing
.
itsample([rng], iter, n::Int, method = AlgL(); ordered = false)
itsample([rng], iter, wfunc, n::Int, method = AlgAExpJ(); ordered = false)
Return a vector of n
random elements of the iterator, optionally specifying a rng
(which defaults to Random.default_rng()
) a weight function wfunc
and a method
. ordered
dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in iter
) must be collected.
If the iterator has less than n
elements, in the case of sampling without replacement, it returns a vector of those elements.
itsample(rngs, iters, n::Int)
-itsample(rngs, iters, wfuncs, n::Int)
Parallel implementation which returns a sample with replacement of size n
from the multiple iterables. All the arguments except from n
must be tuples.
Sampling Algorithms
StreamSampling.AlgR
— TypeImplements random reservoir sampling without replacement. To be used with ReservoirSample
or itsample
.
Adapted from algorithm R described in "Random sampling with a reservoir, J. S. Vitter, 1985".
StreamSampling.AlgL
— TypeImplements random reservoir sampling without replacement. To be used with ReservoirSample
or itsample
.
Adapted from algorithm L described in "Random sampling with a reservoir, J. S. Vitter, 1985".
StreamSampling.AlgRSWRSKIP
— TypeImplements random reservoir sampling with replacement. To be used with ReservoirSample
or itsample
.
Adapted fron algorithm RSWR-SKIP described in "Reservoir-based Random Sampling with Replacement from Data Stream, B. Park et al., 2008".
StreamSampling.AlgARes
— TypeImplements weighted random reservoir sampling without replacement. To be used with ReservoirSample
or itsample
.
Adapted from algorithm A-Res described in "Weighted random sampling with a reservoir, P. S. Efraimidis et al., 2006".
StreamSampling.AlgAExpJ
— TypeImplements weighted random reservoir sampling without replacement. To be used with ReservoirSample
or itsample
.
Adapted from algorithm A-ExpJ described in "Weighted random sampling with a reservoir, P. S. Efraimidis et al., 2006".
StreamSampling.AlgWRSWRSKIP
— TypeImplements weighted random reservoir sampling with replacement. To be used with ReservoirSample
or itsample
.
Adapted from algorithm WRSWR-SKIP described in "Weighted Reservoir Sampling with Replacement from Multiple Data Streams, A. Meligrana, 2024".
StreamSampling.AlgD
— TypeImplements random sampling without replacement. To be used with StreamSample
or itsample
.
Adapted from algorithm D described in "An Efficient Algorithm for Sequential Random Sampling, J. S. Vitter, 1987".
StreamSampling.AlgORDSWR
— TypeImplements random stream sampling with replacement. To be used with StreamSample
or itsample
.
Adapted from algorithm 4 described in "Generating Sorted Lists of Random Numbers, J. L. Bentley et al., 1980".