Vamp is an C/C++ plugin API for audio analysis / feature extraction plugins: https://www.vamp-plugins.org
This SDK for plugins and hosts targets performance-critical applications by:
- reducing memory allocations, no memory allocation during processing
- simplifying and restricting the plugin API
constexpr
evaluation for compile time errors instead of runtime errors
The SDK aims to be well tested, cross-platform and use modern C++. The plugin SDK is available as a single-header library (download as asset from latest release page).
Compiler support: GCC >= 10
, Clang >= 11
, MSVC >= 19.30
Note: Python bindings for the hostsdk are available via PyPI. Please check out the documentation.
- API documentation
- Examples
- Vamp ecosystem:
- Great collection of plugins
- Sonic Visualiser: Open-source software to visualize, analyze and annotate audio
- Sonic Annotator: Batch tool for feature extraction
- Audacity supports Vamp plugins
Following benchmarks compare the performance/overhead of the plugin SDKs based on a simple RMS plugin. The performance is measured as throughput (number of processed samples per second).
Results with an i7-9850H CPU (12 cores):
Throughput vs. block size | Multithreading |
---|---|
Results with an ARMv7 CPU: Throughput vs block size, Multithreading
The official SDK offers a convenient C++ plugin interface. But there are some drawbacks for real-time processing:
-
Huge amount of memory allocations due to the use of C++ containers like vectors and lists passed by value.
Let's have a look at the
process
method of theVamp::Plugin
class which does the main work:FeatureSet process(const float *const *inputBuffers, RealTime timestamp)
FeatureSet
is returned by value and is astd::map<int, FeatureList>
.FeatureList
is astd::vector<Feature>
andFeature
isstruct
containing the actual feature values as astd::vector<float>
. So in total, those are three nested containers, which are all heap allocated. -
The C++ API is a wrapper of the C API:
On the plugin side, the
PluginAdapter
class converts the C++ containers to C level (code). Therefore the C++ containers are temporary objects and will be deallocated shortly after creation.On the host side, the
PluginHostAdapter
converts again from the C to the C++ representation (code).
The rt-vamp-plugin-sdk
aims to to keep the overhead minimal but still provide an easy and safe to use API:
- Static plugin informations are provided as
static constexpr
variables to generate the C plugin descriptor at compile time. - The computed features are returned by reference (as a
std::span
) to prevent heap allocations during processing. - The input buffer is provided either as a
TimeDomainBuffer
(std::span<const float>
) or aFrequencyDomainBuffer
(std::span<const std::complex<float>>
). The process method takes astd::variant<TimeDomainBuffer, FrequencyDomainBuffer>
. A wrong input buffer type will result in an exception. The sized spans enable easy iteration over the input buffer data.
Following features of the Vamp API Vamp::Plugin
are restricted within the rt-vamp-plugin-sdk
:
-
OutputDescriptor::hasFixedBinCount == true
for every output. The number of values is constant for each feature during processing. This has the advantage, that memory for the feature vector can be preallocated. -
OutputDescriptor::SampleType == OneSamplePerStep
for every output. The plugin will generate one feature set for each input block.Following parameters are therefore negitable:
OutputDescriptor::sampleRate
OutputDescriptor::hasDuration
Feature::hasTimestamp
&Feature::timestamp
Feature::hasDuration
&Feature::duration
-
Only one input channel allowed:
getMinChannelCount() == 1
More examples can be found here: https://github.com/lukasberbuer/rt-vamp-plugin-sdk/tree/master/examples.
class ZeroCrossing : public rtvamp::pluginsdk::Plugin<1 /* one output */> {
public:
using Plugin::Plugin; // inherit constructor
static constexpr Meta meta{
.identifier = "zerocrossing",
.name = "Zero crossings",
.description = "Detect and count zero crossings",
.maker = "LB",
.copyright = "MIT",
.pluginVersion = 1,
.inputDomain = InputDomain::Time,
};
OutputList getOutputDescriptors() const override {
return {
OutputDescriptor{
.identifier = "counts",
.name = "Zero crossing counts",
.description = "The number of zero crossing points per processing block",
.unit = "",
.binCount = 1,
},
};
}
bool initialise(uint32_t stepSize, uint32_t blockSize) override {
initialiseFeatureSet(); // automatically resizes feature set to number of outputs and bins
return true;
}
void reset() override {
previousSample_ = 0.0f;
}
const FeatureSet& process(InputBuffer buffer, uint64_t nsec) override {
const auto signal = std::get<TimeDomainBuffer>(buffer);
size_t crossings = 0;
bool wasPositive = (previousSample_ >= 0.0f);
for (const auto& sample : signal) {
const bool isPositive = (sample >= 0.0f);
crossings += int(isPositive != wasPositive);
wasPositive = isPositive;
}
previousSample_ = signal.back();
auto& result = getFeatureSet();
result[0][0] = crossings; // first and only output, first and only bin
return result; // return span/view of the results
}
private:
float previousSample_ = 0.0f;
};
RTVAMP_ENTRY_POINT(ZeroCrossing)
// list all plugins keys (library:plugin)
for (auto&& key : rtvamp::hostsdk::listPlugins()) {
std::cout << key.get() << std::endl;
}
auto plugin = rtvamp::hostsdk::loadPlugin("minimal-plugin:zerocrossing", 48000 /* samplerate */);
plugin->initialise(4096 /* step size */, 4096 /* block size */);
std::vector<float> buffer(4096);
// fill buffer with data from audio file, sound card, ...
auto features = plugin->process(buffer, 0 /* timestamp nanoseconds */);
std::cout << "Zero crossings: " << features[0][0] << std::endl;