Skip to content
This repository has been archived by the owner on Apr 8, 2022. It is now read-only.

Commit

Permalink
Added Python bindings.
Browse files Browse the repository at this point in the history
  • Loading branch information
siavashserver committed Feb 27, 2018
1 parent 470f68a commit 1a84a4e
Show file tree
Hide file tree
Showing 4 changed files with 446 additions and 10 deletions.
115 changes: 105 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Introduction

**neonrvm** is an experimental open source machine learning library written in C
programming language for performing regression tasks using [RVM] technique.
**neonrvm** is an experimental open source machine learning library for
performing regression tasks using [RVM] technique. It is written in C
programming language and comes with bindings for the Python programming
language.

Under the hood neonrvm uses expectation maximization fitting method, and allows
basis functions to be fed incrementally to the model. This helps to keep training
Expand Down Expand Up @@ -30,20 +32,38 @@ neonrvm requires a linear algebra library providing CBLAS/LAPACKE
interface to do its magic. Popular ones are [Intel MKL], [OpenBLAS], and
the reference [Netlib LAPACKE].

Python bindings can be installed from the source package using [Flit] Python
module by simply running:
```Shell
flit install
```

or from [PyPI] software repository using following command:
```Shell
pip install neonrvm
```

[CMake]: https://cmake.org/
[Intel MKL]: https://software.intel.com/mkl
[OpenBLAS]: http://www.openblas.net/
[Netlib LAPACKE]: http://www.netlib.org/lapack/
[Flit]: https://github.com/takluyver/flit
[PyPI]: https://pypi.python.org/pypi

---

# Using neonrvm

Congratulations, you survived the build process! Following are general tips
and steps in order to train your model and perform predictions using neonrvm.
Please have a look at `example.c` for a working sample code. At this point it's
a good idea to grab the original RVM paper and other related papers to get a
feeling of inner workings of the RVM technique and different parameters.
Please have a look at `example.c` and `example.py` for working sample codes.
At this point it's a good idea to grab the original RVM paper and other related
papers to get a feeling of inner workings of the RVM technique and different
parameters.

In order to keep repetitions in this document lower, Python bindings are
briefly documented. Errors reported by the library, will be raised as
exceptions in Python.

[Sparse Bayesian Models (and the RVM)](http://www.miketipping.com/sparsebayes.htm)

Expand Down Expand Up @@ -97,8 +117,11 @@ need to prepare a 2D `m*m` matrix.

`neonrvm_cache` structure acts as a cache for storing a couple of intermediate
training results and allows us to reuse memory as much as possible during
learning process. You can create one using `neonrvm_create_cache` function
described below:
learning process.

### C/C++

You can create one using `neonrvm_create_cache` function described below:

```C
int neonrvm_create_cache(neonrvm_cache** cache, double* y, size_t count)
Expand Down Expand Up @@ -128,10 +151,26 @@ int neonrvm_destroy_cache(neonrvm_cache* cache)
- `NEONRVM_SUCCESS`: After successful execution.
- `NEONRVM_INVALID_Px`: When facing erroneous parameters.

### Python

You simply need to create a new `Cache` instance, no need for manual memory
management.

```Python
class Cache(y: numpy.ndarray)
```

*Returns*
- A new `Cache` instance.

## Step 3: Creating training parameters

`neonrvm_param` structure deals with training convergence conditions and
initial values. Use `neonrvm_create_param` function to create one:
initial values.

### C/C++

Use `neonrvm_create_param` function to create one:

```C
int neonrvm_create_param(neonrvm_param** param,
Expand Down Expand Up @@ -173,6 +212,18 @@ int neonrvm_destroy_param(neonrvm_param* param)
- `NEONRVM_SUCCESS`: After successful execution.
- `NEONRVM_INVALID_Px`: When facing erroneous parameters.
### Python
A new `Param` instance should be created:
```Python
class Param(alpha_init: float, alpha_max: float, alpha_tol: float,
beta_init: float, basis_percent_min: float, iter_max: int)
```

*Returns*
- A new `Param` instance.

## Step 4: Training the model

`neonrvm_train` function requires a pair of training parameter structures, one
Expand Down Expand Up @@ -201,6 +252,7 @@ more relaxed convergence conditions. In other words, a highly polished and
sparse model isn't required.

### B) Finalized kernel parameters and model

When you are finished with tuning kernel parameters and trying different model
creation ideas, you need access to the best basis functions and finely tuned
weights associated to them in order to make accurate predictions.
Expand All @@ -210,6 +262,7 @@ parameters with low basis function percentage and high iteration count for the
polishing step in this case.

### 🍔) Big data sets

Memory and storage requirements do quickly skyrocket when dealing with large
data sets. You don't necessarily need to feed the whole design matrix to the
neonrvm all at once. It can also be fed in smaller chunks by loading different
Expand All @@ -220,6 +273,8 @@ process incrementally at higher level through caching mechanism provided. You
just need to make multiple `neonrvm_train` function calls and neonrvm will
store the useful basis functions in the given `neonrvm_cache` on the go.

### C/C++

Alright, now that we covered the different use cases, it's time to get familiar
with the `neonrvm_train` function:

Expand Down Expand Up @@ -254,6 +309,16 @@ int neonrvm_train(neonrvm_cache* cache, neonrvm_param* param1, neonrvm_param* pa
equations.
- `NEONRVM_MATH_ERROR`: When `NaN` or `∞` numbers show up in the calculations.
### Python
```Python
def train(cache: Cache, param1: Param, param2: Param,
phi: numpy.ndarray, index: numpy.ndarray, batch_size_max: int)
```

*Returns*
- Nothing that I'm aware of.

[SciPy]: https://www.scipy.org/

## Step 5: Getting the training results
Expand All @@ -266,6 +331,8 @@ You should first get the useful basis functions count, and then allocate enough
memory for the basis function indices and weights vectors so neonrvm can fill
them for you.

### C/C++

```C
int neonrvm_get_training_stats(neonrvm_cache* cache, size_t* basis_count, bool* bias_used)
```
Expand Down Expand Up @@ -296,6 +363,20 @@ int neonrvm_get_training_results(neonrvm_cache* cache, size_t* index, double* mu
- `NEONRVM_SUCCESS`: After successful execution.
- `NEONRVM_INVALID_Px`: When facing erroneous parameters.

### Python

A single function call is enough:

```Python
def get_training_results(cache: Cache)
```

*Returns*
- `index`: `numpy.ndarray`
- `mu`: `numpy.ndarray`
- `basis_count`: `int`
- `bias_used`: `bool`

## Step 6: Profit!

Now that you have the indices and weights of the useful data in hand, you can
Expand All @@ -311,8 +392,11 @@ number of useful basis functions.

Prediction are made simply by multiplying the result matrix and weights vector.
Output vector with a lenghts equal to the number of new input data samples
contains the prediction outcomes. You can use the `neonrvm_predict` function to
make predictions.
contains the prediction outcomes.

### C/C++

You can use the `neonrvm_predict` function to make predictions.

```C
int neonrvm_predict(double* phi, double* mu,
Expand All @@ -334,6 +418,17 @@ int neonrvm_predict(double* phi, double* mu,
- `NEONRVM_INVALID_Px`: When facing erroneous parameters.
- `NEONRVM_MATH_ERROR`: When `NaN` or `∞` numbers show up in the calculations.
### Python
Number of `phi` columns and `mu` length should match.
```Python
def predict(phi: np.ndarray, mu: np.ndarray)
```

*Returns*
- `y`: `numpy.ndarray`

---

# License
Expand Down
76 changes: 76 additions & 0 deletions example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
import numpy as np
from sklearn.metrics.pairwise import rbf_kernel
from sklearn.preprocessing import StandardScaler

import neonrvm


def generate_input_data(num_samples, range_x, scale_y, scale_noise):
index = np.arange(num_samples)

x = np.linspace(-range_x, range_x, num_samples, False).reshape(-1, 1)

x_pp = StandardScaler().fit(x).transform(x)

noise_y = np.empty(num_samples).reshape(-1, 1)
for i in range(num_samples):
noise_y[i] = 1.0 if (i % 2 == 0) else -1.0

y = scale_y * np.sinc(x) + scale_noise * noise_y

return index, x, x_pp, y


def generate_design_matrix(x, y, gamma, bias_used=False):
design_matrix = rbf_kernel(x, y, gamma)

if bias_used is True:
design_matrix = np.append(design_matrix, np.ones_like(x), axis=1)

return design_matrix


# Generate sample input data using sinc function
index_basis, x, x_pp, y = generate_input_data(500, 5.0, 5.0, 1e-3)

# Preparing the design matrix
gamma = 10.0
design_matrix = generate_design_matrix(x_pp, x_pp, gamma)

# Setting learning parameters
c = neonrvm.Cache(y)
p1 = neonrvm.Param(1e-6, 1e3, 1e-1, 1e-6, 80.0, 100)
p2 = neonrvm.Param(1e-6, 1e3, 1e-2, 1e-6, 00.0, 300)

# Starting the learning process
neonrvm.train(c, p1, p2, design_matrix, index_basis, 100)

# Getting back the training results
index_relevant, mu, basis_count, bias_used = neonrvm.get_training_results(c)

# Printing the useful input data, and their associated weights
if bias_used is True:
index_relevant = index_relevant[:-1]

x_rel = x[index_relevant]
y_rel = y[index_relevant]

for i in range(index_relevant.size):
index = index_relevant[i]
print("item: {:2d}, index: {:3d}, mu: {:6.3f}, x: {:6.3f}, y: {:6.3f}".format(i, index_basis[index], mu[i],
x.item((index, 0)),
y.item((index, 0))))

if bias_used is True:
print("item: bias, mu: {:6.3f}".format(mu[-1]))

# Test the model performance using the already preprocessed training data for the sake of simplicity
x_pp_rel = x_pp[index_relevant]

phi_rel = generate_design_matrix(x_pp, x_pp_rel, gamma, bias_used)

y_predicted = neonrvm.predict(phi_rel, mu).reshape(-1, 1)

# Printing prediction and training stats
print("Mean Absolute Percentage Error: {:.2f}".format(100.0 * np.mean(np.abs((y - y_predicted) / y))))
print("Percentage of Relevance Vectors: {:.2f}".format(100.0 * basis_count / x.size))
Loading

0 comments on commit 1a84a4e

Please sign in to comment.