Skip to content

Commit

Permalink
Merge pull request #1 from rapidsai/branch-0.14
Browse files Browse the repository at this point in the history
Merge updates
  • Loading branch information
bsuryadevara authored May 5, 2020
2 parents 998cd9e + 5803d9a commit 8258e86
Show file tree
Hide file tree
Showing 17 changed files with 1,035 additions and 79 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,16 @@

## New Features
- PR #141 CUDA BERT Tokenizer
- PR #152 Local gpuCI build script
- PR #133 Phishing detection using BERT

## Improvements
- PR #149 Add Versioneer
- PR #151 README and CONTRIBUTING updates

## Bug Fixes
- PR #150 Fix splunk alert workflow test
- PR #154 Local gpuCI build fix

# clx 0.13.0 (Date TBD)

Expand Down
7 changes: 1 addition & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ To install CLX from source, ensure the dependencies are met and follow the steps
1) Clone the repository and submodules

```bash
# Set the localtion to CLX in an environment variable CLX_HOME
# Set the location to CLX in an environment variable CLX_HOME
export CLX_HOME=$(pwd)/clx

# Download the CLX repo
Expand Down Expand Up @@ -173,11 +173,6 @@ $ ./build.sh libclx -n # compile libclx but do not install

Note: This conda installation only applies to Linux and Python versions 3.6/3.7.

### Building and Testing on a gpuCI image locally

Before submitting a pull request, you can do a local build and test on your machine that mimics our gpuCI environment using the `ci/local/build.sh` script.
For detailed information on usage of this script, see [here](ci/local/README.md).

## Creating documentation

Python API documentation can be generated from [docs](docs) directory.
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ RUN apt update -y --fix-missing && \
apt install -y vim

RUN source activate rapids \
&& conda install -c pytorch pytorch==1.3.1 torchvision=0.4.2 datashader>=0.10.* panel=0.6.* geopandas>=0.6.* pyppeteer s3fs \
&& conda install -c pytorch pytorch==1.3.1 torchvision=0.4.2 datashader>=0.10.* panel=0.6.* geopandas>=0.6.* pyppeteer s3fs ipywidgets \
&& pip install "git+https://github.com/rapidsai/cudatashader.git"

# libclx build/install
Expand Down
112 changes: 59 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,31 +71,69 @@ for rule in alerts_per_day_piv.columns:

```

## Installation
CLX is available in a Docker container, by building from source, and through Conda installation. There are multiple ways to start the CLX container, depending on if you want a container with only RAPIDS and CLX or you want multiple contianers to run that enable SIEM integration and data ingest.
## Getting Started With Workflows

In addition to traditional Python files and Jupyter notebooks, CLX also includes structure in the form of a workflow. A workflow is a series of data transformations performed on a [GPU dataframe](https://github.com/rapidsai/cudf) that contains raw cyber data, with the goal of surfacing meaningful cyber analytical output. Multiple I/O methods are available, including Kafka and on-disk file stores.

Example flow workflow reading and writing to file:

```python
from clx.workflow import netflow_workflow

source = {
"type": "fs",
"input_format": "csv",
"input_path": "/path/to/input",
"schema": ["firstname","lastname","gender"],
"delimiter": ",",
"required_cols": ["firstname","lastname","gender"],
"dtype": ["str","str","str"],
"header": "0"
}
dest = {
"type": "fs",
"output_format": "csv",
"output_path": "/path/to/output"
}
wf = netflow_workflow.NetflowWorkflow(source=source, destination=dest, name="my-netflow-workflow")
wf.run_workflow()
```

For additional examples, browse our complete [API documentation](https://rapidsai.github.io/clx/), or check out our more detailed [notebooks](https://github.com/rapidsai/clx/tree/master/notebooks).

### Docker Container without SIEM Integration

#### Install via CLX Docker Container

## Getting CLX
### Intro
There are 3 ways to get CLX :
1. [Quick Start with CLX Docker Container](#quick)
1. [Conda Installation](#conda)
1. [Build from Source](#source)

<a name="quick"></a>

## Quick Start Docker Container

Prerequisites

* NVIDIA Pascal™ GPU architecture or better
* CUDA 9.2 or 10.0 compatible NVIDIA driver
* CUDA 10.0+ compatible NVIDIA driver
* Ubuntu 16.04/18.04 or CentOS 7
* Docker CE v18+
* nvidia-docker v2+

Pull the RAPIDS image suitable to your environment and build CLX image.

```aidl
docker pull rapidsai/rapidsai-dev-nightly:0.12-cuda9.2-devel-ubuntu18.04-py3.7
docker build --build-arg image=rapidsai/rapidsai-dev-nightly:0.12-cuda9.2-devel-ubuntu18.04-py3.7 -t clx:latest .
docker pull rapidsai/rapidsai-dev-nightly:0.14-cuda10.1-devel-ubuntu18.04-py3.7
docker build --build-arg image=rapidsai/rapidsai-dev-nightly:0.14-cuda10.1-devel-ubuntu18.04-py3.7 -t clx:latest .
```

Now start the container and the notebook server. There are multiple ways to do this, depending on what version of Docker you have.
### Docker Container without SIEM Integration

##### Preferred - Docker CE v19+ and nvidia-container-toolkit
Start the container and the notebook server. There are multiple ways to do this, depending on what version of Docker you have.

#### Preferred - Docker CE v19+ and nvidia-container-toolkit
```aidl
docker run --gpus '"device=0"' \
--rm -d \
Expand All @@ -105,7 +143,7 @@ docker run --gpus '"device=0"' \
clx:latest
```

##### Legacy - Docker CE v18 and nvidia-docker2
#### Legacy - Docker CE v18 and nvidia-docker2
```aidl
docker run --runtime=nvidia \
--rm -d \
Expand All @@ -117,61 +155,29 @@ docker run --runtime=nvidia \

### Docker Container with SIEM Integration

If you want a CLX container with SIEM integration (including data ingest), follow the steps above to pull and build the CLX container. Then use `docker-compose` to start multiple containers running CLX, Kafka, and Zookeeper.
If you want a CLX container with SIEM integration (including data ingest), follow the steps above to build the CLX image. Then use `docker-compose` to start multiple containers running CLX, Kafka, and Zookeeper.

```aidl
docker-compose up
```

### Install from Source
You can install CLX from source on an existing RAPIDS container. A RAPIDS image suitable for your environment can be pulled from [https://hub.docker.com/r/rapidsai/rapidsai/](https://hub.docker.com/r/rapidsai/rapidsai/).

```aidl
# Run tests
pip install pytest
pytest

# Build and install
python setup.py install
```
### Conda Install
You can conda install CLX on an existing RAPIDS container. A RAPIDS image suitable for your environment can be pulled from [https://hub.docker.com/r/rapidsai/rapidsai/](https://hub.docker.com/r/rapidsai/rapidsai/).
<a name="conda"></a>

```
conda install -c rapidsai-nightly -c rapidsai -c nvidia -c pytorch -c conda-forge -c defaults clx
```
## Conda Install
It is easy to install CLX using conda. You can get a minimal conda installation with Miniconda or get the full installation with Anaconda.

## Getting Started With Workflows

In addition to traditional Python files and Jupyter notebooks, CLX also includes structure in the form of a workflow. A workflow is a series of data transformations performed on a [GPU dataframe](https://github.com/rapidsai/cudf) that contains raw cyber data, with the goal of surfacing meaningful cyber analytical output. Multiple I/O methods are available, including Kafka and on-disk file stores.
Install and update CLX using the conda command:

Example flow workflow reading and writing to file:

```python
from clx.workflow import netflow_workflow

source = {
"type": "fs",
"input_format": "csv",
"input_path": "/path/to/input",
"schema": ["firstname","lastname","gender"],
"delimiter": ",",
"required_cols": ["firstname","lastname","gender"],
"dtype": ["str","str","str"],
"header": "0"
}
dest = {
"type": "fs",
"output_format": "csv",
"output_path": "/path/to/output"
}
wf = netflow_workflow.NetflowWorkflow(source=source, destination=dest, name="my-netflow-workflow")
wf.run_workflow()
```
conda install -c rapidsai-nightly -c nvidia -c pytorch -c conda-forge -c defaults clx
```

For additional examples, browse our complete [API documentation](https://rapidsai.github.io/clx/), or check out our more detailed [notebooks](https://github.com/rapidsai/clx/tree/master/notebooks).


## Contributing
<a name="source"></a>

## Building from Source and Contributing

For contributing guildelines please reference our [guide for contributing](https://github.com/rapidsai/clx/blob/master/CONTRIBUTING.md).
For contributing guildelines please reference our [guide for contributing](CONTRIBUTING.md).
2 changes: 1 addition & 1 deletion ci/cpu/clx/upload-anaconda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ fi

echo "Upload"
echo ${UPLOADFILE}
anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --force ${UPLOADFILE}
anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --skip-existing ${UPLOADFILE}
2 changes: 1 addition & 1 deletion ci/cpu/libclx/upload-anaconda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ if [ "$UPLOAD_LIBCLX" == "1" ]; then

echo "Upload"
echo ${UPLOADFILE}
anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --force ${UPLOADFILE}
anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --skip-existing ${UPLOADFILE}
else
echo "Skipping libclx upload"
fi
1 change: 1 addition & 0 deletions ci/gpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ $WORKSPACE/build.sh clean libclx clx
if hasArg --skip-tests; then
logger "Skipping Tests..."
else
cd ${WORKSPACE}/python
py.test --ignore=ci --cache-clear --junitxml=${WORKSPACE}/junit-clx.xml -v
${WORKSPACE}/ci/gpu/test-notebooks.sh 2>&1 | tee nbtest.log
python ${WORKSPACE}/ci/utils/nbtestlog2junitxml.py nbtest.log
Expand Down
2 changes: 1 addition & 1 deletion ci/gpu/test-notebooks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ TOPLEVEL_NB_FOLDERS=$(find . -name *.ipynb |cut -d'/' -f2|sort -u)

# Add notebooks that should be skipped here
# (space-separated list of filenames without paths)
SKIPNBS="DGA_Detection.ipynb FLAIR_DNS_Log_Parsing.ipynb Alert_Analysis_with_CLX.ipynb cybert_example_training.ipynb CLX_Workflow_Notebook1.ipynb CLX_Workflow_Notebook2.ipynb CLX_Workflow_Notebook3.ipynb Network_Mapping_With_RAPIDS_And_CLX.ipynb"
SKIPNBS="DGA_Detection.ipynb FLAIR_DNS_Log_Parsing.ipynb Alert_Analysis_with_CLX.ipynb cybert_example_training.ipynb CLX_Workflow_Notebook1.ipynb CLX_Workflow_Notebook2.ipynb CLX_Workflow_Notebook3.ipynb Network_Mapping_With_RAPIDS_And_CLX.ipynb Phishing_Detection_using_Bert_CLX.ipynb"


## Check env
Expand Down
57 changes: 57 additions & 0 deletions ci/local/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
## Purpose

This script is designed for developer and contributor use. This tool mimics the actions of gpuCI on your local machine. This allows you to test and even debug your code inside a gpuCI base container before pushing your code as a GitHub commit.
The script can be helpful in locally triaging and debugging RAPIDS continuous integration failures.

## Requirements

```
nvidia-docker
```

## Usage

```
bash build.sh [-h] [-H] [-s] [-r <repo_dir>] [-i <image_name>]
Build and test your local repository using a base gpuCI Docker image
where:
-H Show this help text
-r Path to repository (defaults to working directory)
-i Use Docker image (default is gpuci/rapidsai-base:cuda10.0-ubuntu16.04-gcc5-py3.6)
-s Skip building and testing and start an interactive shell in a container of the Docker image
```

Example Usage:
`bash build.sh -r ~/rapids/clx -i gpuci/rapidsai-base:cuda10.1-ubuntu16.04-gcc5-py3.6`

For a full list of available gpuCI docker images, visit our [DockerHub](https://hub.docker.com/r/gpuci/rapidsai-base/tags) page.

Style Check:
```bash
$ bash ci/local/build.sh -r ~/rapids/clx -s
$ source activate gdf #Activate gpuCI conda environment
$ cd rapids
$ flake8 python
```

## Information

There are some caveats to be aware of when using this script, especially if you plan on developing from within the container itself.


### Docker Image Build Repository

The docker image will generate build artifacts in a folder on your machine located in the `root` directory of the repository you passed to the script. For the above example, the directory is named `~/rapids/clx/build_rapidsai-base_cuda10.1-ubuntu16.04-gcc5-py3.6/`. Feel free to remove this directory after the script is finished.

*Note*: The script *will not* override your local build repository. Your local environment stays in tact.


### Where The User is Dumped

The script will build your repository and run all tests. If any tests fail, it dumps the user into the docker container itself to allow you to debug from within the container. If all the tests pass as expected the container exits and is automatically removed. Remember to exit the container if tests fail and you do not wish to debug within the container itself.


### Container File Structure

Your repository will be located in the `/rapids/` folder of the container. This folder is volume mounted from the local machine. Any changes to the code in this repository are replicated onto the local machine. The `cpp/build` and `python/build` directories within your repository is on a separate mount to avoid conflicting with your local build artifacts.
Loading

0 comments on commit 8258e86

Please sign in to comment.