Developers

Developer Setup

The following steps are only required for local development and testing. The containerized version is recommended for production use.

Install the following packages using your OS package manager (apt, yum, homebrew, etc.):
1. make
2. shellcheck
3. shfmt

Start by cloning this repository.

git clone [email protected]:center-for-threat-informed-defense/tram.git

Change to the TRAM directory.
```
cd tram/
```
Create a virtual environment and activate the new virtual environment.
1. Mac and Linux
```
python3 -m venv venv
source venv/bin/activate
```
2. Windows
```
venv\Scripts\activate.bat
```

Install Python application requirements.

pip install -r requirements/requirements.txt
pip install -r requirements/test-requirements.txt

Install pre-commit hooks
```
pre-commit install
```
Set up the application database.
```
tram makemigrations tram
tram migrate
```

Run the Machine learning training.

tram attackdata load
tram pipeline load-training-data
tram pipeline train --model nb
tram pipeline train --model logreg
tram pipeline train --model nn_cls

Download the pre-trained tokenizer and BERT models.

python3 -c "import os; import transformers; mdl = transformers.AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased'); mdl.save_pretrained('data/ml-models/priv-allenai-scibert-scivocab-uncased')"

curl -L "https://ctidtram.blob.core.windows.net/tram-models/single-label-202308303/config.json" \
    -o data/ml-models/bert_model/config.json
curl -L "https://ctidtram.blob.core.windows.net/tram-models/single-label-202308303/pytorch_model.bin" \
    -o data/ml-models/bert_model/pytorch_model.bin

Create a superuser (web login)

```sh
tram createsuperuser
```

Run the application server
```
DJANGO_DEBUG=1 tram runserver
```
Open the application in your web browser.
1. Navigate to http://localhost:8000 and use the superuser to log in

In a separate terminal window, run the ML pipeline

cd tram/
source venv/bin/activate
tram pipeline run

Makefile Targets

The repository includes a Makefile that includes handy shortcuts for common development tasks:

Run TRAM application
- make start-container
Stop TRAM application
- make stop-container
View TRAM logs
- make container-logs
Build Python virtualenv
- make venv
Install production Python dependencies
- make install
Install prod and dev Python dependencies
- make install-dev
Manually run pre-commit hooks without performing a commit
- make pre-commit-run
Build container image (By default, container is tagged with timestamp and git hash of codebase) See note below about custom CA certificates in the Docker build.)
- make build-container
Run linting locally
- make lint
Run unit tests, safety, and bandit locally
- make test

Testing

The automated test suite runs inside tox, which guarantees a consistent testing environment, but also has considerable overhead. When writing code, it may be useful to run pytest directly, which is considerably faster and can also be used to run a specific test. Here are some useful pytest commands:

# Run the entire test suite:
$ pytest tests/

# Run tests in a specific file:
$ pytest tests/tram/test_models.py

# Run a test by name:
$ pytest tests/ -k test_mapping_repr_is_correct

# Run tests with code coverage tracking, and show which lines are missing coverage:
$ pytest --cov=tram --cov-report=term-missing tests/

Custom CA Certificate

If you are building the container in an environment that intercepts SSL connections, you can specify a root CA certificate to inject into the container at build time. (This is only necessary for the TRAM application container. The TRAM Nginx container does not make outbound connections.)

Export the following two variables in your environment.

$ export TRAM_CA_URL="http://your.domain.com/root.crt"
$ export TRAM_CA_THUMBPRINT="C7:E0:F9:69:09:A4:A3:E7:A9:76:32:5F:68:79:9A:85:FD:F9:B3:BD"

The first variable is a URL to a PEM certificate containing a root certificate that you want to inject into the container. (If you use an https URL, then certificate checking is disabled.) The second variable is a SHA-1 certificate thumbprint that is used to verify that the correct certificate was downloaded. You can obtain the thumbprint with the following OpenSSL command:

$ openssl x509 -in <your-cert.crt> -fingerprint -noout
SHA1 Fingerprint=C7:E0:F9:69:09:A4:A3:E7:A9:76:32:5F:68:79:9A:85:FD:F9:B3:BD

After exporting these two variables, you can run make build-container as usual and the TRAM container will contain your specified root certificate.

Machine Learning Development

All source code related to machine learning is located in TRAM src/tram/ml.

Existing ML Models

TRAM has five machine learning models that can be used out-of-the-box:

SKLearn models
1. LogisticRegressionModel - Uses SKLearn's Logistic Regression.
2. NaiveBayesModel - Uses SKLearn's Multinomial NB.
3. Multilayer Perception - Uses SKLearn's MLPClassifier.
4. DummyModel - Uses SKLearn's Dummy Classifier for testing purposes.
Large Language Models (PyTorch)
1. BERT Classifier - Uses Huggingface's transformers library with a fine-tuned BERT model.

The SKLearn models are each implemented as an SKLearn Pipeline. Machine learning engineers will find that it's pretty easy to plug in a new SKLearn model (see Creating Your Own SKLearn Model).

Creating Your Own SKLearn Model

In order to write your own model, take the following steps:

Create a subclass of tram.ml.base.SKLearnModel that implements the get_model function. See existing ML Models for examples that can be copied.

class DummyModel(SKLearnModel):
    def get_model(self):
        # Your model goes here
        return Pipeline([
            ("features", CountVectorizer(lowercase=True, stop_words='english', min_df=3)),
            ("clf", DummyClassifier(strategy='uniform'))
        ])

Add your model to the ModelManager registry

Note: This method can be improved. Pull requests welcome!

class ModelManager(object):
    model_registry = {
        'dummy': DummyModel,
        'nb': NaiveBayesModel,
        'logreg': LogisticRegressionModel,
        # Your model on the line below
        'your-model': python.path.to.your.model
    }

You can now train your model, and the model will appear in the application interface.
```
tram pipeline train --model your-model
```
If you are interested in sharing your model with the community, thank you! Please open a Pull Request with your model, and please include performance statistics in your Pull Request description.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly