readme.md

Codes and Data for Defects4J Dataset

First, you should preprecess the metadata to get the full dataset. To do this, please run:

cd dataset
python preprocess.py
cd ..

After this, you are supposed to see train.json, valid.json and test.json in folder ./dataset.

Then you should build the tree-sitter parser by running:

cd parser
bash build.sh
cd ..

You are supposed to see file my-languages.so in folder ./parser after this command.

We provide shell script to train our models. You can run to train SPACE:

bash run_adv.sh

in run_adv.sh, you can set codebert="1" to train on CodeBERT and codebert="0" to train on GraphCodeBERT.

To train the baselines, run:

bash run.sh

in run.sh, you can set codebert="1" to train on CodeBERT and codebert="0" to train on GraphCodeBERT.