Skip to content

Latest commit

 

History

History
76 lines (62 loc) · 3.49 KB

README.md

File metadata and controls

76 lines (62 loc) · 3.49 KB

PYSCRIBE (supplementary)

This is a supplementary experiment for PYSCRIBE model.

Runtime Environment

Runtime Environment in our experiments:

  • 4 NVIDIA 2080 Ti GPUs
  • Ubuntu >=16.04
  • CUDA >=10.0 (with CuDNN of the corresponding version)
  • Anaconda
    • Python >=3.7 (base environment)
    • Python 2.7 (virtual environment named as python27)
  • PyTorch >=1.2.0 for Python 3.x
  • Specifically, install our package with pip install my-lib-0.0.8.tar.gz for both Python 3.x and Python 2.7. The package can be downloaded from Google Drive

Dataset

we provide the supplementary experiment on two datasets including a Python dataset[1] and a Java dataset[2]. The whole datasets of Python and Java can be downloaded from Google Drive.

Experiment on the Python Dataset

  1. Step into the directory src_code/python/:
    cd src_code/python
    
  2. Proprecess the train/valid/test data:
    python s1_preprocessor.py
    conda activate python27
    python s2_preprocessor_py27.py
    conda deactivate
    python s3_preprocessor.py
    
  3. Run the model for training and testing: python s4_model.py

After running, the performance will be printed in the console, and the predicted results of test data and will be saved in the path data/python/result/code2text_4_4_4_512.json, with ground truth and code involved for comparison.

We have provided the results of test dataset, you can get the evaluation results directly by running

python s5_eval_res.py"

Note that:

  • all the parameters are set in src_code/python/config.py and src_code/python/config_py27.py.
  • If the model has been trained, you can set the parameter "train_mode" in line 83 in config.py to "False". Then you can predict the test data directly by using the model that has been saved in data/python/model/.

Experiment on the Java Dataset

  1. Step into the directory src_code/java/:
    cd src_code/java
    
  2. Proprecess the train/valid/test data:
    python s1_preprocessor.py
    
  3. Run the model for training and testing:
    python s2_model.py
    

After running, the performance will be printed in the console, and the predicted results of test data and will be saved in the path data/java/result/code2text_4_4_4_512.json, with ground truth and code involved for comparison.

We have provided the results of test dataset, you can get the evaluation results directly by running

python s3_eval_res.py"

Note that:

  • all the parameters are set in src_code/java/config.py.
  • If the model has been trained, you can set the parameter "train_mode" in line 117 in config.py to "False". Then you can predict the test data directly by using the model that has been saved in data/java/model/.

[1] Wan, Y., Zhao, Z., Yang, M., Xu, G., Ying, H., Wu, J., Yu, P.S.: Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 397–407 (2018).

[2] Hu, X., Li, G., Xia, X., Lo, D., Lu, S., Jin, Z.: Summarizing source code with transferred api knowledge. IJCAI’18, pp. 2269–2275 (2018).

This work is still under review, please do not distribute it.