-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Critic Training pre-processing steps #47
Comments
Hello, I also have a question in this Section. I noticed that the generated programs and their evaluations are stored in the folder 'outputs/codes/' and 'outputs/test_results'. And they also said:"For each training sample, we can follow the prior processes (generating programs and running unit tests) to obtain synthetic samples and their annotations of unit test outcomes." But why they then use the data in 'data/APPS/train' to train the critic model? I've noticed that you were asking the question in the same Section, maybe you can answer my question, thanks a lot |
When training in critic mode, the dataset will load the generated solutions as well: see here in APPSBaseDataset. Hope this helps. |
Thanks for helping! But after I've read the code in 'generating programs' and 'running unit tests', I noticed that they save the program generated by the actor-model in the file path 'outputs/codes/' and the evaluation results of these programs in the file path 'outputs/test_results'. So it seems that the 'gen_solutions.json' you mentioned under the path 'data/APPS/train/prob_path/' doesn't exist since they didn't save anything to this file in the 'generating programs' and 'running unit tests'. So I wonder is there any code I missed? Which is used for adding content in the file 'gen_solutions.json'? Thanks for your answering again🙏 |
Hello,
Thanks for making the code for this great project open source, this is really great!
We are using CodeRL as a really nice starting point for student projects, and there are some questions for understanding:
In the "Critic Training" section, you say the following:
gen_solutions.json
files look like "good" code, and sometimes there are less thann=20
. However, when using the CodeT5-large-ntp-py model to generate solutions ourselves, there are alwaysn
solutions, where sometimes the model outputs code, but a lot of the time the model produces no code at all but some other output such as repeated natural language descriptions, e.g:The text was updated successfully, but these errors were encountered: