Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Title: Inconsistent Usage of "is_bug" Field in Dataset Files in autoFL Repository #7

Open
lyriccoder opened this issue Sep 19, 2024 · 5 comments

Comments

@lyriccoder
Copy link

Hi guys, we're conducting a review of your dataset in the autoFL repository as part of our bug inspection process, and something looks suspiciously too good to be true.

In the dataset of the autoFL GitHub repository, there are 4 json files. Additionally, there is a similar field in test_snippet.json, but this field is never used anywhere in the code.

Issues that need clarification:

  1. What criteria define whether "is_bug" should be true or false?
  2. Why is the "is_bug" field present if it's not actively used in any code logic?
  3. Is there an intended use for this field, or is it just hanging out in the data without a job?

Thanks in advance!

@smkang96
Copy link
Contributor

Hello @lyriccoder,

  1. is_bug is determined relative to the Defects4J dataset - if a method is patched in the particular bug, it is set to True, and vice versa.
  2. The is_bug field is in fact used in lib/d4j_interface.py; I am not sure if you are referring to something else.
  3. As it is used in lib/d4j_interface.py, it is used to identify the buggy methods, which are in turn used to grade the AutoFL results.

If you have any other questions, feel free to let us know :)

@lyriccoder
Copy link
Author

lyriccoder commented Sep 20, 2024

Thank you for your answer.
But I have more questions, could you please answer?

  1. Empty field_snippet.json: The field_snippet.json file is always empty within the BugsInPy dataset. Is this intentional, or is there a bug causing it to remain empty?
  2. Missing is_bug = True in snippet.json: There are examples in snippet.json (e.g., Py_Snooper_1) where none of the functions have is_bug = True. This seems suspicious, as we would expect at least one function to be flagged as a bug. Is this an oversight or an issue within the dataset?
  3. Tests Without is_bug = True: There are samples in the BugsInPy dataset where all the test cases lack is_bug = True. This raises a red flag because logically, at least one test should indicate a bug. Is this a mistake in the dataset?
  4. Usage of is_bug Field: When referring to is_bug not being used, I meant that is_bug is not utilized for test methods. From our understanding, the is_bug field is mainly employed in lib/d4j_interface.py across ALL JSON files, including snippet.json and test_snippet.json. Could you confirm if this is correct?

@smkang96
Copy link
Contributor

Hello @lyriccoder,

  1. field_snippet.json is empty for BugsInPy because unlike Java, Python does not natively have field declarations. This does not mean Python does not have fields - just that it is much more difficult to automatically detect them. As a result, after consideration we left it empty, instead of using what would be a complex set of heuristics to try and (incompletely) extract fields.
  2. For some bugs, the bug resides outside of any methods. For a good visualization, see Defects4J Math-104, where only a field is changed. In such cases, is_bug would not be True for any method.
  3. For test_snippet.json, the is_bug field has no effect. See 4.
  4. It is intentional that is_bug is not used for test methods; due to the construction process of the datasets that we use, the tests themselves are not assumed to have bugs. For both Defects4J and BugsInPy, only the methods with is_bug=True from snippet.json are selected as _buggy_methods. As a result, the label should have no effect on the behavior of AutoFL. The bug-reproducing tests are indicated by the failing_tests file instead.

For the record, I believe it would be best if you refrain from using accusatory language ("too good to be true", "raises a red flag", etc.) which is out of proportion for the situation you are describing (some fields may not influence the behavior of the script). While we welcome constructive discussion and honest impressions of our work, we may not engage further if we feel your comments are unnecessarily inflammatory.

@lyriccoder
Copy link
Author

For the record, I believe it would be best if you refrain from using accusatory language ("too good to be true", "raises a red flag", etc.) which is out of proportion for the situation you are describing (some fields may not influence the behavior of the script). While we welcome constructive discussion and honest impressions of our work, we may not engage further if we feel your comments are unnecessarily inflammatory.

Thank you for your feedback, and I apologize if my previous comments came across as accusatory—that was not my intention. My goal is not to criticize but to understand the situation better. I genuinely appreciate your work, and I’m just trying to grasp how fault localization is managed, especially when there is no information about the bug’s location in the dataset (e.g., when field_snippet.json is empty and snippet.json contains only functions). I’m seeking clarification because I want to ensure that I accurately interpret your approach.

So, how do you perform fault localization when field_snippet.json is empty, and snippet.json contains only functions? I mean how do you define a sample with a bug when there is no information about where the bug is located? This seems like contradictory information.

Thank you for your patience and assistance!

@smkang96
Copy link
Contributor

Hm, I'll do my best to answer your question, but to be honest I'm not sure I fully grasp it 😅 Feel free to ask further questions for clarification.

  1. Perhaps your question is, if there exist bugs with no buggy methods, what happens? In our grading scheme, AutoFL would never get the answer right, no matter what it gives as an answer, as it would not be able to suggest the "correct" bug location.
  2. Or perhaps your question is, how can AutoFL perform localization with just the provided information? I would say the key is that we show the LLM the error message and the tests - refer to Listing 2 in our paper for an example of what information is given to the LLM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants