Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: presence of trailing comma (#416) #418

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion crates/core/src/inline_snippets.rs
Original file line number Diff line number Diff line change
Expand Up @@ -234,12 +234,28 @@ pub(crate) fn inline_sorted_snippets_with_offset(
}
}

let mut trailing_comma_lines = vec![];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a delete_hanging_comma function that should be used - we don't want two different pieces of code dealing with orphaned commas.

Any solution should either be there or in a replacement.

Copy link
Contributor Author

@abhishek818 abhishek818 Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morgante
Had tried that earlier. I mean, placing the code fix in delete_hanging_comma function or in sub functions called inside it. That resulted in failure of an existing edge test case. This happens since offsets are again calculated after calling delete_hanging_comma and then actual replacements happens.

So, I feel doing this fix during actual replacements is the best approach.

Pasting here the changes (or little variations of it, you will get the idea) I was trying with :
just manipulated the range indexes here.

fn delete_hanging_comma(
    code: &str,
    replacements: &mut [(EffectRange, String)],
    offset: usize,
) -> Result<(String, Vec<ReplacementInfo>)> {
    ....
    ....
    // Handle the case when after applying replacements, a single comma
    // gets left behind on a single line
    let mut temp_code = code.to_string();
    for (range, snippet) in replacements.iter_mut() {
        let mut adjusted_range = adjust_range(&range.effective_range(), offset, &temp_code)?;
        if adjusted_range.start > temp_code.len() || adjusted_range.end > temp_code.len() {
            bail!("Range {:?} is out of bounds for code:\n{}\n", adjusted_range, temp_code);
        }

        temp_code.replace_range(adjusted_range.clone(), snippet);

        let line_start = temp_code[..adjusted_range.start].rfind('\n').map_or(0, |pos| pos + 1);
        let line_end = temp_code[adjusted_range.start..].find('\n').map_or(temp_code.len(), |pos| adjusted_range.start + pos);
        let line_content = temp_code[line_start..line_end].trim();
        
        // Check for lines with a single comma after replacements
        if line_content == "," {
            // Ensure the next character is a newline
            if line_end < temp_code.len() && temp_code.chars().nth(line_end) == Some('\n') {
                adjusted_range.end = line_end + 1;
            }

            // Update the original range to replace the range with an empty string later.
            range.range.end += adjusted_range.end - line_start;     <--------------------------------------------
        }
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to do it in one of those two locations. Please don't make our handling even more spaghetti.

Copy link
Contributor Author

@abhishek818 abhishek818 Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morgante hey one doubt I have :

In below test case (existing), shouldnt we remove the empty newline?

#[test]
fn only_deduplicate_rewrites6() {
    run_test_expected({
        TestArgExpected {
            pattern: r#"
                engine marzano(0.1)
                language json
                pair($key, $value) => . where {
                    $key <: `"extends"`,
                }"#
            .to_owned(),
            source: r#"
            {
                "env": "bar",
                "extends": "foo",
                "baz": 1
            }"#
            .to_owned(),
            expected: r#"
            {
                "env": "bar",
                                            <----------------------------------
                "baz": 1
            }"#
            .to_owned(),
        }
    })
    .unwrap();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not necessary for Grit to handle removing newlines. We rely on formatters to handle that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morgante got it. This fix should be satisfactory, have a look..


for (range, snippet) in replacements {
let range = adjust_range(&range.effective_range(), offset, &code)?;
if range.start > code.len() || range.end > code.len() {
bail!("Range {:?} is out of bounds for code:\n{}\n", range, code);
}
code.replace_range(range, snippet);
code.replace_range(range.clone(), snippet);

// Check if the replacement results in any lines containing just a single comma
let line_start = code[..range.start].rfind('\n').map_or(0, |pos| pos + 1);
let line_end = code[range.start..].find('\n').map_or(code.len(), |pos| range.start + pos);

let line_content = code[line_start..line_end].trim();
if line_content == "," {
trailing_comma_lines.push(line_start..line_end+1);
}
}

// Remove lines containing a trailing comma
for range in trailing_comma_lines {
code.replace_range(range, "");
}

Ok((code, output_ranges, original_ranges))
Expand Down
98 changes: 98 additions & 0 deletions crates/core/src/test.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12014,6 +12014,104 @@ fn trailing_comma_import_from_python_with_alias() {
.unwrap();
}

// refer https://github.com/getgrit/gritql/issues/416
#[test]
fn trailing_comma_after_argument_removal() {
run_test_expected({
TestArgExpected {
pattern: r#"
language python
`TaskMetadata($args)` where {
$args <: any {
contains `n_samples=$_` as $ns_kwarg where {
$ns_kwarg <: `n_samples = $ns_val` => .
},
contains `avg_character_length=$_` as $avg_kwarg where {
$avg_kwarg <: `avg_character_length = $avg_val` => `stats=GeneralDescriptiveStats(n_samples=$ns_val, avg_character_length=$avg_val)`
},
},
}
"#.to_owned(),
source: r#"
from pydantic import BaseModel


class TaskMetadata(BaseModel):
n_samples: dict[str, int]
avg_character_length: dict[str, float]


if __name__ == "__main__":
TaskMetadata(
name="TbilisiCityHallBitextMining",
dataset={
"path": "jupyterjazz/tbilisi-city-hall-titles",
"revision": "798bb599140565cca2dab8473035fa167e5ee602",
},
description="Parallel news titles from the Tbilisi City Hall website (https://tbilisi.gov.ge/).",
type="BitextMining",
category="s2s",
eval_splits=[_EVAL_SPLIT],
eval_langs=_EVAL_LANGS,
main_score="f1",
domains=["News"],
text_creation="created",
n_samples={_EVAL_SPLIT: 1820},
reference="https://huggingface.co/datasets/jupyterjazz/tbilisi-city-hall-titles",
date=("2024-05-02", "2024-05-03"),
form=["written"],
task_subtypes=[],
license="Not specified",
socioeconomic_status="mixed",
annotations_creators="derived",
dialect=[],
bibtex_citation="",
avg_character_length={_EVAL_SPLIT: 78},
)
"#
.to_owned(),
expected: r#"
from pydantic import BaseModel


class TaskMetadata(BaseModel):
n_samples: dict[str, int]
avg_character_length: dict[str, float]


if __name__ == "__main__":
TaskMetadata(
name="TbilisiCityHallBitextMining",
dataset={
"path": "jupyterjazz/tbilisi-city-hall-titles",
"revision": "798bb599140565cca2dab8473035fa167e5ee602",
},
description="Parallel news titles from the Tbilisi City Hall website (https://tbilisi.gov.ge/).",
type="BitextMining",
category="s2s",
eval_splits=[_EVAL_SPLIT],
eval_langs=_EVAL_LANGS,
main_score="f1",
domains=["News"],
text_creation="created",
reference="https://huggingface.co/datasets/jupyterjazz/tbilisi-city-hall-titles",
date=("2024-05-02", "2024-05-03"),
form=["written"],
task_subtypes=[],
license="Not specified",
socioeconomic_status="mixed",
annotations_creators="derived",
dialect=[],
bibtex_citation="",
stats=GeneralDescriptiveStats(n_samples={_EVAL_SPLIT: 1820}, avg_character_length={_EVAL_SPLIT: 78}),
)
"#
.to_owned(),
}
})
.unwrap();
}

#[test]
fn python_orphaned_from_imports() {
run_test_expected({
Expand Down
Loading