Gin LLM User Guide

This guide provides instructions on how to recreate the experiments detailed in the paper "Large Language Model-based Code Completion is an Effective Genetic Improvement Mutation Operator." accepted to the GI@ICSE2025 workshop. This guiude focuses solely on the experiments; for more information about GIN, please refer to the README.

Code Overview

Below is an explanation of the functionality related to the LLM-enhanced genetic improvement (GI) mutation operations.

Algorithms

src/main/java/gin/util/RandomSampler.java: Implements the random sampling algorithms.
src/main/java/gin/util/LocalSearchRuntime.java: Implements the local search algorithms (hill climber used in this paper).

LLM Utilities

src/main/java/gin/edit/llm/LLMConfig.java: Contains all LLM prompts and environment configurations for interacting with large language models.
src/main/java/gin/edit/llm/Ollama4jLLMQuery.java: Implements the interface for querying a locally hosted Ollama server.
src/main/java/gin/edit/llm/OpenAILLMQuery.java: Implements the interface for querying a remote OpenAI server.

LLM-Enhanced Mutation Operators

src/main/java/gin/edit/llm/LLMReplaceStatement.java: Implements the LLM-based replacement mutation operator, as proposed by Brownlee et al.
src/main/java/gin/edit/llm/LLMMaskedStatement.java: Implements the LLM-based masking mutation operator, proposed in this paper.

Getting Started

Prerequisites

To set up the environment, ensure the following software is installed:

JDK 17: Download from Oracle Java Downloads.
Gradle: Version 8.0.2 or higher. Install from Gradle.
Ollama: Version 0.1.48 or higher. Download from Ollama.

Download and Install GIN

Clone the GIN repository:

git clone https://anonymous.4open.science/r/gin-Masking-llm.git

Check out the LLM branch:
```
git checkout llm_combined_exp
```
Build the project and install dependencies by running:
```
gradle build
```

Setting Up LLM

The paper experiments with five LLMs, detailed in the table below:

Name	ID	Quantisation	Parameters	Size
Gemma2 2B	430ed3535049	Q4_0	2.61B	1.7 GB
Gemma2 9B	ff02c3702f32	Q4_0	9.24B	5.4 GB
Llama3.1 8B	91ab477bec9d	Q4_0	8.03B	4.7 GB
Mistral 7B	61e88e884507	Q4_0	7.24B	4.1 GB
Phi3 14B	cf611a26b048	Q4_0	14.0B	7.9 GB

To experiment with the Gemma2:2b model:

Download the model with the following command:
```
ollama run gemma2:2b
```

Setting Up the Benchmark

The experiments use five benchmark projects, detailed below:

Project	URL	Branch
JCodec	github.com/jcodec/jcodec	master (7e52834)
JUnit4	github.com/junit-team/junit4	r4.13.2
Gson	github.com/google/gson	gson-parent-2.10.1
Commons-Net	github.com/apache/commons-net	commons-net-3.10.0
Karate	github.com/karatelabs/karate	v1.4.1

To run an experiment using JCodec and Gemma2:2b:

Navigate to the examples folder in the GIN project:
```
cd examples
```

Clone the JCodec project:

git clone https://github.com/jcodec/jcodec.git

Build the JCodec project:
```
mvn build
```

Profile the project:

projectnameforgin='jcodec'; java -Dtinylog.level=trace -cp ../../build/gin.jar gin.util.Profiler -r 20 -mavenHome /PATH/TO/MAVEN -p $projectnameforgin -d . -o $projectnameforgin.Profiler_output.csv

Replace /PATH/TO/MAVEN with your Maven home directory.

Running Experiments

To run the experiments described in the paper, use the following commands:

Masking Random Search

projectnameforgin='jcodec'; LLM = 'gemma2:2b'; model=$LLM; java -Dtinylog.level=trace -cp ../../build/gin.jar gin.util.RandomSampler -j -p $projectnameforgin -d . -m $projectnameforgin.Profiler_output.csv -o results/$projectnameforgin.RandomSampler_1000_output.$model.csv -mavenHome /PATH/TO/MAVEN -timeoutMS 10000 -et gin.edit.llm.LLMMaskedStatement -mt $model -pt MASKED -pn 1000 &> results/$projectnameforgin.RandomSampler_COMBINED_1000_stderrstdout_.$model.txt`

Masking Local Search

projectnameforgin='jcodec'; LLM = 'gemma2:2b'; model=$LLM; java -Dtinylog.level=trace -cp ../../build/gin.jar gin.util.LocalSearchRuntime -j -p $projectnameforgin -d . -m $projectnameforgin.Profiler_output.csv -o results/$projectnameforgin.LocalSearchRuntime_COMBINED_50_output.$model.csv -mavenHome /PATH/TO/MAVEN -timeoutMS 10000 -et gin.edit.llm.LLMMaskedStatement -mt $LLM -pt MASKED -in 100 &> results/$projectnameforgin.LocalSearchRuntime_LLM_MASKED_50_stderrstdout.$model.txt`

Combined Random Search with 70% probability of choosing masking mutation

projectnameforgin='jcodec'; LLM = 'gemma2:2b'; model=$LLM; java -Dtinylog.level=trace -cp ../../build/gin.jar gin.util.RandomSampler -j -pb 0.7 -p $projectnameforgin -d . -m $projectnameforgin.Profiler_output.csv -o results/$projectnameforgin.RandomSampler_COMBINED_1000_output.$model.csv -mavenHome /PATH/TO/MAVEN -timeoutMS 10000 -et gin.edit.llm.LLMMaskedStatement,STATEMENT -mt $model -pt MASKED -pn 100 &> results/$projectnameforgin.RandomSampler_COMBINED_1000_stderrstdout_.$model.txt`

Combined Local Search with 70% probability of choosing masking mutation

projectnameforgin='jcodec';  LLM = 'gemma2:2b'; model=$LLM; java -Dtinylog.level=trace -cp ../../build/gin.jar gin.util.LocalSearchRuntime -j -pb 0.7 -p $projectnameforgin -d . -m $projectnameforgin.Profiler_output.csv -o results/$projectnameforgin.LocalSearchRuntime_COMBINED_50_output.$model.csv -mavenHome /PATH/TO/MAVEN -timeoutMS 10000 -et gin.edit.llm.LLMMaskedStatement,STATEMENT -mt $LLM -pt MASKED -in 100 &> results/$projectnameforgin.LocalSearchRuntime_LLM_MASKED_50_stderrstdout.$model.txt`

(NOTE: Replace the /PATH/TO/MAVEN with your maven home directory).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Profilers		Profilers
doc		doc
examples		examples
githooks		githooks
gradle/wrapper		gradle/wrapper
img		img
libs		libs
src		src
testgeneration		testgeneration
LICENSE.md		LICENSE.md
README.md		README.md
README_GIN.md		README_GIN.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gin LLM User Guide

Code Overview

Algorithms

LLM Utilities

LLM-Enhanced Mutation Operators

Getting Started

Prerequisites

Download and Install GIN

Setting Up LLM

Setting Up the Benchmark

Running Experiments

Masking Random Search

Masking Local Search

Combined Random Search with 70% probability of choosing masking mutation

Combined Local Search with 70% probability of choosing masking mutation

About

Releases

Packages

Languages

License

SOLAR-group/gin-Masking-llm

Folders and files

Latest commit

History

Repository files navigation

Gin LLM User Guide

Code Overview

Algorithms

LLM Utilities

LLM-Enhanced Mutation Operators

Getting Started

Prerequisites

Download and Install GIN

Setting Up LLM

Setting Up the Benchmark

Running Experiments

Masking Random Search

Masking Local Search

Combined Random Search with 70% probability of choosing masking mutation

Combined Local Search with 70% probability of choosing masking mutation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages