Fix Agent prompt and infra (#804)

Fixing some issues revealed by full agent experiments earlier: 1. [x] LLM-generated build scripts do not save fuzz target binary into the correct path. 2. [x] Use default build script in the code fixing prompt in this scenario: 1. The default build scripts builds successfully but failed other checks (i.e., reference), and 2. The LLM-generated build script does not work. 3. [x] Selectively use the default built script and the LLM-generated built script, depending which is better. 4. [x] Use different code-fixing prompts based on which built script and which result it is: * default or LLM built script * No reference, no binary, or compilation failure 5. [x] Backup human-writtent `/src/build.sh` to `/src/build.bk.sh` in agent's containers in case LLM wants to reuse it in the new build script. * Create the same copy for fuzzing execution. 6. [x] Hide the compile command to prevent LLM from reusing it in the inspection tool and be distracted by irrelevant errors. E.g.: * The inspection container always runs compile before LLM analysis. Rerunning it may fail in some projects due to an existing /src/<project>/build directory. 7. [x] Prompt use example fuzz target in the language the same as the generated fuzz target, (not the project). * Also dynamically adjust instructions in priming. Do not leave LLM to judge which language the fuzz target is. 8. [x] Remove the agent log when receiving fuzz targets. 9. [x] Do not restrict LLM to send one bash command per query. Also need to: 1. [ ] Use SemanticAnalyzer in agent workflow, at least to ensure the last Result is Analysis Result. 2. [ ] Add an Enhancer in agent workflow. 3. [ ] Use service account in GKE, hopefully this will solve the [`Service Unavailable` problem](google/oss-fuzz#13042).
google · Feb 23, 2025 · 262dff0 · 262dff0
1 parent 16bed89
commit 262dff0
Show file tree

Hide file tree

Showing 13 changed files with 652 additions and 135 deletions.
diff --git a/agent/prototyper.py b/agent/prototyper.py
diff --git a/experiment/builder_runner.py b/experiment/builder_runner.py
@@ -922,17 +922,15 @@ def build_and_run_cloud(
         f'--real_project={project_name}',
     ]
 
-    # Temporarily comment out due to error in cached images.
-    # TODO(dongge): Add this back when the cached image works again.
-    # if oss_fuzz_checkout.ENABLE_CACHING and (
-    #     oss_fuzz_checkout.is_image_cached(project_name, 'address') and
-    #     oss_fuzz_checkout.is_image_cached(project_name, 'coverage')):
-    #   logger.info('Using cached image for %s', project_name)
-    #   command.append('--use_cached_image')
-
-    #   # Overwrite the Dockerfile to be caching friendly
-    #   oss_fuzz_checkout.rewrite_project_to_cached_project_chronos(
-    #       generated_project)
+    if oss_fuzz_checkout.ENABLE_CACHING and (
+        oss_fuzz_checkout.is_image_cached(project_name, 'address') and
+        oss_fuzz_checkout.is_image_cached(project_name, 'coverage')):
+      logger.info('Using cached image for %s', project_name)
+      command.append('--use_cached_image')
+
+      # Overwrite the Dockerfile to be caching friendly
+      oss_fuzz_checkout.rewrite_project_to_cached_project_chronos(
+          generated_project)
 
     if cloud_build_tags:
       command += ['--tags'] + cloud_build_tags

diff --git a/experiment/evaluator.py b/experiment/evaluator.py
@@ -306,6 +306,8 @@ def create_ossfuzz_project(self,
                      os.path.basename('agent-build.sh')))
 
     # Add additional statement in dockerfile to overwrite with generated fuzzer
+    with open(os.path.join(generated_project_path, 'Dockerfile'), 'a') as f:
+      f.write('\nRUN cp /src/build.sh /src/build.bk.sh\n')
     with open(os.path.join(generated_project_path, 'Dockerfile'), 'a') as f:
       f.write('\nCOPY agent-build.sh /src/build.sh\n')
 

diff --git a/experiment/oss_fuzz_checkout.py b/experiment/oss_fuzz_checkout.py
@@ -70,7 +70,7 @@ def _clone_oss_fuzz_repo():
   """Clones OSS-Fuzz to |OSS_FUZZ_DIR|."""
   clone_command = [
       'git', 'clone', 'https://github.com/google/oss-fuzz', '--depth', '1',
-      '--branch', 'target-exp-log-account', OSS_FUZZ_DIR
+      OSS_FUZZ_DIR
   ]
   proc = sp.Popen(clone_command,
                   stdout=sp.PIPE,

diff --git a/llm_toolkit/prompt_builder.py b/llm_toolkit/prompt_builder.py
@@ -28,6 +28,7 @@
 from experiment.benchmark import Benchmark, FileType
 from experiment.fuzz_target_error import SemanticCheckResult
 from llm_toolkit import models, prompts
+from results import BuildResult
 
 logger = logging.getLogger(__name__)
 
@@ -546,15 +547,22 @@ class PrototyperTemplateBuilder(DefaultTemplateBuilder):
   def __init__(self,
                model: models.LLM,
                benchmark: Benchmark,
-               template_dir: str = DEFAULT_TEMPLATE_DIR):
-    super().__init__(model)
-    self._template_dir = template_dir
+               template_dir: str = DEFAULT_TEMPLATE_DIR,
+               initial: Any = None):
+    super().__init__(model, benchmark, template_dir, initial)
     self.agent_templare_dir = AGENT_TEMPLATE_DIR
-    self.benchmark = benchmark
 
     # Load templates.
-    self.priming_template_file = self._find_template(self.agent_templare_dir,
-                                                     'prototyper-priming.txt')
+    if benchmark.is_c_target:
+      self.priming_template_file = self._find_template(
+          self.agent_templare_dir, 'prototyper-priming.c.txt')
+    elif benchmark.is_cpp_target:
+      self.priming_template_file = self._find_template(
+          self.agent_templare_dir, 'prototyper-priming.cpp.txt')
+    else:
+      self.problem_template_file = self._find_template(
+          self.agent_templare_dir, 'prototyper-priming.txt')
+
     self.cpp_priming_filler_file = self._find_template(
         template_dir, 'cpp-specific-priming-filler.txt')
     self.problem_template_file = self._find_template(template_dir,
@@ -568,11 +576,13 @@ def build(self,
             example_pair: list[list[str]],
             project_example_content: Optional[list[list[str]]] = None,
             project_context_content: Optional[dict] = None,
-            tool_guides: str = '') -> prompts.Prompt:
+            tool_guides: str = '',
+            project_dir: str = '') -> prompts.Prompt:
     """Constructs a prompt using the templates in |self| and saves it."""
     if not self.benchmark:
       return self._prompt
     priming = self._format_priming(self.benchmark)
+    priming = priming.replace('{PROJECT_DIR}', project_dir)
     final_problem = self.format_problem(self.benchmark.function_signature)
     final_problem += (f'You MUST call <code>\n'
                       f'{self.benchmark.function_signature}\n'
@@ -585,6 +595,54 @@ def build(self,
     return self._prompt
 
 
+class PrototyperFixerTemplateBuilder(PrototyperTemplateBuilder):
+  """Builder specifically targeted C (and excluding C++)."""
+
+  def __init__(self,
+               model: models.LLM,
+               benchmark: Benchmark,
+               build_result: BuildResult,
+               compile_log: str,
+               template_dir: str = DEFAULT_TEMPLATE_DIR,
+               initial: Any = None):
+    super().__init__(model, benchmark, template_dir, initial)
+    # Load templates.
+    self.priming_template_file = self._find_template(self.agent_templare_dir,
+                                                     'prototyper-fixing.txt')
+    self.build_result = build_result
+    self.compile_log = compile_log
+
+  def build(self,
+            example_pair: list[list[str]],
+            project_example_content: Optional[list[list[str]]] = None,
+            project_context_content: Optional[dict] = None,
+            tool_guides: str = '',
+            project_dir: str = '') -> prompts.Prompt:
+    """Constructs a prompt using the templates in |self| and saves it."""
+    del (example_pair, project_example_content, project_context_content,
+         tool_guides)
+    if not self.benchmark:
+      return self._prompt
+
+    if self.build_result.build_script_source:
+      build_text = (f'<build script>\n{self.build_result.build_script_source}\n'
+                    '</build script>')
+    else:
+      build_text = 'Build script reuses `/src/build.bk.sh`.'
+
+    prompt = self._get_template(self.priming_template_file)
+    prompt = prompt.replace('{FUZZ_TARGET_SOURCE}',
+                            self.build_result.fuzz_target_source)
+    prompt = prompt.replace('{BUILD_TEXT}', build_text)
+    prompt = prompt.replace('{COMPILE_LOG}', self.compile_log)
+    prompt = prompt.replace('{FUNCTION_SIGNATURE}',
+                            self.benchmark.function_signature)
+    prompt = prompt.replace('{PROJECT_DIR}', project_dir)
+    self._prompt.append(prompt)
+
+    return self._prompt
+
+
 class DefaultJvmTemplateBuilder(PromptBuilder):
   """Default builder for JVM projects."""
 

diff --git a/prompts/agent/prototyper-fixing.txt b/prompts/agent/prototyper-fixing.txt
@@ -0,0 +1,16 @@
+Failed to build fuzz target. Here is the fuzz target, build script, compilation command, and compilation output:
+<fuzz target>\n{FUZZ_TARGET_SOURCE}\n</fuzz target>
+{BUILD_TEXT}
+<compilation log>\n{COMPILE_LOG}\n</compilation log>
+YOU MUST first analyze the error messages with the fuzz target and the build script carefully to identify the root cause.
+YOU MUST NOT make any assumptions of the source code or build environment. Always confirm assumptions with source code evidence, obtain them via Bash commands.
+Once you are absolutely certain of the error root cause, output the FULL SOURCE CODE of the fuzz target (and FULL SOURCE CODE of build script, if /src/build.bk.sh is insufficient).
+TIPS:
+1. If necessary, #include necessary headers and #define required macros or constants in the fuzz target.
+2. Adjust compiler flags to link required libraries in the build script.
+3. After collecting information, analyzing and understanding the error root cause. YOU MUST take at least one step to validate your theory with source code evidence.
+4. Always use the source code from project source code directory `{PROJECT_DIR}/` to understand errors and how to fix them. For example, search for the key words (e.g., function name, type name, constant name) in the source code to learn how they are used. Similarly, learn from the other fuzz targets and the build script to understand how to include the correct headers.
+5. Once you have verified the error root cause, output the FULL SOURCE CODE of the fuzz target (and FULL SOURCE CODE of build script, if /src/build.bk.sh is insufficient).
+6. Focus on writing a compilable fuzz target that calls the function-under-test {FUNCTION_SIGNATURE}, don't worry about coverage or finding bugs. We can improve that later, but first try to ensure it calls the function-under-test {FUNCTION_SIGNATURE} and can compile successfully.
+7. If an error happens repeatedly and cannot be fixed, try to mitigate it. For example, replace or remove the line.
+
diff --git a/prompts/agent/prototyper-priming.c.txt b/prompts/agent/prototyper-priming.c.txt
@@ -0,0 +1,141 @@
+<system>
+As a security testing engineer, you must write an `int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)` fuzz target in {LANGUAGE}.
+Objective: Your goal is to modify an existing fuzz target `{FUZZ_TARGET_PATH}` to write a minimum fuzz target of a given function-under-test that can build successfully.
+</system>
+
+<steps>
+Follow these steps to write a minimum fuzz target:
+
+Step 1. Determine the information you need to write an effective fuzz target.
+This includes:
+    * **Source code** of the function under test.
+    * **Custom Types and Dependencies** definitions and implementations.
+    * **Initialization and setup** requirements and steps.
+    * **Build details** and integration steps.
+    * Valid and edge-case input values.
+    * Environmental and runtime dependencies.
+
+Step 2. Collect information using the Bash tool.
+Use the bash tool (see <tool> section) and follow its rules to gather the necessary information. You can collect information from:
+    * The existing human written fuzz target at `{FUZZ_TARGET_PATH}`.
+    * The existing human written build script `/src/build.bk.sh`.
+    * The project source code directory `{PROJECT_DIR}/` cloned from the project repository.
+    * Documentation about the project, the function, and the variables/constants involved.
+    * Environment variables.
+    * Knowledge about OSS-Fuzz's build infrastructure: It will compile your fuzz target in the same way as the exiting human written fuzz target with the build script.
+
+Step 3. Analyze the function and its parameters.
+Understand the function under test by analyzing its source code and documentation:
+    * **Purpose and functionality** of the function.
+    * **Input processing** and internal logic.
+    * **Dependencies** on other functions or global variables.
+    * **Error handling** and edge cases.
+
+Step 4. Understand initialization requirements.
+Identify what is needed to properly initialize the function:
+    * **Header files** and their relative paths used by include statements in the fuzz target.
+    * **Complex input parameters or objects** initialization.
+    * **Constructor functions** or initialization routines.
+    * **Global state** or configuration needs to be set up.
+    * **Mocking** external dependencies if necessary.
+
+Step 5. Understand Constraints and edge cases.
+For each input parameter, understand:
+    * Valid ranges and data types.
+    * Invalid or edge-case values (e.g., zero, NULL, predefined constants, maximum values).
+    * Special values that trigger different code paths.
+
+Step 6: Plan Fuzz Target Implementation.
+Decide how to implement the fuzz target:
+    * **Extract parameters** from the `data` and `size` variable of `LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)`.
+    * Handle fixed-size versus variable-size data.
+    * **Initialize function's parameters** by appropriately mapping the raw input bytes.
+    * Ensure that the fuzz target remains deterministic and avoids side effects.
+    * Avoid `goto` statements.
+
+Step 7: **Write** the fuzz target code.
+Implement the `LLVMFuzzerTestOneInput` function:
+    * Header files:
+        * Investigate how existing fuzz targets include headers.
+        * Investigate where they are located in the project
+        * Collect all headers required by your fuzz target and their locations.
+        * Include their relative path in the same way as the existing fuzz targets.
+    * Macros or Constants:
+        * Include or define necessary macros or constants.
+    * Input Handling:
+        * Check that the input size is sufficient.
+        * Extract parameters from the input data.
+        * Handle any necessary conversions or validations.
+    * Function Invocation:
+        * Initialize required objects or state.
+        * Modify the existing fuzz target at `{FUZZ_TARGET_PATH}` to fuzz the function under test with the fuzzed parameters.
+        * Ensure proper error handling.
+    *
+    * Cleanup:
+        * Free any allocated resources.
+        * Reset any global state if necessary.
+
+Step 8 (Optional): **Modify** the Build Script.
+Write a new build script only if the existing one (`/src/build.bk.sh`) is insufficient:
+    * Decide if you need to modify the build script at `/src/build.bk.sh` to successfully build the new fuzz target.
+    * Include compilation steps for the project under test.
+    * Include compilation steps for the new fuzz target.
+    * Specify necessary compiler and linker flags.
+    * Ensure all dependencies are correctly linked.
+
+Step 9: Providing Your Conclusion:
+    * Provide your conclusion on the FULL new fuzz target and build script **ONLY AFTER** you have gathered all necessary information.
+    * **DO NOT SEND** any other content (e.g., bash tool commands) in the conclusion message. ALWAYS send other commands individually and ONLY SEND conclusion after collecting all information.
+    * Conclusion Format:
+        * Overall Description:
+            * Summarize your findings and describe your fuzz target design.
+            * Wrap this summary within <conclusion> and </conclusion> tags.
+    * Modified Fuzz Target:
+        * Provide the full code of the modified fuzz target.
+        * Wrap the code within <fuzz target> and </fuzz target> tags.
+    * Modified Build Script (if applicable):
+        * If you need to modify the build script, provide the full code.
+        * Wrap it within <build script> and </build script> tags.
+    * Format Example:
+        <conclusion>
+        I determined that the fuzz target needs to include specific header files and adjust the `LLVMFuzzerTestOneInput` function to call the new function-under-test. Additionally, the build script requires modification to link against the necessary libraries.
+        </conclusion>
+        <fuzz target>
+        [Your FULL fuzz target code here.]
+        </fuzz target>
+        <build script>
+        [Your FULL build script code here, if applicable.]
+        </build script>
+
+</steps>
+
+{TYPE_SPECIFIC_PRIMING}
+
+<instructions>
+3. Methodical Approach:
+    * Be systematic to cover all necessary aspects, such as:
+        * Understanding the function's parameters and dependencies.
+        * Identifying required header files and libraries.
+        * Recognizing any special initialization or environmental requirements.
+1. Utilizing Existing Examples:
+    * Use the existing fuzz target at `{FUZZ_TARGET_PATH}` and other fuzz targets with `LLVMFuzzerTestOneInput` in its parent directory as references.
+    * Pay special attention to:
+        * How header files are included.
+        * The structure and content of the `LLVMFuzzerTestOneInput` function.
+    * Typically, you only need to modify the content of `LLVMFuzzerTestOneInput`.
+2. Investigating Header Inclusions:
+    * Use bash tool to find required headers and libraries.
+    * Examine library files built by `/src/build.bk.sh` to understand available functions and symbols.
+3. Modifying the Build Script (if necessary):
+    * Modifying `/src/build.bk.sh` to build the necessary components or include required libraries if function-under-test is not included.
+    * The project's directory may contain a `README.md` with build instructions (e.g., at `/src/<project-name>/README.md`
+4. Do Not Compile:
+    * **Do not compile** the fuzz target during your investigation.
+    * Provide your conclusions based on the information gathered after you have a solution.
+5. Formatting Code Snippets:
+    * Do not wrap code snippets with triple backticks (```).
+    * Use the specified XML-style tags for wrapping code and other content.
+6. DO NOT send the <conclusion> early: Provide conclusions **only after** gathering all necessary information.
+7. Focus on Final Goals:
+    * Ensure that your fuzz target and build script aim to successfully build the fuzz target and fuzz the function-under-test.
+</instructions>