Releases · zaripych/gpt-refactor-bot

28 Jan 11:54

zaripych

[email protected]

ddb9704

[email protected] Latest

Latest

Patch Changes

#16 54d866e Thanks @zaripych! - fix: if an identifier is not found, provide LLM with suggestion to reduce specificity

#16 54d866e Thanks @zaripych! - feat: improve benchmarking command

Introduce changes to the report generated by the refactor bot so that we can get better benchmark stats.

The benchmark command now outputs promptTokens and completionTokens.

The report generated by the benchmark command has been improved to include difference comparison, outliers and a list of the refactors with lowest scores.

Example:

   Benchmark results

           METRIC         │     A     │     B     │  DIFF
  ────────────────────────┼───────────┼───────────┼──────────
    numberOfRuns          │      9.00 │     10.00 │
    score                 │      0.83 │      1.00 │ +17.28%
    acceptedRatio         │      0.81 │      1.00 │ +18.52%
    totalTokens           │  44688.67 │  50365.90 │ +12.70%
    totalPromptTokens     │  40015.44 │  48283.30 │ +20.66%
    totalCompletionTokens │   4673.22 │   2082.60 │ -55.44%
    wastedTokensRatio     │      0.09 │      0.00 │ -9.49%
    durationMs            │ 286141.39 │ 171294.32 │ -40.14%

#16 54d866e Thanks @zaripych! - fix: fail if eslint is not properly configured or installed instead of ignoring the errors

If eslint is not properly configured or installed, the refactor bot would ignore the errors because it would fail to analyze stderr of the eslint command.

It now properly fails with a message that explains the problem.

This should lead to better outcomes when configuring the refactor bot for the first time.
#18 1d26b8c Thanks @zaripych! - feat: introducing experimental chunky edit strategy

This strategy allows the LLM to perform edits via find-replace operations which reduce the total number of completion tokens. The completion tokens are typically priced at twice the cost of prompt tokens. In addition to the reduction of the price this strategy also significantly improves the performance of the refactoring.

Here are benchmark results for the chunky-edit strategy:
```
           METRIC         │     A     │     B     │  DIFF
  ────────────────────────┼───────────┼───────────┼──────────
    numberOfRuns          │      9.00 │     10.00 │
    score                 │      0.83 │      1.00 │ +17.28%
    acceptedRatio         │      0.81 │      1.00 │ +18.52%
    totalTokens           │  44688.67 │  50365.90 │ +12.70%
    totalPromptTokens     │  40015.44 │  48283.30 │ +20.66%
    totalCompletionTokens │   4673.22 │   2082.60 │ -55.44%
    wastedTokensRatio     │      0.09 │      0.00 │ -9.49%
    durationMs            │ 286141.39 │ 171294.32 │ -40.14%
```
While it does seem to improve the score, this should just be considered as variance introduce by the randomness of the LLM. The main outcome of this strategy is the reduction of the number of completion tokens and the improvement of the performance.

There might be some other side effects, probably depending on the type of the refactor. So, this strategy is still experimental and must be selectively opted-in via "--experiment-chunky-edit-strategy" cli option.

Assets 2

21 Jan 08:28

zaripych

[email protected]

4a201ba

[email protected]

Patch Changes

#14 05da890 Thanks @zaripych! - feat: evaluate refactor outcomes using LLM to make decision of whether file edit should be accepted or discarded

This is a big change which adds extra steps to the refactor process. Every time an LLM produces a file edit - we will pass that edit through evaluation algorithm to asses whether it should be accepted or discarded. Previously, this logic was only affected by the existence or absence of eslint errors. This will make the final result higher quality and more reliable.

The new behavior can be disabled by setting evaluate: false in the goal.md file.

In addition to that, this change also adds a new CLI command for internal use which allows us to compare results of multiple refactor runs. This is useful for benchmarking purposes.

To run the benchmark, use the following command:
```
pnpm benchmark --config .refactor-bot/benchmarks/test-benchmark.yaml
```
Where the config:
```
refactorConfig:
    name: test-refactoring
    ref: 8f1a3da55caeee3df75853042e57978c45513f18
    budgetCents: 100
    model: gpt-4-1106-preview
    objective: Replace all usages of `readFile` from `fs/promises` module with `readFileSync` from `fs` module in `packages/refactor-bot/src/refactor/planTasks.ts`, `packages/refactor-bot/src/refactor/loadRefactors.ts` and `packages/refactor-bot/src/refactor/discoverDependencies.ts`.

numberOfRuns: 2

variants:
    - name: 'A'
      ids: # ids of refactor runs to save mooney on
          - VRixXEwC
          - k0FmgQjU
          - IpSOtP7d
          - xqydSrSU
    - name: 'B'
```
This will run multiple refactor runs and compare the results. At this moment no statistical analysis is performed as I'm not convinced we can reach statistical significance with the number of runs that also doesn't make you poor.

Assets 2

08 Jan 13:00

zaripych

[email protected]

671d221

[email protected]

Patch Changes

#12 9131738 Thanks @zaripych! - fix: sanitize results of the function calls when they fail removing full paths to repository
#12 9131738 Thanks @zaripych! - fix: default to gpt-3.5-turbo-1106 in the config
#12 9131738 Thanks @zaripych! - fix: fail at the start of the refactor when prettier cannot be found

Assets 2

05 Jan 13:46

zaripych

[email protected]

9acf04c

[email protected]

Patch Changes

1666830 Thanks @zaripych! - docs: updated the README.md to make them compatible with npm

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch Changes

Patch Changes

Patch Changes

Patch Changes

Releases: zaripych/gpt-refactor-bot

[email protected]

Patch Changes

[email protected]

Patch Changes

[email protected]

Patch Changes

[email protected]

Patch Changes