Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experiment] Add symbol navigation commands into the editor #5092

Closed
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions openhands/agenthub/codeact_agent/function_calling.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,8 @@ def __init__(self):
* The `create` command cannot be used if the specified `path` already exists as a file
* If a `command` generates a long output, it will be truncated and marked with `<response clipped>`
* The `undo_edit` command will revert the last edit made to the file at `path`
* The `jump_to_definition` command will print out possible FULL definition(s) of the class or function/method in the whole codebase. Can be useful for navigating large repositories
ryanhoangt marked this conversation as resolved.
Show resolved Hide resolved
* The `find_references` command will print out possible references (or usages) to the class or function/method in the whole codebase. This can be useful to understand how a class or function/method is used in a repository

Notes for using the `str_replace` command:
* The `old_str` parameter should match EXACTLY one or more consecutive lines from the original file. Be mindful of whitespaces!
Expand All @@ -235,7 +237,7 @@ def __init__(self):
'type': 'string',
},
'path': {
'description': 'Absolute path to file or directory, e.g. `/workspace/file.py` or `/workspace`.',
'description': 'Absolute path to file or directory, e.g. `/workspace/file.py` or `/workspace`. Required for all commands, except `jump_to_definition` and `find_references`.',
'type': 'string',
},
'file_text': {
Expand All @@ -259,8 +261,11 @@ def __init__(self):
'items': {'type': 'integer'},
'type': 'array',
},
'symbol_name': {
'description': 'Required parameter of `jump_to_definition` and `find_references` commands. The name of the class or function/method to search for.',
},
},
'required': ['command', 'path'],
'required': ['command'],
},
),
)
Expand Down
145 changes: 143 additions & 2 deletions openhands/llm/fn_call_converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@
IN_CONTEXT_LEARNING_EXAMPLE_PREFIX = """
Here's a running example of how to perform a task with the provided tools.

--------------------- START OF EXAMPLE ---------------------
--------------------- START OF EXAMPLE 1 ---------------------

USER: Create a list of numbers from 1 to 10, and display them in a web page at port 5000.

Expand Down Expand Up @@ -213,7 +213,148 @@ def index():
<function=finish>
</function>

--------------------- END OF EXAMPLE ---------------------
--------------------- END OF EXAMPLE 1 ---------------------


--------------------- START OF EXAMPLE 2 ---------------------

USER: I'm working on an issue about HTTP timeouts in our data ingestion pipeline. The logs show
TimeoutError is being raised in fetch_customer_data() but I'm not sure if we need to increase the
timeout or if there's a deeper issue. Can you help me trace this function?

ASSISTANT: I'll help track down the timeout issue in fetch_customer_data. Let me see its definition first:
ryanhoangt marked this conversation as resolved.
Show resolved Hide resolved

<function=str_replace_editor>
<parameter=command>jump_to_definition</parameter>
<parameter=symbol_name>fetch_customer_data</parameter>
</function>
ryanhoangt marked this conversation as resolved.
Show resolved Hide resolved

USER: EXECUTION RESULT of [str_replace_editor]:
Definition(s) of `fetch_customer_data`:
/workspace/services/data_ingestion.py:
...⋮...
252│async def fetch_customer_data(customer_id: str) -> Dict[str, Any]:
253│ '''Fetches customer data from multiple endpoints and aggregates results.
254│
255│ Args:
256│ customer_id: Unique identifier for customer
257│
258│ Returns:
259│ Combined customer data from all sources
260│ '''
261│ async with CustomerClient(timeout=DEFAULT_TIMEOUT) as client:
262│ # Core profile data has 30s timeout
263│ profile = await client.get_profile(customer_id)
264│
265│ # Transaction history needs more time
266│ transactions = await _fetch_transactions(
267│ customer_id,
268│ timeout=TRANSACTION_TIMEOUT
269│ )
270│
271│ return {
272│ "profile": profile,
273│ "transactions": transactions,
274│ "last_updated": datetime.now()
275│ }

ASSISTANT: Interesting - it looks like fetch_customer_data relies on _fetch_transactions which has a separate timeout. Let me check the definition of that internal function:

<function=str_replace_editor>
<parameter=command>jump_to_definition</parameter>
<parameter=symbol_name>_fetch_transactions</parameter>
</function>

USER: EXECUTION RESULT of [str_replace_editor]:
Definition(s) of `_fetch_transactions`:
/workspace/services/data_ingestion.py:
...⋮...
67|async def _fetch_transactions(customer_id: str, timeout: int) -> List[Dict]:
68| '''Fetches detailed transaction history for customer.
69| Handles pagination and retries on network errors.
70| '''
71| results = []
72| page = 1
73|
74| while True:
75| try:
76| async with TransactionClient(timeout=timeout) as client:
77| batch = await client.get_transactions(
78| customer_id,
79| page=page,
80| batch_size=BATCH_SIZE
81| )
82| if not batch:
83| break
84|
85| results.extend(batch)
86| page += 1
87|
88| except TimeoutError:
89| logger.error(
90| f"Timeout fetching transactions for {customer_id} "
91| f"on page {page}"
92| )
93| raise
94|
95| return results

ASSISTANT: Now, let me see where fetch_customer_data is being called to understand the usage patterns:

<function=str_replace_editor>
<parameter=command>find_references</parameter>
<parameter=symbol_name>fetch_customer_data</parameter>
</function>

USER: EXECUTION RESULT of [str_replace_editor]:
References to `fetch_customer_data`:
/workspace/api/customers.py:
...⋮...
21| @router.get("/customers/{customer_id}/full")
22| async def get_full_customer_data(customer_id: str):
23| try:
24| data = await fetch_customer_data(customer_id)
25| return data
26| except TimeoutError:
27| raise HTTPException(status_code=504, detail="Data fetch timed out")
...⋮...
/workspace/jobs/daily_report.py:
...⋮...
314| async def generate_customer_report(customer_ids: List[str]):
315| results = []
316| for id in customer_ids:
317| try:
318| data = await fetch_customer_data(id)
319| results.append(summarize_customer_data(data))
320| except TimeoutError:
321| logger.warning(f"Skipping customer {id} due to timeout")
322| continue
323| return results
...⋮...
/workspace/tests/test_services.py:
...⋮...
60| @pytest.mark.asyncio
61| async def test_fetch_customer_data_large_history():
62| # Test with customer having 50+ pages of transactions
63| data = await fetch_customer_data("customer_with_large_history")
64| assert len(data["transactions"]) > 500
...⋮...

ASSISTANT:
I see the issue now - the TimeoutError is likely happening because _fetch_transactions has to paginate through transaction history, and while each individual request uses TRANSACTION_TIMEOUT, we're not limiting the total time for all pages.

Some customers might have hundreds of transaction pages, causing the overall fetch_customer_data call to take much longer than expected. This explains why we see timeouts in both the API endpoint and daily report job.

We should probably:
1. Add a max_pages parameter to _fetch_transactions to limit total pagination
2. Add a total_timeout parameter separate from the per-request timeout
3. Consider implementing parallel page fetching for large transaction histories

Would you like me to implement any of these solutions for the issue?

... (continue the conversation)

--------------------- END OF EXAMPLE 2 ---------------------

Do NOT assume the environment is the same as in the example above.

Expand Down
23 changes: 14 additions & 9 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ opentelemetry-exporter-otlp-proto-grpc = "1.25.0"
modal = "^0.64.145"
runloop-api-client = "0.7.0"
pygithub = "^2.5.0"
openhands-aci = "^0.1.1"
openhands-aci = {git = "https://github.com/All-Hands-AI/openhands-aci.git", rev = "ht/jump-commands"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to update this when we merge jump-commands in ACI repo

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this is updated & CI is fixed, i'd be happy to approve this PR


[tool.poetry.group.llama-index.dependencies]
llama-index = "*"
Expand Down Expand Up @@ -95,6 +95,7 @@ reportlab = "*"
[tool.coverage.run]
concurrency = ["gevent"]


[tool.poetry.group.runtime.dependencies]
jupyterlab = "*"
notebook = "*"
Expand Down Expand Up @@ -125,6 +126,7 @@ ignore = ["D1"]
[tool.ruff.lint.pydocstyle]
convention = "google"


[tool.poetry.group.evaluation.dependencies]
streamlit = "*"
whatthepatch = "*"
Expand Down
Loading