Skip to content

Commit

Permalink
Replace all gpt_index with llama_index (run-llama#1875)
Browse files Browse the repository at this point in the history
* use llama_index

* replace in docs

* replace all tests

* replace other packages

* replace more docs

* more docs

* update others

* update setup

* wip

* wip
  • Loading branch information
Disiok authored May 2, 2023
1 parent 7a9c570 commit 1e3ce0d
Show file tree
Hide file tree
Showing 441 changed files with 1,514 additions and 1,590 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build_package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,5 +36,5 @@ jobs:
- name: Test import
working-directory: ${{ vars.RUNNER_TEMP }}
run: |
python -c "import gpt_index"
python -c "import llama_index"
2 changes: 1 addition & 1 deletion .github/workflows/dev_docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
with:
source-directory: './docs'
destination-github-username: 'avb-is-me'
destination-repository-name: 'gpt_index'
destination-repository-name: 'llama_index'
user-email: github-actions[bot]@users.noreply.github.com
target-branch: main
target-directory: docs
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ authors:
title: "LlamaIndex"
doi: 10.5281/zenodo.1234
date-released: 2022-11-1
url: "https://github.com/jerryjliu/gpt_index"
url: "https://github.com/jerryjliu/llama_index"
44 changes: 22 additions & 22 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ It is responsible for splitting text (via text splitters) and explicitly modelli
**Interface**: `get_nodes_from_documents` takes a sequence of `Document` objects as input, and outputs a sequence of `Node` objects.

**Examples**:
* [Simple Node Parser](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/node_parser/simple.py)
* [Simple Node Parser](https://github.com/jerryjliu/llama_index/blob/main/llama_index/node_parser/simple.py)

See [the API reference](https://gpt-index.readthedocs.io/en/latest/reference/node_parser.html) for full details.

Expand All @@ -69,8 +69,8 @@ Text splitter splits a long text `str` into smaller text `str` chunks with desir
**Interface**: `split_text` takes a `str` as input, and outputs a sequence of `str`

**Examples**:
* [Token Text Splitter](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/langchain_helpers/text_splitter.py#L23)
* [Sentence Splitter](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/langchain_helpers/text_splitter.py#L239)
* [Token Text Splitter](https://github.com/jerryjliu/llama_index/blob/main/llama_index/langchain_helpers/text_splitter.py#L23)
* [Sentence Splitter](https://github.com/jerryjliu/llama_index/blob/main/llama_index/langchain_helpers/text_splitter.py#L239)

---

Expand All @@ -95,9 +95,9 @@ These serve as the main data store and retrieval engine for our vector index.
* `query` retrieves top-k most similar entries given a query embedding.

**Examples**:
* [Pinecone](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/vector_stores/pinecone.py)
* [Faiss](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/vector_stores/faiss.py)
* [Chroma](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/vector_stores/chroma.py)
* [Pinecone](https://github.com/jerryjliu/llama_index/blob/main/llama_index/vector_stores/pinecone.py)
* [Faiss](https://github.com/jerryjliu/llama_index/blob/main/llama_index/vector_stores/faiss.py)
* [Chroma](https://github.com/jerryjliu/llama_index/blob/main/llama_index/vector_stores/chroma.py)

**Ideas**:
* See a vector database out there that we don't support yet? Make a PR!
Expand All @@ -119,9 +119,9 @@ data if you wish.
- `retrieve` takes in a `str` or `QueryBundle` as input, and outputs a list of `NodeWithScore` objects

**Examples**:
* [Vector Index Retriever](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/vector_store/retrievers.py)
* [List Index Retriever](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/list/retrievers.py)
* [Transform Retriever](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/retrievers/transform_retriever.py)
* [Vector Index Retriever](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/vector_store/retrievers.py)
* [List Index Retriever](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/list/retrievers.py)
* [Transform Retriever](https://github.com/jerryjliu/llama_index/blob/main/llama_index/retrievers/transform_retriever.py)

**Ideas**:
* Besides the "default" retrievers built on top of each index, what about fancier retrievers? E.g. retrievers that take in other retrivers as input? Or other
Expand All @@ -141,8 +141,8 @@ They may take in other query engine classes in as input too.
- `query` takes in a `str` or `QueryBundle` as input, and outputs a `Response` object.

**Examples**:
- [Retriever Query Engine](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/query_engine/retriever_query_engine.py)
- [Transform Query Engine](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/query_engine/transform_query_engine.py)
- [Retriever Query Engine](https://github.com/jerryjliu/llama_index/blob/main/llama_index/query_engine/retriever_query_engine.py)
- [Transform Query Engine](https://github.com/jerryjliu/llama_index/blob/main/llama_index/query_engine/transform_query_engine.py)

---

Expand All @@ -153,8 +153,8 @@ This can interpreted as a pre-processing stage, before the core index query logi
**Interface**: `run` takes in a `str` or `Querybundle` as input, and outputs a transformed `QueryBundle`.

**Examples**:
* [Hypothetical Document Embeddings](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/query/query_transform/base.py#L77)
* [Query Decompose](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/query/query_transform/base.py#L124)
* [Hypothetical Document Embeddings](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/query/query_transform/base.py#L77)
* [Query Decompose](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/query/query_transform/base.py#L124)

See [guide](https://gpt-index.readthedocs.io/en/latest/how_to/query/query_transformations.html#hyde-hypothetical-document-embeddings) for more information.

Expand All @@ -165,7 +165,7 @@ A token usage optimizer refines the retrieved `Nodes` to reduce token usage duri
**Interface**: `optimize` takes in the `QueryBundle` and a text chunk `str`, and outputs a refined text chunk `str` that yeilds a more optimized response

**Examples**:
* [Sentence Embedding Optimizer](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/optimization/optimizer.py)
* [Sentence Embedding Optimizer](https://github.com/jerryjliu/llama_index/blob/main/llama_index/optimization/optimizer.py)

---
#### Node Postprocessors
Expand All @@ -175,9 +175,9 @@ A node postprocessor refines a list of retrieve nodes given configuration and co


**Examples**:
* [Keyword Postprocessor](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/postprocessor/node.py#L32): filters nodes based on keyword match
* [Similarity Postprocessor](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/postprocessor/node.py#L62): filers nodes based on similarity threshold
* [Prev Next Postprocessor](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/postprocessor/node.py#L135): fetchs additional nodes to augment context based on node relationships.
* [Keyword Postprocessor](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/postprocessor/node.py#L32): filters nodes based on keyword match
* [Similarity Postprocessor](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/postprocessor/node.py#L62): filers nodes based on similarity threshold
* [Prev Next Postprocessor](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/postprocessor/node.py#L135): fetchs additional nodes to augment context based on node relationships.

---
#### Output Parsers
Expand All @@ -188,15 +188,15 @@ A output parser enables us to extract structured output from the plain text outp
* `parse`: takes a `str` (from LLM response) as input, and gives a parsed tructured output (optionally also validated, error-corrected).

**Examples**:
* [Guardrails Output Parser](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/output_parsers/guardrails.py)
* [Langchain Output Parser](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/output_parsers/langchain.py)
* [Guardrails Output Parser](https://github.com/jerryjliu/llama_index/blob/main/llama_index/output_parsers/guardrails.py)
* [Langchain Output Parser](https://github.com/jerryjliu/llama_index/blob/main/llama_index/output_parsers/langchain.py)

See [guide](https://gpt-index.readthedocs.io/en/latest/how_to/output_parsing.html) for more information.

---

### 2. 🐛 Fix Bugs
Most bugs are reported and tracked in the [Github Issues Page](https://github.com/jerryjliu/gpt_index/issues).
Most bugs are reported and tracked in the [Github Issues Page](https://github.com/jerryjliu/llama_index/issues).
We try our best in triaging and tagging these issues:
* Issues tagged as `bug` are confirmed bugs.
* New contributors may want to start with issues tagged with `good first issue`.
Expand All @@ -222,7 +222,7 @@ We would love your help in making the project cleaner, more robust, and more und
LlamaIndex is a Python package. We've tested primarily with Python versions >= 3.8. Here's a quick
and dirty guide to getting your environment setup.

First, create a fork of LlamaIndex, by clicking the "Fork" button on the [LlamaIndex Github page](https://github.com/jerryjliu/gpt_index).
First, create a fork of LlamaIndex, by clicking the "Fork" button on the [LlamaIndex Github page](https://github.com/jerryjliu/llama_index).
Following [these steps](https://docs.github.com/en/get-started/quickstart/fork-a-repo) for more details
on how to fork the repo and clone the forked repo.

Expand Down Expand Up @@ -284,7 +284,7 @@ pytest tests
For changes that involve entirely new features, it may be worth adding an example Jupyter notebook to showcase
this feature.

Example notebooks can be found in this folder: https://github.com/jerryjliu/gpt_index/tree/main/examples.
Example notebooks can be found in this folder: https://github.com/jerryjliu/llama_index/tree/main/examples.


### Creating a pull request
Expand Down
4 changes: 2 additions & 2 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
include gpt_index/py.typed
include gpt_index/VERSION
include llama_index/py.typed
include llama_index/VERSION
include LICENSE
3 changes: 0 additions & 3 deletions MANIFEST_llama.in

This file was deleted.

2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ test:

# Docs
watch-docs: ## Build and watch documentation
sphinx-autobuild docs/ docs/_build/html --open-browser --watch $(GIT_ROOT)/gpt_index/
sphinx-autobuild docs/ docs/_build/html --open-browser --watch $(GIT_ROOT)/llama_index/
4 changes: 2 additions & 2 deletions benchmarks/struct_indices/spider/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage
from gpt_index.response.schema import Response
from llama_index.response.schema import Response
from spider_utils import create_indexes, load_examples
from tqdm import tqdm

from gpt_index.indices.struct_store.sql import GPTSQLStructStoreIndex, SQLQueryMode
from llama_index.indices.struct_store.sql import GPTSQLStructStoreIndex, SQLQueryMode

logging.getLogger("root").setLevel(logging.WARNING)

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/struct_indices/spider/generate_sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from sqlalchemy import create_engine, text
from tqdm import tqdm

from gpt_index import GPTSQLStructStoreIndex, LLMPredictor, SQLDatabase
from llama_index import GPTSQLStructStoreIndex, LLMPredictor, SQLDatabase
from typing import Any, cast

logging.getLogger("root").setLevel(logging.WARNING)
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/struct_indices/spider/spider_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from langchain.chat_models import ChatOpenAI
from sqlalchemy import create_engine, text

from gpt_index import GPTSQLStructStoreIndex, LLMPredictor, SQLDatabase
from llama_index import GPTSQLStructStoreIndex, LLMPredictor, SQLDatabase


def load_examples(spider_dir: str) -> Tuple[list, list]:
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

sys.path.insert(0, os.path.abspath("../"))

with open("../gpt_index/VERSION") as f:
with open("../llama_index/VERSION") as f:
version = f.read()

# -- Project information -----------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ pip install llama-index
```

### Installation from Source
Git clone this repository: `git clone [email protected]:jerryjliu/gpt_index.git`. Then do:
Git clone this repository: `git clone [email protected]:jerryjliu/llama_index.git`. Then do:

- `pip install -e .` if you want to do an editable install (you can modify source files) of just the package itself.
- `pip install -r requirements.txt` if you want to install optional dependencies + dependencies used for development (e.g. unit testing).
Expand Down
6 changes: 3 additions & 3 deletions docs/getting_started/starter_example.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,18 @@ LlamaIndex examples can be found in the `examples` folder of the LlamaIndex repo
We first want to download this `examples` folder. An easy way to do this is to just clone the repo:

```bash
$ git clone https://github.com/jerryjliu/gpt_index.git
$ git clone https://github.com/jerryjliu/llama_index.git
```

Next, navigate to your newly-cloned repository, and verify the contents:

```bash
$ cd gpt_index
$ cd llama_index
$ ls
LICENSE data_requirements.txt tests/
MANIFEST.in examples/ pyproject.toml
Makefile experimental/ requirements.txt
README.md gpt_index/ setup.py
README.md llama_index/ setup.py
```

We now want to navigate to the following folder:
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/notebooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ Notebooks

We offer a wide variety of example notebooks. They are referenced throughout the documentation.

Example notebooks are found `here <https://github.com/jerryjliu/gpt_index/tree/main/examples>`_.
Example notebooks are found `here <https://github.com/jerryjliu/llama_index/tree/main/examples>`_.
2 changes: 1 addition & 1 deletion docs/guides/tutorials/building_a_chatbot.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ decompose_transform = DecomposeQueryTransform(
)

# define custom retrievers
from gpt_index.query_engine.transform_query_engine import TransformQueryEngine
from llama_index.query_engine.transform_query_engine import TransformQueryEngine

custom_query_engines = {}
for index in index_set.values():
Expand Down
4 changes: 2 additions & 2 deletions docs/guides/tutorials/sql_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,8 +230,8 @@ stores the context on the generated context container.
You can then build the context container, and pass it to the index during query-time!

```python
from gpt_index import GPTSQLStructStoreIndex, SQLDatabase, GPTVectorStoreIndex
from gpt_index.indices.struct_store import SQLContextContainerBuilder
from llama_index import GPTSQLStructStoreIndex, SQLDatabase, GPTVectorStoreIndex
from llama_index.indices.struct_store import SQLContextContainerBuilder

sql_database = SQLDatabase(engine)
# build a vector index from the table schema information
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/tutorials/terms_definitions_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ If you play around with the app a bit now, you might notice that it stopped foll

This is due to the concept of "refining" answers in Llama Index. Since we are querying across the top 5 matching results, sometimes all the results do not fit in a single prompt! OpenAI models typically have a max input size of 4097 tokens. So, Llama Index accounts for this by breaking up the matching results into chunks that will fit into the prompt. After Llama Index gets an initial answer from the first API call, it sends the next chunk to the API, along with the previous answer, and asks the model to refine that answer.

So, the refine process seems to be messing with our results! Rather than appending extra instructions to the `query_str`, remove that, and Llama Index will let us provide our own custom prompts! Let's create those now, using the [default prompts](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/default_prompts.py) and [chat specific prompts](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/chat_prompts.py) as a guide. Using a new file `constants.py`, let's create some new query templates:
So, the refine process seems to be messing with our results! Rather than appending extra instructions to the `query_str`, remove that, and Llama Index will let us provide our own custom prompts! Let's create those now, using the [default prompts](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/default_prompts.py) and [chat specific prompts](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/chat_prompts.py) as a guide. Using a new file `constants.py`, let's create some new query templates:

```python
from langchain.chains.prompt_selector import ConditionalPromptSelector, is_chat_model
Expand Down
14 changes: 7 additions & 7 deletions docs/guides/tutorials/unified_query.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ that solves a distinct use case.
We will first define a vector index over the documents of each city.

```python
from gpt_index import GPTVectorStoreIndex, ServiceContext, StorageContext
from llama_index import GPTVectorStoreIndex, ServiceContext, StorageContext
from langchain.llms.openai import OpenAIChat

# set service context
Expand Down Expand Up @@ -127,7 +127,7 @@ Next, we compose a keyword table on top of these vector indexes, with these inde


```python
from gpt_index.indices.composability import ComposableGraph
from llama_index.indices.composability import ComposableGraph

graph = ComposableGraph.from_indices(
GPTSimpleKeywordTableIndex,
Expand All @@ -152,13 +152,13 @@ An example is shown below.

```python
# define decompose_transform
from gpt_index.indices.query.query_transform.base import DecomposeQueryTransform
from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
decompose_transform = DecomposeQueryTransform(
llm_predictor_chatgpt, verbose=True
)

# define custom query engines
from gpt_index.query_engine.transform_query_engine import TransformQueryEngine
from llama_index.query_engine.transform_query_engine import TransformQueryEngine
custom_query_engines = {}
for index in vector_indices.values():
query_engine = index.as_query_engine(service_context=service_context)
Expand Down Expand Up @@ -204,7 +204,7 @@ First, we define the query engines for the set of indexes/graph that we want to


```python
from gpt_index.tools.query_engine import QueryEngineTool
from llama_index.tools.query_engine import QueryEngineTool

query_engine_tools = []

Expand All @@ -231,8 +231,8 @@ Now, we can define the routing logic and overall router query engine.
Here, we use the `LLMSingleSelector`, which uses LLM to choose a underlying query engine to route the query to.

```python
from gpt_index.query_engine.router_query_engine import RouterQueryEngine
from gpt_index.selectors.llm_selectors import LLMSingleSelector
from llama_index.query_engine.router_query_engine import RouterQueryEngine
from llama_index.selectors.llm_selectors import LLMSingleSelector


router_query_engine = RouterQueryEngine(
Expand Down
2 changes: 1 addition & 1 deletion docs/how_to/analysis/cost_analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,4 +116,4 @@ response = query_engine.query(
```


[Here is an example notebook](https://github.com/jerryjliu/gpt_index/blob/main/examples/cost_analysis/TokenPredictor.ipynb).
[Here is an example notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/cost_analysis/TokenPredictor.ipynb).
2 changes: 1 addition & 1 deletion docs/how_to/customization/custom_llms.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,4 +197,4 @@ Using this method, you can use any LLM. Maybe you have one running locally, or r

Note that you may have to adjust the internal prompts to get good performance. Even then, you should be using a sufficiently large LLM to ensure it's capable of handling the complex queries that LlamaIndex uses internally, so your mileage may vary.

A list of all default internal prompts is available [here](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/default_prompts.py), and chat-specific prompts are listed [here](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/chat_prompts.py). You can also implement your own custom prompts, as described [here](https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_prompts.html).
A list of all default internal prompts is available [here](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/default_prompts.py), and chat-specific prompts are listed [here](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/chat_prompts.py). You can also implement your own custom prompts, as described [here](https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_prompts.html).
Loading

0 comments on commit 1e3ce0d

Please sign in to comment.