Replace all gpt_index with llama_index (run-llama#1875)

* use llama_index * replace in docs * replace all tests * replace other packages * replace more docs * more docs * update others * update setup * wip * wip
MaxJPRey · May 2, 2023 · 1e3ce0d · 1e3ce0d
1 parent 7a9c570
commit 1e3ce0d
Show file tree

Hide file tree

Showing 441 changed files with 1,514 additions and 1,590 deletions.
diff --git a/.github/workflows/build_package.yml b/.github/workflows/build_package.yml
@@ -36,5 +36,5 @@ jobs:
       - name: Test import
         working-directory: ${{ vars.RUNNER_TEMP }}
         run: |
-          python -c "import gpt_index"
+          python -c "import llama_index"
         
diff --git a/.github/workflows/dev_docs.yml b/.github/workflows/dev_docs.yml
@@ -15,7 +15,7 @@ jobs:
         with:
           source-directory: './docs'
           destination-github-username: 'avb-is-me'
-          destination-repository-name: 'gpt_index'
+          destination-repository-name: 'llama_index'
           user-email: github-actions[bot]@users.noreply.github.com
           target-branch: main
           target-directory: docs
diff --git a/CITATION.cff b/CITATION.cff
@@ -7,4 +7,4 @@ authors:
 title: "LlamaIndex"
 doi: 10.5281/zenodo.1234
 date-released: 2022-11-1
-url: "https://github.com/jerryjliu/gpt_index"
+url: "https://github.com/jerryjliu/llama_index"
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -55,7 +55,7 @@ It is responsible for splitting text (via text splitters) and explicitly modelli
 **Interface**: `get_nodes_from_documents` takes a sequence of `Document` objects as input, and outputs a sequence of `Node` objects.
 
 **Examples**:
-* [Simple Node Parser](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/node_parser/simple.py)
+* [Simple Node Parser](https://github.com/jerryjliu/llama_index/blob/main/llama_index/node_parser/simple.py)
 
 See [the API reference](https://gpt-index.readthedocs.io/en/latest/reference/node_parser.html) for full details.
 
@@ -69,8 +69,8 @@ Text splitter splits a long text `str` into smaller text `str` chunks with desir
 **Interface**: `split_text` takes a `str` as input, and outputs a sequence of `str`
 
 **Examples**:
-* [Token Text Splitter](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/langchain_helpers/text_splitter.py#L23)
-* [Sentence Splitter](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/langchain_helpers/text_splitter.py#L239)
+* [Token Text Splitter](https://github.com/jerryjliu/llama_index/blob/main/llama_index/langchain_helpers/text_splitter.py#L23)
+* [Sentence Splitter](https://github.com/jerryjliu/llama_index/blob/main/llama_index/langchain_helpers/text_splitter.py#L239)
 
 ---
 
@@ -95,9 +95,9 @@ These serve as the main data store and retrieval engine for our vector index.
 * `query` retrieves top-k most similar entries given a query embedding.
 
 **Examples**:
-* [Pinecone](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/vector_stores/pinecone.py)
-* [Faiss](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/vector_stores/faiss.py)
-* [Chroma](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/vector_stores/chroma.py)
+* [Pinecone](https://github.com/jerryjliu/llama_index/blob/main/llama_index/vector_stores/pinecone.py)
+* [Faiss](https://github.com/jerryjliu/llama_index/blob/main/llama_index/vector_stores/faiss.py)
+* [Chroma](https://github.com/jerryjliu/llama_index/blob/main/llama_index/vector_stores/chroma.py)
 
 **Ideas**:
 * See a vector database out there that we don't support yet? Make a PR!
@@ -119,9 +119,9 @@ data if you wish.
 - `retrieve` takes in a `str` or `QueryBundle` as input, and outputs a list of `NodeWithScore` objects
 
 **Examples**:
-* [Vector Index Retriever](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/vector_store/retrievers.py)
-* [List Index Retriever](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/list/retrievers.py)
-* [Transform Retriever](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/retrievers/transform_retriever.py)
+* [Vector Index Retriever](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/vector_store/retrievers.py)
+* [List Index Retriever](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/list/retrievers.py)
+* [Transform Retriever](https://github.com/jerryjliu/llama_index/blob/main/llama_index/retrievers/transform_retriever.py)
 
 **Ideas**:
 * Besides the "default" retrievers built on top of each index, what about fancier retrievers? E.g. retrievers that take in other retrivers as input? Or other
@@ -141,8 +141,8 @@ They may take in other query engine classes in as input too.
 - `query` takes in a `str` or `QueryBundle` as input, and outputs a `Response` object.
 
 **Examples**:
-- [Retriever Query Engine](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/query_engine/retriever_query_engine.py)
-- [Transform Query Engine](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/query_engine/transform_query_engine.py)
+- [Retriever Query Engine](https://github.com/jerryjliu/llama_index/blob/main/llama_index/query_engine/retriever_query_engine.py)
+- [Transform Query Engine](https://github.com/jerryjliu/llama_index/blob/main/llama_index/query_engine/transform_query_engine.py)
 
 ---
 
@@ -153,8 +153,8 @@ This can interpreted as a pre-processing stage, before the core index query logi
 **Interface**: `run` takes in a `str` or `Querybundle` as input, and outputs a transformed `QueryBundle`.
 
 **Examples**:
-* [Hypothetical Document Embeddings](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/query/query_transform/base.py#L77)
-* [Query Decompose](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/query/query_transform/base.py#L124)
+* [Hypothetical Document Embeddings](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/query/query_transform/base.py#L77)
+* [Query Decompose](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/query/query_transform/base.py#L124)
 
 See [guide](https://gpt-index.readthedocs.io/en/latest/how_to/query/query_transformations.html#hyde-hypothetical-document-embeddings) for more information.
 
@@ -165,7 +165,7 @@ A token usage optimizer refines the retrieved `Nodes` to reduce token usage duri
 **Interface**: `optimize` takes in the `QueryBundle` and a text chunk `str`, and outputs a refined text chunk `str` that yeilds a more optimized response
 
 **Examples**:
-* [Sentence Embedding Optimizer](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/optimization/optimizer.py)
+* [Sentence Embedding Optimizer](https://github.com/jerryjliu/llama_index/blob/main/llama_index/optimization/optimizer.py)
 
 ---
 #### Node Postprocessors
@@ -175,9 +175,9 @@ A node postprocessor refines a list of retrieve nodes given configuration and co
 
 
 **Examples**:
-* [Keyword Postprocessor](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/postprocessor/node.py#L32): filters nodes based on keyword match
-* [Similarity Postprocessor](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/postprocessor/node.py#L62): filers nodes based on similarity threshold
-* [Prev Next Postprocessor](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/indices/postprocessor/node.py#L135): fetchs additional nodes to augment context based on node relationships.
+* [Keyword Postprocessor](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/postprocessor/node.py#L32): filters nodes based on keyword match
+* [Similarity Postprocessor](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/postprocessor/node.py#L62): filers nodes based on similarity threshold
+* [Prev Next Postprocessor](https://github.com/jerryjliu/llama_index/blob/main/llama_index/indices/postprocessor/node.py#L135): fetchs additional nodes to augment context based on node relationships.
 
 ---
 #### Output Parsers
@@ -188,15 +188,15 @@ A output parser enables us to extract structured output from the plain text outp
 * `parse`: takes a `str` (from LLM response) as input, and gives a parsed tructured output (optionally also validated, error-corrected).
 
 **Examples**:
-* [Guardrails Output Parser](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/output_parsers/guardrails.py)
-* [Langchain Output Parser](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/output_parsers/langchain.py)
+* [Guardrails Output Parser](https://github.com/jerryjliu/llama_index/blob/main/llama_index/output_parsers/guardrails.py)
+* [Langchain Output Parser](https://github.com/jerryjliu/llama_index/blob/main/llama_index/output_parsers/langchain.py)
 
 See [guide](https://gpt-index.readthedocs.io/en/latest/how_to/output_parsing.html) for more information.
 
 ---
 
 ### 2. 🐛 Fix Bugs
-Most bugs are reported and tracked in the [Github Issues Page](https://github.com/jerryjliu/gpt_index/issues).
+Most bugs are reported and tracked in the [Github Issues Page](https://github.com/jerryjliu/llama_index/issues).
 We try our best in triaging and tagging these issues:
 * Issues tagged as `bug` are confirmed bugs. 
 * New contributors may want to start with issues tagged with `good first issue`. 
@@ -222,7 +222,7 @@ We would love your help in making the project cleaner, more robust, and more und
 LlamaIndex is a Python package. We've tested primarily with Python versions >= 3.8. Here's a quick
 and dirty guide to getting your environment setup.
 
-First, create a fork of LlamaIndex, by clicking the "Fork" button on the [LlamaIndex Github page](https://github.com/jerryjliu/gpt_index).
+First, create a fork of LlamaIndex, by clicking the "Fork" button on the [LlamaIndex Github page](https://github.com/jerryjliu/llama_index).
 Following [these steps](https://docs.github.com/en/get-started/quickstart/fork-a-repo) for more details
 on how to fork the repo and clone the forked repo.
 
@@ -284,7 +284,7 @@ pytest tests
 For changes that involve entirely new features, it may be worth adding an example Jupyter notebook to showcase
 this feature. 
 
-Example notebooks can be found in this folder: https://github.com/jerryjliu/gpt_index/tree/main/examples.
+Example notebooks can be found in this folder: https://github.com/jerryjliu/llama_index/tree/main/examples.
 
 
 ### Creating a pull request

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,3 +1,3 @@
-include gpt_index/py.typed
-include gpt_index/VERSION
+include llama_index/py.typed
+include llama_index/VERSION
 include LICENSE
diff --git a/MANIFEST_llama.in b/MANIFEST_llama.in
diff --git a/Makefile b/Makefile
@@ -15,4 +15,4 @@ test:
 
 # Docs
 watch-docs: ## Build and watch documentation
-	sphinx-autobuild docs/ docs/_build/html --open-browser --watch $(GIT_ROOT)/gpt_index/
+	sphinx-autobuild docs/ docs/_build/html --open-browser --watch $(GIT_ROOT)/llama_index/
diff --git a/benchmarks/struct_indices/spider/evaluate.py b/benchmarks/struct_indices/spider/evaluate.py
@@ -8,11 +8,11 @@
 
 from langchain.chat_models import ChatOpenAI
 from langchain.schema import HumanMessage
-from gpt_index.response.schema import Response
+from llama_index.response.schema import Response
 from spider_utils import create_indexes, load_examples
 from tqdm import tqdm
 
-from gpt_index.indices.struct_store.sql import GPTSQLStructStoreIndex, SQLQueryMode
+from llama_index.indices.struct_store.sql import GPTSQLStructStoreIndex, SQLQueryMode
 
 logging.getLogger("root").setLevel(logging.WARNING)
 

diff --git a/benchmarks/struct_indices/spider/generate_sql.py b/benchmarks/struct_indices/spider/generate_sql.py
@@ -11,7 +11,7 @@
 from sqlalchemy import create_engine, text
 from tqdm import tqdm
 
-from gpt_index import GPTSQLStructStoreIndex, LLMPredictor, SQLDatabase
+from llama_index import GPTSQLStructStoreIndex, LLMPredictor, SQLDatabase
 from typing import Any, cast
 
 logging.getLogger("root").setLevel(logging.WARNING)

diff --git a/benchmarks/struct_indices/spider/spider_utils.py b/benchmarks/struct_indices/spider/spider_utils.py
@@ -8,7 +8,7 @@
 from langchain.chat_models import ChatOpenAI
 from sqlalchemy import create_engine, text
 
-from gpt_index import GPTSQLStructStoreIndex, LLMPredictor, SQLDatabase
+from llama_index import GPTSQLStructStoreIndex, LLMPredictor, SQLDatabase
 
 
 def load_examples(spider_dir: str) -> Tuple[list, list]:

diff --git a/docs/conf.py b/docs/conf.py
@@ -17,7 +17,7 @@
 
 sys.path.insert(0, os.path.abspath("../"))
 
-with open("../gpt_index/VERSION") as f:
+with open("../llama_index/VERSION") as f:
     version = f.read()
 
 # -- Project information -----------------------------------------------------

diff --git a/docs/getting_started/installation.md b/docs/getting_started/installation.md
@@ -8,7 +8,7 @@ pip install llama-index
 ```
 
 ### Installation from Source
-Git clone this repository: `git clone [email protected]:jerryjliu/gpt_index.git`. Then do:
+Git clone this repository: `git clone [email protected]:jerryjliu/llama_index.git`. Then do:
 
 - `pip install -e .` if you want to do an editable install (you can modify source files) of just the package itself.
 - `pip install -r requirements.txt` if you want to install optional dependencies + dependencies used for development (e.g. unit testing).

diff --git a/docs/getting_started/starter_example.md b/docs/getting_started/starter_example.md
@@ -8,18 +8,18 @@ LlamaIndex examples can be found in the `examples` folder of the LlamaIndex repo
 We first want to download this `examples` folder. An easy way to do this is to just clone the repo:
 
 ```bash
-$ git clone https://github.com/jerryjliu/gpt_index.git
+$ git clone https://github.com/jerryjliu/llama_index.git
 ```
 
 Next, navigate to your newly-cloned repository, and verify the contents:
 
 ```bash
-$ cd gpt_index
+$ cd llama_index
 $ ls
 LICENSE                data_requirements.txt  tests/
 MANIFEST.in            examples/              pyproject.toml
 Makefile               experimental/          requirements.txt
-README.md              gpt_index/             setup.py
+README.md              llama_index/             setup.py
 ```
 
 We now want to navigate to the following folder:

diff --git a/docs/guides/notebooks.rst b/docs/guides/notebooks.rst
@@ -3,4 +3,4 @@ Notebooks
 
 We offer a wide variety of example notebooks. They are referenced throughout the documentation.
 
-Example notebooks are found `here <https://github.com/jerryjliu/gpt_index/tree/main/examples>`_.
+Example notebooks are found `here <https://github.com/jerryjliu/llama_index/tree/main/examples>`_.
diff --git a/docs/guides/tutorials/building_a_chatbot.md b/docs/guides/tutorials/building_a_chatbot.md
@@ -150,7 +150,7 @@ decompose_transform = DecomposeQueryTransform(
 )
 
 # define custom retrievers
-from gpt_index.query_engine.transform_query_engine import TransformQueryEngine
+from llama_index.query_engine.transform_query_engine import TransformQueryEngine
 
 custom_query_engines = {}
 for index in index_set.values():

diff --git a/docs/guides/tutorials/sql_guide.md b/docs/guides/tutorials/sql_guide.md
@@ -230,8 +230,8 @@ stores the context on the generated context container.
 You can then build the context container, and pass it to the index during query-time!
 
 ```python
-from gpt_index import GPTSQLStructStoreIndex, SQLDatabase, GPTVectorStoreIndex
-from gpt_index.indices.struct_store import SQLContextContainerBuilder
+from llama_index import GPTSQLStructStoreIndex, SQLDatabase, GPTVectorStoreIndex
+from llama_index.indices.struct_store import SQLContextContainerBuilder
 
 sql_database = SQLDatabase(engine)
 # build a vector index from the table schema information

diff --git a/docs/guides/tutorials/terms_definitions_tutorial.md b/docs/guides/tutorials/terms_definitions_tutorial.md
@@ -300,7 +300,7 @@ If you play around with the app a bit now, you might notice that it stopped foll
 
 This is due to the concept of "refining" answers in Llama Index. Since we are querying across the top 5 matching results, sometimes all the results do not fit in a single prompt! OpenAI models typically have a max input size of 4097 tokens. So, Llama Index accounts for this by breaking up the matching results into chunks that will fit into the prompt. After Llama Index gets an initial answer from the first API call, it sends the next chunk to the API, along with the previous answer, and asks the model to refine that answer.
 
-So, the refine process seems to be messing with our results! Rather than appending extra instructions to the `query_str`, remove that, and Llama Index will let us provide our own custom prompts! Let's create those now, using the [default prompts](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/default_prompts.py) and [chat specific prompts](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/chat_prompts.py) as a guide. Using a new file `constants.py`, let's create some new query templates:
+So, the refine process seems to be messing with our results! Rather than appending extra instructions to the `query_str`, remove that, and Llama Index will let us provide our own custom prompts! Let's create those now, using the [default prompts](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/default_prompts.py) and [chat specific prompts](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/chat_prompts.py) as a guide. Using a new file `constants.py`, let's create some new query templates:
 
 ```python
 from langchain.chains.prompt_selector import ConditionalPromptSelector, is_chat_model

diff --git a/docs/guides/tutorials/unified_query.md b/docs/guides/tutorials/unified_query.md
@@ -67,7 +67,7 @@ that solves a distinct use case.
 We will first define a vector index over the documents of each city.
 
 ```python
-from gpt_index import GPTVectorStoreIndex, ServiceContext, StorageContext
+from llama_index import GPTVectorStoreIndex, ServiceContext, StorageContext
 from langchain.llms.openai import OpenAIChat
 
 # set service context
@@ -127,7 +127,7 @@ Next, we compose a keyword table on top of these vector indexes, with these inde
 
 
 ```python
-from gpt_index.indices.composability import ComposableGraph
+from llama_index.indices.composability import ComposableGraph
 
 graph = ComposableGraph.from_indices(
     GPTSimpleKeywordTableIndex,
@@ -152,13 +152,13 @@ An example is shown below.
 
 ```python
 # define decompose_transform
-from gpt_index.indices.query.query_transform.base import DecomposeQueryTransform
+from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
 decompose_transform = DecomposeQueryTransform(
     llm_predictor_chatgpt, verbose=True
 )
 
 # define custom query engines
-from gpt_index.query_engine.transform_query_engine import TransformQueryEngine
+from llama_index.query_engine.transform_query_engine import TransformQueryEngine
 custom_query_engines = {}
 for index in vector_indices.values():
     query_engine = index.as_query_engine(service_context=service_context)
@@ -204,7 +204,7 @@ First, we define the query engines for the set of indexes/graph that we want to
 
 
 ```python
-from gpt_index.tools.query_engine import QueryEngineTool
+from llama_index.tools.query_engine import QueryEngineTool
 
 query_engine_tools = []
 
@@ -231,8 +231,8 @@ Now, we can define the routing logic and overall router query engine.
 Here, we use the `LLMSingleSelector`, which uses LLM to choose a underlying query engine to route the query to.
 
 ```python
-from gpt_index.query_engine.router_query_engine import RouterQueryEngine
-from gpt_index.selectors.llm_selectors import LLMSingleSelector
+from llama_index.query_engine.router_query_engine import RouterQueryEngine
+from llama_index.selectors.llm_selectors import LLMSingleSelector
 
 
 router_query_engine = RouterQueryEngine(

diff --git a/docs/how_to/analysis/cost_analysis.md b/docs/how_to/analysis/cost_analysis.md
@@ -116,4 +116,4 @@ response = query_engine.query(
 ```
 
 
-[Here is an example notebook](https://github.com/jerryjliu/gpt_index/blob/main/examples/cost_analysis/TokenPredictor.ipynb).
+[Here is an example notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/cost_analysis/TokenPredictor.ipynb).
diff --git a/docs/how_to/customization/custom_llms.md b/docs/how_to/customization/custom_llms.md
@@ -197,4 +197,4 @@ Using this method, you can use any LLM. Maybe you have one running locally, or r
 
 Note that you may have to adjust the internal prompts to get good performance. Even then, you should be using a sufficiently large LLM to ensure it's capable of handling the complex queries that LlamaIndex uses internally, so your mileage may vary.
 
-A list of all default internal prompts is available [here](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/default_prompts.py), and chat-specific prompts are listed [here](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/chat_prompts.py). You can also implement your own custom prompts, as described [here](https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_prompts.html).
+A list of all default internal prompts is available [here](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/default_prompts.py), and chat-specific prompts are listed [here](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/chat_prompts.py). You can also implement your own custom prompts, as described [here](https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_prompts.html).
Original file line number	Diff line number	Diff line change
Expand Up		@@ -3,4 +3,4 @@ Notebooks

		We offer a wide variety of example notebooks. They are referenced throughout the documentation.

		Example notebooks are found `here <https://github.com/jerryjliu/gpt_index/tree/main/examples>`_.
		Example notebooks are found `here <https://github.com/jerryjliu/llama_index/tree/main/examples>`_.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -116,4 +116,4 @@ response = query_engine.query(
		```


		[Here is an example notebook](https://github.com/jerryjliu/gpt_index/blob/main/examples/cost_analysis/TokenPredictor.ipynb).
		[Here is an example notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/cost_analysis/TokenPredictor.ipynb).
Original file line number	Diff line number	Diff line change
Expand Up		@@ -197,4 +197,4 @@ Using this method, you can use any LLM. Maybe you have one running locally, or r

		Note that you may have to adjust the internal prompts to get good performance. Even then, you should be using a sufficiently large LLM to ensure it's capable of handling the complex queries that LlamaIndex uses internally, so your mileage may vary.

		A list of all default internal prompts is available [here](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/default_prompts.py), and chat-specific prompts are listed [here](https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/chat_prompts.py). You can also implement your own custom prompts, as described [here](https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_prompts.html).
		A list of all default internal prompts is available [here](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/default_prompts.py), and chat-specific prompts are listed [here](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/chat_prompts.py). You can also implement your own custom prompts, as described [here](https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_prompts.html).