Improve index benefit estimation with more accurate query execution (with index) cost calculations #6

YanRong-au · 2024-12-05T15:24:30Z

This pull request includes updates to two files to enhance the accuracy of index benefit estimation by improving query execution cost calculations:

bin/mindexer:
- Integrated the improved query execution cost calculation logic into the main script.
mindexer/utils/query.py:
- Added a new function to generate queries designed to determine the number of index keys that will be examined for a particular query and index configuration.

…with index) cost calculations

rueckstiess

Hi Yan, I left a few suggestions for improvements inline. Can you take a look?

Also the PR includes the PDF of your thesis. I'm creating a section on the README and will link directly to the PDF in your github repo, so we can remove the PDF from the PR. I'd like to avoid having large binary files in there because everyone who installs the tool would otherwise download the PDF as well.

rueckstiess · 2024-12-09T04:53:03Z

mindexer/utils/query.py

@@ -161,6 +161,25 @@ def index_intersect(self, index):

        return query

+    def index_number_key_qurey(self, index):


Typo in the function name. Can you change it here and where this is called?

Suggested change

def index_number_key_qurey(self, index):

def index_number_key_query(self, index):

rueckstiess · 2024-12-09T05:06:31Z

mindexer/utils/query.py

+    def index_number_key_qurey(self, index):
+        """
+            return the query that can be used to determine the number of 
+            index key needs to be examined.
+        """
+        query = Query()
+
+        for field in index:
+            if field in self.filter:
+                if isinstance(self.filter[field], dict):
+                    query.add_predicate({field: self.filter[field]})
+                    break
+                else:
+                    query.add_predicate({field: self.filter[field]})
+            else:
+                break
+
+        return query
+


Can you add a few unit tests under tests/test_query.py to confirm this function does what it is supposed to?

rueckstiess · 2024-12-09T05:16:34Z

bin/mindexer

@@ -1,5 +1,6 @@
 #!/usr/bin/env python

+from ast import Not


This doesn't seem to be used, can be removed again.

rueckstiess · 2024-12-09T05:17:45Z

bin/mindexer

-                # collscan = 1.0 relative to other costs)
-                benefit = estimator.get_cardinality() * COLLSCAN_COST - est * cost
+                # calculating index benefit by substract index cost from collection scan cost
+                index_cost = (IXSCAN_COST+(len(candidate)-1)*0.05) * index_key_scanned_est


What does the 0.05 factor do here? Maybe define at the top with the other constants?

rueckstiess · 2024-12-09T05:18:58Z

bin/mindexer

-                # benefit of the index over a collection scan (assuming cost of
-                # collscan = 1.0 relative to other costs)
-                benefit = estimator.get_cardinality() * COLLSCAN_COST - est * cost
+                # calculating index benefit by substract index cost from collection scan cost


Can you update the comment and explain how the index_key_scanned_est factors in? Feel free to use multiple comment lines, more explicit is better :-)

rueckstiess

Hi Yan, thanks, the code changes are perfect!

I saw you removed the thesis PDF in a commit, but the next commit added it back again. If you could remove the PDF again then we can merge the PR.

I made a separate PR to add a Contributors section in the readme where I link to your thesis in your github repo, so people will still be able to find it.

rueckstiess

LGTM!

YanRong6616 and others added 4 commits December 6, 2024 02:05

Improve index benefit estimation with more accurate query execution (…

3f4b91e

…with index) cost calculations

Honours Project Thesis

9af908d

Delete THESIS_YAN_RONG.pdf

8ae481c

Honours Project Thesis

7d0f7d1

rueckstiess requested changes Dec 9, 2024

View reviewed changes

Make changes according to PR comments

1702ff2

rueckstiess reviewed Jan 20, 2025

View reviewed changes

Remove THESIS_YAN_RONG.pdf from the project repository

925bdfa

rueckstiess approved these changes Jan 21, 2025

View reviewed changes

rueckstiess merged commit 946a999 into mongodb-labs:main Jan 21, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve index benefit estimation with more accurate query execution (with index) cost calculations #6

Improve index benefit estimation with more accurate query execution (with index) cost calculations #6

YanRong-au commented Dec 5, 2024

rueckstiess left a comment •

edited

Loading

rueckstiess Dec 9, 2024

rueckstiess Dec 9, 2024

rueckstiess Dec 9, 2024

rueckstiess Dec 9, 2024

rueckstiess Dec 9, 2024

rueckstiess left a comment

rueckstiess left a comment

		@@ -161,6 +161,25 @@ def index_intersect(self, index):

		return query

		def index_number_key_qurey(self, index):

	def index_number_key_qurey(self, index):
	def index_number_key_query(self, index):

Improve index benefit estimation with more accurate query execution (with index) cost calculations #6

Improve index benefit estimation with more accurate query execution (with index) cost calculations #6

Conversation

YanRong-au commented Dec 5, 2024

rueckstiess left a comment • edited Loading

Choose a reason for hiding this comment

rueckstiess Dec 9, 2024

Choose a reason for hiding this comment

rueckstiess Dec 9, 2024

Choose a reason for hiding this comment

rueckstiess Dec 9, 2024

Choose a reason for hiding this comment

rueckstiess Dec 9, 2024

Choose a reason for hiding this comment

rueckstiess Dec 9, 2024

Choose a reason for hiding this comment

rueckstiess left a comment

Choose a reason for hiding this comment

rueckstiess left a comment

Choose a reason for hiding this comment

rueckstiess left a comment •

edited

Loading