-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve index benefit estimation with more accurate query execution (with index) cost calculations #6
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Yan, I left a few suggestions for improvements inline. Can you take a look?
Also the PR includes the PDF of your thesis. I'm creating a section on the README and will link directly to the PDF in your github repo, so we can remove the PDF from the PR. I'd like to avoid having large binary files in there because everyone who installs the tool would otherwise download the PDF as well.
mindexer/utils/query.py
Outdated
@@ -161,6 +161,25 @@ def index_intersect(self, index): | |||
|
|||
return query | |||
|
|||
def index_number_key_qurey(self, index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the function name. Can you change it here and where this is called?
def index_number_key_qurey(self, index): | |
def index_number_key_query(self, index): |
mindexer/utils/query.py
Outdated
def index_number_key_qurey(self, index): | ||
""" | ||
return the query that can be used to determine the number of | ||
index key needs to be examined. | ||
""" | ||
query = Query() | ||
|
||
for field in index: | ||
if field in self.filter: | ||
if isinstance(self.filter[field], dict): | ||
query.add_predicate({field: self.filter[field]}) | ||
break | ||
else: | ||
query.add_predicate({field: self.filter[field]}) | ||
else: | ||
break | ||
|
||
return query | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a few unit tests under tests/test_query.py
to confirm this function does what it is supposed to?
bin/mindexer
Outdated
@@ -1,5 +1,6 @@ | |||
#!/usr/bin/env python | |||
|
|||
from ast import Not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be used, can be removed again.
bin/mindexer
Outdated
# collscan = 1.0 relative to other costs) | ||
benefit = estimator.get_cardinality() * COLLSCAN_COST - est * cost | ||
# calculating index benefit by substract index cost from collection scan cost | ||
index_cost = (IXSCAN_COST+(len(candidate)-1)*0.05) * index_key_scanned_est |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the 0.05 factor do here? Maybe define at the top with the other constants?
bin/mindexer
Outdated
# benefit of the index over a collection scan (assuming cost of | ||
# collscan = 1.0 relative to other costs) | ||
benefit = estimator.get_cardinality() * COLLSCAN_COST - est * cost | ||
# calculating index benefit by substract index cost from collection scan cost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the comment and explain how the index_key_scanned_est factors in? Feel free to use multiple comment lines, more explicit is better :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Yan, thanks, the code changes are perfect!
I saw you removed the thesis PDF in a commit, but the next commit added it back again. If you could remove the PDF again then we can merge the PR.
I made a separate PR to add a Contributors section in the readme where I link to your thesis in your github repo, so people will still be able to find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This pull request includes updates to two files to enhance the accuracy of index benefit estimation by improving query execution cost calculations:
bin/mindexer
:mindexer/utils/query.py
: