-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-24.3: ui, apiutil, server: surface multiple indexes within a range in the hot ranges page #134988
release-24.3: ui, apiutil, server: surface multiple indexes within a range in the hot ranges page #134988
Conversation
Historically, to estimate which tables and indexes existed within the span of a range, we used the Collection's `GetAllDescriptors` function and returned the values which fell in between the start and end keys of the range's span. This approach had the benefit of being precise, but the drawback of being computationally expensive - since the system can have hundreds of nodes and theoretically millions of tables, using `GetAllDescriptors` begins to become a bottleneck. This approach was modified so that instead of pulling all descriptors for the system when computing range contents, the system instead used the start key of the range to identify the exact table + index at that point within the keyspace ([change](https://github.com/cockroachdb/cockroach/pull/77277/files)). This works if each table, index has their own range, but quickly breaks down if tables share a range. Consider the following layout of data: ``` T1=Table 1 T2=Table 2 R1=Range1 └─────T1─────┴─────T2────┴─────T3─────┘ └────────┴─────────R1───────┴─────────┘ ``` Since the start key of the range falls with Table 1, the system associates the range with only Table 1, despite it containing Tables 2 and 3. Using this information, it becomes necessary to identify a set of descriptors within a certain span. This PR introduces the `ScanDescriptorsInSpans` function which does just that, allows the user to specify a set of spans whose descriptors are important and then return a catalog including those descriptors. It does this by translating the span keys into descriptor span keys and scanning them from the descriptors table. For example given a span `[/Table/Users/PKEY/1, /Table/Users/SECONDARY/chicago]` where the ID for the Users table is `5`, it will generate a descriptor span `[/Table/Descriptors/PKEY/5, /Table/Descriptors/PKEY/6]`. This descriptor too comes with its drawbacks in that within the descriptor space, keys are scoped by table, and not necessarily indexes. That means in a following PR, the status server will be responsible for taking these descriptors, which include all indexes in the tables pulled, and filtering it down to only the indexes which appear in the specified range. The bulk of the changeset is in updating the datadriven tests to test this behavior, the primary area of focus for review should be the `pkg/sql/catalog/internal/catkv/catalog_reader.go` file (~75 LOC). Epic: CRDB-43151 Fixes: cockroachdb#130997 Release note: None
Previously each range correlated to a single table, or even a single index in a database, so all that was required to identify which tables, indexes were in the range were to look at the start key of the range and map it accordingly. With range coalescing however, it's possible for one, or many, tables, indexes and the like to reside within the same range. To properly identify the contents of a range, this PR adds the following utilities: 1. A utility function which turns a range into a span, and clamps it to its tenant's table space. 2. A utility function which takes the above spans and uses the catalog and new descriptor by span utility to turn those spans into a set of table descriptors ordered by id. 3. A utility function which transforms those table descriptors into a set of (database, table, index) names which deduplicate and identify each index uniquely. 4. A utility function, which merges the ranges and indexes into a map keyed by RangeID whose values are the above index names. 5. A primary entrypoint for consumers from which a set of ranges can be passed in and a mapping from those ranges to indexes can be returned. A variety of caveats come with this approach. It attempts to scan the desciptors all at once, but it still will scan a sizable portion of the descriptors table if the request is large enough. This makes no attempt to describe system information which does not have a descriptor. It will describe system tables which appear in the descriptors table, but it will not try to explain "tables" which do not have descriptors (example tsdb), or any other information stored in the keyspace without a descriptor (PseudoTableIDs, GossipKeys for example). Throughout this work, many existing utilities were duplicated, and then un-duplicated (`keys.TableDataMin`, `roachpb.Span.Overlap`, etc). If you see anything that seems to already exist, feel free to point it out accordingly. Epic: CRDB-43151 Fixes: cockroachdb#130997 Release note: None
Previously each range correlated to a single table, or even a single index in a database, so all that was required to identify which tables, indexes were in the range were to look at the start key of the range and map it accordingly. With range coalescing however, it's possible for one, or many, tables, indexes and the like to reside within the same range. To properly identify the contents of a range, this PR adds the following utilities: 1. A utility function which turns a range into a span, and clamps it to its tenant's table space. 2. A utility function which takes the above spans and uses the catalog and new descriptor by span utility to turn those spans into a set of table descriptors ordered by id. 3. A utility function which transforms those table descriptors into a set of (database, table, index) names which deduplicate and identify each index uniquely. 4. A utility function, which merges the ranges and indexes into a map keyed by RangeID whose values are the above index names. 5. A primary entrypoint for consumers from which a set of ranges can be passed in and a mapping from those ranges to indexes can be returned. A variety of cavets come with this approach. It attempts to scan the desciptors all at once, but it still will scan a sizable portion of the descriptors table if the request is large enough. This makes no attempt to describe system information which does not have a descriptor. It will describe system tables which appear in the descriptors table, but it will not try to explain "tables" which do not have descriptors (example tsdb), or any other information stored in the keyspace without a descriptor (PseudoTableIDs, GossipKeys for example). Throughout this work, many existing utilities were duplicated, and then un-duplicated (`keys.TableDataMin`, `roachpb.Span.Overlap`, etc). If you see anything that seems to already exist, feel free to point it out accordingly. Epic: none Fixes: cockroachdb#130997 Release note: None
This change is the last in a set of commits to change the hot ranges page from only showing one table, index per range to many. It builds on top of the changes in the catalog reader (10b9ee0) and the range utilities (109219d) to surface a set of tables, indexes for each range. The primary changes in this commit specifically are the modification of the status server to use the new `rangeutil` utilities, and changing the wire, presentation format of the information. Epic: CRDB-43151 Fixes: cockroachdb#130997 Release note (bug fix): changes the table, index contents of the hot ranges page in DB console.
Thanks for opening a backport. Please check the backport criteria before merging:
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
Also, please add a brief release justification to the body of your PR to justify this |
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
@angles-n-daemons can you explain why the same commit appears twice in this PR? It looks like the UI PR contained a slightly different version of a commit that was already merged. That's a bit confusing here. |
I'm not entirely sure. My original thought is that this could have been caused by a rebase onto a moving head. If PR 3 branched off of 2 where a different commit hash existed due to 2's own amending, rebasing, it's possible that PR 3 believed there were multiple commits for 2 which it should include. That being said, I tried to recreate this locally and had little luck. It could also have been a merge instead of a rebase, but it's tricky to tell. I notice some code that was removed at one point is added as one of the commits. It doesn't affect any functionality, but it is strange. |
I was unable to reproduce the duplicate commit locally. Looking into the individual commits themselves, the second one seems to add in some code which was removed in an amend. It could be a strange rebasing issue where some information was removed, which created changed the commit references. |
string database_name = 5; | ||
// index_name indicates the index name for current range. | ||
// index_name has been deprecated in favor of indexes = 17; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason that only one field has deprecated = true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this was an accident. I'll make the corresponding change to master.
480af61
into
cockroachdb:release-24.3
Contained with the three linked PRs are the changes required to surface all of the contents within a range in the hot ranges DB console page.
Backport:
Please see individual PRs for details.
/cc @cockroachdb/release
Fixes: #130997
Release justification: fixes an important bug within the db console