Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-24.3: ui, apiutil, server: surface multiple indexes within a range in the hot ranges page #134988

Conversation

angles-n-daemons
Copy link
Contributor

Contained with the three linked PRs are the changes required to surface all of the contents within a range in the hot ranges DB console page.

Backport:

Please see individual PRs for details.

/cc @cockroachdb/release

Fixes: #130997

Release justification: fixes an important bug within the db console

Historically, to estimate which tables and indexes existed within the
span of a range, we used the Collection's `GetAllDescriptors` function
and returned the values which fell in between the start and end keys of
the range's span. This approach had the benefit of being precise, but
the drawback of being computationally expensive - since the system can
have hundreds of nodes and theoretically millions of tables, using
`GetAllDescriptors` begins to become a bottleneck.

This approach was modified so that instead of pulling all descriptors
for the system when computing range contents, the system instead used
the start key of the range to identify the exact table + index at that
point within the keyspace
([change](https://github.com/cockroachdb/cockroach/pull/77277/files)).
This works if each table, index has their own range, but quickly breaks
down if tables share a range. Consider the following layout of data:

```
T1=Table 1
T2=Table 2
R1=Range1

└─────T1─────┴─────T2────┴─────T3─────┘
└────────┴─────────R1───────┴─────────┘
```

Since the start key of the range falls with Table 1, the system
associates the range with only Table 1, despite it containing Tables 2
and 3.

Using this information, it becomes necessary to identify a set of
descriptors within a certain span. This PR introduces the
`ScanDescriptorsInSpans` function which does just that, allows the user
to specify a set of spans whose descriptors are important and then
return a catalog including those descriptors.

It does this by translating the span keys into descriptor span keys and
scanning them from the descriptors table. For example given a span
`[/Table/Users/PKEY/1, /Table/Users/SECONDARY/chicago]` where the ID for
the Users table is `5`, it will generate a descriptor span
`[/Table/Descriptors/PKEY/5, /Table/Descriptors/PKEY/6]`.

This descriptor too comes with its drawbacks in that within the
descriptor space, keys are scoped by table, and not necessarily indexes.
That means in a following PR, the status server will be responsible for
taking these descriptors, which include all indexes in the tables
pulled, and filtering it down to only the indexes which appear in the
specified range.

The bulk of the changeset is in updating the datadriven tests to test
this behavior, the primary area of focus for review should be the
`pkg/sql/catalog/internal/catkv/catalog_reader.go` file (~75 LOC).

Epic: CRDB-43151
Fixes: cockroachdb#130997

Release note: None
Previously each range correlated to a single table, or even a single
index in a database, so all that was required to identify which tables,
indexes were in the range were to look at the start key of the range and
map it accordingly.

With range coalescing however, it's possible for one, or many, tables,
indexes and the like to reside within the same range. To properly
identify the contents of a range, this PR adds the following utilities:

 1. A utility function which turns a range into a span, and clamps it
    to its tenant's table space.
 2. A utility function which takes the above spans and uses the catalog
    and new descriptor by span utility to turn those spans into a set of
    table descriptors ordered by id.
 3. A utility function which transforms those table descriptors into a
    set of (database, table, index) names which deduplicate and identify
    each index uniquely.
 4. A utility function, which merges the ranges and indexes into a map
    keyed by RangeID whose values are the above index names.
 5. A primary entrypoint for consumers from which a set of ranges can be
    passed in and a mapping from those ranges to indexes can be
    returned.

A variety of caveats come with this approach. It attempts to scan the
desciptors all at once, but it still will scan a sizable portion of the
descriptors table if the request is large enough. This makes no attempt
to describe system information which does not have a descriptor. It will
describe system tables which appear in the descriptors table, but it
will not try to explain "tables" which do not have descriptors (example
tsdb), or any other information stored in the keyspace without a
descriptor (PseudoTableIDs, GossipKeys for example).

Throughout this work, many existing utilities were duplicated, and then
un-duplicated (`keys.TableDataMin`, `roachpb.Span.Overlap`, etc). If you
see anything that seems to already exist, feel free to point it out
accordingly.

Epic: CRDB-43151
Fixes: cockroachdb#130997

Release note: None
Previously each range correlated to a single table, or even a single
index in a database, so all that was required to identify which tables,
indexes were in the range were to look at the start key of the range and
map it accordingly.

With range coalescing however, it's possible for one, or many, tables,
indexes and the like to reside within the same range. To properly
identify the contents of a range, this PR adds the following utilities:

 1. A utility function which turns a range into a span, and clamps it
		to its tenant's table space.
 2. A utility function which takes the above spans and uses the catalog
		and new descriptor by span utility to turn those spans into a set of
		table descriptors ordered by id.
 3. A utility function which transforms those table descriptors into a
		set of (database, table, index) names which deduplicate and identify
		each index uniquely.
 4. A utility function, which merges the ranges and indexes into a map
		keyed by RangeID whose values are the above index names.
 5. A primary entrypoint for consumers from which a set of ranges can be
		passed in and a mapping from those ranges to indexes can be
		returned.

A variety of cavets come with this approach. It attempts to scan the
desciptors all at once, but it still will scan a sizable portion of the
descriptors table if the request is large enough. This makes no attempt
to describe system information which does not have a descriptor. It will
describe system tables which appear in the descriptors table, but it
will not try to explain "tables" which do not have descriptors (example
tsdb), or any other information stored in the keyspace without a
descriptor (PseudoTableIDs, GossipKeys for example).

Throughout this work, many existing utilities were duplicated, and then
un-duplicated (`keys.TableDataMin`, `roachpb.Span.Overlap`, etc). If you
see anything that seems to already exist, feel free to point it out
accordingly.

Epic: none
Fixes: cockroachdb#130997

Release note: None
This change is the last in a set of commits to change the hot ranges
page from only showing one table, index per range to many. It builds on
top of the changes in the catalog reader
(10b9ee0) and the range utilities
(109219d) to surface a set of tables,
indexes for each range.

The primary changes in this commit specifically are the modification of
the status server to use the new `rangeutil` utilities, and changing the
wire, presentation format of the information.

Epic: CRDB-43151
Fixes: cockroachdb#130997

Release note (bug fix): changes the table, index contents of the hot ranges page
in DB console.
@angles-n-daemons angles-n-daemons requested review from a team as code owners November 12, 2024 18:26
@angles-n-daemons angles-n-daemons requested review from kyle-a-wong and removed request for a team November 12, 2024 18:26
Copy link

blathers-crl bot commented Nov 12, 2024

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Backports should only be created for serious
    issues
    or test-only changes.
  • Backports should not break backwards-compatibility.
  • Backports should change as little code as possible.
  • Backports should not change on-disk formats or node communication protocols.
  • Backports should not add new functionality (except as defined
    here).
  • Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
  • All backports must be reviewed by the owning areas TL. For more information as to how that review should be conducted, please consult the backport
    policy
    .
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
  • Your backport must be accompanied by a post to the appropriate Slack
    channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this
backport.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label Nov 12, 2024
Copy link

blathers-crl bot commented Nov 12, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dhartunian
Copy link
Collaborator

@angles-n-daemons can you explain why the same commit appears twice in this PR? It looks like the UI PR contained a slightly different version of a commit that was already merged. That's a bit confusing here.

@angles-n-daemons
Copy link
Contributor Author

I'm not entirely sure. My original thought is that this could have been caused by a rebase onto a moving head. If PR 3 branched off of 2 where a different commit hash existed due to 2's own amending, rebasing, it's possible that PR 3 believed there were multiple commits for 2 which it should include.

That being said, I tried to recreate this locally and had little luck. It could also have been a merge instead of a rebase, but it's tricky to tell. I notice some code that was removed at one point is added as one of the commits. It doesn't affect any functionality, but it is strange.

@rafiss rafiss removed the request for review from a team December 2, 2024 16:06
@angles-n-daemons
Copy link
Contributor Author

I was unable to reproduce the duplicate commit locally. Looking into the individual commits themselves, the second one seems to add in some code which was removed in an amend. It could be a strange rebasing issue where some information was removed, which created changed the commit references.

string database_name = 5;
// index_name indicates the index name for current range.
// index_name has been deprecated in favor of indexes = 17;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason that only one field has deprecated = true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this was an accident. I'll make the corresponding change to master.

@angles-n-daemons angles-n-daemons merged commit 480af61 into cockroachdb:release-24.3 Dec 6, 2024
20 of 21 checks passed
@angles-n-daemons angles-n-daemons deleted the backport24.3-133190-133840-134106 branch December 6, 2024 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants