Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve book library page query performance on title, titleIgnorePrefix, and addedAt sort orders. #3952

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mikiher
Copy link
Contributor

@mikiher mikiher commented Feb 8, 2025

Brief summary

Significantly improves book library Sequelize page queries for the following sort orders:

  • title (with or without ignorePrefix)
  • addedAt

Which issue is fixed?

This partially fixes #2073 (resolving the book library load times, but not the podcast library load times)

In-depth Description

After digging more into the details of the issues people were complaining about in #2073 and doing additional performance analysis in Sequelize query to bring the page here, I made the following observations:

  1. Due the the current architecture of the database (specifically, the separate libraryItems and books tables), SQLite cannot sort the results of the main query efficiently, even when an index on the sorted book column exists (more details below)
  2. The main query performs a join with the feeds table, which somehat complicates the query (especialy if there are many feeds)
  3. A query for the total count of results is run every time, even though it is usually unnecessary as the total count doesn't change during a scrolling over the library.

1 is by far the most serious problem, and also causes significant degradation in query performance as the offset becomes larger.
When the main query is sorting by title:

SELECT book.*, libraryItem.*, [libraryItem->feeds].*
  FROM books AS book
       INNER JOIN
       libraryItems AS libraryItem ON book.id = libraryItem.mediaId AND 
                                      (libraryItem.libraryId = 'b3bf8580-edba-4d7d-90eb-3b54206d16c3' AND 
                                       libraryItem.mediaType = 'book') 
       LEFT OUTER JOIN
       feeds AS [libraryItem->feeds] ON libraryItem.id = [libraryItem->feeds].entityId AND 
                                        [libraryItem->feeds].entityType = 'libraryItem'
 ORDER BY book.title ASC
 LIMIT 0, 35;

it's evident in the query plan that the query engine cannot use the existing book.title index, and needs to build a temporary tree for sorting.

SEARCH libraryItem USING INDEX library_items_library_id_media_type_size (libraryId=? AND mediaType=?)
SEARCH book USING INDEX sqlite_autoindex_books_1 (id=?)
SCAN libraryItem->feeds LEFT-JOIN
USE TEMP B-TREE FOR ORDER BY

Even when you remove the feeds table join from the query:

SELECT book.*, libraryItem.*
  FROM books AS book
       INNER JOIN
       libraryItems AS libraryItem ON book.id = libraryItem.mediaId AND 
                                      (libraryItem.libraryId = 'b3bf8580-edba-4d7d-90eb-3b54206d16c3' AND 
                                       libraryItem.mediaType = 'book') 
 ORDER BY book.title ASC
 LIMIT 0, 35;

The query plan still doesn't make use of the book.title index:

SEARCH libraryItem USING INDEX library_items_library_id_media_type_size (libraryId=? AND mediaType=?)
SEARCH book USING INDEX sqlite_autoindex_books_1 (id=?)
USE TEMP B-TREE FOR ORDER BY

The significant boost in performance can come only if the title column is put in the libraryItems table, and an index on (libraryId, mediaType, title) is built. This way, filtering and sorting happens at the same time, and the index can be traveresed very quickly to reach the required offset without needing to look at the tables themselves.

So with a query like this:

SELECT book.*, libraryItem.*
  FROM books AS book
       INNER JOIN
       libraryItems AS libraryItem ON book.id = libraryItem.mediaId AND 
                                      (libraryItem.libraryId = 'b3bf8580-edba-4d7d-90eb-3b54206d16c3' AND 
                                       libraryItem.mediaType = 'book') 
 ORDER BY libraryItem.title ASC
 LIMIT 0, 35;

We get the following query plan:

SEARCH libraryItem USING INDEX library_items_library_id_media_type_title (libraryId=? AND mediaType=?)
SEARCH book USING INDEX sqlite_autoindex_books_1 (id=?)

Which is optimal! (or, to be more precise, optimal given the current architecture)

Resolution

The following changes were made:

  1. title and titleIgnorePrefix columns were added to libraryItems
  2. Triggers were put in place to update the new column values whenever the corresponding values in books change.
  3. The following Indices were added to libraryItems:
    • (libraryId, mediaType, title)
    • (libraryId, mediaType, titleIgnorePrefix)
    • (libraryId, mediaType, createdAt)
  4. The feeds query was separated from the main query
  5. A total count cache was introduced.
    • If the count for a given query is found in the cache, findAll is called instead of findAndCountAll
    • The cache is invalidated whenever book records are added or removed.

How have you tested this?

I tested on loading 72 consecutive page of 35 books each, on each of the above sorting orders, on an ABS docker container running on a Synology 920+ NAS (I wanted to test on a hardware that was much weaker than my dev machine).

Results

All measurements are in ms.

Summary
Overall, we see a 94-95% drop (!) in mean and median query time.
Standard deviaion also reduces drastically from ~400 to ~10.

Note how the steady rise in query time (as the the requested offset grows) which is quite visible before, is not noticable after.

Sorting by title - before:

[2025-02-08 09:16:21.962] INFO: [bound findAndCountAll] histogram values: [
   191,  261,  257,  314,  296,  328,  336,  376,  398,
   453,  476,  535,  513,  594,  585,  618,  638,  672,
   666,  700,  736,  715,  745,  757,  810,  864,  863,
   849,  897,  898,  962,  985, 1003, 1063, 1086, 1084,
  1114, 1131, 1152, 1147, 1197, 1198, 1224, 1236, 1265,
  1318, 1345, 1216, 1313, 1315, 1284, 1302, 1309, 1353,
  1381, 1360, 1348, 1445, 1322, 1423, 1390, 1426, 1473,
  1550, 1512, 1513, 1495, 1539, 1550, 1573, 1614, 1629
]
[2025-02-08 09:16:21.962] INFO: [bound findAndCountAll] histogram: Histogram {
  min: 191,
  max: 1629,
  mean: 1006.75,
  exceeds: 0,
  stddev: 410.08172457856904,
  count: 72,
  percentiles: SafeMap(9) [Map] {
    0 => 191,
    50 => 1086,
    75 => 1345,
    87.5 => 1473,
    93.75 => 1550,
    96.875 => 1573,
    98.4375 => 1614,
    99.21875 => 1629,
    100 => 1629
  }
}

Sorting by title - after:

[2025-02-08 07:58:39.193] INFO: [findAndCountAll] histogram values: [
  87, 104, 70, 70, 51, 57, 77, 49, 54, 60, 57, 58,
  60,  47, 87, 47, 57, 68, 49, 66, 50, 55, 64, 45,
  48,  54, 67, 46, 65, 53, 79, 44, 72, 75, 85, 52,
  61,  65, 56, 60, 58, 67, 70, 63, 53, 52, 50, 55,
  53,  62, 54, 58, 50, 52, 67, 53, 58, 55, 65, 50,
  56,  57, 66, 75, 48, 57, 56, 52, 49, 75, 65, 87
]
[2025-02-08 07:58:39.193] INFO: [findAndCountAll] histogram: Histogram {
  min: 44,
  max: 104,
  mean: 60.541666666666664,
  exceeds: 0,
  stddev: 11.6665922616675,
  count: 72,
  percentiles: SafeMap(9) [Map] {
    0 => 44,
    50 => 57,
    75 => 66,
    87.5 => 75,
    93.75 => 85,
    96.875 => 87,
    98.4375 => 87,
    99.21875 => 104,
    100 => 104
  }
}

Sorting by titleIgnorePrefix - before:

[2025-02-08 09:28:31.290] INFO: [bound findAndCountAll] histogram values: [
   191,  257,  286,  297,  321,  357,  382,  432,  405,
   470,  513,  516,  560,  543,  609,  639,  646,  748,
   670,  757,  819,  790,  814,  866,  899,  969,  970,
  1033, 1033, 1026, 1073, 1109, 1099, 1120, 1165, 1203,
  1184, 1317, 1280, 1270, 1305, 1355, 1336, 1329, 1367,
  1413, 1401, 1406, 1441, 1520, 1433, 1471, 1569, 1455,
  1495, 1509, 1519, 1513, 1527, 1539, 1568, 1559, 1605,
  1561, 1627, 1637, 1642, 1670, 1693, 1701, 1729, 1695
]
[2025-02-08 09:28:31.291] INFO: [bound findAndCountAll] histogram: Histogram {
  min: 191,
  max: 1729,
  mean: 1100.388888888889,
  exceeds: 0,
  stddev: 453.3125778200694,
  count: 72,
  percentiles: SafeMap(9) [Map] {
    0 => 191,
    50 => 1184,
    75 => 1509,
    87.5 => 1569,
    93.75 => 1670,
    96.875 => 1695,
    98.4375 => 1701,
    99.21875 => 1729,
    100 => 1729
  }
}

Sorting by titleIgnorePrefix - after:

[2025-02-08 08:21:02.593] INFO: [findAndCountAll] histogram values: [
   70, 83, 71, 51, 78, 77, 53, 59, 54, 49, 79, 49,
   48, 51, 64, 48, 48, 51, 54, 58, 50, 49, 59, 53,
  103, 58, 57, 59, 55, 56, 46, 49, 59, 67, 50, 53,
   87, 53, 56, 72, 57, 52, 56, 87, 66, 58, 83, 53,
   62, 59, 71, 53, 61, 57, 53, 61, 66, 49, 65, 52,
   63, 54, 58, 64, 74, 74, 70, 72, 53, 67, 88, 62
]
[2025-02-08 08:21:02.593] INFO: [findAndCountAll] histogram: Histogram {
  min: 46,
  max: 103,
  mean: 61.19444444444444,
  exceeds: 0,
  stddev: 11.67456743058851,
  count: 72,
  percentiles: SafeMap(9) [Map] {
    0 => 46,
    50 => 58,
    75 => 67,
    87.5 => 74,
    93.75 => 83,
    96.875 => 87,
    98.4375 => 88,
    99.21875 => 103,
    100 => 103
  }
}

Sorting by addedAt - before:

[2025-02-08 09:43:29.037] INFO: [bound findAndCountAll] histogram values: [
   277,  278,  305,  334,  368,  428,  464,  505,  503,
   581,  626,  636,  674,  694,  775,  771,  822,  829,
   859,  805,  858,  817,  941,  953,  985, 1021,  991,
  1029, 1111, 1094, 1131, 1130, 1169, 1216, 1225, 1166,
  1215, 1253, 1198, 1272, 1365, 1313, 1298, 1336, 1330,
  1407, 1382, 1406, 1496, 1420, 1578, 1551, 1429, 1403,
  1488, 1407, 1443, 1448, 1484, 1420, 1474, 1473, 1496,
  1478, 1476, 1518, 1514, 1520, 1568, 1577, 1554, 1579
]
[2025-02-08 09:43:29.038] INFO: [bound findAndCountAll] histogram: Histogram {
  min: 277,
  max: 1579,
  mean: 1110.2777777777778,
  exceeds: 0,
  stddev: 388.1921138868623,
  count: 72,
  percentiles: SafeMap(9) [Map] {
    0 => 277,
    50 => 1215,
    75 => 1443,
    87.5 => 1496,
    93.75 => 1554,
    96.875 => 1577,
    98.4375 => 1578,
    99.21875 => 1579,
    100 => 1579
  }
}

Sorting by addedAt - after:

[2025-02-08 08:07:23.636] INFO: [findAndCountAll] histogram values: [
  94, 83, 65, 70, 79, 51, 51, 61, 63, 63, 50, 76,
  48, 65, 59, 55, 73, 48, 63, 51, 56, 44, 59, 52,
  69, 56, 67, 55, 64, 54, 65, 54, 68, 62, 54, 72,
  53, 72, 49, 52, 69, 59, 55, 62, 56, 75, 48, 59,
  59, 50, 72, 44, 48, 47, 54, 65, 51, 65, 50, 47,
  85, 55, 49, 54, 45, 50, 49, 62, 60, 43, 53, 49
]
[2025-02-08 08:07:23.636] INFO: [findAndCountAll] histogram: Histogram {
  min: 43,
  max: 94,
  mean: 58.80555555555556,
  exceeds: 0,
  stddev: 10.483636573368477,
  count: 72,
  percentiles: SafeMap(9) [Map] {
    0 => 43,
    50 => 56,
    75 => 65,
    87.5 => 72,
    93.75 => 76,
    96.875 => 83,
    98.4375 => 85,
    99.21875 => 94,
    100 => 94
  }
}

@mikiher mikiher marked this pull request as ready for review February 8, 2025 13:35
@mikiher
Copy link
Contributor Author

mikiher commented Feb 8, 2025

Note: I suspect that some of the existing indices in the database are now obsolete, but I'm keeping them right now. I'll have to do a thorough survery of what's currently being used and where.

server/models/Book.js Outdated Show resolved Hide resolved
@mikiher
Copy link
Contributor Author

mikiher commented Feb 8, 2025

Note 2: @advplyr, you already mentioned over discussion in Discord the more fundamental change of merging the libraryItems, books and podcasts tables into a single table. I think this is the right way to go, although this is more of a revolution that needs to be done carefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Loading items a lot slower than pre-2.4.0 versions
2 participants