Improve book library page query performance on title, titleIgnorePrefix, and addedAt sort orders. #3952

mikiher · 2025-02-08T13:33:34Z

Brief summary

Significantly improves book library Sequelize page queries for the following sort orders:

title (with or without ignorePrefix)
addedAt

Which issue is fixed?

This partially fixes #2073 (resolving the book library load times, but not the podcast library load times)

In-depth Description

After digging more into the details of the issues people were complaining about in #2073 and doing additional performance analysis in Sequelize query to bring the page here, I made the following observations:

Due the the current architecture of the database (specifically, the separate libraryItems and books tables), SQLite cannot sort the results of the main query efficiently, even when an index on the sorted book column exists (more details below)
The main query performs a join with the feeds table, which somehat complicates the query (especialy if there are many feeds)
A query for the total count of results is run every time, even though it is usually unnecessary as the total count doesn't change during a scrolling over the library.

1 is by far the most serious problem, and also causes significant degradation in query performance as the offset becomes larger.
When the main query is sorting by title:

SELECT book.*, libraryItem.*, [libraryItem->feeds].*
  FROM books AS book
       INNER JOIN
       libraryItems AS libraryItem ON book.id = libraryItem.mediaId AND 
                                      (libraryItem.libraryId = 'b3bf8580-edba-4d7d-90eb-3b54206d16c3' AND 
                                       libraryItem.mediaType = 'book') 
       LEFT OUTER JOIN
       feeds AS [libraryItem->feeds] ON libraryItem.id = [libraryItem->feeds].entityId AND 
                                        [libraryItem->feeds].entityType = 'libraryItem'
 ORDER BY book.title ASC
 LIMIT 0, 35;

it's evident in the query plan that the query engine cannot use the existing book.title index, and needs to build a temporary tree for sorting.

SEARCH libraryItem USING INDEX library_items_library_id_media_type_size (libraryId=? AND mediaType=?)
SEARCH book USING INDEX sqlite_autoindex_books_1 (id=?)
SCAN libraryItem->feeds LEFT-JOIN
USE TEMP B-TREE FOR ORDER BY

Even when you remove the feeds table join from the query:

SELECT book.*, libraryItem.*
  FROM books AS book
       INNER JOIN
       libraryItems AS libraryItem ON book.id = libraryItem.mediaId AND 
                                      (libraryItem.libraryId = 'b3bf8580-edba-4d7d-90eb-3b54206d16c3' AND 
                                       libraryItem.mediaType = 'book') 
 ORDER BY book.title ASC
 LIMIT 0, 35;

The query plan still doesn't make use of the book.title index:

SEARCH libraryItem USING INDEX library_items_library_id_media_type_size (libraryId=? AND mediaType=?)
SEARCH book USING INDEX sqlite_autoindex_books_1 (id=?)
USE TEMP B-TREE FOR ORDER BY

The significant boost in performance can come only if the title column is put in the libraryItems table, and an index on (libraryId, mediaType, title) is built. This way, filtering and sorting happens at the same time, and the index can be traveresed very quickly to reach the required offset without needing to look at the tables themselves.

So with a query like this:

SELECT book.*, libraryItem.*
  FROM books AS book
       INNER JOIN
       libraryItems AS libraryItem ON book.id = libraryItem.mediaId AND 
                                      (libraryItem.libraryId = 'b3bf8580-edba-4d7d-90eb-3b54206d16c3' AND 
                                       libraryItem.mediaType = 'book') 
 ORDER BY libraryItem.title ASC
 LIMIT 0, 35;

We get the following query plan:

SEARCH libraryItem USING INDEX library_items_library_id_media_type_title (libraryId=? AND mediaType=?)
SEARCH book USING INDEX sqlite_autoindex_books_1 (id=?)

Which is optimal! (or, to be more precise, optimal given the current architecture)

Resolution

The following changes were made:

title and titleIgnorePrefix columns were added to libraryItems
Triggers were put in place to update the new column values whenever the corresponding values in books change.
The following Indices were added to libraryItems:
- (libraryId, mediaType, title)
- (libraryId, mediaType, titleIgnorePrefix)
- (libraryId, mediaType, createdAt)
The feeds query was separated from the main query
A total count cache was introduced.
- If the count for a given query is found in the cache, findAll is called instead of findAndCountAll
- The cache is invalidated whenever book records are added or removed.

How have you tested this?

I tested on loading 72 consecutive page of 35 books each, on each of the above sorting orders, on an ABS docker container running on a Synology 920+ NAS (I wanted to test on a hardware that was much weaker than my dev machine).

Results

All measurements are in ms.

Summary
Overall, we see a 94-95% drop (!) in mean and median query time.
Standard deviaion also reduces drastically from ~400 to ~10.

Note how the steady rise in query time (as the the requested offset grows) which is quite visible before, is not noticable after.

Sorting by title - before:

[2025-02-08 09:16:21.962] INFO: [bound findAndCountAll] histogram values: [
   191,  261,  257,  314,  296,  328,  336,  376,  398,
   453,  476,  535,  513,  594,  585,  618,  638,  672,
   666,  700,  736,  715,  745,  757,  810,  864,  863,
   849,  897,  898,  962,  985, 1003, 1063, 1086, 1084,
  1114, 1131, 1152, 1147, 1197, 1198, 1224, 1236, 1265,
  1318, 1345, 1216, 1313, 1315, 1284, 1302, 1309, 1353,
  1381, 1360, 1348, 1445, 1322, 1423, 1390, 1426, 1473,
  1550, 1512, 1513, 1495, 1539, 1550, 1573, 1614, 1629
]
[2025-02-08 09:16:21.962] INFO: [bound findAndCountAll] histogram: Histogram {
  min: 191,
  max: 1629,
  mean: 1006.75,
  exceeds: 0,
  stddev: 410.08172457856904,
  count: 72,
  percentiles: SafeMap(9) [Map] {
    0 => 191,
    50 => 1086,
    75 => 1345,
    87.5 => 1473,
    93.75 => 1550,
    96.875 => 1573,
    98.4375 => 1614,
    99.21875 => 1629,
    100 => 1629
  }
}

Sorting by title - after:

[2025-02-08 07:58:39.193] INFO: [findAndCountAll] histogram values: [
  87, 104, 70, 70, 51, 57, 77, 49, 54, 60, 57, 58,
  60,  47, 87, 47, 57, 68, 49, 66, 50, 55, 64, 45,
  48,  54, 67, 46, 65, 53, 79, 44, 72, 75, 85, 52,
  61,  65, 56, 60, 58, 67, 70, 63, 53, 52, 50, 55,
  53,  62, 54, 58, 50, 52, 67, 53, 58, 55, 65, 50,
  56,  57, 66, 75, 48, 57, 56, 52, 49, 75, 65, 87
]
[2025-02-08 07:58:39.193] INFO: [findAndCountAll] histogram: Histogram {
  min: 44,
  max: 104,
  mean: 60.541666666666664,
  exceeds: 0,
  stddev: 11.6665922616675,
  count: 72,
  percentiles: SafeMap(9) [Map] {
    0 => 44,
    50 => 57,
    75 => 66,
    87.5 => 75,
    93.75 => 85,
    96.875 => 87,
    98.4375 => 87,
    99.21875 => 104,
    100 => 104
  }
}

Sorting by titleIgnorePrefix - before:

[2025-02-08 09:28:31.290] INFO: [bound findAndCountAll] histogram values: [
   191,  257,  286,  297,  321,  357,  382,  432,  405,
   470,  513,  516,  560,  543,  609,  639,  646,  748,
   670,  757,  819,  790,  814,  866,  899,  969,  970,
  1033, 1033, 1026, 1073, 1109, 1099, 1120, 1165, 1203,
  1184, 1317, 1280, 1270, 1305, 1355, 1336, 1329, 1367,
  1413, 1401, 1406, 1441, 1520, 1433, 1471, 1569, 1455,
  1495, 1509, 1519, 1513, 1527, 1539, 1568, 1559, 1605,
  1561, 1627, 1637, 1642, 1670, 1693, 1701, 1729, 1695
]
[2025-02-08 09:28:31.291] INFO: [bound findAndCountAll] histogram: Histogram {
  min: 191,
  max: 1729,
  mean: 1100.388888888889,
  exceeds: 0,
  stddev: 453.3125778200694,
  count: 72,
  percentiles: SafeMap(9) [Map] {
    0 => 191,
    50 => 1184,
    75 => 1509,
    87.5 => 1569,
    93.75 => 1670,
    96.875 => 1695,
    98.4375 => 1701,
    99.21875 => 1729,
    100 => 1729
  }
}

Sorting by titleIgnorePrefix - after:

[2025-02-08 08:21:02.593] INFO: [findAndCountAll] histogram values: [
   70, 83, 71, 51, 78, 77, 53, 59, 54, 49, 79, 49,
   48, 51, 64, 48, 48, 51, 54, 58, 50, 49, 59, 53,
  103, 58, 57, 59, 55, 56, 46, 49, 59, 67, 50, 53,
   87, 53, 56, 72, 57, 52, 56, 87, 66, 58, 83, 53,
   62, 59, 71, 53, 61, 57, 53, 61, 66, 49, 65, 52,
   63, 54, 58, 64, 74, 74, 70, 72, 53, 67, 88, 62
]
[2025-02-08 08:21:02.593] INFO: [findAndCountAll] histogram: Histogram {
  min: 46,
  max: 103,
  mean: 61.19444444444444,
  exceeds: 0,
  stddev: 11.67456743058851,
  count: 72,
  percentiles: SafeMap(9) [Map] {
    0 => 46,
    50 => 58,
    75 => 67,
    87.5 => 74,
    93.75 => 83,
    96.875 => 87,
    98.4375 => 88,
    99.21875 => 103,
    100 => 103
  }
}

Sorting by addedAt - before:

[2025-02-08 09:43:29.037] INFO: [bound findAndCountAll] histogram values: [
   277,  278,  305,  334,  368,  428,  464,  505,  503,
   581,  626,  636,  674,  694,  775,  771,  822,  829,
   859,  805,  858,  817,  941,  953,  985, 1021,  991,
  1029, 1111, 1094, 1131, 1130, 1169, 1216, 1225, 1166,
  1215, 1253, 1198, 1272, 1365, 1313, 1298, 1336, 1330,
  1407, 1382, 1406, 1496, 1420, 1578, 1551, 1429, 1403,
  1488, 1407, 1443, 1448, 1484, 1420, 1474, 1473, 1496,
  1478, 1476, 1518, 1514, 1520, 1568, 1577, 1554, 1579
]
[2025-02-08 09:43:29.038] INFO: [bound findAndCountAll] histogram: Histogram {
  min: 277,
  max: 1579,
  mean: 1110.2777777777778,
  exceeds: 0,
  stddev: 388.1921138868623,
  count: 72,
  percentiles: SafeMap(9) [Map] {
    0 => 277,
    50 => 1215,
    75 => 1443,
    87.5 => 1496,
    93.75 => 1554,
    96.875 => 1577,
    98.4375 => 1578,
    99.21875 => 1579,
    100 => 1579
  }
}

Sorting by addedAt - after:

[2025-02-08 08:07:23.636] INFO: [findAndCountAll] histogram values: [
  94, 83, 65, 70, 79, 51, 51, 61, 63, 63, 50, 76,
  48, 65, 59, 55, 73, 48, 63, 51, 56, 44, 59, 52,
  69, 56, 67, 55, 64, 54, 65, 54, 68, 62, 54, 72,
  53, 72, 49, 52, 69, 59, 55, 62, 56, 75, 48, 59,
  59, 50, 72, 44, 48, 47, 54, 65, 51, 65, 50, 47,
  85, 55, 49, 54, 45, 50, 49, 62, 60, 43, 53, 49
]
[2025-02-08 08:07:23.636] INFO: [findAndCountAll] histogram: Histogram {
  min: 43,
  max: 94,
  mean: 58.80555555555556,
  exceeds: 0,
  stddev: 10.483636573368477,
  count: 72,
  percentiles: SafeMap(9) [Map] {
    0 => 43,
    50 => 56,
    75 => 65,
    87.5 => 72,
    93.75 => 76,
    96.875 => 83,
    98.4375 => 85,
    99.21875 => 94,
    100 => 94
  }
}

…rt order

mikiher · 2025-02-08T16:17:10Z

Note: I suspect that some of the existing indices in the database are now obsolete, but I'm keeping them right now. I'll have to do a thorough survery of what's currently being used and where.

server/models/Book.js

mikiher · 2025-02-08T16:33:48Z

Note 2: @advplyr, you already mentioned over discussion in Discord the more fundamental change of merging the libraryItems, books and podcasts tables into a single table. I think this is the right way to go, although this is more of a revolution that needs to be done carefully.

mikiher added 3 commits February 8, 2025 10:41

Add a profile utility function

52bb286

Improve page load queries on title, titleIgnorePrefix, and addedAt so…

a131432

…rt order

Add index on (libraryId, mediaType, createdAt)

3d08a35

mikiher marked this pull request as ready for review February 8, 2025 13:35

advplyr reviewed Feb 8, 2025

View reviewed changes

server/models/Book.js Outdated Show resolved Hide resolved

Update server/models/Book.js

9a26119

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve book library page query performance on title, titleIgnorePrefix, and addedAt sort orders. #3952

Improve book library page query performance on title, titleIgnorePrefix, and addedAt sort orders. #3952

mikiher commented Feb 8, 2025 •

edited

Loading

mikiher commented Feb 8, 2025

mikiher commented Feb 8, 2025 •

edited

Loading

Improve book library page query performance on title, titleIgnorePrefix, and addedAt sort orders. #3952

Are you sure you want to change the base?

Improve book library page query performance on title, titleIgnorePrefix, and addedAt sort orders. #3952

Conversation

mikiher commented Feb 8, 2025 • edited Loading

Brief summary

Which issue is fixed?

In-depth Description

Resolution

How have you tested this?

Results

mikiher commented Feb 8, 2025

mikiher commented Feb 8, 2025 • edited Loading

mikiher commented Feb 8, 2025 •

edited

Loading

mikiher commented Feb 8, 2025 •

edited

Loading