Large dataset performance #11

protich · 2018-08-21T16:03:04Z

Please see last commit and let's chat when you get a minute.

The joins for the queues should not yield duplicate records. Therefore distinct counts should not be necessary. This saves the overhead of sorting the records to be counted to ensure duplicated rows are not counted multiple times.

This removes two or three joins from queries which check ticket access, such as the queue pages, by checking the object_type and object_id directly. In general, this helps the MySQL query optimizer have an easier time realizing that the joins are not necessary at all. It would be nice if the ORM would realize that a join to check the primary key value of the foreign table should actually stop one table short in the join path.

If there are annotations in an SQL statement, but there are no aggregate functions used (such as SUM, COUNT, etc), then a GROUP BY clause is not technically required. Using one implies sorting of the results to ensure uniqueness--prior to sorting them according to the requested sort in the ORDER BY clause.

Somehow on large datasets (like >1M tickets), MySQL can get confused on which index will provide the best performance. Generally, as systems age, they will have significantly more closed tickets than open ones. Therefore, it should be safe to assume that scanning the `status_id` index on the ticket table for `open` tickets would be the fastest way to arrive at the sort-of short list of tickets which should need to possibly be aged.

If APCu is available, then the queue counts can be cached between requests. They are automatically cleared and recalculated if the status of a ticket changes or if a queue or saved search is edited. Otherwise, the queue counts will expire after an hour and be recalculated anyway.

This changes the queue counts shown at the bottom of the page to no longer be calculated using the SQL_CALC_FOUND_ROWS method of MySQL. Such is very slow for large recordsets. Instead, a rough count is computed based on the total number of tickets in the queue without respect for staff access. This is the fastest way to get a maximum number of possible tickets to be shown. The pagenation interface should be changed to show only NEXT and PREVIOUS pages where the rough estimate can be used to provide a rough idea of whether or not another page of data would be available. Furthermore, if APCu is available, the rough count is stashed and kept between requests so that the rough counts do not need to be re-tallied until they would change from a ticket state change. Another optimization might be to increment and decrement the queue rough counts when tickets are created or change states. In such a case, it could be identified which queues the old ticket would have been (and decrement the count) and which queues the updated ticket would be in (and increment the count).

For its own reasons, MySQL seems to pick a better index when the join between ticket and user is a left join.

The joins for the queues should not yield duplicate records. Therefore distinct counts should not be necessary. This saves the overhead of sorting the records to be counted to ensure duplicated rows are not counted multiple times.

This removes two or three joins from queries which check ticket access, such as the queue pages, by checking the object_type and object_id directly. In general, this helps the MySQL query optimizer have an easier time realizing that the joins are not necessary at all. It would be nice if the ORM would realize that a join to check the primary key value of the foreign table should actually stop one table short in the join path.

If there are annotations in an SQL statement, but there are no aggregate functions used (such as SUM, COUNT, etc), then a GROUP BY clause is not technically required. Using one implies sorting of the results to ensure uniqueness--prior to sorting them according to the requested sort in the ORDER BY clause.

Somehow on large datasets (like >1M tickets), MySQL can get confused on which index will provide the best performance. Generally, as systems age, they will have significantly more closed tickets than open ones. Therefore, it should be safe to assume that scanning the `status_id` index on the ticket table for `open` tickets would be the fastest way to arrive at the sort-of short list of tickets which should need to possibly be aged.

If APCu is available, then the queue counts can be cached between requests. They are automatically cleared and recalculated if the status of a ticket changes or if a queue or saved search is edited. Otherwise, the queue counts will expire after an hour and be recalculated anyway.

This changes the queue counts shown at the bottom of the page to no longer be calculated using the SQL_CALC_FOUND_ROWS method of MySQL. Such is very slow for large recordsets. Instead, a rough count is computed based on the total number of tickets in the queue without respect for staff access. This is the fastest way to get a maximum number of possible tickets to be shown. The pagenation interface should be changed to show only NEXT and PREVIOUS pages where the rough estimate can be used to provide a rough idea of whether or not another page of data would be available. Furthermore, if APCu is available, the rough count is stashed and kept between requests so that the rough counts do not need to be re-tallied until they would change from a ticket state change. Another optimization might be to increment and decrement the queue rough counts when tickets are created or change states. In such a case, it could be identified which queues the old ticket would have been (and decrement the count) and which queues the updated ticket would be in (and increment the count).

For its own reasons, MySQL seems to pick a better index when the join between ticket and user is a left join.

Prefer agent's queue count instead of rough count when paginating the tickets. This will make the initial queue load expensive but has an added advantage of having queue counts available thereafter for drop downs. This commits also adds entry to auto-cron, to keep queue counts more up to date in the background.When APCu is not available SESSION is used to cache the counts.

This adds the advanced option to the queue sort configuration. An index can be specified to be used for the sorting operation. In some cases, the MySQL query optimizer cannot select the most efficient index to use when dealing with large querysets and sorting. This feature, if enabled, allows an administrator to specify an index which MySQL should use when using the sort. To use the feature, an `extra` column must be added to the `%queue_sort` table to receive the index name.

Prefer agent's queue count instead of rough count when paginating the tickets. This will make the initial queue load expensive but has an added advantage of having queue counts available thereafter for drop downs. This commits also adds entry to auto-cron, to keep queue counts more up to date in the background.When APCu is not available SESSION is used to cache the counts.

…e/large-dataset-performance

This is useful to avoid blank page due to `getCount` on queue.

…e/large-dataset-performance Conflicts: include/class.orm.php include/class.search.php include/staff/templates/queue-tickets.tmpl.php

greezybacon and others added 19 commits August 15, 2018 03:18

queues: distinct counts are not necessary

3a3824e

The joins for the queues should not yield duplicate records. Therefore distinct counts should not be necessary. This saves the overhead of sorting the records to be counted to ensure duplicated rows are not counted multiple times.

queues: MySQL prefers a left join on user

e45d13a

For its own reasons, MySQL seems to pick a better index when the join between ticket and user is a left join.

queues: distinct counts are not necessary

1a1c457

The joins for the queues should not yield duplicate records. Therefore distinct counts should not be necessary. This saves the overhead of sorting the records to be counted to ensure duplicated rows are not counted multiple times.

queues: MySQL prefers a left join on user

26f2cd2

For its own reasons, MySQL seems to pick a better index when the join between ticket and user is a left join.

Merge remote branch 'jared/issue/large-dataset-performance' into issu…

2c1f9e9

…e/large-dataset-performance

queues: Catch DB exceptions on queue count

e3e3bf4

This is useful to avoid blank page due to `getCount` on queue.

greezybacon force-pushed the issue/large-dataset-performance branch 2 times, most recently from 827d39f to dcc58bc Compare August 24, 2018 00:33

Merge remote branch 'jared/issue/large-dataset-performance' into issu…

2f6cf14

…e/large-dataset-performance Conflicts: include/class.orm.php include/class.search.php include/staff/templates/queue-tickets.tmpl.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large dataset performance #11

Large dataset performance #11

protich commented Aug 21, 2018

Large dataset performance #11

Are you sure you want to change the base?

Large dataset performance #11

Conversation

protich commented Aug 21, 2018