Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large dataset performance #11

Open
wants to merge 20 commits into
base: issue/large-dataset-performance
Choose a base branch
from

Conversation

protich
Copy link

@protich protich commented Aug 21, 2018

Please see last commit and let's chat when you get a minute.

greezybacon and others added 19 commits August 15, 2018 03:18
The joins for the queues should not yield duplicate records. Therefore distinct
counts should not be necessary. This saves the overhead of sorting the records
to be counted to ensure duplicated rows are not counted multiple times.
This removes two or three joins from queries which check ticket access, such as
the queue pages, by checking the object_type and object_id directly. In
general, this helps the MySQL query optimizer have an easier time realizing
that the joins are not necessary at all. It would be nice if the ORM would
realize that a join to check the primary key value of the foreign table should
actually stop one table short in the join path.
If there are annotations in an SQL statement, but there are no aggregate
functions used (such as SUM, COUNT, etc), then a GROUP BY clause is not
technically required. Using one implies sorting of the results to ensure
uniqueness--prior to sorting them according to the requested sort in the ORDER
BY clause.
Somehow on large datasets (like >1M tickets), MySQL can get confused on which
index will provide the best performance. Generally, as systems age, they will
have significantly more closed tickets than open ones. Therefore, it should be
safe to assume that scanning the `status_id` index on the ticket table for
`open` tickets would be the fastest way to arrive at the sort-of short list of
tickets which should need to possibly be aged.
If APCu is available, then the queue counts can be cached between requests.
They are automatically cleared and recalculated if the status of a ticket
changes or if a queue or saved search is edited. Otherwise, the queue counts
will expire after an hour and be recalculated anyway.
This changes the queue counts shown at the bottom of the page to no longer be
calculated using the SQL_CALC_FOUND_ROWS method of MySQL. Such is very slow for
large recordsets. Instead, a rough count is computed based on the total number of
tickets in the queue without respect for staff access. This is the fastest way to
get a maximum number of possible tickets to be shown. The pagenation interface
should be changed to show only NEXT and PREVIOUS pages where the rough estimate can
be used to provide a rough idea of whether or not another page of data would be
available.

Furthermore, if APCu is available, the rough count is stashed and kept between
requests so that the rough counts do not need to be re-tallied until they would
change from a ticket state change.

Another optimization might be to increment and decrement the queue rough counts when
tickets are created or change states. In such a case, it could be identified which
queues the old ticket would have been (and decrement the count) and which queues the
updated ticket would be in (and increment the count).
For its own reasons, MySQL seems to pick a better index when the join between ticket
and user is a left join.
The joins for the queues should not yield duplicate records. Therefore distinct
counts should not be necessary. This saves the overhead of sorting the records
to be counted to ensure duplicated rows are not counted multiple times.
This removes two or three joins from queries which check ticket access, such as
the queue pages, by checking the object_type and object_id directly. In
general, this helps the MySQL query optimizer have an easier time realizing
that the joins are not necessary at all. It would be nice if the ORM would
realize that a join to check the primary key value of the foreign table should
actually stop one table short in the join path.
If there are annotations in an SQL statement, but there are no aggregate
functions used (such as SUM, COUNT, etc), then a GROUP BY clause is not
technically required. Using one implies sorting of the results to ensure
uniqueness--prior to sorting them according to the requested sort in the ORDER
BY clause.
Somehow on large datasets (like >1M tickets), MySQL can get confused on which
index will provide the best performance. Generally, as systems age, they will
have significantly more closed tickets than open ones. Therefore, it should be
safe to assume that scanning the `status_id` index on the ticket table for
`open` tickets would be the fastest way to arrive at the sort-of short list of
tickets which should need to possibly be aged.
If APCu is available, then the queue counts can be cached between requests.
They are automatically cleared and recalculated if the status of a ticket
changes or if a queue or saved search is edited. Otherwise, the queue counts
will expire after an hour and be recalculated anyway.
This changes the queue counts shown at the bottom of the page to no longer be
calculated using the SQL_CALC_FOUND_ROWS method of MySQL. Such is very slow for
large recordsets. Instead, a rough count is computed based on the total number of
tickets in the queue without respect for staff access. This is the fastest way to
get a maximum number of possible tickets to be shown. The pagenation interface
should be changed to show only NEXT and PREVIOUS pages where the rough estimate can
be used to provide a rough idea of whether or not another page of data would be
available.

Furthermore, if APCu is available, the rough count is stashed and kept between
requests so that the rough counts do not need to be re-tallied until they would
change from a ticket state change.

Another optimization might be to increment and decrement the queue rough counts when
tickets are created or change states. In such a case, it could be identified which
queues the old ticket would have been (and decrement the count) and which queues the
updated ticket would be in (and increment the count).
For its own reasons, MySQL seems to pick a better index when the join between ticket
and user is a left join.
Prefer agent's queue count instead of rough count when paginating the
tickets. This will make the initial queue load expensive but has an
added advantage of having queue counts available thereafter for drop downs.

This commits also adds entry to auto-cron, to keep queue counts more up to
date in the background.When APCu is not available SESSION is used to cache
the counts.
This adds the advanced option to the queue sort configuration. An index can be
specified to be used for the sorting operation. In some cases, the MySQL query
optimizer cannot select the most efficient index to use when dealing with large
querysets and sorting. This feature, if enabled, allows an administrator to specify
an index which MySQL should use when using the sort.

To use the feature, an `extra` column must be added to the `%queue_sort` table to
receive the index name.
Prefer agent's queue count instead of rough count when paginating the
tickets. This will make the initial queue load expensive but has an
added advantage of having queue counts available thereafter for drop downs.

This commits also adds entry to auto-cron, to keep queue counts more up to
date in the background.When APCu is not available SESSION is used to cache
the counts.
This is useful to avoid blank page due to `getCount` on queue.
@greezybacon greezybacon force-pushed the issue/large-dataset-performance branch 2 times, most recently from 827d39f to dcc58bc Compare August 24, 2018 00:33
…e/large-dataset-performance

Conflicts:
	include/class.orm.php
	include/class.search.php
	include/staff/templates/queue-tickets.tmpl.php
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants