Skip to content

Commit

Permalink
Improve this project's readme (#80) [BB-2994]
Browse files Browse the repository at this point in the history
  • Loading branch information
bradenmacdonald authored Nov 17, 2020
1 parent a71ab4f commit c3b1c1a
Show file tree
Hide file tree
Showing 3 changed files with 63 additions and 56 deletions.
117 changes: 62 additions & 55 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,56 +3,76 @@ openedx-completion-aggregator

|pypi-badge| |travis-badge| |codecov-badge| |pyversions-badge| |license-badge|

openedx-completion-aggregator is a Django app that aggregates block level
completion data for different block types for Open edX.
openedx-completion-aggregator is a Django app that aggregates block level completion data for different block types for Open edX.

Overview
--------
What does that mean?

openedx-completion-aggregator uses the pluggable django app pattern to
ease installation. To use in edx-platform, do the following:
A standard Open edX installation can track the completion of individual XBlocks in a course, which is done using the `completion library <https://github.com/edx/completion#completion>`_. This completion tracking is what powers the green checkmarks shown in the course outline and course navigation as the learner completes each unit in the course:

1. Install the app into your virtualenv.
.. image:: docs/completion.png
:width: 100%

..code_block::
When completion tracking is enabled (and green checkmarks are showing, as seen above), it is only tracked at the XBlock level. You can use the Course Blocks API to check the completion status of any individual XBlock in the course, for a single user. For example, to get the completion of the XBlock with usage ID ``block-v1:OpenCraft+completion+demo+type@html+block@demo_block`` on the LMS instance ``courses.opencraft.com`` by the user ``MyUsername``, you could call this REST API::

$ pip install openedx-completion-aggregator
GET https://courses.opencraft.com/api/courses/v1/blocks/block-v1:OpenCraft+completion+demo+type@html+block@demo_block?username=MyUsername&requested_fields=completion

2. [Optional] You may override the set of registered aggregator block types in
your lms.env.json file::
The response will include a ``completion`` value between ``0`` and ``1``.

...
"COMPLETION_AGGREGATOR_BLOCK_TYPES": {
"course",
"chapter",
"subsection",
"vertical"
},
...
However, what if you want to know the overall % completion of an entire course? ("Alex, you have completed 45% of Introduction to Statistics") Or what if you as an instructor want to get a report of how much of Section 1 every student in a course has completed? Those queries are either not possible or too slow using the APIs built in to the LMS and ``completion``.

This Open edX plugin, ``openedx-completion-aggregator`` watches course activity and asynchronously updates database tables with "aggregate" completion data. "Aggregate" data means completion data summed up over all XBlocks into a course and aggregated at higher levels, like the subsection, section, and course level. The completion aggregator provides a REST API that can provide near-instant answers to queries such as:

* What % complete are each of the courses that I'm enrolled in?
* What % of each section in Course X have my students completed?
* What is the average completion % among all enrolled students in a course?

Notes:

* This service only provides data, via a REST API. There is no user interface.
* On production instances, the answers to these "aggregate" questions may be slightly out of date, because they are computed asynchronously (see below). How often they are updated is configurable.

Synchronous vs. Asynchronous calculations
-----------------------------------------

openedx-completion-aggregator operates in one of two modes: synchronous or asynchronous.

With synchronous aggregation, each time a student completes a block, the aggregator code will re-calculate the aggregate completion values immediately. You will always have the freshest results from this API, but at a huge performance cost. Synchronous aggregation is only for development purposes and is not suitable for production. **Synchronous aggregation can cause deadlocks when users complete XBlocks, leading to a partial outage of the LMS. Do not use it on a production site.**

With asynchronous aggregation, the aggregator code will re-calculate the aggregate completion values asynchronously, at periodic intervals (e.g. every hour). How often the update can and should be run depends on many factors - you will have to experiment and find what works best and what is possible for your specific Open edX installation. (Running this too often can clog the celery tasks queue, which might require manual intervention.)

It's important to note that in both modes the single-user, single-course API endpoints will always return up-to-date data. However, data that covers multiple users or multiple courses can be slightly out of date, until the aggregates are updated asynchronously.

API Details
-----------

3. By default, completion is aggregated with each created or updated
BlockCompletion. In most production instances, you will want to calculate
aggregations asynchronously. To enable asynchronous calculation for your
installation, set the following in your lms.env.json file::
For details about how the completion aggregator's REST APIs can be used, please refer to `the docstrings in views.py <https://github.com/open-craft/openedx-completion-aggregator/blob/master/completion_aggregator/api/v1/views.py#L24>`_.

Installation and Configuration
------------------------------

openedx-completion-aggregator uses the pluggable django app pattern to ease installation. To use in edx-platform, do the following:

1. Install the app into your virtualenv::

$ pip install openedx-completion-aggregator

2. By default, aggregate data is re-computed synchronously (with each created or updated BlockCompletion). While that is often useful for development, in most production instances, you will want to calculate aggregations asynchronously as explained above. To enable asynchronous calculation for your installation, set the following in your ``lms.yml`` file::

...
"COMPLETION_AGGREGATOR_ASYNC_AGGREGATION": true,
COMPLETION_AGGREGATOR_ASYNC_AGGREGATION: true
...

Then configure up a pair of cron jobs to run `./manage.py
run_aggregator_service` and `./manage.py run_aggregator_cleanup` as often
as desired.
Then configure a pair of cron jobs to run ``./manage.py run_aggregator_service`` and ``./manage.py run_aggregator_cleanup`` as often as desired. (Start with hourly and daily, respectively, if you are unsure.) The ``run_aggregator_service`` task is what updates any aggregate completion data values that need to be updated since it was last run (it will in turn enqueue celery tasks to do the actual updating). The cleanup task deletes old database entries used to coordinate the aggregation updates, and which can build up over time but are no longer needed.

Note that if operating on a Hawthorne-or-later release of edx-platform, you may
override the settings in `EDXAPP_ENV_EXTRA` instead.
3. If the aggregator is installed on an existing instance, then it's sometimes desirable to fill "Aggregate" data for the existing courses. There is the ``reaggregate_course`` management command, which prepares data that will be aggregated during the next ``run_aggregator_service`` run. However, the process of aggregating data for existing courses can place extremely high loads on both your celery workers and your MySQL database, so on large instances this process must be planned with great care. For starters, we recommend you disable any associated cron jobs, scale up your celery worker pool significantly, and scale up your database cluster and storage.

Design
------

Design: Technical Details
-------------------------

The completion aggregator is designed to facilitate working with course-level,
chapter-level, and other aggregated percentages of course completion as
represented by the [BlockCompletion model](https://github.com/edx/completion/blob/master/completion/models.py#L173) (from the edx-completion djangoapp).
represented by the `BlockCompletion model <https://github.com/edx/completion/blob/e1db6a137423f6/completion/models.py#L175>`_ (from the edx-completion djangoapp).
By storing these values in the database, we are able to quickly return
information for all users in a course.

Expand Down Expand Up @@ -84,18 +104,18 @@ by each aggregator, and values are summed recursively from the course block on
down. Values for every node in the whole tree can be calculated in a single
traversal. These calculations can either be performed "read-only" (to get the
latest data for each user), or "read-write" to store that data in the
[`completion_aggregator.Aggregator` model](https://github.com/open-craft/openedx-completion-aggregator/blob/master/completion_aggregator/models.py#L199).
`completion_aggregator.Aggregator model <https://github.com/open-craft/openedx-completion-aggregator/blob/a71ab4f077/completion_aggregator/models.py#L199>`_.

During regular course interaction, a learner will calculate aggregations on the
fly to get the latest information. However, on-the-fly calculations are too
expensive when performed for all users in a course, so periodically (every hour
or less), a task is run to calculate all aggregators that have gone out of
date in the previous hour, and store those values in the database. These
stored values are then used for reporting on course-wide completion (for course
admin views).

By tracking which blocks have been changed recently (in the [`StaleCompletion` table](https://github.com/open-craft/openedx-completion-aggregator/blob/master/completion_aggregator/models.py#L272)
table) These stored values can also be used to shortcut calculations for
expensive when performed for all users in a course, so periodically (e.g. every
hour, but this is configurable), a task is run to calculate all aggregators that
have gone out of date since the last run, and store those values in the database.
These stored values are then used for reporting on course-wide completion (for
course admin views).

By tracking which blocks have been changed recently (in the `StaleCompletion table <https://github.com/open-craft/openedx-completion-aggregator/blob/a71ab4f077a/completion_aggregator/models.py#L272>`_
), these stored values can also be used to shortcut calculations for
portions of the course graph that are known to be up to date. If a user has
only completed blocks in chapter 3 of a three-chapter course since the last
time aggregations were stored, there is no need to redo the calculation for
Expand All @@ -105,11 +125,7 @@ value for chapter 3.

Currently, the major bottleneck in these calculations is creating the course
graph for each user. We are caching the graph locally to speed things up, but
this stresses the memory capabilities of the servers. My understanding is that
more recent versions of edx-platform do a better job caching course graphs
site-wide, which should improve performance, and allow us to bypass the local
calculation, though this will need to be evaluated when our client (which is
currently on ginkgo) upgrades.
this stresses the memory capabilities of the servers.

License
-------
Expand All @@ -126,15 +142,6 @@ Contributions are very welcome.

Please read `How To Contribute <https://github.com/edx/edx-platform/blob/master/CONTRIBUTING.rst>`_ for details.

Even though they were written with ``edx-platform`` in mind, the guidelines
should be followed for Open edX code in general.

PR description template should be automatically applied if you are sending PR from github interface; otherwise you
can find it it at `PULL_REQUEST_TEMPLATE.md <https://github.com/open-craft/openedx-completion-aggregator/blob/master/.github/PULL_REQUEST_TEMPLATE.md>`_

Issue report template should be automatically applied if you are sending it from github UI as well; otherwise you
can find it at `ISSUE_TEMPLATE.md <https://github.com/open-craft/openedx-completion-aggregator/blob/master/.github/ISSUE_TEMPLATE.md>`_

Reporting Security Issues
-------------------------

Expand Down
Binary file added docs/completion.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
envlist = py35-django22,quality,docs

[doc8]
max-line-length = 120
ignore = D001

[pycodestyle]
exclude = .git,.tox,migrations
Expand Down

0 comments on commit c3b1c1a

Please sign in to comment.