Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New exercise: building suffix arrays. #27

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

fpottier
Copy link
Collaborator

This exercise explains the concept of a suffix array and proposes a series of relatively easy questions that lead to a naive algorithm for building a suffix array. Then, it moves on to Manber and Myers' algorithm, which is much more subtle. Several building blocks are given so that the student can have a reasonable chance of success. Still, this is definitely a difficult algorithm. The difficulty lies probably more in the algorithmic aspects than in the actual programming task. I have placed several assertions in advance in the code so that the student has a good chance of detecting their own mistakes.

move during this stage. Indeed, these suffixes have fewer than *h* characters,
so, as far as they are concerned, *h*-order and *2h*-order are the same thing.
Because *a* is *h*-sorted, each of these suffixes has already reached its
final position in the array, and inhabits a singleton *h*-bucket.
Copy link
Collaborator

@yurug yurug Dec 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final position in the array, and inhabits a singleton *h*-bucket.
final position in the array, and inhabits a singleton *h*-bucket in the final array.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this suggestion. I think each index inhabits a singleton h-bucket both in the current array and in the final array. I am not sure if adding "in the final array" clarifies anything. Is something presently unclear?


Thus, a naïve idea would be to implement stage *h* simply by sorting the array
*a* using an off-the-shelf sorting algorithm, such as Heapsort or Mergesort.
The total complexity of this stage would then be *O(n log n)*, which is not
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your grader able to detect if the student has submitted such an algorithm?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the grader does not perform an explicit complexity check. The student's code is tested with relatively long strings (up to several hundred thousand characters) so its complexity must not be awful, or it will time out.

It's not clear how we would reliably measure the theoretical complexity of the code, as we do not have any obvious hooks. (Perhaps we could override the primitive array access operations and count them? But we would also need to prevent the student from calling library functions such as List.sort which perform array accesses without going through this wrapper.)

boundary between the slots that have been written already and
the slots that remain available.)

At a high level of abstraction, the algorithm can be described as follows:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's easy to do, you could show how each of the algorithm's steps performs on the "Mississippi" running example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I am capable of doing this, and not sure it would be easy to follow... (I always have difficulty following animations, especially in the case of a complex algorithm like this.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants