-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New exercise: building suffix arrays. #27
base: master
Are you sure you want to change the base?
Conversation
move during this stage. Indeed, these suffixes have fewer than *h* characters, | ||
so, as far as they are concerned, *h*-order and *2h*-order are the same thing. | ||
Because *a* is *h*-sorted, each of these suffixes has already reached its | ||
final position in the array, and inhabits a singleton *h*-bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final position in the array, and inhabits a singleton *h*-bucket. | |
final position in the array, and inhabits a singleton *h*-bucket in the final array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this suggestion. I think each index inhabits a singleton h-bucket both in the current array and in the final array. I am not sure if adding "in the final array" clarifies anything. Is something presently unclear?
|
||
Thus, a naïve idea would be to implement stage *h* simply by sorting the array | ||
*a* using an off-the-shelf sorting algorithm, such as Heapsort or Mergesort. | ||
The total complexity of this stage would then be *O(n log n)*, which is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is your grader able to detect if the student has submitted such an algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the grader does not perform an explicit complexity check. The student's code is tested with relatively long strings (up to several hundred thousand characters) so its complexity must not be awful, or it will time out.
It's not clear how we would reliably measure the theoretical complexity of the code, as we do not have any obvious hooks. (Perhaps we could override the primitive array access operations and count them? But we would also need to prevent the student from calling library functions such as List.sort
which perform array accesses without going through this wrapper.)
boundary between the slots that have been written already and | ||
the slots that remain available.) | ||
|
||
At a high level of abstraction, the algorithm can be described as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's easy to do, you could show how each of the algorithm's steps performs on the "Mississippi" running example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I am capable of doing this, and not sure it would be easy to follow... (I always have difficulty following animations, especially in the case of a complex algorithm like this.)
Co-authored-by: Yann Régis Gianas <[email protected]>
Co-authored-by: Yann Régis Gianas <[email protected]>
Co-authored-by: Yann Régis Gianas <[email protected]>
Co-authored-by: Yann Régis Gianas <[email protected]>
This exercise explains the concept of a suffix array and proposes a series of relatively easy questions that lead to a naive algorithm for building a suffix array. Then, it moves on to Manber and Myers' algorithm, which is much more subtle. Several building blocks are given so that the student can have a reasonable chance of success. Still, this is definitely a difficult algorithm. The difficulty lies probably more in the algorithmic aspects than in the actual programming task. I have placed several assertions in advance in the code so that the student has a good chance of detecting their own mistakes.