New exercise: building suffix arrays. #27

fpottier · 2020-08-26T13:03:54Z

This exercise explains the concept of a suffix array and proposes a series of relatively easy questions that lead to a naive algorithm for building a suffix array. Then, it moves on to Manber and Myers' algorithm, which is much more subtle. Several building blocks are given so that the student can have a reasonable chance of success. Still, this is definitely a difficult algorithm. The difficulty lies probably more in the algorithmic aspects than in the actual programming task. I have placed several assertions in advance in the code so that the student has a good chance of detecting their own mistakes.

exercises/building_suffix_arrays/descr.md

yurug · 2020-12-28T17:14:44Z

exercises/building_suffix_arrays/descr.md

+move during this stage. Indeed, these suffixes have fewer than *h* characters,
+so, as far as they are concerned, *h*-order and *2h*-order are the same thing.
+Because *a* is *h*-sorted, each of these suffixes has already reached its
+final position in the array, and inhabits a singleton *h*-bucket.


Suggested change

final position in the array, and inhabits a singleton *h*-bucket.

final position in the array, and inhabits a singleton *h*-bucket in the final array.

I am not sure about this suggestion. I think each index inhabits a singleton h-bucket both in the current array and in the final array. I am not sure if adding "in the final array" clarifies anything. Is something presently unclear?

yurug · 2020-12-28T17:17:17Z

exercises/building_suffix_arrays/descr.md

+
+Thus, a naïve idea would be to implement stage *h* simply by sorting the array
+*a* using an off-the-shelf sorting algorithm, such as Heapsort or Mergesort.
+The total complexity of this stage would then be *O(n log n)*, which is not


Is your grader able to detect if the student has submitted such an algorithm?

No, the grader does not perform an explicit complexity check. The student's code is tested with relatively long strings (up to several hundred thousand characters) so its complexity must not be awful, or it will time out.

It's not clear how we would reliably measure the theoretical complexity of the code, as we do not have any obvious hooks. (Perhaps we could override the primitive array access operations and count them? But we would also need to prevent the student from calling library functions such as List.sort which perform array accesses without going through this wrapper.)

exercises/building_suffix_arrays/descr.md

yurug · 2020-12-30T08:01:23Z

exercises/building_suffix_arrays/descr.md

+boundary between the slots that have been written already and
+the slots that remain available.)
+
+At a high level of abstraction, the algorithm can be described as follows:


If that's easy to do, you could show how each of the algorithm's steps performs on the "Mississippi" running example.

I am not sure I am capable of doing this, and not sure it would be easy to follow... (I always have difficulty following animations, especially in the case of a complex algorithm like this.)

Co-authored-by: Yann Régis Gianas <[email protected]>

New exercise: building suffix arrays.

6ea527c

yurug reviewed Dec 6, 2020

View reviewed changes

exercises/building_suffix_arrays/descr.md Show resolved Hide resolved

yurug reviewed Dec 6, 2020

View reviewed changes

exercises/building_suffix_arrays/descr.md Show resolved Hide resolved

yurug reviewed Dec 28, 2020

View reviewed changes

exercises/building_suffix_arrays/descr.md Outdated Show resolved Hide resolved

yurug reviewed Dec 28, 2020

View reviewed changes

exercises/building_suffix_arrays/descr.md Outdated Show resolved Hide resolved

yurug reviewed Dec 28, 2020

View reviewed changes

exercises/building_suffix_arrays/descr.md Outdated Show resolved Hide resolved

yurug reviewed Dec 28, 2020

View reviewed changes

yurug reviewed Dec 30, 2020

View reviewed changes

exercises/building_suffix_arrays/descr.md Outdated Show resolved Hide resolved

yurug reviewed Dec 30, 2020

View reviewed changes

yurug force-pushed the master branch from e231f97 to 4625001 Compare December 30, 2020 15:37

fpottier and others added 6 commits January 10, 2021 16:49

Add assertions in [prefix] and [suffix].

10d0ca6

Point out that we have an example of a permutation stored in an array.

109d4f4

Add a dot.

cc29791

Co-authored-by: Yann Régis Gianas <[email protected]>

Add a pointer to pigeonhole sort.

c57523f

Co-authored-by: Yann Régis Gianas <[email protected]>

Typo.

cb1928d

Co-authored-by: Yann Régis Gianas <[email protected]>

Typo.

45fb181

Co-authored-by: Yann Régis Gianas <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New exercise: building suffix arrays. #27

New exercise: building suffix arrays. #27

fpottier commented Aug 26, 2020

yurug Dec 28, 2020 •

edited

Loading

fpottier Jan 10, 2021

yurug Dec 28, 2020

fpottier Jan 10, 2021

yurug Dec 30, 2020

fpottier Jan 10, 2021

	final position in the array, and inhabits a singleton h-bucket.
	final position in the array, and inhabits a singleton h-bucket in the final array.

New exercise: building suffix arrays. #27

Are you sure you want to change the base?

New exercise: building suffix arrays. #27

Conversation

fpottier commented Aug 26, 2020

yurug Dec 28, 2020 • edited Loading

Choose a reason for hiding this comment

fpottier Jan 10, 2021

Choose a reason for hiding this comment

yurug Dec 28, 2020

Choose a reason for hiding this comment

fpottier Jan 10, 2021

Choose a reason for hiding this comment

yurug Dec 30, 2020

Choose a reason for hiding this comment

fpottier Jan 10, 2021

Choose a reason for hiding this comment

yurug Dec 28, 2020 •

edited

Loading