Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SplitAt implementation #316

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

SplitAt implementation #316

wants to merge 11 commits into from

Conversation

atifaziz
Copy link
Member

This PR is an implementation of SplitAt as proposed in #315.

@atifaziz atifaziz requested a review from fsateler June 14, 2017 17:21
@leandromoh
Copy link
Collaborator

leandromoh commented Jun 15, 2017

There is missing a test for infinite sequence. Something like:

[Test]
public void SplitAtWithInfiniteSequence()
{
	var naturals = MoreEnumerable.Generate(1, x => x + 1);

	var (first, second) = naturals.SplitAt(10);

	Assert.That(first, Is.EquivalentTo(Enumerable.Range(1, 10)));
	Assert.That(second.Take(10), Is.EquivalentTo(Enumerable.Range(11, 10)));
}

@atifaziz
Copy link
Member Author

@leandromoh That test can never work. Note the remarks comments:

        /// <remarks>
        /// This method uses deferred execution semantics but iterates the
        /// <paramref name="source"/> entirely when either of the sequences
        /// resulting from split are iterated.
        /// </remarks>

Since the sequence will be iterated entirely, an infinite one will never end or stop when memory has been exhausted.

@leandromoh
Copy link
Collaborator

leandromoh commented Jun 15, 2017

I replaced your definition by the following and all tests passed, including that which I comment for Infinite sequence. Would the following version be preferred after some optimizations?

public static TResult SplitAt<T, TResult>(
	this IEnumerable<T> source, int index,
	Func<IEnumerable<T>, IEnumerable<T>, TResult> resultSelector)
{
	if (source == null) throw new ArgumentNullException(nameof(source));
	if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));

	if (index <= 0)
		return resultSelector(Enumerable.Empty<T>(), source);

	if (source is ICollection<T> col && index >= col.Count)
		return resultSelector(source, Enumerable.Empty<T>());

	var list =  new List<T>();
	var cached = list.Concat(source.AsEnumerateOnce().Pipe(x => list.Add(x)));

	return resultSelector(cached.Take(index), cached.Skip(index));
}

where AsEnumerateOnce definition is

private static IEnumerable<T> AsEnumerateOnce<T>(this IEnumerable<T> source)
{
	IEnumerator<T> ector = null;

	return _(); IEnumerable<T> _()
	{
		ector = ector ?? source.GetEnumerator();

		while (ector.MoveNext())
			yield return ector.Current;

		ector.Dispose();
	}
}

@atifaziz
Copy link
Member Author

atifaziz commented Jun 15, 2017

@leandromoh So you completely changed the implementation to make it work with infinite sources. I'd say it's a little detail you missed out earlier. 😆 I have a hard time reasoning about the code now due to all the side-effects (Pipe) and will have to inspect more closely. I also see a mini Memoize in there too. Perhaps we should finish #211 that seems to have stalled again? It's a noble goal to try and work with infinite sequences but your proposal does make the second part somewhat inefficient due to the SkipWhile. It may be acceptable for small values of index but not otherwise. Also the implementation I'm proposing returns lists for the two parts, which will be efficient for any operators optimized for lists or collection. However, I'm not sure anyone should count on that; I consider it an an implementation detail at the moment.

Thanks for your suggestions & ideas. Keep 'em coming!

@leandromoh
Copy link
Collaborator

leandromoh commented Jun 15, 2017

@leandromoh So you completely changed the implementation to make it work with infinite sources.

The idea was to create the simplest draft to pass in all tests. Complex code turns hard to reason about. As C.A.R. Hoare quote: “There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.

I have a hard time reasoning about the code now due to all the side-effects (Pipe) and will have inspect more closely.

Since my draft passed in all wroten test, I suppose it is a perfectly valid implementation.

It's a noble goal to try and work with infinite sequences

I think all LINQ operators must work with infinit sequences, except aggregation functions (count, max, min, aggregate, etc). otherwise the method has a limitation that is must not have. LINQ operators must be lazy as possible.

your proposal does make the second part somewhat inefficient due to the SkipWhile

As I said, its only a draft without all possible optmizations. If we take it as the current implementation for this PR, we can add more optimizations for it, and In the worst case, we can apply the mini memoize to it too and turn it efficient;

Since the sequence will be iterated entirely

I think it shoud be avoided, and the sequence must be as lazy iterated as possible. Imagine that source is long, for example 10.000 elements, and user just split it at 10 and consume only its 25 first elements (first = 10, and second = 15). the current implementation iterate 9975 elements for nothing. This dont make sense and user of course dont expect that. Of course, it is also not performatic.

the implementation I'm proposing returns lists for the two parts, which will be efficient for any operators optimized for lists or collection. However, I'm not sure anyone should count on that; I consider it an an implementation detail at the moment.

I think it is more important that methods works with infinit sequences than to have optmizations. Users are not too concerned about optimizations but expect method works with an infinit list if he is not iterating it entirely.

Thanks for your suggestions & ideas. Keep 'em coming!

you are welcome, I am here to help and contribute 😆

I also see a mini Memoize in there too. Perhaps we should finish #211 that seems to have stalled again?

Being honest, the fact of #211 had been submitted 6 or 7 months ago and it is not still be acceptable for merge made me a bit exhausted. I will be happy when it be released, but for now, I think it is better dont account or wait for it.

@atifaziz
Copy link
Member Author

Love that quote by Hoare every time I read it.

Since my draft passed in all wroten test, I suppose it is a perfectly valid implementation.

True, but aren't you assuming that the tests are perfect? 😃 Look how in #319 I made a capital mistake because I was missing a test (not a problem of coverage).

I agree with all your points about staying as lazy as possible. I realized the limitations and implementation choices, which is why I called them out in the remarks sections. I'll reconsider based on your feedback.

Being honest, the fact of #211 had been submitted 6 or 7 months ago and it is not still be acceptable for merge made me a bit exhausted. I will be happy when it be released, but for now, I think it is better dont account or wait for it.

Sorry to hear that. If it's any consolation, it's been very consuming to review and fix.

I will be happy when it be released, but for now, I think it is better dont account or wait for it.

That's a shame because a lot of work & thought went into it. I feel now that my initial inclination to just go with the version in System.Interactive may have been the right call.

@leandromoh
Copy link
Collaborator

leandromoh commented Sep 4, 2017

Sorry to hear that. If it's any consolation, it's been very consuming to review and fix.

No problem. All we have a life outside github, with duties, and priorities.

I feel now that my initial inclination to just go with the version in System.Interactive may have been the right call.

I think the problem is not about our work spent, but about define a MVP. The Memoize PR's release is stopped 4 months only because "Dispose should really mean Reset,".

@leandromoh
Copy link
Collaborator

leandromoh commented Oct 8, 2017

@atifaziz Any news?

@atifaziz
Copy link
Member Author

atifaziz commented Oct 9, 2017

@leandromoh Not much news, I'm afraid. 😞 I don't know if you followed #337, but I think the idea of using a reader monad to express partial consumption would work here with large and infinite sequences. For example:

var source = MoreEnumerable.Generate(0, x => x + 1);
var (xs, ys) = source.SplitAt(10, ys => ys.Take(25));

So xs will have ≤ 10 elements and ys will have ≤ 25 and at most 35 elements of source will have been consumed even though it's an infinite sequence.

@atifaziz
Copy link
Member Author

atifaziz commented Apr 5, 2018

2a094a8 updates the implementation based on my revised proposal for #315 where split parts are returned as sub-sequences instead of a pivoted couple (tuple of 2) of parts.

📝 Notes:

  • Clean-up implementation
  • Consider changing offsets type to IEnumerable<int> (params int[] was out of laziness)
  • Add more test cases of various offsets
  • Optionally, optimise for corner cases where source can be returned verbatim, though this can be done post-merge of this PR

@leandromoh
Copy link
Collaborator

leandromoh commented Apr 6, 2018

@atifaziz I liked a lot this new implementation. How about receive a predicate (Func<T, int, bool>) to indicate a split? This allow user split by other criterious. You can write the Index overload in terms of this new overload wrapping int[] offesets with the predicate.

Ex.: (e, Index) => offsets.Contains(Index).

@atifaziz
Copy link
Member Author

I liked a lot this new implementation.

@leandromoh Glad to hear.

How about receive a predicate (Func<T, int, bool>) to indicate a split? This allow user split by other criterious.

It's an idea but have to be careful about avoiding confusion with Split and Segment.

@leandromoh
Copy link
Collaborator

I did not know that Segment can receive an predicate. Why dont write SplitAt in terms of Segment?

@atifaziz
Copy link
Member Author

Why dont write SplitAt in terms of Segment?

It would almost work but it can't be used because Segment doesn't always produce a sub-sequence. With SplitAt, if the source sequence is too short then you will still get a blank sub-sequence.

@Orace
Copy link
Contributor

Orace commented Dec 3, 2019

Dispatch #633 may have been used to implement this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants