SplitAt implementation #316

atifaziz · 2017-06-14T17:21:52Z

This PR is an implementation of SplitAt as proposed in #315.

leandromoh · 2017-06-15T15:35:01Z

There is missing a test for infinite sequence. Something like:

[Test]
public void SplitAtWithInfiniteSequence()
{
	var naturals = MoreEnumerable.Generate(1, x => x + 1);

	var (first, second) = naturals.SplitAt(10);

	Assert.That(first, Is.EquivalentTo(Enumerable.Range(1, 10)));
	Assert.That(second.Take(10), Is.EquivalentTo(Enumerable.Range(11, 10)));
}

atifaziz · 2017-06-15T15:43:24Z

@leandromoh That test can never work. Note the remarks comments:

        /// <remarks>
        /// This method uses deferred execution semantics but iterates the
        /// <paramref name="source"/> entirely when either of the sequences
        /// resulting from split are iterated.
        /// </remarks>

Since the sequence will be iterated entirely, an infinite one will never end or stop when memory has been exhausted.

leandromoh · 2017-06-15T15:45:29Z

I replaced your definition by the following and all tests passed, including that which I comment for Infinite sequence. Would the following version be preferred after some optimizations?

public static TResult SplitAt<T, TResult>(
	this IEnumerable<T> source, int index,
	Func<IEnumerable<T>, IEnumerable<T>, TResult> resultSelector)
{
	if (source == null) throw new ArgumentNullException(nameof(source));
	if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));

	if (index <= 0)
		return resultSelector(Enumerable.Empty<T>(), source);

	if (source is ICollection<T> col && index >= col.Count)
		return resultSelector(source, Enumerable.Empty<T>());

	var list =  new List<T>();
	var cached = list.Concat(source.AsEnumerateOnce().Pipe(x => list.Add(x)));

	return resultSelector(cached.Take(index), cached.Skip(index));
}

where AsEnumerateOnce definition is

private static IEnumerable<T> AsEnumerateOnce<T>(this IEnumerable<T> source)
{
	IEnumerator<T> ector = null;

	return _(); IEnumerable<T> _()
	{
		ector = ector ?? source.GetEnumerator();

		while (ector.MoveNext())
			yield return ector.Current;

		ector.Dispose();
	}
}

atifaziz · 2017-06-15T16:25:08Z

@leandromoh So you completely changed the implementation to make it work with infinite sources. I'd say it's a little detail you missed out earlier. 😆 I have a hard time reasoning about the code now due to all the side-effects (Pipe) and will have to inspect more closely. I also see a mini Memoize in there too. Perhaps we should finish #211 that seems to have stalled again? It's a noble goal to try and work with infinite sequences but your proposal does make the second part somewhat inefficient due to the SkipWhile. It may be acceptable for small values of index but not otherwise. Also the implementation I'm proposing returns lists for the two parts, which will be efficient for any operators optimized for lists or collection. However, I'm not sure anyone should count on that; I consider it an an implementation detail at the moment.

Thanks for your suggestions & ideas. Keep 'em coming!

leandromoh · 2017-06-15T18:20:26Z

@leandromoh So you completely changed the implementation to make it work with infinite sources.

The idea was to create the simplest draft to pass in all tests. Complex code turns hard to reason about. As C.A.R. Hoare quote: “There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”

I have a hard time reasoning about the code now due to all the side-effects (Pipe) and will have inspect more closely.

Since my draft passed in all wroten test, I suppose it is a perfectly valid implementation.

It's a noble goal to try and work with infinite sequences

I think all LINQ operators must work with infinit sequences, except aggregation functions (count, max, min, aggregate, etc). otherwise the method has a limitation that is must not have. LINQ operators must be lazy as possible.

your proposal does make the second part somewhat inefficient due to the SkipWhile

As I said, its only a draft without all possible optmizations. If we take it as the current implementation for this PR, we can add more optimizations for it, and In the worst case, we can apply the mini memoize to it too and turn it efficient;

Since the sequence will be iterated entirely

I think it shoud be avoided, and the sequence must be as lazy iterated as possible. Imagine that source is long, for example 10.000 elements, and user just split it at 10 and consume only its 25 first elements (first = 10, and second = 15). the current implementation iterate 9975 elements for nothing. This dont make sense and user of course dont expect that. Of course, it is also not performatic.

the implementation I'm proposing returns lists for the two parts, which will be efficient for any operators optimized for lists or collection. However, I'm not sure anyone should count on that; I consider it an an implementation detail at the moment.

I think it is more important that methods works with infinit sequences than to have optmizations. Users are not too concerned about optimizations but expect method works with an infinit list if he is not iterating it entirely.

Thanks for your suggestions & ideas. Keep 'em coming!

you are welcome, I am here to help and contribute 😆

I also see a mini Memoize in there too. Perhaps we should finish #211 that seems to have stalled again?

Being honest, the fact of #211 had been submitted 6 or 7 months ago and it is not still be acceptable for merge made me a bit exhausted. I will be happy when it be released, but for now, I think it is better dont account or wait for it.

atifaziz · 2017-06-16T13:10:23Z

Love that quote by Hoare every time I read it.

Since my draft passed in all wroten test, I suppose it is a perfectly valid implementation.

True, but aren't you assuming that the tests are perfect? 😃 Look how in #319 I made a capital mistake because I was missing a test (not a problem of coverage).

I agree with all your points about staying as lazy as possible. I realized the limitations and implementation choices, which is why I called them out in the remarks sections. I'll reconsider based on your feedback.

Being honest, the fact of #211 had been submitted 6 or 7 months ago and it is not still be acceptable for merge made me a bit exhausted. I will be happy when it be released, but for now, I think it is better dont account or wait for it.

Sorry to hear that. If it's any consolation, it's been very consuming to review and fix.

I will be happy when it be released, but for now, I think it is better dont account or wait for it.

That's a shame because a lot of work & thought went into it. I feel now that my initial inclination to just go with the version in System.Interactive may have been the right call.

leandromoh · 2017-09-04T00:53:12Z

Sorry to hear that. If it's any consolation, it's been very consuming to review and fix.

No problem. All we have a life outside github, with duties, and priorities.

I feel now that my initial inclination to just go with the version in System.Interactive may have been the right call.

I think the problem is not about our work spent, but about define a MVP. The Memoize PR's release is stopped 4 months only because "Dispose should really mean Reset,".

leandromoh · 2017-10-08T00:43:10Z

@atifaziz Any news?

atifaziz · 2017-10-09T22:25:10Z

@leandromoh Not much news, I'm afraid. 😞 I don't know if you followed #337, but I think the idea of using a reader monad to express partial consumption would work here with large and infinite sequences. For example:

var source = MoreEnumerable.Generate(0, x => x + 1);
var (xs, ys) = source.SplitAt(10, ys => ys.Take(25));

So xs will have ≤ 10 elements and ys will have ≤ 25 and at most 35 elements of source will have been consumed even though it's an infinite sequence.

Merge conflicts resolved: - MoreLinq/MoreLinq.csproj - MoreLinq.Test/TestingSequence.cs

# Conflicts: # MoreLinq.Test/MoreLinq.Test.csproj

atifaziz · 2018-04-05T20:52:20Z

2a094a8 updates the implementation based on my revised proposal for #315 where split parts are returned as sub-sequences instead of a pivoted couple (tuple of 2) of parts.

📝 Notes:

Clean-up implementation
Consider changing offsets type to IEnumerable<int> (params int[] was out of laziness)
Add more test cases of various offsets
Optionally, optimise for corner cases where source can be returned verbatim, though this can be done post-merge of this PR

leandromoh · 2018-04-06T16:34:02Z

@atifaziz I liked a lot this new implementation. How about receive a predicate (Func<T, int, bool>) to indicate a split? This allow user split by other criterious. You can write the Index overload in terms of this new overload wrapping int[] offesets with the predicate.

Ex.: (e, Index) => offsets.Contains(Index).

atifaziz · 2018-05-18T15:52:00Z

I liked a lot this new implementation.

@leandromoh Glad to hear.

How about receive a predicate (Func<T, int, bool>) to indicate a split? This allow user split by other criterious.

It's an idea but have to be careful about avoiding confusion with Split and Segment.

leandromoh · 2018-05-18T19:49:57Z

I did not know that Segment can receive an predicate. Why dont write SplitAt in terms of Segment?

atifaziz · 2018-05-22T08:16:52Z

Why dont write SplitAt in terms of Segment?

It would almost work but it can't be used because Segment doesn't always produce a sub-sequence. With SplitAt, if the source sequence is too short then you will still get a blank sub-sequence.

Orace · 2019-12-03T21:22:16Z

Dispatch #633 may have been used to implement this.

Add SplitAt (split a sequence in two at an index)

f88dd5c

atifaziz requested a review from fsateler June 14, 2017 17:21

atifaziz added 2 commits June 14, 2017 19:55

Add to README

773853c

Add to release notes

8b58bd2

atifaziz added 6 commits October 10, 2017 08:35

Merge branch 'master' into SplitAt

e65c5c4

Don't import System.Linq in test fixture file

2b99909

Merge branch 'master' into SplitAt

90877ad

Merge conflicts resolved: - MoreLinq/MoreLinq.csproj - MoreLinq.Test/TestingSequence.cs

Add SplitAt to operators list in package description

f11b8ea

TestingList.cs missing final newline

a7059bc

Delete trailing whitespace

13bdac6

atifaziz mentioned this pull request Apr 5, 2018

SplitAt (split a sequence in two at a specified index) #315

Open

atifaziz added 2 commits April 5, 2018 22:30

Merge remote-tracking branch 'upstream/master' into SplitAt

a68f981

# Conflicts: # MoreLinq.Test/MoreLinq.Test.csproj

Refactor to return split parts as sub-sequences

2a094a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SplitAt implementation #316

SplitAt implementation #316

atifaziz commented Jun 14, 2017

leandromoh commented Jun 15, 2017 •

edited

Loading

atifaziz commented Jun 15, 2017

leandromoh commented Jun 15, 2017 •

edited

Loading

atifaziz commented Jun 15, 2017 •

edited

Loading

leandromoh commented Jun 15, 2017 •

edited

Loading

atifaziz commented Jun 16, 2017

leandromoh commented Sep 4, 2017 •

edited

Loading

leandromoh commented Oct 8, 2017 •

edited

Loading

atifaziz commented Oct 9, 2017

atifaziz commented Apr 5, 2018

leandromoh commented Apr 6, 2018 •

edited

Loading

atifaziz commented May 18, 2018

leandromoh commented May 18, 2018

atifaziz commented May 22, 2018

Orace commented Dec 3, 2019

SplitAt implementation #316

Are you sure you want to change the base?

SplitAt implementation #316

Conversation

atifaziz commented Jun 14, 2017

leandromoh commented Jun 15, 2017 • edited Loading

atifaziz commented Jun 15, 2017

leandromoh commented Jun 15, 2017 • edited Loading

atifaziz commented Jun 15, 2017 • edited Loading

leandromoh commented Jun 15, 2017 • edited Loading

atifaziz commented Jun 16, 2017

leandromoh commented Sep 4, 2017 • edited Loading

leandromoh commented Oct 8, 2017 • edited Loading

atifaziz commented Oct 9, 2017

atifaziz commented Apr 5, 2018

leandromoh commented Apr 6, 2018 • edited Loading

atifaziz commented May 18, 2018

leandromoh commented May 18, 2018

atifaziz commented May 22, 2018

Orace commented Dec 3, 2019

leandromoh commented Jun 15, 2017 •

edited

Loading

leandromoh commented Jun 15, 2017 •

edited

Loading

atifaziz commented Jun 15, 2017 •

edited

Loading

leandromoh commented Jun 15, 2017 •

edited

Loading

leandromoh commented Sep 4, 2017 •

edited

Loading

leandromoh commented Oct 8, 2017 •

edited

Loading

leandromoh commented Apr 6, 2018 •

edited

Loading