add `array_permutations()` #1013

ronnodas · 2025-01-08T12:29:35Z

Completes a variant of #1001.

phimuemue

Hi there, thanks for you effort.

I fear Permutations has some quirks that I do not want to entrench in our code. I think the main problem boils down to the fact that it uses many different states instead of always working on indices. (I see that it may be nice to sometimes avoid constructing it, but it appears to complicate the surrounding code.)

We should first try to clean up Permutations to the usual pattern where we "operate on indices and map indices onto elements", so that we can - afterwards - go with the existing PoolIndex (instead of a new PermIter).

Unless this clean-up is infeasible, I want to avoid new abstractions. Thus, I'd like to postpone this PR until we have evidence that Permutations is as simple as possible.

phimuemue · 2025-01-08T19:57:12Z

src/permutations.rs

+pub type ArrayPermutations<I, const K: usize> = PermutationsGeneric<I, [usize; K]>;
+/// Iterator for const generic permutations returned by
+/// [`.array_permutations()`](crate::Itertools::array_permutations)
+pub type Permutations<I> = PermutationsGeneric<I, Vec<<I as Iterator>::Item>>;


Having the second generic parameter Vec<<I as Iterator>::Item> seems odd when PermutationsGeneric has [usize; K].

[Array]Combinations have Vec<usize> and [usize; K], respectively. Unless there's a good reason, we should adopt this pattern.

This was an error, array_permutations() was only working on Iterator<Item = usize> before. Fixed now.

phimuemue · 2025-01-08T19:57:55Z

src/permutations.rs

    vals: LazyBuffer<I>,
    state: PermutationState,
+    _item: PhantomData<Item>,


I think the decision Vec<usize> vs [usize; K] should be propagated onto PermutationState.

With the current algorithm, PermutationState::Loaded.cycles could have that type, but that field is not directly being used to index in the buffer, so it's a "coincidence".

Yes, using a PoolIndex for cycles itself is wrong, but we already exploit the fact that the length is k (https://docs.rs/itertools/latest/src/itertools/permutations.rs.html#112). So it should possibly be PoolIdx<usize>::Item.

phimuemue · 2025-01-08T20:04:01Z

src/permutations.rs

-                        .chain(once(*min_n))
-                        .map(|i| vals[i].clone())
-                        .collect();
+                    let item = Item::extract_start(vals, *k, *min_n);


Having a separate variant Buffered prevents us from using PoolIndex. Maybe always constructing indices might allow PoolIndex here, simplifying implementation and improving maintenance.

phimuemue · 2025-01-08T20:13:03Z

src/permutations.rs

        match state {
            PermutationState::Start { k: 0 } => {
                *state = PermutationState::End;
-                Some(Vec::new())
+                Some(Item::extract_start(vals, 0, 0))


I see calling extract_start with 0 here has the merit to save one function, but it requires us to account for the special case len==0 in extract_start, right? Would unconditionally working on indices simplify and get rid of special cases?

The issue is that whatever code goes here needs to generically produce a [T; K] even though we know it will only ever be called if K == 0. We could make this a separate function that does whatever arbitrary thing for K > 0 (say buf.get_array([0; K])) since that code will never get called. The first thing I tried that doesn't work is the following:

impl PoolIndex for [usize; K]{ ... fn empty_item() -> Option<Self::Item> { if K == 0 { Some([]) } else { None } } }

Correct, this attempt won't work. However, we could introduce PoolIndex::from_fn to do whatever we want and return either a Vec or an array.

phimuemue · 2025-01-08T20:14:52Z

src/permutations.rs

+    }
+
+    fn extract_from_prefix<I: Iterator<Item = T>>(buf: &LazyBuffer<I>, indices: &[usize]) -> Self {
+        buf.get_array_from_fn(|i| indices[i])


Isn't this essentially buf.get_array(indices)? (Would support my theory that PermItem should actually be part of PoolIndex.)

Note that when this function is called, there does not exist an instance of [usize; K] to call buf.get_array() on. We could do something like buf.get_array(indices.try_into().unwrap()) instead if that seems better.

phimuemue · 2025-01-08T20:15:28Z

src/permutations.rs

+
+/// A type that can be picked out from a pool or buffer of items from an inner iterator
+/// and in a generic way given their indices.
+pub trait PermItem<T> {


Is this new trait really necessary? Couldn't we re-use/generalize/extend PoolIndex?

That is what I started with but it seems that the operations required for ArrayPermutations are different from the ones for {Array}Combinations. Even in the existing Permutations implementation you can see that there are as many instances of directly indexing the buffer as calling buf.get_at(). We could create an intermediate slice/array of indices and call versions of these methods moved to the PoolIndex trait but that seems like an unnecessary level of indirection.

phimuemue · 2025-01-08T20:16:07Z

src/permutations.rs

+        if len == 0 {
+            Vec::new()
+        } else {
+            (0..len - 1)
+                .chain(once(last))
+                .map(|i| buf[i].clone())
+                .collect()
+        }


Special cases that might go away if we always worked on indices.

ronnodas · 2025-01-08T20:53:26Z

My original intention was to reuse PoolIndex (as I wrote in #1001 (comment)) but that seems to actually be more complicated without tweaking the algorithm used to generate the indices. The crucial difference is that PermutationState::Loaded.indices does not have length K/k but n, the length of the original iterator. So you can't just make this field into an IndexPool. The actual array of indices that are looked up in the buffer is the length k prefix of this slice, as visible in the call extract_from_prefix(vals, &indices[0..k]).

I can think about modifying the algorithm to only keep the k relevant indices but if someone else wants to make that modification to Permutations I'm happy to redo the PR based on that change.

phimuemue · 2025-01-09T13:27:38Z

Thanks for your reply. I experimented a bit (https://github.com/phimuemue/rust-itertools/tree/array_permutations_via_poolindex) and I think we should extend (and probably rename) PoolIndex.

ronnodas · 2025-01-09T13:47:10Z

Thanks!

Does it make sense to flip PoolIndex and instead of:

trait PoolIndex<T> {
    type Item;
    
    fn extract_item(&self, pool) -> Self::Item
}

have something like:

trait PoolOutput<T, Idx> {
    fn extract(pool, idx: &Self::Idx) -> Self;
}

Then you can have

impl PoolOutput<T, [usize]> for Vec<T> { ... }
impl PoolOutput<T, [usize; K]> for [T; K] { ... }

and even

struct IndexFn<F>;

impl<F: FnMut(usize) -> usize> PoolOutput<T, IndexFn<F>> for [T; K] {...}

impl<F: FnMut(usize) -> usize> PoolOutput<T, (IndexFn<F>, usize)> for Vec<T> {...}

instead of a from_fn method.

A big disadvantage would be that CombinationsGeneric would now need both the Idx and Item as parameters, since it has a field of type Idx and needs to have Item as an associated type to implement Iterator.

phimuemue · 2025-01-09T14:15:22Z

Thanks!

Does it make sense to flip PoolIndex and instead of:
trait PoolIndex<T> {
    type Item;
    
    fn extract_item(&self, pool) -> Self::Item
}
have something like:
trait PoolOutput<T, Idx> {
    fn extract(pool, idx: &Self::Idx) -> Self;
}
Then you can have
impl PoolOutput<T, [usize]> for Vec<T> { ... }
impl PoolOutput<T, [usize; K]> for [T; K] { ... }
and even
struct IndexFn<F>;

impl<F: FnMut(usize) -> usize> PoolOutput<T, IndexFn<F>> for [T; K] {...}

impl<F: FnMut(usize) -> usize> PoolOutput<T, (IndexFn<F>, usize)> for Vec<T> {...}
instead of a from_fn method.

A big disadvantage would be that CombinationsGeneric would now need both the Idx and Item as parameters, since it has a field of type Idx and needs to have Item as an associated type to implement Iterator.

I can't reliably predict how simple the outcome would be, but my gut feeling is that PoolIndex is simpler than PoolIndex<T, ...> and PoolIndex::from_fn is simpler than impl PoolOutput<T, IndexFn>.

ronnodas · 2025-01-09T17:09:16Z

I've tried to make some simplifications at https://github.com/ronnodas/itertools/tree/array-permutations-attempt-2. I do agree that PoolIndex is not the best name any more (especially with it being the type of PermutationState::Loaded.cycles), but don't have a good alternative.

phimuemue · 2025-01-13T11:05:46Z

I do agree that PoolIndex is not the best name any more (especially with it being the type of PermutationState::Loaded.cycles), but don't have a good alternative.

Maybe ArrayOrVecHelper. I'm unsure if you'd really want impl ArrayOrVectorHelper for [usize; K] or rather struct ArrayHelper; impl ArrayOrVectorHelper for ArrayHelper, i.e. introduce dedicated structs for arrays and Vecs, respectively.

Note if you'd like to convert this into a PR (please excuse this note you did not ask for, but I want to clarify things before huge PRs): a916d37 is hard to review, because it lumps different things into one commit (moving PoolIndex to another file, introducing fn start and fn item_from_fn, changing Buffered::k to Buffered::indices, ...). #790 shows how öarge PRs could be structured.

ronnodas · 2025-01-13T12:32:44Z

@phimuemue Thanks! Broke up into smaller commits and filed as #1014.

add array_permutations()

0e4a563

phimuemue requested changes Jan 8, 2025

View reviewed changes

ronnodas added 2 commits January 8, 2025 22:25

fixed type alias

76c5ce6

test permutations on a type other than usize

1959799

ronnodas closed this Jan 13, 2025

ronnodas deleted the array-methods branch January 13, 2025 12:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `array_permutations()` #1013

add `array_permutations()` #1013

ronnodas commented Jan 8, 2025

phimuemue left a comment •

edited

Loading

phimuemue Jan 8, 2025

ronnodas Jan 8, 2025

phimuemue Jan 8, 2025

ronnodas Jan 8, 2025 •

edited

Loading

phimuemue Jan 9, 2025

phimuemue Jan 8, 2025

phimuemue Jan 8, 2025

ronnodas Jan 8, 2025

phimuemue Jan 9, 2025

phimuemue Jan 8, 2025

ronnodas Jan 8, 2025

phimuemue Jan 8, 2025

ronnodas Jan 8, 2025 •

edited

Loading

phimuemue Jan 8, 2025

ronnodas commented Jan 8, 2025

phimuemue commented Jan 9, 2025

ronnodas commented Jan 9, 2025 •

edited

Loading

phimuemue commented Jan 9, 2025

ronnodas commented Jan 9, 2025

phimuemue commented Jan 13, 2025 •

edited

Loading

ronnodas commented Jan 13, 2025

add array_permutations() #1013

add array_permutations() #1013

Conversation

ronnodas commented Jan 8, 2025

phimuemue left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ronnodas Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ronnodas Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ronnodas commented Jan 8, 2025

phimuemue commented Jan 9, 2025

ronnodas commented Jan 9, 2025 • edited Loading

phimuemue commented Jan 9, 2025

ronnodas commented Jan 9, 2025

phimuemue commented Jan 13, 2025 • edited Loading

ronnodas commented Jan 13, 2025

add `array_permutations()` #1013

add `array_permutations()` #1013

phimuemue left a comment •

edited

Loading

ronnodas Jan 8, 2025 •

edited

Loading

ronnodas Jan 8, 2025 •

edited

Loading

ronnodas commented Jan 9, 2025 •

edited

Loading

phimuemue commented Jan 13, 2025 •

edited

Loading