Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add array_permutations() #1013

Closed
wants to merge 3 commits into from

Conversation

ronnodas
Copy link
Contributor

@ronnodas ronnodas commented Jan 8, 2025

Completes a variant of #1001.

Copy link
Member

@phimuemue phimuemue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there, thanks for you effort.

I fear Permutations has some quirks that I do not want to entrench in our code. I think the main problem boils down to the fact that it uses many different states instead of always working on indices. (I see that it may be nice to sometimes avoid constructing it, but it appears to complicate the surrounding code.)

We should first try to clean up Permutations to the usual pattern where we "operate on indices and map indices onto elements", so that we can - afterwards - go with the existing PoolIndex (instead of a new PermIter).

Unless this clean-up is infeasible, I want to avoid new abstractions. Thus, I'd like to postpone this PR until we have evidence that Permutations is as simple as possible.

pub type ArrayPermutations<I, const K: usize> = PermutationsGeneric<I, [usize; K]>;
/// Iterator for const generic permutations returned by
/// [`.array_permutations()`](crate::Itertools::array_permutations)
pub type Permutations<I> = PermutationsGeneric<I, Vec<<I as Iterator>::Item>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the second generic parameter Vec<<I as Iterator>::Item> seems odd when PermutationsGeneric has [usize; K].

[Array]Combinations have Vec<usize> and [usize; K], respectively. Unless there's a good reason, we should adopt this pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an error, array_permutations() was only working on Iterator<Item = usize> before. Fixed now.

vals: LazyBuffer<I>,
state: PermutationState,
_item: PhantomData<Item>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the decision Vec<usize> vs [usize; K] should be propagated onto PermutationState.

Copy link
Contributor Author

@ronnodas ronnodas Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current algorithm, PermutationState::Loaded.cycles could have that type, but that field is not directly being used to index in the buffer, so it's a "coincidence".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, using a PoolIndex for cycles itself is wrong, but we already exploit the fact that the length is k (https://docs.rs/itertools/latest/src/itertools/permutations.rs.html#112). So it should possibly be PoolIdx<usize>::Item.

.chain(once(*min_n))
.map(|i| vals[i].clone())
.collect();
let item = Item::extract_start(vals, *k, *min_n);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a separate variant Buffered prevents us from using PoolIndex. Maybe always constructing indices might allow PoolIndex here, simplifying implementation and improving maintenance.

match state {
PermutationState::Start { k: 0 } => {
*state = PermutationState::End;
Some(Vec::new())
Some(Item::extract_start(vals, 0, 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see calling extract_start with 0 here has the merit to save one function, but it requires us to account for the special case len==0 in extract_start, right? Would unconditionally working on indices simplify and get rid of special cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that whatever code goes here needs to generically produce a [T; K] even though we know it will only ever be called if K == 0. We could make this a separate function that does whatever arbitrary thing for K > 0 (say buf.get_array([0; K])) since that code will never get called. The first thing I tried that doesn't work is the following:

impl PoolIndex for [usize; K]{
    ...
    fn empty_item() -> Option<Self::Item> {
        if K == 0 {
            Some([])
        } else {
            None
        }
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this attempt won't work. However, we could introduce PoolIndex::from_fn to do whatever we want and return either a Vec or an array.

}

fn extract_from_prefix<I: Iterator<Item = T>>(buf: &LazyBuffer<I>, indices: &[usize]) -> Self {
buf.get_array_from_fn(|i| indices[i])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this essentially buf.get_array(indices)? (Would support my theory that PermItem should actually be part of PoolIndex.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that when this function is called, there does not exist an instance of [usize; K] to call buf.get_array() on. We could do something like buf.get_array(indices.try_into().unwrap()) instead if that seems better.


/// A type that can be picked out from a pool or buffer of items from an inner iterator
/// and in a generic way given their indices.
pub trait PermItem<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this new trait really necessary? Couldn't we re-use/generalize/extend PoolIndex?

Copy link
Contributor Author

@ronnodas ronnodas Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I started with but it seems that the operations required for ArrayPermutations are different from the ones for {Array}Combinations. Even in the existing Permutations implementation you can see that there are as many instances of directly indexing the buffer as calling buf.get_at(). We could create an intermediate slice/array of indices and call versions of these methods moved to the PoolIndex trait but that seems like an unnecessary level of indirection.

Comment on lines +227 to +234
if len == 0 {
Vec::new()
} else {
(0..len - 1)
.chain(once(last))
.map(|i| buf[i].clone())
.collect()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Special cases that might go away if we always worked on indices.

@ronnodas
Copy link
Contributor Author

ronnodas commented Jan 8, 2025

My original intention was to reuse PoolIndex (as I wrote in #1001 (comment)) but that seems to actually be more complicated without tweaking the algorithm used to generate the indices. The crucial difference is that PermutationState::Loaded.indices does not have length K/k but n, the length of the original iterator. So you can't just make this field into an IndexPool. The actual array of indices that are looked up in the buffer is the length k prefix of this slice, as visible in the call extract_from_prefix(vals, &indices[0..k]).

I can think about modifying the algorithm to only keep the k relevant indices but if someone else wants to make that modification to Permutations I'm happy to redo the PR based on that change.

@phimuemue
Copy link
Member

Thanks for your reply. I experimented a bit (https://github.com/phimuemue/rust-itertools/tree/array_permutations_via_poolindex) and I think we should extend (and probably rename) PoolIndex.

@ronnodas
Copy link
Contributor Author

ronnodas commented Jan 9, 2025

Thanks!

Does it make sense to flip PoolIndex and instead of:

trait PoolIndex<T> {
    type Item;
    
    fn extract_item(&self, pool) -> Self::Item
}

have something like:

trait PoolOutput<T, Idx> {
    fn extract(pool, idx: &Self::Idx) -> Self;
}

Then you can have

impl PoolOutput<T, [usize]> for Vec<T> { ... }
impl PoolOutput<T, [usize; K]> for [T; K] { ... }

and even

struct IndexFn<F>;

impl<F: FnMut(usize) -> usize> PoolOutput<T, IndexFn<F>> for [T; K] {...}

impl<F: FnMut(usize) -> usize> PoolOutput<T, (IndexFn<F>, usize)> for Vec<T> {...}

instead of a from_fn method.

A big disadvantage would be that CombinationsGeneric would now need both the Idx and Item as parameters, since it has a field of type Idx and needs to have Item as an associated type to implement Iterator.

@phimuemue
Copy link
Member

Thanks!

Does it make sense to flip PoolIndex and instead of:

trait PoolIndex<T> {
    type Item;
    
    fn extract_item(&self, pool) -> Self::Item
}

have something like:

trait PoolOutput<T, Idx> {
    fn extract(pool, idx: &Self::Idx) -> Self;
}

Then you can have

impl PoolOutput<T, [usize]> for Vec<T> { ... }
impl PoolOutput<T, [usize; K]> for [T; K] { ... }

and even

struct IndexFn<F>;

impl<F: FnMut(usize) -> usize> PoolOutput<T, IndexFn<F>> for [T; K] {...}

impl<F: FnMut(usize) -> usize> PoolOutput<T, (IndexFn<F>, usize)> for Vec<T> {...}

instead of a from_fn method.

A big disadvantage would be that CombinationsGeneric would now need both the Idx and Item as parameters, since it has a field of type Idx and needs to have Item as an associated type to implement Iterator.

I can't reliably predict how simple the outcome would be, but my gut feeling is that PoolIndex is simpler than PoolIndex<T, ...> and PoolIndex::from_fn is simpler than impl PoolOutput<T, IndexFn>.

@ronnodas
Copy link
Contributor Author

ronnodas commented Jan 9, 2025

I've tried to make some simplifications at https://github.com/ronnodas/itertools/tree/array-permutations-attempt-2. I do agree that PoolIndex is not the best name any more (especially with it being the type of PermutationState::Loaded.cycles), but don't have a good alternative.

@phimuemue
Copy link
Member

phimuemue commented Jan 13, 2025

I do agree that PoolIndex is not the best name any more (especially with it being the type of PermutationState::Loaded.cycles), but don't have a good alternative.

Maybe ArrayOrVecHelper. I'm unsure if you'd really want impl ArrayOrVectorHelper for [usize; K] or rather struct ArrayHelper; impl ArrayOrVectorHelper for ArrayHelper, i.e. introduce dedicated structs for arrays and Vecs, respectively.

Note if you'd like to convert this into a PR (please excuse this note you did not ask for, but I want to clarify things before huge PRs): a916d37 is hard to review, because it lumps different things into one commit (moving PoolIndex to another file, introducing fn start and fn item_from_fn, changing Buffered::k to Buffered::indices, ...). #790 shows how öarge PRs could be structured.

@ronnodas ronnodas closed this Jan 13, 2025
@ronnodas
Copy link
Contributor Author

@phimuemue Thanks! Broke up into smaller commits and filed as #1014.

@ronnodas ronnodas deleted the array-methods branch January 13, 2025 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants