-
-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: implement Random Projections #332
Conversation
Contains two algorithms based on variants on the Johnson-lindenstrauss lemma: - Random projections with Gaussian coefficients - Sparse random projections with +/- 1 coefficients (multiplied by a scaling factor).
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #332 +/- ##
==========================================
- Coverage 36.18% 35.87% -0.32%
==========================================
Files 92 96 +4
Lines 6218 6303 +85
==========================================
+ Hits 2250 2261 +11
- Misses 3968 4042 +74 ☔ View full report in Codecov by Sentry. |
I've done a quick review; this looks good to me, but have requested @bytesnake also give it a look as he is probably more familiar with the algorithm side of things. |
algorithms/linfa-reduction/src/random_projection/sparse/algorithms.rs
Outdated
Show resolved
Hide resolved
algorithms/linfa-reduction/src/random_projection/gaussian/algorithms.rs
Outdated
Show resolved
Hide resolved
thank you for reviewing @relf @quietlychris |
RNG defaults to Xoshiro256Plus if not provided by user. Also added tests for minimum dimension using values from scikit-learn.
Thank you for the reviews, and @relf for the suggestions, I have implemented them. Changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution and the changes. Now, gaussian and sparse random projection codes look very alike, I am wondering if you could not refactor even further by using zero-sized types and a unique RandomProjection
generic type, something like:
struct Gaussian;
struct Sparse;
pub struct RandomProjectionValidParams<RandomMethod, R: Rng + Clone> {
pub params: RandomProjectionParamsInner,
pub rng: Option<R>,
pub method: std::marker::PhantomData<RandomMethod>,
}
pub struct RandomProjectionParams<RandomMethod, R: Rng + Clone>(
pub(crate) RandomProjectionValidParams<RandomMethod, R>,
);
pub struct RandomProjection<RandomMethod, F: Float> {
projection: Array2<F>,
method: std::marker::PhantomData<RandomMethod>,
}
pub struct GaussianRandomProjection<F: Float> = RandomProjection<Gaussian, F: Float>;
pub struct SparseRandomProjection<F: Float> = RandomProjection<Sparse, F: Float>;
impl<F, Rec, T, R> Fit<Rec, T, ReductionError> for RandomProjectionValidParams<Gausssian, R>
where
F: Float,
Rec: Records<Elem = F>,
StandardNormal: Distribution<F>,
R: Rng + Clone,
{
type Object = RandomProjection<Gaussian, F>;
fn fit(&self, dataset: &linfa::DatasetBase<Rec, T>) -> Result<Self::Object, ReductionError> {...}
}
impl<F, Rec, T, R> Fit<Rec, T, ReductionError> for RandomProjectionValidParams<Sparse, R>
where
F: Float,
Rec: Records<Elem = F>,
StandardNormal: Distribution<F>,
R: Rng + Clone,
{
type Object = RandomProjection<Sparse, F>;
fn fit(&self, dataset: &linfa::DatasetBase<Rec, T>) -> Result<Self::Object, ReductionError> {...}
}
...
What do you think?
I think that's a very good suggestion, it will be easier to maintain than the previous approach using a macro to avoid code duplication. 6b9c2a4 implements a variation of this idea: all the logic has been refactored, and behavior depending on the projection method has been encapsulated in the |
This PR implements random projection techniques for dimensionality reduction, as seen in the
sklean.random_projection
module of scikit-learn