Skip to content

Latest commit

 

History

History
356 lines (225 loc) · 9.91 KB

scaling.md

File metadata and controls

356 lines (225 loc) · 9.91 KB

module scaling

Operators that scaling input distributions in the phenotype simulation.

Classes:

  • Clip: A node that clips the input to be greater than or equal to some minimum value and/or less than or equal to some maximum value.

  • MinMaxScaler: A node that scales the input to be between 0 and 1 using the minimum and maximum values of the input.

  • StandardScaler: A node that scales the input to have mean 0 and standard deviation 1 using the mean and standard deviation of the input.

  • RobustScaler: A node that scales the input to have median 0 and interquartile range 1 using the median and interquartile range of the input.


class Clip

Operator node that clips the input based on a min and/or max value(s).

If a minimum value is provided, then all values less than the minimum value are set to the minimum value. If a maximum value is provided, then all values greater than the maximum value are set to the maximum value.

Example:

        >>> vals = np.array([
                [1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]
        ])

        >>> clip = Clip("clip", "vals", min_val=2, max_val=8)
        >>> clip(vals)
        array([[2, 2, 3],
                [4, 5, 6],
                [7, 8, 8]])

        >>> clip = Clip("clip", "vals", min_val=4)
        >>> clip(vals)
        array([[4, 4, 4],
                [4, 5, 6],
                [7, 8, 9]])

        >>> clip = Clip("clip", "vals", max_val=6)
        >>> clip(vals)
        array([[1, 2, 3],
                [4, 5, 6],
                [6, 6, 6]])

method __init__

__init__(
    alias: str,
    input_alias: str,
    min_val: float = None,
    max_val: float = None
)

Initialize Clip node.

Args:

  • alias: The alias of the node.
  • input_alias: The alias of the input node.
  • min_val (float, default None): The minimum value to clip to.
  • max_val (float, default None): The maximum value to clip to.

method run

run(input_vals)

Return the input clipped to the min and/or max value(s).


class MinMaxScaler

Operator node that scales the input to be between 0 and 1.

The scaling is based on the minimum and maximum values of the input, such that the minimum value of the input is mapped to 0 and the maximum value of the input is mapped to 1. All other values are scaled linearly between 0 and 1.

Scaling is done either by feature or among all features based on the 'by_feat' argument.

Example:

        >>> vals = np.array([
                [1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]
        ])

        >>> mms_by_feat = MinMaxScaler("mms", "vals", by_feat=True)
        >>> mms_all = MinMaxScaler("mms", "vals", by_feat=False)

        >>> print(mms_by_feat(vals))
        array([[0. , 0.5, 1. ],
                [0. , 0.5, 1. ],
                [0. , 0.5, 1. ]])
        >>> print(mms_all(vals))
        array([[0.   , 0.125, 0.25 ],
                [0.375, 0.5  , 0.625],
                [0.75 , 0.875, 1.   ]])

method __init__

__init__(alias: str, input_alias: str, by_feat: bool = True)

Initialize MinMaxScaler node.

Args:

  • alias: The alias of the node.
  • input_alias: The alias of the input node.
  • by_feat (bool, default True): Whether to scale by feature or among all features.

method run

run(input_vals)

Return the input scaled to be between 0 and 1.

Args:

  • input_vals: The input values to scale.

Returns: The input scaled to be between 0 and 1.


class StandardScaler

Operator that scales input to have mean 0 and standard deviation 1.

Scaling is either done by feature or among all features based on the 'by_feat' argument.

Example:

        >>> vals = np.array([
                [1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]
        ])
        >>> std_scaler_by_feat = StandardScaler(
                "std_scaler", "vals", by_feat=True
        )
        >>> std_scaler_all = StandardScaler(
                "std_scaler", "vals", by_feat=False
        )

        >>> by_feat_out = std_scaler_by_feat(vals)
        >>> all_out = std_scaler_all(vals)

        >>> by_feat_out
        array([[-1.225,  0.   ,  1.225],
                [-1.225,  0.   ,  1.225],
                [-1.225,  0.   ,  1.225]])
        >>> all_out
        array([[-1.549, -1.162, -0.775],
                [-0.387,  0.   ,  0.387],
                [ 0.775,  1.162,  1.549]])
        
        >>> by_feat_out.mean(1)
        array([0., 0., 0.])
        >>> by_feat_out.std(1)
        array([1., 1., 1.])

        >>> all_out.mean(1)
        array([-1.162,  0.   ,  1.162])
        >>> all_out.mean()
        0.0
        >>> all_out.std()
        1.0


<a href="../../pheno_sim/func_nodes/scaling.py#L206"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `__init__`

```python
__init__(alias: str, input_alias: str, by_feat: bool = True)

Initialize StandardScaler node.

Args:

  • alias: The alias of the node.
  • input_alias: The alias of the input node.
  • by_feat (bool, default True): Whether to scale by feature or among all features.

method run

run(input_vals)

Scale the input to have mean 0 and standard deviation 1.


class RobustScaler

Operator that scales input to have median 0 and interquartile range 1.

Output interquartile range can be changed by using the 'out_iqr' argument. Output median can be changed by using the 'out_median' argument.

Scaling is either done by feature or among all features based on the 'by_feat' argument.

Example:

        >>> vals = np.array([
                [1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]
        ])
        >>> robust_by_feat = RobustScaler("robust", "vals", by_feat=True)
        >>> robust_all = RobustScaler("robust", "vals", by_feat=False)

        >>> robust_by_feat(vals)
        array([[-1.,  0.,  1.],
                [-1.,  0.,  1.],
                [-1.,  0.,  1.]])
        >>> robust_all(vals)
        array([[-1.  , -0.75, -0.5 ],
                [-0.25,  0.  ,  0.25],
                [ 0.5 ,  0.75,  1.  ]])

        >>> extreme_vals = np.array(
                np.random.normal(0, 10, 1000).tolist() + [-1000000000000]
        )
        >>> robust_by_feat(extreme_vals)
        array([[-7.454e-01, -6.908e-01, ..., -1.848e+00, -7.676e+10]])

        >>> vals = np.random.uniform(0, 10, size=(1000, 5))
        >>> robust_scaled = RobustScaler(
                "robust", "vals", out_iqr=.5, out_median=1
        )
        >>> scaled_vals = robust_scaled(vals)
        >>> np.median(scaled_vals, axis=0)
        array([1., 1., 1., 1., 1.])
        >>> iqr(scaled_vals, axis=0)
        array([0.608, 0.546, 0.528, 0.483, 0.571])

method __init__

__init__(
    alias: str,
    input_alias: str,
    by_feat: bool = True,
    out_iqr: float = 1.0,
    out_median: float = 0.0
)

Initialize RobustScaler node.

Args:

  • alias: The alias of the node.
  • input_alias: The alias of the input node.
  • by_feat (bool, default True): Whether to scale by feature or among all features.
  • out_iqr (float, default 1.0): The interquartile range of the output.
  • out_median (float, default 0.0): The median of the output.

method run

run(input_vals)

Scale the input to have median 0 and interquartile range 1.


This file was automatically generated via lazydocs.