-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] <static data imputation issue> #113
Comments
Thank you for reporting this issue! In the first instance, could you please try installing temporai in a conda environment with python 3.10 (rather than 3.11, which we haven't tested fully yet)? https://conda.io/projects/conda/en/latest/user-guide/install/windows.html then: conda create -n temporai-env python=3.10 -y
conda activate temporai-env
pip install temporai And let us know if the problem still happens. |
Thank you for your feedback. Now, I have used conda environment with python 3.10, and reran my code. The issue is still existing. I tried to figure out what happened by shrinking the static part of the data frame to one column. IF the column is numeric: My code to show the data
The result showed as below:Missing value count: 16 StaticSamples with data: sample_idx MedianIncomePerACS 828 rows × 1 columns
I got following error:
However, when I tried to add a categorical column in static data, without any missing value:Missing value count: 0 I got following error, when I tried to impute it:
When I combined these 2 columns as the new static part, the error is very similar as above:
|
any suggestion on how should I debug my code? |
Not sure quite yet, but will look into this over the next week or so and hopefully have this solved! |
Describe the bug
I created a TemporalPredictionDataset according to the tutorial. However, when I tried to do the static data imputation, there are always error reported. I tried different static_imputer: "mean", "MissForest", and"HypterImput, but they all gave me the same error message.
I followed your imputation tutorial, with following code:
from tempor import plugin_loader
dataset = my_datasource(with_missing=True, random_state=42).load()
print(dataset)
model = plugin_loader.get("preprocessing.imputation.static.static_tabular_imputer", static_imputer="mean")
print(model)
Note missingness in static data.
print("Missing value count:", dataset.static.dataframe().isnull().sum().sum()) # type: ignore
dataset.static
Note no more missingness in static data.
dataset = model.fit_transform(dataset) # Or call fit() then transform().
print("Missing value count:", dataset.static.dataframe().isnull().sum().sum()) # type: ignore
dataset.static
TypeError Traceback (most recent call last)
Cell In[59], line 3
1 # Note no more missingness in static data.
----> 3 dataset = model.fit_transform(dataset) # Or call fit() then transform().
5 print("Missing value count:", dataset.static.dataframe().isnull().sum().sum()) # type: ignore
7 dataset.static
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydantic\deprecated\decorator.py:55, in validate_arguments..validate..wrapper_function(*args, **kwargs)
53 @wraps(_func)
54 def wrapper_function(*args: Any, **kwargs: Any) -> Any:
---> 55 return vd.call(*args, **kwargs)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydantic\deprecated\decorator.py:150, in ValidatedFunction.call(self, *args, **kwargs)
148 def call(self, *args: Any, **kwargs: Any) -> Any:
149 m = self.init_model_instance(*args, **kwargs)
--> 150 return self.execute(m)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydantic\deprecated\decorator.py:222, in ValidatedFunction.execute(self, m)
220 return self.raw_function(*args_, **kwargs, **var_kwargs)
221 else:
--> 222 return self.raw_function(**d, **var_kwargs)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\tempor\methods\core_base_transformer.py:64, in BaseTransformer.fit_transform(self, data, *args, **kwargs)
53 """Fit the method to the data and transform it. Equivalent to calling
fit
and thentransform
.54
55 Args:
(...)
61 dataset.BaseDataset: The transformed dataset.
62 """
63 self.fit(data, *args, **kwargs)
---> 64 return self.transform(data, *args, **kwargs)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\tempor\methods\core_base_transformer.py:42, in BaseTransformer.transform(self, data, *args, **kwargs)
31 """Transforms the given data.
32
33 Args:
(...)
39 Any: The transformed data.
40 """
41 logger.debug(f"Calling _transform() implementation on {self.class.name}")
---> 42 transformed_data = self._transform(data, *args, **kwargs)
44 return transformed_data
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\tempor\methods\preprocessing\imputation\static\plugin_static_tabular_imputer.py:82, in StaticTabularImputer._transform(self, data, *args, **kwargs)
80 if data.static is not None:
81 static_data = data.static.dataframe()
---> 82 imputed_static_data = self.imputer.transform(static_data)
83 imputed_static_data.columns = static_data.columns
84 imputed_static_data.index = static_data.index
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\hyperimpute\plugins\core\base_plugin.py:132, in Plugin.transform(self, X)
130 def transform(self, X: pd.DataFrame) -> pd.DataFrame:
131 X = cast.to_dataframe(X)
--> 132 return pd.DataFrame(self._transform(X))
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\hyperimpute\plugins\imputers\plugin_ice.py:86, in IterativeChainedEquationsPlugin._transform(self, X)
85 def _transform(self, X: pd.DataFrame) -> pd.DataFrame:
---> 86 return self._model.transform(X)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\hyperimpute\plugins\core\base_plugin.py:132, in Plugin.transform(self, X)
130 def transform(self, X: pd.DataFrame) -> pd.DataFrame:
131 X = cast.to_dataframe(X)
--> 132 return pd.DataFrame(self._transform(X))
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\hyperimpute\plugins\imputers\plugin_hyperimpute.py:128, in HyperImputePlugin._transform(self, X)
127 def _transform(self, X: pd.DataFrame) -> pd.DataFrame:
--> 128 return self.model.fit_transform(X)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydantic\deprecated\decorator.py:55, in validate_arguments..validate..wrapper_function(*args, **kwargs)
53 @wraps(_func)
54 def wrapper_function(*args: Any, **kwargs: Any) -> Any:
---> 55 return vd.call(*args, **kwargs)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydantic\deprecated\decorator.py:150, in ValidatedFunction.call(self, *args, **kwargs)
148 def call(self, *args: Any, **kwargs: Any) -> Any:
149 m = self.init_model_instance(*args, **kwargs)
--> 150 return self.execute(m)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydantic\deprecated\decorator.py:222, in ValidatedFunction.execute(self, m)
220 return self.raw_function(*args_, **kwargs, **var_kwargs)
221 else:
--> 222 return self.raw_function(**d, **var_kwargs)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\hyperimpute\plugins\imputers_hyperimpute_internals.py:918, in IterativeErrorCorrection.fit_transform(self, X)
915 @validate_arguments(config=dict(arbitrary_types_allowed=True))
916 def fit_transform(self, X: pd.DataFrame) -> pd.DataFrame:
917 # Run imputation
--> 918 X = self._setup(X)
920 Xt_init = self._initial_imputation(X)
921 Xt_init.columns = X.columns
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydantic\deprecated\decorator.py:55, in validate_arguments..validate..wrapper_function(*args, **kwargs)
53 @wraps(_func)
54 def wrapper_function(*args: Any, **kwargs: Any) -> Any:
---> 55 return vd.call(*args, **kwargs)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydantic\deprecated\decorator.py:150, in ValidatedFunction.call(self, *args, **kwargs)
148 def call(self, *args: Any, **kwargs: Any) -> Any:
149 m = self.init_model_instance(*args, **kwargs)
--> 150 return self.execute(m)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydantic\deprecated\decorator.py:222, in ValidatedFunction.execute(self, m)
220 return self.raw_function(*args_, **kwargs, **var_kwargs)
221 else:
--> 222 return self.raw_function(**d, **var_kwargs)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\hyperimpute\plugins\imputers_hyperimpute_internals.py:703, in IterativeErrorCorrection._setup(self, X)
700 existing_vals = X[col][X[col].notnull()]
702 le = LabelEncoder()
--> 703 X.loc[X[col].notnull(), col] = le.fit_transform(existing_vals).astype(
704 int
705 )
706 self.encoders[col] = le
708 self.limits = {}
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexing.py:885, in _LocationIndexer.setitem(self, key, value)
882 self._has_valid_setitem_indexer(key)
884 iloc = self if self.name == "iloc" else self.obj.iloc
--> 885 iloc._setitem_with_indexer(indexer, value, self.name)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexing.py:1893, in _iLocIndexer._setitem_with_indexer(self, indexer, value, name)
1890 # align and set the values
1891 if take_split_path:
1892 # We have to operate column-wise
-> 1893 self._setitem_with_indexer_split_path(indexer, value, name)
1894 else:
1895 self._setitem_single_block(indexer, value, name)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexing.py:1937, in _iLocIndexer._setitem_with_indexer_split_path(self, indexer, value, name)
1933 self._setitem_with_indexer_2d_value(indexer, value)
1935 elif len(ilocs) == 1 and lplane_indexer == len(value) and not is_scalar(pi):
1936 # We are setting multiple rows in a single column.
-> 1937 self._setitem_single_column(ilocs[0], value, pi)
1939 elif len(ilocs) == 1 and 0 != lplane_indexer != len(value):
1940 # We are trying to set N values into M entries of a single
1941 # column, which is invalid for N != M
1942 # Exclude zero-len for e.g. boolean masking that is all-false
1944 if len(value) == 1 and not is_integer(info_axis):
1945 # This is a case like df.iloc[:3, [1]] = [0]
1946 # where we treat as df.iloc[:3, 1] = 0
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexing.py:2095, in _iLocIndexer._setitem_single_column(self, loc, value, plane_indexer)
2091 self.obj.isetitem(loc, value)
2092 else:
2093 # set value into the column (first attempting to operate inplace, then
2094 # falling back to casting if necessary)
-> 2095 self.obj._mgr.column_setitem(loc, plane_indexer, value)
2097 self.obj._clear_item_cache()
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\internals\managers.py:1308, in BlockManager.column_setitem(self, loc, idx, value, inplace_only)
1306 col_mgr.setitem_inplace(idx, value)
1307 else:
-> 1308 new_mgr = col_mgr.setitem((idx,), value)
1309 self.iset(loc, new_mgr._block.values, inplace=True)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\internals\managers.py:399, in BaseBlockManager.setitem(self, indexer, value)
395 # No need to split if we either set all columns or on a single block
396 # manager
397 self = self.copy()
--> 399 return self.apply("setitem", indexer=indexer, value=value)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\internals\managers.py:354, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
352 applied = b.apply(f, **kwargs)
353 else:
--> 354 applied = getattr(b, f)(**kwargs)
355 result_blocks = extend_blocks(applied, result_blocks)
357 out = type(self).from_blocks(result_blocks, self.axes)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\internals\blocks.py:1758, in EABackedBlock.setitem(self, indexer, value, using_cow)
1755 check_setitem_lengths(indexer, value, values)
1757 try:
-> 1758 values[indexer] = value
1759 except (ValueError, TypeError) as err:
1760 _catch_deprecated_value_error(err)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\arrays_mixins.py:253, in NDArrayBackedExtensionArray.setitem(self, key, value)
251 def setitem(self, key, value) -> None:
252 key = check_array_indexer(self, key)
--> 253 value = self._validate_setitem_value(value)
254 self._ndarray[key] = value
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\arrays\categorical.py:1560, in Categorical._validate_setitem_value(self, value)
1557 def _validate_setitem_value(self, value):
1558 if not is_hashable(value):
1559 # wrap scalars and hashable-listlikes in list
-> 1560 return self._validate_listlike(value)
1561 else:
1562 return self._validate_scalar(value)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\arrays\categorical.py:2277, in Categorical._validate_listlike(self, value)
2274 # no assignments of values not in categories, but it's always ok to set
2275 # something to np.nan
2276 if len(to_add) and not isna(to_add).all():
-> 2277 raise TypeError(
2278 "Cannot setitem on a Categorical with a new "
2279 "category, set the categories first"
2280 )
2282 codes = self.categories.get_indexer(value)
2283 return codes.astype(self._ndarray.dtype, copy=False)
TypeError: Cannot setitem on a Categorical with a new category, set the categories first
Desktop (please complete the following information):
Please help me understand how to solve this. I appreciate your hard word on developing such great tool. I hope I can use it in my work.
The text was updated successfully, but these errors were encountered: