Management of pandas object #140

lionelkusch · 2025-01-23T17:15:46Z

For the moment, the management of pandas is not handled by all the methods. In the refactoring, I remove the management of and replace with only a transformation of data to numpy arrays.
In the long term, we need to provide a homogeneous way to handle the pandas's objects.

bthirion · 2025-01-23T20:50:59Z

Can you a be a bit more precise ?
The basic answer is that we should support pandas arrays everywhere sklearn supports it.

lionelkusch · 2025-01-24T09:28:49Z

For the moment, I am not that the support of pandas array is everywhere.
However, to be more precise, what support we want to offer.

The minimal support is to support pandas has input array and transform everything into numpy arrays.
A more elaborate support is that if the input is a pandas array at this moment, the output is also a pandas array. Additionally, if the input is a data frame, it could be nice to return a data frame with the correct name for the columns and rows.

bthirion · 2025-01-24T21:13:59Z

Thx for clarifying. When you say input/output, you think of CPI ?

Until recently, I would have said: we consider only numpy arrays (and thus convert dataframes in a first step), but I increasinlgy think that X can be provided as a DataFrame, because it handles heterogeneous data, e.g. strings, floats etc.
By contrast, the importance scores and p-values are always floats. So your suggestion is to reuse rows and columns names when the inputs were provided as dfs ?

lionelkusch · 2025-01-27T08:32:42Z

Thx for clarifying. When you say input/output, you think of CPI ?

I was thinking in general.

So your suggestion is to reuse rows and columns names when the inputs were provided as dfs ?

Yes, this is one proposition. However, I am not an expert on pandas and I don't know yet the subtlety between the different types of pandas. From a quick look at pandas, I see that the main type is DataFrame. In this case, if we want to support pandas, we should handle DataFrame.

X can be provided as a DataFrame, because it handles heterogeneous data, e.g. strings, floats etc.

A short comment, numpy can also handle heterogeneous data even if it's not really targeting it.

lionelkusch added method implementation Question regarding methods implementations coding style question regarding formatting and declaration of functions labels Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Management of pandas object #140

Management of pandas object #140

lionelkusch commented Jan 23, 2025

bthirion commented Jan 23, 2025

lionelkusch commented Jan 24, 2025

bthirion commented Jan 24, 2025

lionelkusch commented Jan 27, 2025 •

edited

Loading

Management of pandas object #140

Management of pandas object #140

Comments

lionelkusch commented Jan 23, 2025

bthirion commented Jan 23, 2025

lionelkusch commented Jan 24, 2025

bthirion commented Jan 24, 2025

lionelkusch commented Jan 27, 2025 • edited Loading

lionelkusch commented Jan 27, 2025 •

edited

Loading