Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: NotImplementedError: Cannot apply ufunc <ufunc 'hyp2f1'> to mixed DataFrame and Series inputs. #46138

Closed
2 of 3 tasks
timmy-ops opened this issue Feb 24, 2022 · 5 comments
Labels
Bug Needs Info Clarification about behavior needed to assess issue

Comments

@timmy-ops
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

#imports
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import pandas as pd
import numpy as np
from datetime import datetime
!pip install lifetimes
from lifetimes import ParetoNBDFitter, GammaGammaFitter

#data
f_and_t = drive.CreateFile({'id': '1sXcv0SUUygyFvjVEtdV3kk8zyjp4GGQC'})
f_and_t.GetContentFile('f_and_t.csv')
f_and_t = pd.read_csv('f_and_t.csv')

#reproducable example

time_days = 126
time_months = int(math.ceil(time_days / 30.0))   

#column-selection

summary = f_and_t[['customer_id', 'frequency_btyd', 'recency', 'T',
                 'monetary_btyd']]
                 
summary.columns = ['customer_id', 'frequency', 'recency', 'T',
                     'monetary_value']
summary = summary.set_index('customer_id')

actual_df = f_and_t[['customer_id', 'frequency_btyd', 'monetary_dnn',
                     'target_monetary']]
actual_df.columns = ['customer_id', 'train_frequency', 'train_monetary',
                       'act_target_monetary']

#PARETO/NBD fitter
paretof = ParetoNBDFitter(penalizer_coef= 0.01)
paretof.fit(summary['frequency'], summary['recency'], summary['T'])

#Gamma Gamma Fitter

ggf = GammaGammaFitter(penalizer_coef=0)
ggf.fit(summary['frequency'], summary['monetary_value'])

#pareto predict

pareto_pred = paretof.predict(time_days,
                               summary['frequency'].values,
                                summary['recency'],
                                 summary['T'])

trans_pred = pareto_pred.fillna(0)

#gg predict

predicted_value = ggf.customer_lifetime_value(paretof,
                                                summary['frequency'],#.values,
                                                summary['recency'],
                                                summary['T'],
                                                summary['monetary_value'],
                                                time=time_months,
                                                discount_rate= 0.01)


### Issue Description

I was using the lifetimes library to calculate CLV for a list of customers. From one day to an other this issue appeared. I work on Google Colab with Pandas 1.3.5 (their current version). The error below appears for both functions: paretof.predict and ggf.customer_lifetime_value. For paretof.

I already found posts to this issue, from half a  year ago (https://stackoverflow.com/questions/69071130/lifetimes-library-issue-of-calculating-clv-when-using-function-customer-lifet).  The solution to use ".values" only worked for the paretof.predict function. At the ggf.customer_lifetime_value function I am stuck. 


NotImplementedError Traceback (most recent call last)
in ()
58 summary['monetary_value'],
59 time=time_months,
---> 60 discount_rate=discount_rate)
61
62

6 frames
/usr/local/lib/python3.7/dist-packages/lifetimes/fitters/gamma_gamma_fitter.py in customer_lifetime_value(self, transaction_prediction_model, frequency, recency, T, monetary_value, time, discount_rate, freq)
294
295 return _customer_lifetime_value(
--> 296 transaction_prediction_model, frequency, recency, T, adjusted_monetary_value, time, discount_rate, freq=freq
297 )

/usr/local/lib/python3.7/dist-packages/lifetimes/utils.py in _customer_lifetime_value(transaction_prediction_model, frequency, recency, T, monetary_value, time, discount_rate, freq)
496 # since the prediction of number of transactions is cumulative, we have to subtract off the previous periods
497 expected_number_of_transactions = transaction_prediction_model.predict(
--> 498 i, frequency, recency, T
499 ) - transaction_prediction_model.predict(i - factor, frequency, recency, T)
500 # sum up the CLV estimates of all of the periods and apply discounted cash flow

/usr/local/lib/python3.7/dist-packages/lifetimes/fitters/pareto_nbd_fitter.py in conditional_expected_number_of_purchases_up_to_time(self, t, frequency, recency, T)
277 r, alpha, s, beta = params
278
--> 279 likelihood = self._conditional_log_likelihood(params, x, t_x, T)
280 first_term = (
281 gammaln(r + x) - gammaln(r) + r * log(alpha) + s * log(beta) - (r + x) * log(alpha + T) - s * log(beta + T)

/usr/local/lib/python3.7/dist-packages/lifetimes/fitters/pareto_nbd_fitter.py in _conditional_log_likelihood(params, freq, rec, T)
212
213 A_1 = gammaln(r + x) - gammaln(r) + r * log(alpha) + s * log(beta)
--> 214 log_A_0 = ParetoNBDFitter._log_A_0(params, x, rec, T)
215
216 A_2 = logaddexp(-(r + x) * log(alpha + T) - s * log(beta + T), log(s) + log_A_0 - log(r_s_x))

/usr/local/lib/python3.7/dist-packages/lifetimes/fitters/pareto_nbd_fitter.py in _log_A_0(params, freq, recency, age)
179
180 rsf = r + s + freq
--> 181 p_1 = hyp2f1(rsf, t, rsf + 1.0, abs_alpha_beta / (max_of_alpha_beta + recency))
182 q_1 = max_of_alpha_beta + recency
183 p_2 = hyp2f1(rsf, t, rsf + 1.0, abs_alpha_beta / (max_of_alpha_beta + age))

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in array_ufunc(self, ufunc, method, *inputs, **kwargs)
2030 self, ufunc: np.ufunc, method: str, *inputs: Any, **kwargs: Any
2031 ):
-> 2032 return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
2033
2034 # ideally we would define this to avoid the getattr checks, but

/usr/local/lib/python3.7/dist-packages/pandas/core/arraylike.py in array_ufunc(self, ufunc, method, *inputs, **kwargs)
292 raise NotImplementedError(
293 "Cannot apply ufunc {} to mixed DataFrame and Series "
--> 294 "inputs.".format(ufunc)
295 )
296 axes = self.axes

NotImplementedError: Cannot apply ufunc <ufunc 'hyp2f1'> to mixed DataFrame and Series inputs.




### Expected Behavior

Sometimes it works, but mostly it doesnt anymore. It should just no Error appear...

### Installed Versions

<details>

/usr/local/lib/python3.7/dist-packages/psycopg2/__init__.py:144: UserWarning:

The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.


INSTALLED VERSIONS
------------------
commit           : 66e3805b8cabe977f40c05259cc3fcf7ead5687d
python           : 3.7.12.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.4.144+
Version          : #1 SMP Tue Dec 7 09:58:10 PST 2021
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.3.5
numpy            : 1.21.5
pytz             : 2018.9
dateutil         : 2.8.2
pip              : 21.1.3
setuptools       : 57.4.0
Cython           : 0.29.28
pytest           : 3.6.4
hypothesis       : None
sphinx           : 1.8.6
blosc            : None
feather          : 0.4.1
xlsxwriter       : None
lxml.etree       : 4.2.6
html5lib         : 1.0.1
pymysql          : None
psycopg2         : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2           : 2.11.3
IPython          : 5.5.0
pandas_datareader: 0.9.0
bs4              : 4.6.3
bottleneck       : 1.3.2
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.2.2
numexpr          : 2.8.1
odfpy            : None
openpyxl         : 3.0.9
pandas_gbq       : 0.13.3
pyarrow          : 6.0.1
pyxlsb           : None
s3fs             : None
scipy            : 1.4.1
sqlalchemy       : 1.4.31
tables           : 3.7.0
tabulate         : 0.8.9
xarray           : 0.18.2
xlrd             : 1.1.0
xlwt             : 1.3.0
numba            : 0.51.2

</details>
@timmy-ops timmy-ops added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 24, 2022
@jreback
Copy link
Contributor

jreback commented Feb 24, 2022

pls show a minimal copy pastable and reproducible example w/o any external dependencies

@timmy-ops
Copy link
Author

pls show a minimal copy pastable and reproducible example w/o any external dependencies

Hi jreback,

Yes I am sorry and I tried to produce one, but the problem is the whole model cannot work without this bigger dataset.

@mroeschke
Copy link
Member

It will be difficult to determine whether there is a true bug here without a more minimal example: https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

@mroeschke mroeschke added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 24, 2022
@ColtAllen
Copy link

Hey @timmy-ops ,

This is not an issue with pandas, but rather the lifetimes library. Please repost this issue in the lifetimes repository.

The scipy.hyp2f1 method in the final line of your error trace is a lifetimes dependency expecting to receive numpy arrays as inputs. When using any of the lifetimes modeling methods, it is important to always use a df['COL_NAME'].values syntax in all of the arguments, otherwise hyp2f1 will receive a sliced-up Pandas dataframe and create the unstable behavior you are seeing.

Unfortunately, in the case of the lifetimes.GammaGammaFitter.customer_lifetime_value method, Pandas slices are being used in the internal operations. It's an easy fix, but the lifetimes project is no longer being actively maintained. Some other contributors and I are planning a Zoom meeting in a few weeks to discuss taking over development of this library. If you wish to contribute, please let us know in this issue link:

CamDavidsonPilon/lifetimes#414

@swasthikshettyhcl
Copy link

@timmy-ops did you find any solution for this?
[error] Cannot apply ufunc <ufunc 'hyp2f1'> to mixed DataFrame and Series input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

5 participants