Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: pct_change with frequency set as 'BM' throws value error #28664

Closed
min2bro opened this issue Sep 28, 2019 · 6 comments · Fixed by #28681
Closed

Bug: pct_change with frequency set as 'BM' throws value error #28664

min2bro opened this issue Sep 28, 2019 · 6 comments · Fixed by #28681
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Frequency DateOffsets
Milestone

Comments

@min2bro
Copy link

min2bro commented Sep 28, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
import random
import numpy as np


Creating the time-series index 
n=60
index = pd.date_range('01/13/2020', periods = 70,freq='D') 
  
Creating the dataframe  
df = pd.DataFrame({"A":np.random.uniform(low=0.5, high=13.3, size=(70,)), 
                   "B":np.random.uniform(low=10.5, high=45.3, size=(70,)),  
                   "C":np.random.uniform(low=70.5, high=85, size=(70,)), 
                   "D":np.random.uniform(low=50.5, high=65.7, size=(70,))}, index = index) 


df.pct_change(freq='BM')

Problem description

For a time-series data, the df.pct_change(freq='BM') doesn't works and throws following error.

ValueError: cannot reindex from a duplicate axis

Expected Output

df.asfreq('BM').pct_change()

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.1
numpy : 1.15.4
pytz : 2017.2
dateutil : 2.6.1
pip : 18.0
setuptools : 39.1.0
Cython : 0.26.1
pytest : 5.1.2
hypothesis : None
sphinx : 1.6.3
blosc : None
feather : None
xlsxwriter : 1.0.2
lxml.etree : 4.1.0
html5lib : 0.9999999
pymysql : None
psycopg2 : 2.7.5 (dt dec pq3 ext lo64)
jinja2 : 2.9.6
IPython : 6.1.0
pandas_datareader: None
bs4 : 4.6.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.1.0
matplotlib : 2.1.0
numexpr : 2.6.2
odfpy : None
openpyxl : 2.4.8
pandas_gbq : None
pyarrow : 0.8.0
pytables : None
s3fs : None
scipy : 1.0.0
sqlalchemy : 1.1.13
tables : 3.4.2
xarray : 0.10.7
xlrd : 1.1.0
xlwt : 1.3.0
xlsxwriter : 1.0.2

@min2bro min2bro changed the title pct_change with frequency set as 'BM' doesn't work pct_change with frequency set as 'BM' throws value error Sep 28, 2019
@dongho-jung
Copy link
Contributor

Hello, I have found that this happens when the offset which is made of freq is anchored. (offset.isAnchored)

so I propose that when the freq gets anchored, make the data resampled first, before the next line
rs = data.div(data.shift(periods=periods, freq=freq, axis=axis, **kwargs)) - 1

--- a/pandas/core/generic.py
+++ b/pandas/core/generic.py
@@ -10413,6 +10413,8 @@ class NDFrame(PandasObject, SelectionMixin):
         else:
             data = self.fillna(method=fill_method, limit=limit, axis=axis)

+        if freq and to_offset(freq).isAnchored():
+            data = data.asfreq(freq)
         rs = data.div(data.shift(periods=periods, freq=freq, axis=axis, **kwargs)) - 1
         rs = rs.reindex_like(data)
         if freq is None:

@dongho-jung
Copy link
Contributor

However, I couldn't run tests yet because, in the latest commit c4489c, it spewed out a lot of fails.

@min2bro min2bro changed the title pct_change with frequency set as 'BM' throws value error Bug: pct_change with frequency set as 'BM' throws value error Sep 29, 2019
@min2bro
Copy link
Author

min2bro commented Sep 29, 2019

@0xF4D3C0D3 Can you change the label of this issue as Bug ?

@dongho-jung
Copy link
Contributor

dongho-jung commented Sep 29, 2019

No, I think I can't do that because I'm not a contributor nor member.

@dongho-jung
Copy link
Contributor

dongho-jung commented Sep 30, 2019

Oh, the c4489c was innocent, the fails were on my own mistake. So I have run the tests with the above change.

image

and now I'm running pytest pandas now

@dongho-jung
Copy link
Contributor

good! the tests are passed. I'll create the PR

dongho-jung added a commit to dongho-jung/pandas that referenced this issue Sep 30, 2019
pct_change didn't work when the freq is anchored(like 1W, 1M, BM)
so when the freq is anchored, use data.asfreq(freq) instead of the raw
data.
@jreback jreback added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Frequency DateOffsets labels Oct 1, 2019
@jreback jreback added this to the 1.0 milestone Nov 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Frequency DateOffsets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants