Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resample discards timezone information #13238

Closed
jeremywhelchel opened this issue May 20, 2016 · 11 comments · Fixed by #20510
Closed

Resample discards timezone information #13238

jeremywhelchel opened this issue May 20, 2016 · 11 comments · Fixed by #20510
Labels
Bug good first issue Regression Functionality that used to work in a prior pandas version Resample resample method Timezones Timezone data dtype
Milestone

Comments

@jeremywhelchel
Copy link

It appears that resample is now dropping timezone information on the index. Is this expected?

a = pp.Series(index=pp.DatetimeIndex(['2013-01-01 06:00', '2013-01-01 07:00', '2013-01-02 06:00'],
                                     tz=pytz.timezone('America/Los_Angeles')),
              data=[1, 2, 3])
b = a.resample('D', how='max')
print repr(a.index.tz)
print repr(b.index.tz)
print a.index.tz == b.index.tz

produces the following in 0.16.1. Notably the timezones for the original series and the resampled series are the same.

<DstTzInfo 'America/Los_Angeles' LMT-1 day, 16:07:00 STD>
<DstTzInfo 'America/Los_Angeles' LMT-1 day, 16:07:00 STD>
True

However that's not the case for 0.18.1. Caveat is that we're using the new resample syntax: b = a.resample('D').max()

<DstTzInfo 'America/Los_Angeles' LMT-1 day, 16:07:00 STD>
<DstTzInfo 'America/Los_Angeles' PST-1 day, 16:00:00 STD>
False
@jreback
Copy link
Contributor

jreback commented May 20, 2016

rewriting your example like how we test

In [10]: s = pd.Series(index=pd.DatetimeIndex(['2013-01-01 06:00', '2013-01-01 07:00', '2013-01-02 06:00'],
                                     tz='America/Los_Angeles'),
              data=[1, 2, 3])

In [11]: s
Out[11]: 
2013-01-01 06:00:00-08:00    1
2013-01-01 07:00:00-08:00    2
2013-01-02 06:00:00-08:00    3
dtype: int64

In [12]: s.index
Out[12]: DatetimeIndex(['2013-01-01 06:00:00-08:00', '2013-01-01 07:00:00-08:00', '2013-01-02 06:00:00-08:00'], dtype='datetime64[ns, America/Los_Angeles]', freq=None)

In [13]: s.resample('D').max()
Out[13]: 
2013-01-01 00:00:00-08:00    2
2013-01-02 00:00:00-08:00    3
Freq: D, dtype: int64

In [14]: s.resample('D').max().index
Out[14]: DatetimeIndex(['2013-01-01 00:00:00-08:00', '2013-01-02 00:00:00-08:00'], dtype='datetime64[ns, America/Los_Angeles]', freq='D')

datetime w/tz and resampling are not tested very much. pull-requests to do more of this are welcome (the fix is very straightforward).

@jreback jreback added Bug Difficulty Novice Timezones Timezone data dtype Resample resample method labels May 20, 2016
@jreback jreback added this to the 0.18.2 milestone May 20, 2016
@jeremywhelchel
Copy link
Author

Thanks for the quick look!

Note that your re-written example doesn't display the problem. You need to look at repr(...index.tz) to see the difference (LMT-1 day, 16:07:00 STD vs PST-1 day, 16:00:00 STD)

@jreback
Copy link
Contributor

jreback commented May 20, 2016

have a look again at [12] vs [14]. the time changed. (look t the hour). the issue is the re-localization is off. The actual tz is correct.

@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Aug 21, 2016
@jreback jreback modified the milestones: 0.19.0, Next Major Release Sep 28, 2016
@jaredsnyder
Copy link
Contributor

I'm going to attempt fixing this one

@jaredsnyder
Copy link
Contributor

jaredsnyder commented May 22, 2017

Wouldn't we expect the index values to be at midnight on each day given that we're aggregating up to the day level? If we run the code above without the timezones we get a similar output:

>>> s_notz = pd.Series(index=pd.DatetimeIndex(['2013-01-01 06:00', '2013-01-01 07:00', '2013-01-02 06:00']),
...               data=[1, 2, 3])
>>> s_notz.index
DatetimeIndex(['2013-01-01 06:00:00', '2013-01-01 07:00:00',
               '2013-01-02 06:00:00'],
              dtype='datetime64[ns]', freq=None)
>>> s_notz.resample("D").max()
2013-01-01    2
2013-01-02    3
Freq: D, dtype: int64
>>> s_notz.resample("D").max().index
DatetimeIndex(['2013-01-01', '2013-01-02'], dtype='datetime64[ns]', freq='D')

@rockg
Copy link
Contributor

rockg commented Jan 17, 2018

Any progress on this? Had to spend about 20 minutes figuring this out and it impacts all join operations as the timezones are not treated as equal despite having the same zone name. So all binary operations will result in losing timezone information ([13] below). Using tz_convert brings the timezone back in line, but shouldn't be necessary. Some more examples:

In [11]: s = pd.Series([2], index=pd.date_range('2017-01-01', periods=48, freq="H", tz="US/Eastern"))

In [12]: ss = s / s.resample("D").mean()

In [13]: ss.dropna()
Out[13]: 
2017-01-01 05:00:00+00:00    1.0
2017-01-02 05:00:00+00:00    1.0
dtype: float64

In [14]: ss1 = s / s.resample("D").mean().tz_convert("US/Eastern")

In [15]: ss1.dropna()
Out[15]: 
2017-01-01 00:00:00-05:00    1.0
2017-01-02 00:00:00-05:00    1.0
dtype: float64

In [16]: s.resample("D").mean()
Out[16]: 
2017-01-01 00:00:00-05:00    2
2017-01-02 00:00:00-05:00    2

In [22]: s.resample("D").mean().index.tz
Out[22]: <DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>

In [23]: s.index.tz
Out[23]: <DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>

@jreback
Copy link
Contributor

jreback commented Jan 17, 2018

i think #18596 will fix this if u want to give that a try

@rockg
Copy link
Contributor

rockg commented Jan 17, 2018

Yes, that will fix the join issue. Thanks.

@jreback
Copy link
Contributor

jreback commented Jan 17, 2018

ohh great

lmk rebase that tomorrow

i suspect this will close a number of other issues as well
if you could have a look thru would be great

@jreback
Copy link
Contributor

jreback commented Jan 17, 2018

@rockg xref #19281, though I don't think this actually fixes it, the tz is still converted incorrectly. welcome an investigation.

@rockg
Copy link
Contributor

rockg commented Jan 17, 2018

Yeah, I agree it doesn't fix the resample, just the join issue ([13] in my example). It works despite resample returning a different tzinfo, so half the battle.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Mar 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue Regression Functionality that used to work in a prior pandas version Resample resample method Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants