Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add .normalize() to Timestamp? #8794

Closed
jorisvandenbossche opened this issue Nov 12, 2014 · 19 comments
Closed

ENH: add .normalize() to Timestamp? #8794

jorisvandenbossche opened this issue Nov 12, 2014 · 19 comments
Labels
API Design Compat pandas objects compatability with Numpy or Python functions Datetime Datetime data dtype

Comments

@jorisvandenbossche
Copy link
Member

From http://stackoverflow.com/questions/26882499/reset-time-part-of-a-pandas-timestamp#26882499

There is now a DatetimeIndex.normalize() method to set times to midnight.
Would it be useful to also have this method available on the individual Timestamp object?

Further, this method could also be added to the .dt accessor (udpate: for this, there already is a similar issue: #5502).

@jreback
Copy link
Contributor

jreback commented Nov 12, 2014

sure though I don't think is used much in the docs / maybe should have an example
(also using replace to do the same thing)

@jorisvandenbossche
Copy link
Member Author

yes, indeed, Timestamp.replace(hour=0, minute=0, second=0) does the same, so maybe this is not needed.
But replace is then not available on DatetimeIndex, so it would be nice you can use the same method (one of both) on both Timestamp as DatetimeIndex/.dt accessor

@jreback jreback added API Design Compat pandas objects compatability with Numpy or Python functions Datetime Datetime data dtype labels Nov 12, 2014
@jreback jreback added this to the 0.16.0 milestone Nov 12, 2014
@jreback jreback added Good as first PR Compat pandas objects compatability with Numpy or Python functions and removed Compat pandas objects compatability with Numpy or Python functions labels Nov 12, 2014
@rockg
Copy link
Contributor

rockg commented Nov 12, 2014

I was thinking this the other day. Be careful because Timestamp.replace(hour=0, minute=0, second=0) does not do the right thing on DST days as it does not change the timezone offset. I was going to change normalize_date in tslib to do the right thing (see my comment in #5172).

@rockg
Copy link
Contributor

rockg commented Nov 12, 2014

DatetimeIndex.normalize() does do the right thing on these transition days so it would make sense to replicate this good behavior.

@jorisvandenbossche
Copy link
Member Author

I think the fact that Timestamp.replace does not fully what is desired on DST transition days, that is an incentive to add a normalize method to Timestamp?

@rockg
Copy link
Contributor

rockg commented Nov 12, 2014

Agree, and that would be done with tslib.normalize_date (or at least the name suggests so) which is used in many places and does a straight replace.

@rockg
Copy link
Contributor

rockg commented Nov 12, 2014

Although looking further I see date_normalize (used by DatetimeIndex), normalize_date which I see used by individual Timestamps.

@jorisvandenbossche
Copy link
Member Author

I would think it should not do a straight replace, as this is causing the timezone issue?

@rockg
Copy link
Contributor

rockg commented Nov 12, 2014

Yes, I was going to change this as a part of #5172.

@rockg
Copy link
Contributor

rockg commented Nov 12, 2014

Although I think it belongs as it's own change so please feel free to go ahead (the solution is to replace tzinfo=None, replace as is done, then localize with the timezone again). Although you could probably also call date_normalize as that is used by DatetimeIndex. Might be good to add some documentation for these.

@ischwabacher
Copy link
Contributor

Several things:

  • I don't like normalize as the name of this method because it collides conceptually with the pytz method of the same name. The rails name for this is midnight; maybe that would be better?
  • Temporarily setting tzinfo=None in the middle of an operation that takes an aware Timestamp and returns an aware Timestamp makes me very uneasy. I think we should make it so that aware Timestamps behave in a way that makes this rarely necessary.
  • In this case, stripping the tzinfo results in AmbiguousTimeErrors in jurisdictions where DST changes from 1:00 to 0:00. (Try pd.Timestamp('2013-11-03 00:00:00-0400', tz='America/Havana').) However, this is one of the few cases where it makes sense to do it this way because we almost certainly want the first occurrence of midnight in such a case.

@rockg
Copy link
Contributor

rockg commented Nov 12, 2014

You will have to do it at some level (I think you are suggesting to override replace so that it does the right thing, but that is a larger API change). This method of stripping tzinfo, replacing, and relocalizing is also used in the DateOffset code as relativedelta uses replace as well. Fixing replace would alleviate the need for this method, but I'm not sure how people feel about that. Also, it's not immediately clear that something similar would not need to be done in that case.

@ischwabacher
Copy link
Contributor

Some of my strong feelings about how we should be doing this stuff have been bumping up against reality lately. I'm still mulling point 2.

@rockg
Copy link
Contributor

rockg commented Nov 12, 2014

Don't get me wrong, I agree that it doesn't feel right doing it this way. I'm open to suggestions about how to do normalize and deal with datetime's replace in the context of DateOffset's and relativedelta.

@rockg
Copy link
Contributor

rockg commented Nov 18, 2014

@ischwabacher Any thoughts/suggestions? I'm inclined to try and fix replace and that may make this all simpler but interested to see what you are thinking so begin implementation.

@jorisvandenbossche
Copy link
Member Author

I would personally leave replace as is (and keeping compatibility with datetime, although the behaviour is maybe not the best), but fix normalize_date (and use that to implement Timestamp.normalize?) And then at internal places where we need this, don't use replace but our own correct implementation.

And tring to reuse the DatetimeIndex normalizing machinery, is that possible? As this already works correctly?

In [10]: t = pd.Timestamp('2014-10-26 10:00:00')

In [11]: t = t.tz_localize('Europe/Brussels')

In [12]: t
Out[12]: Timestamp('2014-10-26 10:00:00+0100', tz='Europe/Brussels')

In [13]: t.replace(hour=0, minute=0, second=0)
Out[13]: Timestamp('2014-10-26 00:00:00+0100', tz='Europe/Brussels')

In [14]: pd.tslib.normalize_date(t)
Out[14]: Timestamp('2014-10-26 00:00:00+0100', tz='Europe/Brussels')

In [15]: pd.DatetimeIndex([t]).normalize()[0]
Out[15]: Timestamp('2014-10-26 00:00:00+0200', tz='Europe/Brussels')   # <-- correct

@rockg
Copy link
Contributor

rockg commented Nov 18, 2014

I guess I'm referring to holistically as in #5172 needs to do some hacks because dateutil uses replace and that doesn't work correctly around tz boundaries. I suggested doing the same here in normalize_date but @ischwabacher was opposed to that. I think we can easily use the regular date_normalize which is what DatetimeIndex uses. normalize_date should also be fixed but the question is how (doing it by fixing replace would make life so much easier elsewhere).

@ischwabacher
Copy link
Contributor

I really, really wanted to be able to just say, "The ambiguous kwarg is reserved for converting naive time representations into aware time representations, and everywhere else things should Just Work." But this isn't going to happen, because many of the operations that we want to do on local times are ambiguous. So Timestamp.replace should ask what to do in an ambiguous case. Timestamp.normalize (looking at how many times normalize appears in the code, I think I have to give up on renaming it to midnight), however, should call (or be equivalent to calling) Timestamp.replace(ambiguous=True), so that the first instance of midnight is chosen.

Given that some operations on aware time representations do have to ask how to resolve ambiguities, I'm inclined to split point 2:

  • The user should be able to ignore that ubiquitous piece of advice, "store everything in UTC and convert from/to local time only for I/O" and have everything still Just Work, as long as no operations that explicitly care about the local representation of the time are applied.
  • The user should never have to convert between aware and naive time representations except at I/O boundaries.
  • Maybe the library should pass all handling of ambiguity through a single bottleneck, probably in tz_localize? This would mean that every operation which needs to deal with ambiguity operates by converting to a naive time, operating, and then relocalizing.

@jorisvandenbossche Try your example with this:

t = pd.Timestamp('2013-11-03 06:00:00', tz='America/Havana')

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback
Copy link
Contributor

jreback commented Mar 11, 2015

closed by #9623

@jreback jreback closed this as completed Mar 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Compat pandas objects compatability with Numpy or Python functions Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants