Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add rounding method to DatetimeIndex/TimedeltaIndex #4314

Closed
jreback opened this issue Jul 22, 2013 · 14 comments · Fixed by #11690
Closed

ENH: add rounding method to DatetimeIndex/TimedeltaIndex #4314

jreback opened this issue Jul 22, 2013 · 14 comments · Fixed by #11690
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Timedelta Timedelta data type
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jul 22, 2013

xref #8184

to say round by minute
http://stackoverflow.com/questions/17781159/round-pandas-datetime-index

by hour
#4314

might be simpler to use numpy astype here
http://stackoverflow.com/questions/23315163/merging-two-pandas-timeseries-shifted-by-1-second

In [75]: index = pd.DatetimeIndex([ Timestamp('20120827 12:05:00.002'), Timestamp('20130101 12:05:01'), Timestamp('20130712 15:10:00'), Timestamp('20130712 15:10:00.000004') ])

In [79]: index.values
Out[79]: 
array(['2012-08-27T08:05:00.002000000-0400',
       '2013-01-01T07:05:01.000000000-0500',
       '2013-07-12T11:10:00.000000000-0400',
       '2013-07-12T11:10:00.000004000-0400'], dtype='datetime64[ns]')

In [78]: pd.DatetimeIndex(((index.asi8/(1e9*60)).round()*1e9*60).astype(np.int64)).values
Out[78]: 
array(['2012-08-27T08:05:00.000000000-0400',
       '2013-01-01T07:05:00.000000000-0500',
       '2013-07-12T11:10:00.000000000-0400',
       '2013-07-12T11:10:00.000000000-0400'], dtype='datetime64[ns]')
@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 26, 2014
@jreback jreback changed the title ENH: add rounding method to DatetimeIndex ENH: add rounding method to DatetimeIndex/TimedeltaIndex Sep 11, 2014
@jreback jreback added the Timedelta Timedelta data type label Sep 11, 2014
@jorisvandenbossche
Copy link
Member

How should the API look like for this one?

Add a DatetimeIndex.round method? But there is no round method on the other Index types, isn't that then strange?

It is also not really compatible with the Series.round (which is modeled after numpy I think), which does not handle datetimes

@jorisvandenbossche
Copy link
Member

@jreback
Copy link
Contributor Author

jreback commented Nov 30, 2014

no indexes DO have a round method!

but prob not checked properly.

The numeric indexes (float/int) should support this (it might work already) as its a numpy method.

The Timedelta/DatetimeIndex should simply call the new one (which does sensible rounding).

Further, the .dt. accessor should allow the round methods as well. (a side issue emerges that calling

Series.round() ultimately calls np.round rather than the method on the values if it exists), so need to deal with that at some point (but is just an easy change in Series.round which exists.

e.g.

DatetimeIndex.round() -> call the new round
Series.round() (of a DatetimeIndex as its values), should actually call Series.dt.round() (e.g the new round method)

@jorisvandenbossche
Copy link
Member

at the moment there is no round method at the index, at least I don't find it.
But agreed that all should have it.

About the .dt: if we can change Series.round to handle datetime64 values, I don't think it is needed to also provide this under .dt? (or for discoverability?)
The problem is also that you want other keyword arguments for datetime rounding, so that is maybe difficult in a general Series.round?

@jreback
Copy link
Contributor Author

jreback commented Nov 30, 2014

there WAS a round method prior to 0.15 implicitly as it was an ndarray subclass
it should be added back to NumericIndex. And provide a generic Index.round with a NotImplementedError.

Series.round should take an argument and dispatch to .dt (catching the exception) and just calling np.round otherwise
that way u don't have to infer things yet again - that is the point of .dt

TimeDeltaIndex also supports round so you necessarily don't want to handle this explicitly in Series.round

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@shoyer
Copy link
Member

shoyer commented Apr 9, 2015

This does seem like a good idea.

@jreback
Copy link
Contributor Author

jreback commented Apr 9, 2015

straightforward too :)

@TomAugspurger
Copy link
Contributor

Very useful. Would we have an option to always round down? I've needed something like postgres' DATE_TRUC a few times.

@jreback
Copy link
Contributor Author

jreback commented Apr 9, 2015

note that I had originally implemented this for Timedelta, see here as needed for partial-string indexing. But Timestamp and both index types need and impl.

@shoyer
Copy link
Member

shoyer commented Apr 9, 2015

I've been looking into the rounding down part. In theory, I think this might be what DatetimeIndex.snap is supposed to do, but it doesn't seem to work for me. We might also call this asfreq.

Here's what I came up with:

def asfreq(dt, freq, how='start'):
    resample_how = {'start': 'first', 's': 'first', 'end': 'last', 'e': 'last'}[how]
    resampled = pd.Series(t, t).resample(freq, how=resample_how).index
    return resampled[resampled.get_indexer(t, method='ffill')]

I've been doing this with a combination of to_period()/to_timestamp() and astype('datetime64[%s]' % freq), but neither of those handle uneven timesteps well.

@TomAugspurger
Copy link
Contributor

Yeah I've been doing something similar. Thankfully my data has had a regular interval.

@jorisvandenbossche
Copy link
Member

Yes indeed a good idea! I already wanted to do this for some time, but never found the time ..

@shoyer for the rounding down part, can't you also do the same as above, but use floor instead of round:

def datetime_round(dt, freq):
    from pandas.tseries.frequencies import to_offset
    freq = to_offset(freq).nanos
    return pd.DatetimeIndex(((np.floor(dt.asi8/(float(freq)))*freq).astype(np.int64)))

@jorisvandenbossche
Copy link
Member

BTW, never heard of DatetimeIndex.snap !
But indeed, it does not really seems to work as I expected ..

@jreback
Copy link
Contributor Author

jreback commented Oct 3, 2015

for a Timestamp

In [25]: def round(t, freq):
    # round a Timestamp to a specified freq
    return pd.Timestamp((t.value/freq.delta.value)*freq.delta.value)

@jreback jreback modified the milestones: 0.17.0, Next Major Release Oct 3, 2015
@jreback jreback modified the milestones: 0.17.1, 0.17.0 Oct 3, 2015
@jreback jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015
@jreback jreback modified the milestones: 0.18.0, Next Major Release Nov 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Timedelta Timedelta data type
Projects
None yet
4 participants