Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement scalar shift_month mirroring tslib.shift_months #18218

Merged
merged 8 commits into from
Nov 12, 2017

Conversation

jbrockmendel
Copy link
Member

replace relativedelta usage in relevant cases.

This should be orthogonal to other ongoing offsets PRs.

Ran asv repeatedly overnight, posting results below.

@pep8speaks
Copy link

pep8speaks commented Nov 10, 2017

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 12, 2017 at 00:22 Hours UTC

@jbrockmendel
Copy link
Member Author

We would expect the Month, Quarter, Year (and Business/Custom) offsets to be affected by this:

taskset 4 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries -b period -b timedelta
[...]
       before           after         ratio
     [ca737aca]       [3a2dc531]
+         760±1ns          862±3ns     1.13  period.Properties.time_is_leap_year
+     6.36±0.01μs      7.15±0.01μs     1.12  timeseries.DatetimeIndex.time_timestamp_tzinfo_cons
-      16.7±0.1μs       15.0±0.1μs     0.90  timeseries.Offsets.time_custom_bday_apply
-     11.7±0.06μs      10.4±0.05μs     0.89  timeseries.Offsets.time_timeseries_year_incr
-       152±0.4μs          132±2μs     0.87  timeseries.Offsets.time_custom_bmonthbegin_incr_n
-         131±1μs        104±0.1μs     0.80  timeseries.Offsets.time_custom_bmonthbegin_decr_n
-       135±0.3μs         86.3±1μs     0.64  timeseries.Offsets.time_custom_bmonthend_incr
-       161±0.6μs        101±0.5μs     0.63  timeseries.Offsets.time_custom_bmonthend_decr_n
-       160±0.5μs       95.0±0.2μs     0.59  timeseries.Offsets.time_custom_bmonthend_incr_n
-      56.2±0.1μs      21.0±0.07μs     0.37  timeseries.SemiMonthOffset.time_end_decr_n
-      53.0±0.1μs      19.7±0.04μs     0.37  timeseries.SemiMonthOffset.time_end_incr_n
-      55.1±0.2μs      20.4±0.07μs     0.37  timeseries.SemiMonthOffset.time_begin_decr_n
-      52.1±0.1μs      19.2±0.07μs     0.37  timeseries.SemiMonthOffset.time_begin_incr_n
-     52.0±0.06μs      18.9±0.04μs     0.36  timeseries.SemiMonthOffset.time_begin_decr
-      46.8±0.1μs      16.0±0.05μs     0.34  timeseries.SemiMonthOffset.time_begin_incr
-      56.5±0.3μs      19.0±0.09μs     0.34  timeseries.SemiMonthOffset.time_end_decr
-     47.9±0.07μs      15.8±0.03μs     0.33  timeseries.SemiMonthOffset.time_end_incr
-      43.3±0.3μs      12.7±0.04μs     0.29  timeseries.SemiMonthOffset.time_begin_apply
-     45.1±0.08μs      12.0±0.05μs     0.27  timeseries.SemiMonthOffset.time_end_apply

taskset 4 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries -b period -b timedelta
[...]
       before           after         ratio
     [ca737aca]       [3a2dc531]
+        456±10ms         578±10ms     1.27  timeseries.ToDatetime.time_format_no_exact
+        27.1±2ms           31.8ms     1.17  timeseries.DatetimeIndex.time_dti_factorize
+      23.8±0.1μs      27.6±0.07μs     1.16  timeseries.Offsets.time_custom_bday_cal_incr_n
+      28.7±0.2μs       32.5±0.2μs     1.13  timeseries.Offsets.time_custom_bday_decr
+      15.2±0.2μs      16.9±0.08μs     1.12  timeseries.Offsets.time_custom_bday_apply
+        6.73±0μs      7.50±0.02μs     1.11  timeseries.DatetimeIndex.time_timestamp_tzinfo_cons
+           35.9s            39.7s     1.11  gil.nogil_datetime_fields.time_period_to_datetime
-       165±0.3μs        126±0.7μs     0.76  timeseries.Offsets.time_custom_bmonthbegin_incr_n
-         153±3μs         97.8±4μs     0.64  timeseries.Offsets.time_custom_bmonthend_incr_n
-       161±0.2μs        101±0.3μs     0.63  timeseries.Offsets.time_custom_bmonthend_decr_n
-       136±0.2μs       80.2±0.4μs     0.59  timeseries.Offsets.time_custom_bmonthend_incr
-      51.1±0.1μs      19.4±0.08μs     0.38  timeseries.SemiMonthOffset.time_begin_decr
-      55.7±0.5μs      20.6±0.07μs     0.37  timeseries.SemiMonthOffset.time_begin_decr_n
-      51.7±0.3μs       18.9±0.2μs     0.37  timeseries.SemiMonthOffset.time_end_incr_n
-        53.0±1μs       19.2±0.1μs     0.36  timeseries.SemiMonthOffset.time_end_decr
-      60.0±0.6μs      21.7±0.07μs     0.36  timeseries.SemiMonthOffset.time_end_decr_n
-     47.5±0.08μs      16.6±0.06μs     0.35  timeseries.SemiMonthOffset.time_begin_incr
-     53.9±0.06μs      18.7±0.08μs     0.35  timeseries.SemiMonthOffset.time_begin_incr_n
-      49.3±0.1μs      16.6±0.06μs     0.34  timeseries.SemiMonthOffset.time_end_incr
-      43.7±0.1μs      12.9±0.07μs     0.29  timeseries.SemiMonthOffset.time_begin_apply
-      43.8±0.1μs      12.1±0.04μs     0.28  timeseries.SemiMonthOffset.time_end_apply

taskset 4 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries -b period -b timedelta
[...]
       before           after         ratio
     [ca737aca]       [3a2dc531]
+        301±10ms          405±7ms     1.35  timeseries.AsOfDataFrame.time_asof
+       132±0.2ms        154±0.8ms     1.16  timeseries.ToDatetime.time_iso8601_tz_spaceformat
+     26.8±0.09μs       30.4±0.2μs     1.13  timeseries.Offsets.time_custom_bday_cal_incr_neg_n
-     3.78±0.01ms      3.43±0.01ms     0.91  period.Constructor.time_from_pydatetime
-     24.3±0.07μs       21.6±0.1μs     0.89  timeseries.AsOf.time_asof_single
-     7.24±0.02μs       6.41±0.2μs     0.89  timeseries.DatetimeIndex.time_timestamp_tzinfo_cons
-         904±5ns          790±3ns     0.87  period.Properties.time_quarter
-       154±0.5μs        127±0.3μs     0.83  timeseries.Offsets.time_custom_bmonthbegin_incr_n
-       129±0.3μs        104±0.6μs     0.81  timeseries.Offsets.time_custom_bmonthbegin_decr_n
-         567±2ms        459±0.8ms     0.81  timeseries.ToDatetime.time_format_exact
-       153±0.2μs         95.5±1μs     0.62  timeseries.Offsets.time_custom_bmonthend_incr_n
-       164±0.2μs        102±0.1μs     0.62  timeseries.Offsets.time_custom_bmonthend_decr_n
-       136±0.1μs         76.8±1μs     0.56  timeseries.Offsets.time_custom_bmonthend_incr
-     53.6±0.07μs       21.9±0.1μs     0.41  timeseries.SemiMonthOffset.time_begin_decr_n
-      51.0±0.1μs      19.6±0.04μs     0.38  timeseries.SemiMonthOffset.time_begin_decr
-     53.7±0.09μs      20.6±0.06μs     0.38  timeseries.SemiMonthOffset.time_end_decr_n
-      51.8±0.1μs       18.8±0.1μs     0.36  timeseries.SemiMonthOffset.time_end_decr
-      51.8±0.1μs      18.4±0.03μs     0.36  timeseries.SemiMonthOffset.time_end_incr_n
-     52.0±0.09μs      18.4±0.06μs     0.35  timeseries.SemiMonthOffset.time_begin_incr_n
-     47.4±0.06μs      15.6±0.04μs     0.33  timeseries.SemiMonthOffset.time_end_incr
-      47.7±0.3μs       15.7±0.1μs     0.33  timeseries.SemiMonthOffset.time_begin_incr
-     43.3±0.06μs      12.8±0.03μs     0.30  timeseries.SemiMonthOffset.time_begin_apply
-     45.4±0.03μs      12.3±0.03μs     0.27  timeseries.SemiMonthOffset.time_end_apply

taskset 4 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries -b period -b timedelta
[...]
       before           after         ratio
     [ca737aca]       [3a2dc531]
+         720±2ns          841±6ns     1.17  period.Properties.time_day
+           33.8s            39.3s     1.16  gil.nogil_datetime_fields.time_period_to_datetime
+      24.8±0.1μs       28.3±0.2μs     1.14  timeseries.Offsets.time_custom_bday_cal_incr
+         458±1ms          515±1ms     1.13  timeseries.ToDatetime.time_format_exact
+     27.4±0.08μs       30.1±0.2μs     1.10  timeseries.Offsets.time_custom_bday_cal_incr_neg_n
-         432±1ms          385±2ms     0.89  timeseries.SeriesArithmetic.time_add_offset_slow
-      28.6±0.2μs       24.9±0.1μs     0.87  timeseries.Offsets.time_custom_bday_cal_incr_n
-       149±0.3μs        128±0.2μs     0.86  timeseries.Offsets.time_custom_bmonthbegin_incr_n
-       129±0.4μs        109±0.2μs     0.85  timeseries.Offsets.time_custom_bmonthbegin_decr_n
-         879±4ns        732±0.7ns     0.83  period.PeriodProperties.time_hour
-       157±0.4μs         99.4±1μs     0.64  timeseries.Offsets.time_custom_bmonthend_decr_n
-       152±0.3μs       95.6±0.2μs     0.63  timeseries.Offsets.time_custom_bmonthend_incr_n
-       140±0.2μs       79.7±0.6μs     0.57  timeseries.Offsets.time_custom_bmonthend_incr
-      55.4±0.2μs      21.4±0.06μs     0.39  timeseries.SemiMonthOffset.time_begin_decr_n
-     51.6±0.07μs      19.8±0.02μs     0.38  timeseries.SemiMonthOffset.time_begin_incr_n
-     58.5±0.08μs      21.5±0.07μs     0.37  timeseries.SemiMonthOffset.time_end_decr_n
-      52.5±0.1μs      19.0±0.04μs     0.36  timeseries.SemiMonthOffset.time_begin_decr
-     51.9±0.07μs       18.6±0.1μs     0.36  timeseries.SemiMonthOffset.time_end_decr
-      53.6±0.1μs      18.2±0.07μs     0.34  timeseries.SemiMonthOffset.time_end_incr_n
-     48.6±0.05μs      15.6±0.03μs     0.32  timeseries.SemiMonthOffset.time_end_incr
-      50.0±0.1μs      15.4±0.04μs     0.31  timeseries.SemiMonthOffset.time_begin_incr
-     43.8±0.08μs      12.7±0.06μs     0.29  timeseries.SemiMonthOffset.time_end_apply
-      45.4±0.1μs      12.8±0.07μs     0.28  timeseries.SemiMonthOffset.time_begin_apply
-         290±4ms       27.2±0.2ms     0.09  timeseries.AsOfDataFrame.time_asof_nan
-         298±2ms       23.9±0.2ms     0.08  timeseries.AsOfDataFrame.time_asof

taskset 4 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries -b period -b timedelta
[...]
       before           after         ratio
     [ca737aca]       [3a2dc531]
+         745±7ns          978±4ns     1.31  period.Properties.time_second
+         762±2ns          956±3ns     1.26  period.Properties.time_is_leap_year
+         743±4ns          908±6ns     1.22  period.Properties.time_week
+      17.7±0.1μs       21.0±0.1μs     1.18  timeseries.Offsets.time_timeseries_day_apply
+      27.0±0.2μs      30.6±0.08μs     1.14  timeseries.Offsets.time_custom_bday_cal_decr
-     18.3±0.09μs       16.6±0.1μs     0.90  timeseries.AsOf.time_asof_single_early
-         816±3ns          734±2ns     0.90  period.PeriodProperties.time_minute
-     23.9±0.04μs      21.5±0.06μs     0.90  timeseries.AsOf.time_asof_single
-         294±5ms          258±6ms     0.88  timeseries.AsOfDataFrame.time_asof
-         298±7ms          255±6ms     0.86  timeseries.AsOfDataFrame.time_asof_nan
-         132±1μs        105±0.8μs     0.79  timeseries.Offsets.time_custom_bmonthbegin_decr_n
-       163±0.5μs        126±0.2μs     0.77  timeseries.Offsets.time_custom_bmonthbegin_incr_n
-         159±1μs        101±0.2μs     0.64  timeseries.Offsets.time_custom_bmonthend_decr_n
-         149±2μs      93.6±0.04μs     0.63  timeseries.Offsets.time_custom_bmonthend_incr_n
-       135±0.2μs       77.2±0.1μs     0.57  timeseries.Offsets.time_custom_bmonthend_incr
-      55.1±0.1μs      21.0±0.08μs     0.38  timeseries.SemiMonthOffset.time_end_decr_n
-     52.4±0.08μs      19.7±0.03μs     0.38  timeseries.SemiMonthOffset.time_begin_decr
-      53.7±0.1μs       20.2±0.1μs     0.38  timeseries.SemiMonthOffset.time_begin_decr_n
-     54.2±0.09μs      20.0±0.06μs     0.37  timeseries.SemiMonthOffset.time_end_decr
-     52.4±0.09μs       18.6±0.1μs     0.36  timeseries.SemiMonthOffset.time_begin_incr_n
-      53.6±0.4μs      18.2±0.07μs     0.34  timeseries.SemiMonthOffset.time_end_incr_n
-     48.4±0.06μs      15.4±0.05μs     0.32  timeseries.SemiMonthOffset.time_end_incr
-      48.5±0.2μs      15.4±0.03μs     0.32  timeseries.SemiMonthOffset.time_begin_incr
-     44.2±0.06μs      13.2±0.04μs     0.30  timeseries.SemiMonthOffset.time_end_apply
-      44.8±0.2μs      12.8±0.05μs     0.29  timeseries.SemiMonthOffset.time_begin_apply
-        26.9±2ms       7.22±0.2ms     0.27  timeseries.DatetimeIndex.time_dti_tz_factorize
-        26.6±2ms       6.65±0.2ms     0.25  timeseries.DatetimeIndex.time_dti_factorize

@@ -252,6 +253,8 @@ def apply_index(self, i):
"applied vectorized".format(kwd=kwd))

def isAnchored(self):
# TODO: Does this make sense for the general case? It would help
# if there were a canonical docstring for what isAnchored means.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback : Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah to be honested I am not sure isAnchored is really necessary, but that's orthogonal

@@ -721,6 +724,7 @@ def apply(self, other):

return result
else:
# TODO: Figure out the end of this sente
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume you're going to figure this out beforehand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean what the end of the error message should be? That's orthogonal to this PR, but merits a reminder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah not sure, @sinhrks wrote this originally.

@gfyoung gfyoung added Clean Internals Related to non-user accessible pandas implementation labels Nov 10, 2017
@jreback
Copy link
Contributor

jreback commented Nov 10, 2017

this looks reasonable. pls add a whatsnew note on improved perf of offsets, add all PRs that touch offsets.pyx to this issue.

@@ -252,6 +253,8 @@ def apply_index(self, i):
"applied vectorized".format(kwd=kwd))

def isAnchored(self):
# TODO: Does this make sense for the general case? It would help
# if there were a canonical docstring for what isAnchored means.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah to be honested I am not sure isAnchored is really necessary, but that's orthogonal

@@ -721,6 +724,7 @@ def apply(self, other):

return result
else:
# TODO: Figure out the end of this sente
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah not sure, @sinhrks wrote this originally.

@@ -15,7 +15,7 @@ np.import_array()

from util cimport is_string_object

from pandas._libs.tslib import pydt_to_i8
from pandas._libs.tslib import pydt_to_i8, monthrange
Copy link
Contributor

@jreback jreback Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob should move monthrange and everything in its impl to offsets (and then you can cimport these to tslib.pyx)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually. We've got a few more of these left to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, add to the list

@jreback
Copy link
Contributor

jreback commented Nov 10, 2017

doesn't this supersede #18183 ?

@jbrockmendel
Copy link
Member Author

this looks reasonable. pls add a whatsnew note on improved perf of offsets, add all PRs that touch offsets.pyx to this issue.

Will do.

@jbrockmendel
Copy link
Member Author

doesn't this supersede #18183 ?

Yes. Just closed that one.

int year, month, day
int dim, dy

(dy, month) = divmod(stamp.month + months, 12)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cython doesn't seem to optimize this, instead write the two ops directly

dy = (stamp.month + months) // 12
month = (stamp.month + months) % 12

@codecov
Copy link

codecov bot commented Nov 11, 2017

Codecov Report

Merging #18218 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18218      +/-   ##
==========================================
- Coverage    91.4%   91.38%   -0.02%     
==========================================
  Files         163      163              
  Lines       50091    50091              
==========================================
- Hits        45788    45778      -10     
- Misses       4303     4313      +10
Flag Coverage Δ
#multiple 89.19% <100%> (-0.01%) ⬇️
#single 40.36% <0%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/offsets.py 97.11% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.38% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b36dab5...21216a8. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase

@@ -70,7 +70,7 @@ Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Indexers on Series or DataFrame no longer create a reference cycle (:issue:`17956`)
-
- DateOffset arithmetic performance is improved (:issue:`18218`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a :class`DateOffset` (its now defined

elif day_opt == 'end':
day = dim
else:
# assume this is an integer (and a valid day)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? why is day_opt anything else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if it is, then explicity put it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? why is day_opt anything else?

For e.g. semi-month offsets we may be shifting to a particular day other than the first or last of the month.

and if it is, then explicity put it.

You mean assert it? OK. I'll go ahead and write a docstring too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if it’s an integer

the else clause should raise

@jbrockmendel
Copy link
Member Author

Clipboard.

@jreback jreback added this to the 0.22.0 milestone Nov 12, 2017
@jreback jreback merged commit bd62014 into pandas-dev:master Nov 12, 2017
@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

thanks!

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
@jbrockmendel jbrockmendel deleted the tslibs-offsets-shift_month branch December 8, 2017 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants