-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP BUG: Inconsistent date parsing of to_datetime #35428
WIP BUG: Inconsistent date parsing of to_datetime #35428
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think the warning approach is reasonable.
Might be preferable to the user if they just get 1 warning along the lines of "Parsed {date} as DD/MM/YYY and {other date} as MM/DD/YYY. Provide format
or specify infer_datetime_format=True for consistent parsing"
Sounds good. I'll keep working on this |
@arw2019 doc build failing
|
@arw2019 are you still working on this? |
@simonjayhawkins yes, will push what I have shortly. Sorry about keeping this idle |
Closing this for now. I'll circle back when I'm ready for review. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This PR is an attempt to address concerns about datetime parsing. I'll update the docs if people approve this change.
Original problem
From #12585:
The issue is that the first item is parsed as
DD\MM\YYYY
whereas for the second the format isMM\DD\YYYY
, and there is no message to alert the user of inconsistency in the output.The problem lies with the
dayfirst
argument that selects betweenDD\MM\YYYY
andMM\DD\YYYY
. In the parser:pandas/pandas/_libs/tslibs/parsing.pyx
Lines 94 to 163 in 04e9e0a
we try
DD\MM\YYYY
orMM\DD\YYYY
first depending ondayfirst
but if the result is invalid we try the other.Proposed change
I'd like to raise warnings whenever we parse in contradiction to the directive of
dayfirst
. The OP, wheredayfirst
defaults toFalse
, now raises a Warning:A downside of this change is that sometimes a consistent output produces unnecessary warnings (due to mismatch with the default value of
dayfirst
):but these can be silenced with
infer_datetime_format
:For invalid input (if it cannot be parsed according to a single format) we will always raise a warning, even with
infer_datetime_format=True
:Remaining issues
Personally I can see a case for raising an error with
since there's no valid way to process that. I also think that
shouldn't be raising warnings, but I feel much less strongly about that. That said, changing this behaviour will require more heavy-handed alterations to the codebase (and maybe performance implications - not sure).
I think that the warnings I added here will allow users to figure out the origin of these problems when they turn up and that should do away with most of the headaches.