-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: inconsistent date parsing by to_datetime #7348
Comments
@aschilling If you want to have dates parsed with the day first, you can use This is a problem with |
I understand, but don't you think that pandas should raise then an Exception as suggested by Wes? Especially, if it is filed for more than a year? |
I think you mean Andy :-) (@hayd) But maybe this could be better stated in the docs, that if you want strict parsing, you should use |
This is mentioned in the docs: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#converting-to-timestamps I agree, it sucks but solution is non-trivial, I've looked into fixing this part of dateutil in the past (so many if blocks) alternatively it has been discussed using another (C) library for date parsing (not sure issue number but I think that one was Wes). Would love to see this solved! Perhaps it would mitigate some of this to use default dayfirst based on your locale? :s closing as dupe of #3341 |
@hayd It is not really mentioned (although shown in the example) that you can have mixed formats and so each date is parsed individually what would seem best, and if you want uniform parsing for full column you should pass Is it correct what I am saying? I can add a clarification. |
I was linking to the Warning and example, it Notes format just below. Do you mean adding an example like I've actually come up with in spreadsheets where an inconsistent format was used (potentially with inconsistent datefirsts - but best to ignore that!) here format doesn't really help. |
No, the example has already a mixed format: Do I make myself more clear now? |
Not quite sure I understand what you're saying! The lack of strictness is definitely a bug, you should be able to parse mixed formats provided they are dayfirst (and potentially ambiguous*). I've looked at this before: the dateutil code is IMO super hairy, although IIRC there is some flag for dayfirst which looked like it was supposed to control this (but it falls through when it shouldn't), I couldn't understand how it worked (or why it didn't), but potentially this could be fixed... __That is, 1st Feb and Feb 1st should both parse in either (as they're not ambiguous) but 1/14 should not parse if dayfirst (as it's ambiguously/incorrectly ordered).* |
Hmm, it can be difficult to put it in words sometimes .. :-) Another attempt:
|
I still think this is 100% solved if dayfirst were strict, as it would raise here... so we're in agreement? Or do you something in mind here (without tweaking dateutil)? ps I really like the flexible schema approach, I often see messy "date" cols. |
See #12585 |
Hi everybody,
lets take the following dates:
dates = ['04/07/2013 21:40','22/06/2013 02:40']
if I call
pd.to_datetime
on these two date strings, it produces incorrect results. For the first date '7' is interpreted as day-value while for the second date '22' is interpreted as the day-value.Please get me right I have no problem with the wrong parsed date value in the first string, but with the missing propagation of the corrent interpretation scheme after reading the second string.
Thanks
The text was updated successfully, but these errors were encountered: