Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: mixture of timezone-aware string and timezone-naive datetime treated differently for ISO vs non-ISO paths #50254

Closed
3 tasks done
MarcoGorelli opened this issue Dec 14, 2022 · 1 comment · Fixed by #50242
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Milestone

Comments

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Dec 14, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# ISO:
In [1]: pd.to_datetime(["2020-01-01 01:00:00-01:00", datetime(2020, 1, 1, 3, 0)], format='%Y-%m-%d %H:%M:%S%z')
Out[1]: DatetimeIndex(['2020-01-01 01:00:00-01:00', '2020-01-01 02:00:00-01:00'], dtype='datetime64[ns, UTC-01:00]', freq=None)

# non-ISO:
In [2]: pd.to_datetime(["2020-01-01 01:00:00-01:00", datetime(2020, 1, 1, 3, 0)], format='%Y-%d-%m %H:%M:%S%z')
Out[2]: Index([2020-01-01 01:00:00-01:00, 2020-01-01 03:00:00], dtype='object')

Issue Description

This is inconsistent. I think I prefer the latter - datetime(2020, 1, 1, 3, 0) is timezone-naive, so it seems right to do the same thing we'd do if there were mixed offsets, i.e. return an Index of dtype object

As far as I can tell, the only place this is tested is

- A mix of timezone-aware and timezone-naive inputs is converted to
a timezone-aware :class:`DatetimeIndex` if the offsets of the timezone-aware
are constant:
>>> from datetime import datetime
>>> pd.to_datetime(["2020-01-01 01:00:00-01:00", datetime(2020, 1, 1, 3, 0)])
DatetimeIndex(['2020-01-01 01:00:00-01:00', '2020-01-01 02:00:00-01:00'],
dtype='datetime64[ns, UTC-01:00]', freq=None)

Timezone-naive and timezone-aware are not the same format, so if we now expect elements to be parsed with the same format, I don't think this should fly

Expected Behavior

Same output in both cases, I think this should be:

In [1]: pd.to_datetime(["2020-01-01 01:00:00-01:00", datetime(2020, 1, 1, 3, 0)], format='%Y-%m-%d %H:%M:%S%z')  # ISO: naive dt.datetime is converted
Out[1]: Index([2020-01-01 01:00:00-01:00, 2020-01-01 03:00:00], dtype='object')

In [2]: pd.to_datetime(["2020-01-01 01:00:00-01:00", datetime(2020, 1, 1, 3, 0)], format='%Y-%d-%m %H:%M:%S%z')  # non-ISO: dt.datetime is left as-is, object Index returned
Out[2]: Index([2020-01-01 01:00:00-01:00, 2020-01-01 03:00:00], dtype='object')

Installed Versions

INSTALLED VERSIONS

commit : 113bdb3
python : 3.8.15.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.102.1-microsoft-standard-WSL2
Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 2.0.0.dev0+918.g113bdb3769
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.5.1
pip : 22.3.1
Cython : 0.29.32
pytest : 7.2.0
hypothesis : 6.61.0
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.1.2
IPython : 8.7.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : 2022.12.0
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.6.2
numba : 0.56.4
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 9.0.0
pyreadstat : 1.2.0
pyxlsb : 1.0.10
s3fs : 2021.11.0
scipy : 1.9.3
snappy :
sqlalchemy : 1.4.45
tables : 3.7.0
tabulate : 0.9.0
xarray : 2022.12.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : None
qtpy : None
pyqt5 : None
None

@MarcoGorelli MarcoGorelli added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 14, 2022
@jorisvandenbossche
Copy link
Member

Resolved by #50242

@jorisvandenbossche jorisvandenbossche added this to the 2.0 milestone Jan 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants