Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_sql Out of bounds nanosecond timestamp #26761

Closed
gonghwa opened this issue Jun 10, 2019 · 13 comments · Fixed by #33140
Closed

to_sql Out of bounds nanosecond timestamp #26761

gonghwa opened this issue Jun 10, 2019 · 13 comments · Fixed by #33140
Labels
Datetime Datetime data dtype IO SQL to_sql, read_sql, read_sql_query Timezones Timezone data dtype

Comments

@gonghwa
Copy link

gonghwa commented Jun 10, 2019

Code Sample, a copy-pastable example if possible

# Your code here
df.to_sql(TableName,index=False,if_exists="replace")

Problem description

I upgraded my pandas to 0.24.2 recently and encountered below error, which will not pop before.
When I am using python to transfer some data to sql server. Some date is set to 9999-12-31 and it will cause error: OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00

Expected Output

We should be able to transfer those date. Thanks

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.2
pytest: 3.8.0
pip: 19.0.2
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.16.2
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml.etree: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@thiviyanT
Copy link
Contributor

thiviyanT commented Jun 10, 2019

Here is my attempt at reproducing the issue with two different versions of pandas.

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine('mysql+pymysql://username:password@localhost/database?host=localhost?port=3306', echo=False)

df = pd.DataFrame({'DOB': {0: '26/1/2016 ', 2: '31/12/9999 '}})

df.DOB = pd.to_datetime(df.DOB)  # This line throws the error
df.to_sql('users', con=engine, index=False, if_exists='replace')

Here is what I got with pandas v0.24.2:

Traceback (most recent call last):
  File "/Users/user/anaconda2/envs/pandas-test/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py", line 1861, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data)
  File "pandas/_libs/tslibs/conversion.pyx", line 200, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 8, in <module>
    df.DOB = pd.to_datetime(df.DOB)  # This line throws the error
  File "/Users/user/anaconda2/envs/pandas-test/lib/python3.6/site-packages/pandas/core/tools/datetimes.py", line 592, in to_datetime
    values = convert_listlike(arg._values, True, format)
  File "/Users/user/anaconda2/envs/pandas-test/lib/python3.6/site-packages/pandas/core/tools/datetimes.py", line 302, in _convert_listlike_datetimes
    allow_object=True)
  File "/Users/user/anaconda2/envs/pandas-test/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py", line 1866, in objects_to_datetime64ns
    raise e
  File "/Users/user/anaconda2/envs/pandas-test/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py", line 1857, in objects_to_datetime64ns
    require_iso8601=require_iso8601
  File "pandas/_libs/tslib.pyx", line 460, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 680, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 676, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 628, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslibs/conversion.pyx", line 391, in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject
  File "pandas/_libs/tslibs/np_datetime.pyx", line 120, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00

Key points:

  • From the traceback, it is clear that the issue arises from converting the strings into datetime objects (i.e. pd.to_datetime(df.DOB)) and not from the .to_sql()method
  • I also tried the same code with pandas version 0.25.0.dev0+692.gfb330cf38 (currently in development) and I encountered the same issue.

@jreback @jorisvandenbossche Is this the expected behaviour or is this a bug?

@jbrockmendel
Copy link
Member

Is this the expected behaviour or is this a bug?

This is the expected behavior. pd.Timestamp supports datetimes from 1677-09-21 to 2262-04-11.

Is 12/31/9999 a placeholder for missing data? If so, consider replacing it with "NaT", which pd.to_datetime will understand.

@gonghwa
Copy link
Author

gonghwa commented Jun 12, 2019

Thanks for looking at this.
Yes, I know it is expected behavior when using pd.timestamp. But when I downgrade pandas to 0.23.4 the to_sql works perfectly with year 7229

I use
df=pd.read_sql_query('select * from a looped table list, conn )
then followed by:
df.to_sql(TableName,index=False,if_exists="replace")

It is impossible for me to replace out of range date with "NaT" because I can't tell which column stands for date from over 100 tables, and the year can be 9999 or other number greater than 2261.

@TomAugspurger
Copy link
Contributor

@gonghwa we need a reproducible example from you. http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

If you're able to use sqlite that's best since it's included with python. If you can't reproduce with sqlite then you'll need to provide the database info (database and SQLAlchemy driver you're using)

@gonghwa
Copy link
Author

gonghwa commented Jun 13, 2019

Hi, Tom,

Here is the code I can reproduce the error, but the issue only exists on mssql, not sqllite3.

import pandas as pd
import datetime
from sqlalchemy import create_engine
connapt=create_engine("mssql+pyodbc://SQLEXPRESSAPT")

df=pd.DataFrame({'date':datetime.datetime(9999,1,1)},index=[0])
df.to_sql('pd024',connapt,index=False,if_exists="replace")

With 0.23.4 it finished without error, with 0.24.2, it reports: OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-01-01 00:00:00

Thank you.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 13, 2019 via email

@gonghwa
Copy link
Author

gonghwa commented Jul 2, 2019

I got time to look into details and find the error comes from the change in pandas\io\sql.py, row 877:
In pandas 0.24 it is:

      if col_type == 'datetime64' or col_type == 'datetime':
            # GH 9086: TIMESTAMP is the suggested type if the column contains
            # timezone information
            try:
                if col.dt.tz is not None:
                    return TIMESTAMP(timezone=True)
            except AttributeError:
                # The column is actually a DatetimeIndex
                if col.tz is not None:
                    return TIMESTAMP(timezone=True)
            return DateTime

In pandas 0.23 it is:

       if col_type == 'datetime64' or col_type == 'datetime':
            try:
                tz = col.tzinfo  # noqa
                return DateTime(timezone=True)
            except:
                return DateTime

col.dt.tz returns the error.

Shall we add one more except to handle OutOfBoundsDatetime:

      if col_type == 'datetime64' or col_type == 'datetime':
            # GH 9086: TIMESTAMP is the suggested type if the column contains
            # timezone information
            try:
                if col.dt.tz is not None:
                    return TIMESTAMP(timezone=True)
            except AttributeError:
                # The column is actually a DatetimeIndex
                if col.tz is not None:
                    return TIMESTAMP(timezone=True)
            except OutOfBoundsDatetime:
                return DateTime
            return DateTime

Do you see anywhere in to_sql where we convert object-dtype columns to datetime?

@mroeschke
Copy link
Member

Do you have a small example with data where the OutOfBoundsDatetime is raised? I find it strange that a col.dt.tz call would raise that error since it should just be accessing an attribute.

@gonghwa
Copy link
Author

gonghwa commented Jul 3, 2019

import pandas as pd
import datetime

temp=pd.DataFrame({'date':datetime.datetime(9999,1,1)},index=[0])
col=temp.date
col.dt.tz

The error msg says:
data, inferred_tz = objects_to_datetime64ns(data, dayfirst=dayfirst, yearfirst=yearfirst)

so I guess the attribute comes from calling the function objects_to_datetime64ns, which has the limitation of: supports datetimes from 1677-09-21 to 2262-04-11

@mroeschke
Copy link
Member

mroeschke commented Jul 3, 2019

Thanks for finding the root issue. That definitely appears like a bug. Since that specific branch is really just checking for potential timezones in the date schema columns (which OutOfBoundsDatetime won't tend to have), I think adding your exception case is reasonable.

PR's always welcome.

@mroeschke mroeschke added Datetime Datetime data dtype Timezones Timezone data dtype labels Jul 3, 2019
@gonghwa
Copy link
Author

gonghwa commented Jul 4, 2019

My pleasure.

I found that date out of the period from 1677-09-21 to 2262-04-11 with timezone will raise another exception: ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

import pandas as pd
import datetime

temp=pd.DataFrame({'date':datetime.datetime(9999,1,1,tzinfo=datetime.timezone(datetime.timedelta(hours=12)))},index=[0])
col=temp.date
col.dt.tz

So in conclusion, when date is out of boundary:
without timezone, exception is OutOfBoundsDatetime
with timezone, exception is ValueError

@jbrockmendel jbrockmendel added the IO SQL to_sql, read_sql, read_sql_query label Oct 16, 2019
@Arjun-Arvindakshan
Copy link

hi..
I seem to get this error while running my code.

Error while fetching data from PostgreSQL Out of bounds nanosecond timestamp: 1-03-22 00:00:00
Anyone have any idea on what might be the possible reason behind this?

@gonghwa
Copy link
Author

gonghwa commented Dec 5, 2019

hi..
I seem to get this error while running my code.

Error while fetching data from PostgreSQL Out of bounds nanosecond timestamp: 1-03-22 00:00:00
Anyone have any idea on what might be the possible reason behind this?

I am not sure whether it is fixed in new release or not because I haven't use same data in pandas for long time. Try to downgrade pandas to 0.23.4. If the error persists you have to find the root cause by yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype IO SQL to_sql, read_sql, read_sql_query Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants