Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Inconsistent to_datetime behaviour #57051

Closed
2 of 3 tasks
abhijeetbodas2001 opened this issue Jan 24, 2024 · 6 comments · Fixed by #57548
Closed
2 of 3 tasks

BUG: Inconsistent to_datetime behaviour #57051

abhijeetbodas2001 opened this issue Jan 24, 2024 · 6 comments · Fixed by #57548
Labels
Needs Info Clarification about behavior needed to assess issue

Comments

@abhijeetbodas2001
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> import pandas as pd

>>> pd.to_datetime("1704660000", unit="s", origin="unix")
<stdin>:1: FutureWarning: The behavior of 'to_datetime' with 'unit' when parsing strings is deprecated. In a future version, strings will be parsed as datetime strings, matching the behavior without a 'unit'. To retain the old behavior, explicitly cast ints or floats to numeric type before calling to_datetime.
Timestamp('2024-01-07 20:39:28')

>>> pd.to_datetime(1704660000, unit="s", origin="unix")
Timestamp('2024-01-07 20:40:00')

Issue Description

Incorrect datetime object generated when unix timestamp is passed as string.

Expected Behavior

The timestamp string should be converted to the correct datetime regarless of the dtype of the input (string or int). Or, raise error if string inputs are illegal.

The current depracation warning seems to imply that the behaviour is working as expected now, but won't be available in future versions.

Installed Versions

[~] $ pip list | rg "pandas|numpy"
numpy                             1.26.1
pandas                            2.2.0
pandas-stubs                      2.1.4.231218
@abhijeetbodas2001 abhijeetbodas2001 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 24, 2024
@MarcoGorelli
Copy link
Member

The current depracation warning seems to imply that the behaviour is working as expected now, but won't be available in future versions.

indeed, it'll error in the future

not sure what you're asking, sorry - could you clarify please? what's your expected output?

@MarcoGorelli MarcoGorelli added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member Bug labels Jan 24, 2024
@abhijeetbodas2001
Copy link
Author

Thanks for your response.

not sure what you're asking, sorry - could you clarify please? what's your expected output?

to_datetime currently gives different outputs when the same unix timestamp is passed as a string vs as an int. Question is, should it not return the same datetime object in both the cases?

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Jan 25, 2024

thanks - sorry, I'd missed the minutes/seconds difference

I agree, but as the first one's deprecated this probably isn't worth fixing

Thanks anyway for the report! 🙏

@abhijeetbodas2001
Copy link
Author

I had hoped a fix could have made it to the next minor release (ie, before it is removed), but I can understand if it does not make the cut. Anyways, thanks!

@QuLogic
Copy link
Contributor

QuLogic commented Feb 21, 2024

Do note that this was working correctly in 1.5.3; so while it is deprecated, this is a regression introduced in Pandas 2.0.0.

@QuLogic
Copy link
Contributor

QuLogic commented Feb 21, 2024

Since I have a good&bad commit, I can bisect the change, which points to 83c2a5f.

I believe the issue is that the string is being cast to float32 instead of float64 somewhere:

>>> pd.to_datetime(np.float32("1704660000"), unit="s", origin="unix")
Timestamp('2024-01-07 20:39:28')
>>> pd.to_datetime(np.float64("1704660000"), unit="s", origin="unix")
Timestamp('2024-01-07 20:40:00')
>>> 

QuLogic added a commit to QuLogic/pandas that referenced this issue Feb 21, 2024
In Pandas 1.5.3, the `float(val)` cast was inline to the
`cast_from_unit` call in `array_with_unit_to_datetime`. This caused the
intermediate (unnamed) value to be a Python float.

Since pandas-dev#50301, an temporary variable was added to avoid multiple casts,
but with explicit type `cdef float`, which defines a _Cython_ float.
This type is 32-bit, and causes a loss of precision, and a regression in
parsing from 1.5.3.

Since `cast_from_unit` takes an `object`, not a more specific Cython
type, remove the explicit type from the temporary `fval` variable
entirely. This will cause it to be a (64-bit) Python float, and thus not
lose precision.

Fixes pandas-dev#57051
QuLogic added a commit to QuLogic/pandas that referenced this issue Feb 21, 2024
In Pandas 1.5.3, the `float(val)` cast was inline to the
`cast_from_unit` call in `array_with_unit_to_datetime`. This caused the
intermediate (unnamed) value to be a Python float.

Since pandas-dev#50301, a temporary variable was added to avoid multiple casts,
but with explicit type `cdef float`, which defines a _Cython_ float.
This type is 32-bit, and causes a loss of precision, and a regression in
parsing from 1.5.3.

Since `cast_from_unit` takes an `object`, not a more specific Cython
type, remove the explicit type from the temporary `fval` variable
entirely. This will cause it to be a (64-bit) Python float, and thus not
lose precision.

Fixes pandas-dev#57051
QuLogic added a commit to QuLogic/pandas that referenced this issue Feb 22, 2024
In Pandas 1.5.3, the `float(val)` cast was inline to the
`cast_from_unit` call in `array_with_unit_to_datetime`. This caused the
intermediate (unnamed) value to be a Python float.

Since pandas-dev#50301, a temporary variable was added to avoid multiple casts,
but with explicit type `cdef float`, which defines a _Cython_ float.
This type is 32-bit, and causes a loss of precision, and a regression in
parsing from 1.5.3.

So widen the explicit type of the temporary `fval` variable to (64-bit)
`double`, which will not lose precision.

Fixes pandas-dev#57051
QuLogic added a commit to QuLogic/pandas that referenced this issue Feb 22, 2024
In Pandas 1.5.3, the `float(val)` cast was inline to the
`cast_from_unit` call in `array_with_unit_to_datetime`. This caused the
intermediate (unnamed) value to be a Python float.

Since pandas-dev#50301, a temporary variable was added to avoid multiple casts,
but with explicit type `cdef float`, which defines a _Cython_ float.
This type is 32-bit, and causes a loss of precision, and a regression in
parsing from 1.5.3.

So widen the explicit type of the temporary `fval` variable to (64-bit)
`double`, which will not lose precision.

Fixes pandas-dev#57051
QuLogic added a commit to QuLogic/pandas that referenced this issue Mar 27, 2024
In Pandas 1.5.3, the `float(val)` cast was inline to the
`cast_from_unit` call in `array_with_unit_to_datetime`. This caused the
intermediate (unnamed) value to be a Python float.

Since pandas-dev#50301, a temporary variable was added to avoid multiple casts,
but with explicit type `cdef float`, which defines a _Cython_ float.
This type is 32-bit, and causes a loss of precision, and a regression in
parsing from 1.5.3.

So widen the explicit type of the temporary `fval` variable to (64-bit)
`double`, which will not lose precision.

Fixes pandas-dev#57051
mroeschke pushed a commit that referenced this issue Mar 27, 2024
In Pandas 1.5.3, the `float(val)` cast was inline to the
`cast_from_unit` call in `array_with_unit_to_datetime`. This caused the
intermediate (unnamed) value to be a Python float.

Since #50301, a temporary variable was added to avoid multiple casts,
but with explicit type `cdef float`, which defines a _Cython_ float.
This type is 32-bit, and causes a loss of precision, and a regression in
parsing from 1.5.3.

So widen the explicit type of the temporary `fval` variable to (64-bit)
`double`, which will not lose precision.

Fixes #57051
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this issue May 7, 2024
…as-dev#57548)

In Pandas 1.5.3, the `float(val)` cast was inline to the
`cast_from_unit` call in `array_with_unit_to_datetime`. This caused the
intermediate (unnamed) value to be a Python float.

Since pandas-dev#50301, a temporary variable was added to avoid multiple casts,
but with explicit type `cdef float`, which defines a _Cython_ float.
This type is 32-bit, and causes a loss of precision, and a regression in
parsing from 1.5.3.

So widen the explicit type of the temporary `fval` variable to (64-bit)
`double`, which will not lose precision.

Fixes pandas-dev#57051
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants