BUG: Inconsistent `to_datetime` behaviour #57051

abhijeetbodas2001 · 2024-01-24T11:53:55Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> import pandas as pd

>>> pd.to_datetime("1704660000", unit="s", origin="unix")
<stdin>:1: FutureWarning: The behavior of 'to_datetime' with 'unit' when parsing strings is deprecated. In a future version, strings will be parsed as datetime strings, matching the behavior without a 'unit'. To retain the old behavior, explicitly cast ints or floats to numeric type before calling to_datetime.
Timestamp('2024-01-07 20:39:28')

>>> pd.to_datetime(1704660000, unit="s", origin="unix")
Timestamp('2024-01-07 20:40:00')

Issue Description

Incorrect datetime object generated when unix timestamp is passed as string.

Expected Behavior

The timestamp string should be converted to the correct datetime regarless of the dtype of the input (string or int). Or, raise error if string inputs are illegal.

The current depracation warning seems to imply that the behaviour is working as expected now, but won't be available in future versions.

Installed Versions

[~] $ pip list | rg "pandas|numpy"
numpy                             1.26.1
pandas                            2.2.0
pandas-stubs                      2.1.4.231218

The text was updated successfully, but these errors were encountered:

MarcoGorelli · 2024-01-24T18:58:58Z

The current depracation warning seems to imply that the behaviour is working as expected now, but won't be available in future versions.

indeed, it'll error in the future

not sure what you're asking, sorry - could you clarify please? what's your expected output?

abhijeetbodas2001 · 2024-01-25T07:35:24Z

Thanks for your response.

not sure what you're asking, sorry - could you clarify please? what's your expected output?

to_datetime currently gives different outputs when the same unix timestamp is passed as a string vs as an int. Question is, should it not return the same datetime object in both the cases?

MarcoGorelli · 2024-01-25T08:46:03Z

thanks - sorry, I'd missed the minutes/seconds difference

I agree, but as the first one's deprecated this probably isn't worth fixing

Thanks anyway for the report! 🙏

abhijeetbodas2001 · 2024-01-25T09:29:38Z

I had hoped a fix could have made it to the next minor release (ie, before it is removed), but I can understand if it does not make the cut. Anyways, thanks!

QuLogic · 2024-02-21T09:17:03Z

Do note that this was working correctly in 1.5.3; so while it is deprecated, this is a regression introduced in Pandas 2.0.0.

QuLogic · 2024-02-21T10:41:44Z

Since I have a good&bad commit, I can bisect the change, which points to 83c2a5f.

I believe the issue is that the string is being cast to float32 instead of float64 somewhere:

>>> pd.to_datetime(np.float32("1704660000"), unit="s", origin="unix")
Timestamp('2024-01-07 20:39:28')
>>> pd.to_datetime(np.float64("1704660000"), unit="s", origin="unix")
Timestamp('2024-01-07 20:40:00')
>>>

In Pandas 1.5.3, the `float(val)` cast was inline to the `cast_from_unit` call in `array_with_unit_to_datetime`. This caused the intermediate (unnamed) value to be a Python float. Since pandas-dev#50301, an temporary variable was added to avoid multiple casts, but with explicit type `cdef float`, which defines a _Cython_ float. This type is 32-bit, and causes a loss of precision, and a regression in parsing from 1.5.3. Since `cast_from_unit` takes an `object`, not a more specific Cython type, remove the explicit type from the temporary `fval` variable entirely. This will cause it to be a (64-bit) Python float, and thus not lose precision. Fixes pandas-dev#57051

In Pandas 1.5.3, the `float(val)` cast was inline to the `cast_from_unit` call in `array_with_unit_to_datetime`. This caused the intermediate (unnamed) value to be a Python float. Since pandas-dev#50301, a temporary variable was added to avoid multiple casts, but with explicit type `cdef float`, which defines a _Cython_ float. This type is 32-bit, and causes a loss of precision, and a regression in parsing from 1.5.3. Since `cast_from_unit` takes an `object`, not a more specific Cython type, remove the explicit type from the temporary `fval` variable entirely. This will cause it to be a (64-bit) Python float, and thus not lose precision. Fixes pandas-dev#57051

In Pandas 1.5.3, the `float(val)` cast was inline to the `cast_from_unit` call in `array_with_unit_to_datetime`. This caused the intermediate (unnamed) value to be a Python float. Since pandas-dev#50301, a temporary variable was added to avoid multiple casts, but with explicit type `cdef float`, which defines a _Cython_ float. This type is 32-bit, and causes a loss of precision, and a regression in parsing from 1.5.3. So widen the explicit type of the temporary `fval` variable to (64-bit) `double`, which will not lose precision. Fixes pandas-dev#57051

In Pandas 1.5.3, the `float(val)` cast was inline to the `cast_from_unit` call in `array_with_unit_to_datetime`. This caused the intermediate (unnamed) value to be a Python float. Since #50301, a temporary variable was added to avoid multiple casts, but with explicit type `cdef float`, which defines a _Cython_ float. This type is 32-bit, and causes a loss of precision, and a regression in parsing from 1.5.3. So widen the explicit type of the temporary `fval` variable to (64-bit) `double`, which will not lose precision. Fixes #57051

…as-dev#57548) In Pandas 1.5.3, the `float(val)` cast was inline to the `cast_from_unit` call in `array_with_unit_to_datetime`. This caused the intermediate (unnamed) value to be a Python float. Since pandas-dev#50301, a temporary variable was added to avoid multiple casts, but with explicit type `cdef float`, which defines a _Cython_ float. This type is 32-bit, and causes a loss of precision, and a regression in parsing from 1.5.3. So widen the explicit type of the temporary `fval` variable to (64-bit) `double`, which will not lose precision. Fixes pandas-dev#57051

abhijeetbodas2001 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 24, 2024

MarcoGorelli added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member Bug labels Jan 24, 2024

MarcoGorelli closed this as completed Jan 25, 2024

QuLogic mentioned this issue Feb 21, 2024

Fix accidental loss-of-precision for to_datetime(str, unit=...) #57548

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Inconsistent `to_datetime` behaviour #57051

BUG: Inconsistent `to_datetime` behaviour #57051

abhijeetbodas2001 commented Jan 24, 2024

MarcoGorelli commented Jan 24, 2024

abhijeetbodas2001 commented Jan 25, 2024

MarcoGorelli commented Jan 25, 2024 •

edited

Loading

abhijeetbodas2001 commented Jan 25, 2024

QuLogic commented Feb 21, 2024

QuLogic commented Feb 21, 2024

BUG: Inconsistent to_datetime behaviour #57051

BUG: Inconsistent to_datetime behaviour #57051

Comments

abhijeetbodas2001 commented Jan 24, 2024

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

MarcoGorelli commented Jan 24, 2024

abhijeetbodas2001 commented Jan 25, 2024

MarcoGorelli commented Jan 25, 2024 • edited Loading

abhijeetbodas2001 commented Jan 25, 2024

QuLogic commented Feb 21, 2024

QuLogic commented Feb 21, 2024

BUG: Inconsistent `to_datetime` behaviour #57051

BUG: Inconsistent `to_datetime` behaviour #57051

MarcoGorelli commented Jan 25, 2024 •

edited

Loading