Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_csv casting datetimes in categorical to int #44930

Merged
merged 10 commits into from
Dec 22, 2021

Conversation

phofl
Copy link
Member

@phofl phofl commented Dec 16, 2021

@phofl phofl added Categorical Categorical Data Type IO CSV read_csv, to_csv labels Dec 16, 2021
@jbrockmendel
Copy link
Member

i think the underlying problem is in Categorical.astype(object)

@phofl phofl marked this pull request as draft December 16, 2021 19:27
@phofl
Copy link
Member Author

phofl commented Dec 16, 2021

Was not sure if this was intentional in the astype call. Will have a look again

@@ -532,7 +533,12 @@ def astype(self, dtype: AstypeArg, copy: bool = True) -> ArrayLike:
else:
# GH8628 (PERF): astype category codes instead of astyping array
try:
new_cats = np.asarray(self.categories)
if is_datetime64_dtype(self.categories):
values = ensure_wrapped_if_datetimelike(np.asarray(self.categories))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.categories._values should be what you want here. thats probably also an improvement if is_datetime64tz_dtype(self.categories.dtype)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tz aware datetimes are alreday handled correctly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking it might be a perf improvement for dt64tz, but either way its fine to consider that out of scope for this PR

@phofl
Copy link
Member Author

phofl commented Dec 17, 2021

So I think I figured the issue out now.

Categorical.astype was incorrect for object dtype as you suggested. But this does not directly help us in the to_csv case, becaause this has to respect a date_format in to_native_types. We have to convert this to a DatetimeArray before doing the actual conversion.

@phofl phofl marked this pull request as ready for review December 17, 2021 14:34
@jreback jreback added this to the 1.4 milestone Dec 17, 2021
@jreback jreback merged commit 079289c into pandas-dev:master Dec 22, 2021
@jreback
Copy link
Contributor

jreback commented Dec 22, 2021

thanks @phofl

@phofl phofl deleted the 40754 branch December 22, 2021 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type IO CSV read_csv, to_csv
Projects
None yet
3 participants