Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Make corrwith ignore string columns when finding correlation with a Series #18570

Closed
tdpetrou opened this issue Nov 30, 2017 · 2 comments · Fixed by #18651
Closed

ENH: Make corrwith ignore string columns when finding correlation with a Series #18570

tdpetrou opened this issue Nov 30, 2017 · 2 comments · Fixed by #18651
Labels
Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@tdpetrou
Copy link
Contributor

Code Sample, a copy-pastable example if possible

>>> df = pd.DataFrame({'a':np.random.rand(5), 
                   'b':np.random.rand(5),
                  'string_col':'some string'})
>>> df

          a         b   string_col
0  0.376004  0.761471  some string
1  0.402352  0.865937  some string
2  0.450365  0.715527  some string
3  0.445317  0.017645  some string
4  0.687363  0.903298  some string

>>> s = pd.Series(np.random.rand(100))

>>> df.corrwith(s)
TypeError: ("unsupported operand type(s) for /: 'str' and 'int'", 'occurred at index string_col')

Problem description

Pandas should silently drop the string columns. For now, you must do this:

Expected Output

>>> df.select_dtypes('number').corrwith(s)
a    0.161006
b   -0.000233
dtype: float64
@gfyoung
Copy link
Member

gfyoung commented Dec 1, 2017

@tdpetrou : Thanks for reporting this! While your workaround isn't that hard to do, I don't see why we shouldn't just drop columns that have non-numeric dtypes. We do that for DataFrame.describe.

@jreback @jorisvandenbossche : Thoughts?

@jreback
Copy link
Contributor

jreback commented Dec 2, 2017

sure this could be done, .corrwith is a numeric only operation. Would take a community pull request.

@jreback jreback added Difficulty Novice Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations and removed API Design labels Dec 2, 2017
@jreback jreback added this to the Next Major Release milestone Dec 2, 2017
@jreback jreback modified the milestones: Next Major Release, 0.21.1, 0.22.0 Dec 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants