Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/COMPAT: assert_* functions failing with nested arrays and latest numpy #50360

Open
jorisvandenbossche opened this issue Dec 20, 2022 · 3 comments
Labels
Compat pandas objects compatability with Numpy or Python functions Testing pandas testing functions or related to the test suite

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 20, 2022

When using the latest nightly numpy (1.25.0.dev0+..), we are getting some errors in the pyarrow test suite, that come from pandas functions (assert_series_equal, -> array_equivalent). Those errors don't happen with numpy 1.24.

A few examples that we encountered in the pyarrow tests seem all to boil down to the situation where you have multiple levels of nesting (an array of arrays of arrays), but where the one is a numpy array of nested numpy arrays, and the other is a numpy array of nested lists.

import pandas._testing as tm

# one level of nesting -> ok
s1 = pd.Series([np.array([1, 2, 3]), np.array([4, 5])], dtype=object)
s2 = pd.Series([[1, 2, 3], [4, 5]], dtype=object)

>>> tm.assert_series_equal(s1, s2)

# >1 level of nesting -> fails
s1 = pd.Series([np.array([[1, 2, 3], [4, 5]], dtype=object), np.array([[6], [7, 8]], dtype=object)], dtype=object)
s2 = pd.Series([[[1, 2, 3], [4, 5]], [[6], [7, 8]]], dtype=object)

>>> tm.assert_series_equal(s1, s2)
...
File ~/scipy/repos/pandas-build-arrow/pandas/core/dtypes/missing.py:581, in _array_equivalent_object(left, right, strict_nan)
    579 else:
    580     try:
--> 581         if np.any(np.asarray(left_value != right_value)):
    582             return False
    583     except TypeError as err:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Another example, that is essentially the same situation, but that gives a different error message (from numpy):

# construct inner array with explicitly setting up the shape (otherwise numpy will infer it as 2D)
arr = np.empty(2, dtype=object)
arr[:] = [np.array([None, 'b'], dtype=object), np.array(['c', 'd'], dtype=object)]
s1 = pd.Series(np.array([arr, None], dtype=object))
s2 = pd.Series(np.array([list([[None, 'b'], ['c', 'd']]), None], dtype=object))

In [32]: tm.assert_series_equal(s1, s2)
...
File ~/scipy/repos/pandas-build-arrow/pandas/core/dtypes/missing.py:581, in _array_equivalent_object(left, right, strict_nan)
    579 else:
    580     try:
--> 581         if np.any(np.asarray(left_value != right_value)):
    582             return False
    583     except TypeError as err:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Another case with nesting of dictionaries (with the same error traceback):

s1 = pd.Series([{'f1': 1, 'f2': np.array(['a', 'b'], dtype=object)}], dtype=object)
s2 = pd.Series([{'f1': 1, 'f2': ['a', 'b']}], dtype=object)

>>> tm.assert_series_equal(s1, s2)
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
@jorisvandenbossche jorisvandenbossche added Compat pandas objects compatability with Numpy or Python functions Testing pandas testing functions or related to the test suite labels Dec 20, 2022
@jorisvandenbossche jorisvandenbossche added this to the 1.5.3 milestone Dec 22, 2022
jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue Dec 22, 2022
@jorisvandenbossche
Copy link
Member Author

I suppose this is related to https://numpy.org/devdocs/release/1.25.0-notes.html#and-warnings-finalized (numpy/numpy#22707

Previously, such comparisons (incorrectly) returned a scalar bool:

>>> np.array([np.array(['a', 'b']), np.array(['c'])], dtype=object) != [['a', 'b'], ['c']]
<stdin>:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
True

and now with that deprecation enforced, this raises an error:

>>> np.array([np.array(['a', 'b']), np.array(['c'])], dtype=object) != [['a', 'b'], ['c']]
...
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

>>> np.array([np.array(['a', 'b']), np.array(['c'])], dtype=object) != np.array([np.array(['a', 'b'], dtype=object), np.array(['c'], dtype=object)], dtype=object)
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this issue Jan 3, 2023
@datapythonista datapythonista modified the milestones: 1.5.3, 1.5.4 Jan 18, 2023
@datapythonista datapythonista modified the milestones: 1.5.4, 2.0 Feb 27, 2023
@JonasKlotz
Copy link

It also fails for unit tests:

import pandas as pd
import unittest


class list_test(unittest.TestCase):
    def test_list_test(self):
        list_of_lists = [[1,2,3], [4,5], [6]]
        list_series = pd.Series(list_of_lists)
        self.assertEqual(list_series.array, list_of_lists)

This results in True with:
python 3.9.12, pandas 1.4.2, numpy 1.22.3

And fails with a ValueError with:
python 3.9.5, pandas 1.5.2, numpy 1.24.1
python 3.9.5, pandas 1.5.3, numpy 1.24.1

The error
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.
The error happens in rvalues = np.asarray(rvalues) in /pandas/core/ops/array_ops.py and a possible fixing idea would be to check if rvalues contains lists and use rvalues = np.asarray(rvalues, dtype='object') instead.

@MarcoGorelli
Copy link
Member

removing this from the 2.0 milestone

@MarcoGorelli MarcoGorelli removed this from the 2.0 milestone Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

4 participants