Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: is ser[:2] with Int64Index positional or label-based #49612

Closed
jbrockmendel opened this issue Nov 10, 2022 · 9 comments · Fixed by #50283
Closed

BUG: is ser[:2] with Int64Index positional or label-based #49612

jbrockmendel opened this issue Nov 10, 2022 · 9 comments · Fixed by #50283
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Nov 10, 2022

ser = pd.Series([3, 4, 5])

>>> ser[:2]  # <- should this behave like loc or iloc?
>>> ser.loc[:2]
>>> ser.iloc[:2]

Focusing only on Integer-dtype indexes, i.e. Int64Index, UInt64Index, RangeIndex, and NumericIndex[inty].

ser[2] and ser[[2]] each treat the indexing as label-based instead of positional. But ser[:2] treats the slice as positional instead of label-based.

We've deprecated some of these cases in #45162, but as I go to enforce the deprecation I think I screwed up and issued the warning for too few cases. In particular, to keep the number of warnings down I excluded some cases

                elif start is None and stop is None:
                    # label-based vs positional is irrelevant
                    pass
                elif isinstance(self, ABCRangeIndex) and self._range == range(
                    len(self)
                ):
                    # In this case there is no difference between label-based
                    #  and positional, so nothing will change.
                    pass
                elif (
                    self.dtype.kind in ["i", "u"]
                    and self._is_strictly_monotonic_increasing
                    and len(self) > 0
                    and self[0] == 0
                    and self[-1] == len(self) - 1
                ):
                    # We are range-like, e.g. created with Index(np.arange(N))
                    pass

The elif start is None and stop is None case is correct, but the other two are wrong bc with .loc slicing is right-inclusive.

If I enforce the deprecation for these extra two cases, I get 555 test failures, so this isn't trivial.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 10, 2022
@jbrockmendel
Copy link
Member Author

@jreback

@jreback
Copy link
Contributor

jreback commented Nov 16, 2022

@jbrockmendel your logic seems correct here. agree that s[:2] should be treated as location and warn that it's not positional

@jbrockmendel
Copy link
Member Author

@mroeschke thoughts on what to do here?

@mroeschke
Copy link
Member

I know you wanted to focus on int indexes, but It would be nice is this was positional to align with Series(..., index=non_numeric)[:int] according to this example: https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#series-is-ndarray-like

Unless you had come across this and wanted to change this too

@jbrockmendel
Copy link
Member Author

IIUC what you're suggesting would make/keep int-slicing-with-intindex inconsistent with other int-indexing-with-intindex, but possibly in a way that has fewer corner cases than the status quo?

(Long-term (i.e. 3.0) im increasingly thinking we should just deprecate Series.__getitem__ and Series.__setitem__ entirely and tell users to just use loc/iloc, so there's never any ambiguity.)

@mroeschke
Copy link
Member

mroeschke commented Nov 21, 2022

IIUC what you're suggesting would make/keep int-slicing-with-intindex inconsistent with other int-indexing-with-intindex, but possibly in a way that has fewer corner cases than the status quo?

True, int-slicing and int-indexing would diverge in this case for this index type, but int-slicing would be consistent regardless of index type if int-slicing was always positional. I guess there are multiple ways to cut the consistency pie (by indexer, Index type, or both), and I haven't been keeping up well in which direction we should be/are headed so I don't have super strong opinions on what feels more right.

(Long-term (i.e. 3.0) im increasingly thinking we should just deprecate Series.getitem and Series.setitem entirely and tell users to just use loc/iloc, so there's never any ambiguity.)

+10.

@jbrockmendel
Copy link
Member Author

So I guess for 2.0 we just revert the deprecation?

@mroeschke
Copy link
Member

Probably good to get more thoughts from others who have looked closer at indexing behavior @phofl @jorisvandenbossche

@jbrockmendel
Copy link
Member Author

Actually just reverting wouldn't quite get to the consistent-ish behavior you suggested. we'd also need to change the behavior with Float64Index, which ATM always treats slicing as label-based. Doing that breaks 5 tests locally, though only one directly testing this behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
3 participants