Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: "index_col=False" not working when "usecols" is specified in read_csv #9082

Closed
jseabold opened this issue Dec 15, 2014 · 4 comments
Closed
Labels
Bug IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jseabold
Copy link
Contributor

jseabold commented Dec 15, 2014

Wes' old blog post indicates that you can read the malformed FEC data by passing index_col=False, the docstring for read_csv seems to also say this. It doesn't look like this works anymore?

[9]: cat test.csv
cmte_id,cand_id,cand_nm,contbr_nm,contbr_city,contbr_st,contbr_zip,contbr_employer,contbr_occupation,contb_receipt_amt,contb_receipt_dt,receipt_desc,memo_cd,memo_text,form_tp,file_num,tran_id,election_tp
C00410118,"P20002978","Bachmann, Michele","HARVEY, WILLIAM","MOBILE","AL","366010290","RETIRED","RETIRED",250,20-JUN-11,"","","","SA17A","736166","A1FDABC23D2D545A1B83","P2012",
C00410118,"P20002978","Bachmann, Michele","HARVEY, WILLIAM","MOBILE","AL","366010290","RETIRED","RETIRED",50,23-JUN-11,"","","","SA17A","736166","A899B9B0E223743EFA63","P2012",

[10]: cols = ['cand_nm', 'contbr_st', 'contbr_employer', 'contb_receipt_amt', 
              'contbr_occupation', 'contb_receipt_amt', 'contb_receipt_dt']


# raises an error
pd.read_csv("test.csv", usecols=cols, index_col=False)

# gets "incorrect" columns
pd.read_csv("test.csv", usecols=cols) 
@jorisvandenbossche
Copy link
Member

A small example that is a bit easier to follow what happens:

s = """a,b,c,d
1,2,3,4
5,6,7,8"""

# with extra delimiters at the end of each row with data
s_malformed = """a,b,c,d
1,2,3,4,
5,6,7,8,"""
In [19]: cols = ['a','c','d']

In [20]: pd.read_csv(StringIO(s), usecols=cols)
Out[20]:
   a  c  d
0  1  3  4
1  5  7  8

In [21]: pd.read_csv(StringIO(s_malformed), usecols=cols)
Out[21]:
   a  c   d
1  2  4 NaN
5  6  8 NaN

In [22]: pd.read_csv(StringIO(s_malformed), usecols=cols, index_col=False)
...
TypeError: 'bool' object has no attribute '__getitem__'

To quote Wes in the blogpost:

pandas’s file parsers by default will treat the first column as the DataFrame’s row names if the data have 1 too many columns, which is very useful in a lot of cases. Not so much here. So I made it so you can indicate index_col=False which results on the last column being dropped as desired.

and the read_csv docstring on index_col:

If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to not use the first column as the index (row names)

Probably this wasn't tested for

Without using usecols, it does still work as described:

In [22]: pd.read_csv(StringIO(s))
Out[22]:
   a  b  c  d
0  1  2  3  4
1  5  6  7  8

In [23]: pd.read_csv(StringIO(s_malformed))
Out[23]:
   a  b  c   d
1  2  3  4 NaN
5  6  7  8 NaN

In [24]: pd.read_csv(StringIO(s_malformed), index_col=False)
Out[24]:
   a  b  c  d
0  1  2  3  4
1  5  6  7  8

@jorisvandenbossche jorisvandenbossche added IO CSV read_csv, to_csv Bug Regression Functionality that used to work in a prior pandas version labels Dec 15, 2014
@jorisvandenbossche jorisvandenbossche added this to the 0.16.0 milestone Dec 15, 2014
@jorisvandenbossche jorisvandenbossche changed the title usecols + index_col not working as documented in read_csv BUG: "index_col=False" not working when "usecols" is specified in read_csv Dec 17, 2014
@awhan
Copy link

awhan commented Dec 17, 2014

#9098 (similar issue)

@jorisvandenbossche
Copy link
Member

cc @mdmueller @selasley if you are interested

@jreback
Copy link
Contributor

jreback commented Jan 10, 2015

closed by #9176

@jreback jreback closed this as completed Jan 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants