Error when writing non-ascii allowed characters to Stata dta #7286

bquistorff · 2014-05-30T20:12:56Z

When trying to write a Stata dataset with strings containing upper latin-1 characters (which are allowed by the Stata format), I get an encoding error.

import pandas.io.stata as sta
sr = sta.StataReader('pandas/pandas/io/tests/data/stata1_encoding.dta')
df = sr.data()
sw = sta.StataWriter('stata1_encoding_dup.dta', df)
sw.write_file()

I get the following output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\io\stata.py", line 1242, in write_file
    self._write_data_nodates()
  File "C:\Python27\lib\site-packages\pandas\io\stata.py", line 1326, in _write_data_nodates
    self._write(var)
  File "C:\Python27\lib\site-packages\pandas\io\stata.py", line 1104, in _write
    self._file.write(to_write)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1:ordinal not in range(128)

Machine info:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: AMD64 Family 16 Model 6 Stepping 3, AuthenticAMD
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.14.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.3
bottleneck: 0.8.0
tables: None
numexpr: None
matplotlib: None
openpyxl: 1.8.6
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

The text was updated successfully, but these errors were encountered:

jreback · 2014-05-30T22:13:09Z

cc @bashtage

hmm. it seems it not being encoded to the passed encoding, odd (and is not tested).

the reading however works

FYI, you can/should use:

df = pd.read_stata(.......,encoding='latin-1')
df.to_stata(......,encoding='latin-1')
(thought this last seems to be a bug)

bashtage · 2014-06-13T22:31:14Z

A simple fix - the encoding was never used when writing a file.

jreback · 2014-06-13T22:38:42Z

assume that stata itself can read the encoded file?

iIIRC they always encode in cp1252 or something like that

bashtage · 2014-06-13T22:41:40Z

The original files opens fine in stata, so latin1 seems to be OK. Unicode isn't supported though.

On Jun 13, 2014 6:38 PM, jreback notifications@github.com wrote:

assume that stata itself can read the encoded file?

iIIRC they always encode in cp1252 or something like that

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/7286#issuecomment-46067490.

jreback added Bug labels May 30, 2014

jreback added this to the 0.14.1 milestone May 30, 2014

jreback added Stata and removed Data IO labels Jun 5, 2014

jreback closed this as completed in 904933a Jun 16, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when writing non-ascii allowed characters to Stata dta #7286

Error when writing non-ascii allowed characters to Stata dta #7286

bquistorff commented May 30, 2014

jreback commented May 30, 2014

bashtage commented Jun 13, 2014

jreback commented Jun 13, 2014

bashtage commented Jun 13, 2014

Error when writing non-ascii allowed characters to Stata dta #7286

Error when writing non-ascii allowed characters to Stata dta #7286

Comments

bquistorff commented May 30, 2014

jreback commented May 30, 2014

bashtage commented Jun 13, 2014

jreback commented Jun 13, 2014

bashtage commented Jun 13, 2014