Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in position 20: ordinal not in range(128) #18

Open
marcoippolito opened this issue Feb 3, 2014 · 1 comment

Comments

@marcoippolito
Copy link

Hi,
when running this code on my Ubuntu 12.04 micro-instance:

!/usr/bin/python

import boilerpipe

from boilerpipe.extract import Extractor
extractor = Extractor(extractor='ArticleExtractor', url="http://europe.wsj.com/home-page")
extracted_text = extractor.getText()
print extracted_text
extracted_html = extractor.getHTML()

I get this error:
python boilerpipeTrial.py
Traceback (most recent call last):
File "boilerpipeTrial.py", line 9, in
print extracted_text
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in position 20: ordinal not in range(128)

where line 9 is: print extracted_text

Would please give me some hints on how to solve it?

Kind regards.
Marco

@marcoippolito
Copy link
Author

I solved this issue by adding:
extracted_text_u = extracted_text.encode('utf-8','replace')
print extracted_text_u

Any contraindications with this adding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant