Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawler for Valor Econômico and ZH #90

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

veniciusgrjr
Copy link

I fixed the problem related to line breaking, to the new style string formatting, to print statements and to unnecessary imports, as suggested for @flavioamieiro.
I also created a crawler for ZH.
Please, take a look at these codes.

… formatting, to print statements and to unnecessary imports, as suggested for @flavioamieiro. Please take a look at this code.
ARTICLES.insert(article, w=1)
logger.info("Saved")
else:
logger.info("It already exists")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@veniciusgrjr it is good style to end files with a new linecharacter. can you please fix this?
if you are curious why, read this: http://unix.stackexchange.com/questions/18743/whats-the-point-in-adding-a-new-line-to-the-end-of-a-file

@fccoelho
Copy link
Member

We need tests for these crawlers. @veniciusgrjr, are you familiar with unit testing in Python?

@fccoelho fccoelho closed this Sep 22, 2015
@fccoelho fccoelho reopened this Sep 22, 2015
soup = BeautifulSoup(index, "lxml")
news_index = soup.find(id="block-valor_capa_automatica-central_automatico").find_all('h2')
news_urls = news_urls + ['http://www.valor.com.br' + BeautifulSoup( art.encode('utf8') , "lxml" ).find('a').attrs['href'] for art in news_index]
return set(news_urls )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the sake of style I'd remove this space before the close parenthesis. (Also on the line, in the BeautifulSoup instantiation and in some other places bellow).

@flavioamieiro
Copy link
Member

I think the original issues are covered (except for adding the newline at the end of files, please do that).

I agree with @fccoelho that it would be great to have unit tests for these. If we can do that, we should.

@veniciusgrjr
Copy link
Author

I fixed de problems related to new linecharacter at the end of the files and to some unnecessary spaces.
I'm working on the unity tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants