-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crawler for Valor Econômico and ZH #90
base: master
Are you sure you want to change the base?
Conversation
… formatting, to print statements and to unnecessary imports, as suggested for @flavioamieiro. Please take a look at this code.
ARTICLES.insert(article, w=1) | ||
logger.info("Saved") | ||
else: | ||
logger.info("It already exists") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@veniciusgrjr it is good style to end files with a new linecharacter. can you please fix this?
if you are curious why, read this: http://unix.stackexchange.com/questions/18743/whats-the-point-in-adding-a-new-line-to-the-end-of-a-file
We need tests for these crawlers. @veniciusgrjr, are you familiar with unit testing in Python? |
soup = BeautifulSoup(index, "lxml") | ||
news_index = soup.find(id="block-valor_capa_automatica-central_automatico").find_all('h2') | ||
news_urls = news_urls + ['http://www.valor.com.br' + BeautifulSoup( art.encode('utf8') , "lxml" ).find('a').attrs['href'] for art in news_index] | ||
return set(news_urls ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the sake of style I'd remove this space before the close parenthesis. (Also on the line, in the BeautifulSoup instantiation and in some other places bellow).
I think the original issues are covered (except for adding the newline at the end of files, please do that). I agree with @fccoelho that it would be great to have unit tests for these. If we can do that, we should. |
…les and to some unnecessary spaces.
I fixed de problems related to new linecharacter at the end of the files and to some unnecessary spaces. |
I fixed the problem related to line breaking, to the new style string formatting, to print statements and to unnecessary imports, as suggested for @flavioamieiro.
I also created a crawler for ZH.
Please, take a look at these codes.