Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DailyNewsTZ and HabariLeo #669

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions docs/supported_publishers.md
Original file line number Diff line number Diff line change
Expand Up @@ -1398,6 +1398,57 @@
</table>


## TZ-Publishers

<table class="publishers tz">
<thead>
<tr>
<th>Class&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;</th>
<th>Name&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;</th>
<th>URL&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;</th>
<th>Missing&#160;Attributes</th>
<th>Additional&#160;Attributes&#160;&#160;&#160;&#160;</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<code>DailyNewsTZ</code>
</td>
<td>
<div>Daily News (Tanzania)</div>
</td>
<td>
<a href="https://www.dailynews.co.tz/">
<span>www.dailynews.co.tz</span>
</a>
</td>
<td>
<code>topics</code>
</td>
<td>&#160;</td>
</tr>
<tr>
<td>
<code>HabariLeo</code>
</td>
<td>
<div>Habari Leo</div>
</td>
<td>
<a href="https://www.habarileo.co.tz/">
<span>www.habarileo.co.tz</span>
</a>
</td>
<td>
<code>topics</code>
</td>
<td>&#160;</td>
</tr>
</tbody>
</table>


## UK-Publishers

<table class="publishers uk">
Expand Down
2 changes: 2 additions & 0 deletions src/fundus/publishers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from fundus.publishers.na import NA
from fundus.publishers.no import NO
from fundus.publishers.tr import TR
from fundus.publishers.tz import TZ
from fundus.publishers.uk import UK
from fundus.publishers.us import US

Expand Down Expand Up @@ -69,3 +70,4 @@ class PublisherCollection(metaclass=PublisherCollectionMeta):
ca = CA
es = ES
jp = JP
tz = TZ
32 changes: 32 additions & 0 deletions src/fundus/publishers/tz/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
from fundus.publishers.base_objects import Publisher, PublisherGroup
from fundus.scraping.filter import inverse, regex_filter
from fundus.scraping.url import NewsMap, RSSFeed, Sitemap

from .daily_news_tz import DailyNewsTZParser


class TZ(metaclass=PublisherGroup):
DailyNewsTZ = Publisher(
name="Daily News (Tanzania)",
domain="https://www.dailynews.co.tz/",
parser=DailyNewsTZParser,
sources=[
Sitemap(
"https://www.dailynews.co.tz/sitemap_index.xml",
sitemap_filter=inverse(regex_filter("post-sitemap")),
reverse=True,
),
],
)
HabariLeo = Publisher(
name="Habari Leo",
domain="https://www.habarileo.co.tz/",
parser=DailyNewsTZParser,
sources=[
Sitemap(
"https://www.habarileo.co.tz/sitemap_index.xml",
sitemap_filter=inverse(regex_filter("post-sitemap")),
reverse=True,
),
],
)
37 changes: 37 additions & 0 deletions src/fundus/publishers/tz/daily_news_tz.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import re
from datetime import datetime
from typing import List, Optional

from lxml.cssselect import CSSSelector
from lxml.etree import XPath

from fundus.parser import ArticleBody, BaseParser, ParserProxy, attribute, utility


class DailyNewsTZParser(ParserProxy):
class V1(BaseParser):
_summary_selector = CSSSelector("div.cs-entry__subtitle")
_subheadline_selector = XPath("//div[@class='entry-content']//p[not(text() or position()=1)]//span//strong")
_paragraph_selector = XPath("//div[@class='entry-content']//p[text() or position()=1]")

@attribute
def title(self) -> Optional[str]:
return re.sub(r"(?i)\s*-\s*(daily\s*news|habari\s*leo)\s*", "", self.precomputed.meta.get("og:title") or "")

@attribute
def body(self) -> Optional[ArticleBody]:
article_body = utility.extract_article_body_with_selector(
self.precomputed.doc,
summary_selector=self._summary_selector,
subheadline_selector=self._subheadline_selector,
paragraph_selector=self._paragraph_selector,
)
return article_body

@attribute
def authors(self) -> List[str]:
return utility.generic_author_parsing(self.precomputed.meta.get("twitter:data1"))

@attribute
def publishing_date(self) -> Optional[datetime]:
return utility.generic_date_parsing(self.precomputed.ld.bf_search("datePublished"))
79 changes: 79 additions & 0 deletions tests/resources/parser/test_data/tz/DailyNewsTZ.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
{
"V1": {
"authors": [
"Sauli Giliard"
],
"body": {
"summary": [],
"sections": [
{
"headline": [],
"paragraphs": [
"LINDI: IT’s around 8:00pm but Aisha Mohamed is still busy with her business close to Lindi Bus Stand’s exit door.",
"The mother of three says she is confident of doing business until late hours, thanks to the solar-powered streetlights installed in Lindi town and public areas. Advertisement",
"ALSO READ: Solar firm to boost clean energy uptake"
]
},
{
"headline": [
"Lighting businesses"
],
"paragraphs": [
"According to Amina Salum, 34, after the 2016/2017 fiscal year, small women traders’ businesses not only started growing steadily but also, they gained confidence in trading during later hours. “It is because most of us, women, feel that nothing can harm us in places where there is light,” says the mother of four who vends food in nearly 500 metres away from Lindi Bus Terminal.",
"Counting her fingers, Amina says now she can create a profit of 10,000/- to 15,000/- per day. She reveals that the key factor for the success is that her business is “visible” by the courtesy of light from solar powered lights in her locality."
]
},
{
"headline": [
"Affordability of solar power"
],
"paragraphs": [
"In the financial year of 2016/17, the Lindi Municipal Council decided, among other issues, to install solar powered lights in streets.",
"The reasons are not only to turn the old town look attractive but also to boost security and improve the working environment especially for the people who work or trade in the late hours. The Municipal’s economist Jeremiah Mbelu reveals that the council agreed to install 155 poles of solar lights.",
"“Each pole cost 4m/-,” says Jeremiah who is working in the Municipal’s planning office. This means that the first phase of the project cost 620m/-.",
"He adds that the next phase will be accomplished immediately after the ongoing projects construction of the street roads of Lindi town.",
"Asked why the Municipal decided to spend more than 600m/- per single project, a bit higher than gas and hydropower, the economist replies, “Yes, it is expensive to install solar powered lights in the streets. The initial cost is too high” but “we don’t have another cost after installation. No electricity bills because we use solar as a source of energy.”",
"Before this project which covers distance of 6.2km, the Lindi Municipal Director, Jomaary Satura says, there were four streetlights powered by electricity around Bus Stand and Sokoine Street. “At the end of the month, the bill was 4m/- entailing that each streetlight was ‘eating’ an average of 1m/- per month,” he adds.",
"After the efficiency of the solar powered lights in Lindi town, Regional Commissioner of Lindi, Godfrey Zambi says the programme will be implemented in all towns. Zambi who doubles as region’s security committee chairman believes that the Lindi town is safe from hooligans’ acts because they can no longer hide in darkness as they used to.",
"The regional government is opting for solar powered light over other sources as Energy Access Situation Report, 2016 Tanzania Mainland report indicates solar is second reliable source of energy for lighting.",
"While the 2012 Population and Housing Census report indicates that Lindi has a population of 864,652, the Mainland’s energy report reveals that 14.8 percent of residents are relying on solar as a source of energy for lighting. Apart from solar, the report adds, other sources of energy for lighting used by Lindi residents are Electricity (5.4%), generator (0%), kerosene (12.8%), candle (2.6%), rechargeable (63.6%), firewood (1.5%) and charcoal (0.5%)."
]
},
{
"headline": [
"Sustainable Development Goals 7, 11"
],
"paragraphs": [
"In lighting the town through solar energy, Lindi municipality is implementing two Sustainable Development Goals (SDGs) at ago. These are SDGs 7 and 11 which focus on affordable and clean energy and Sustainable Cities and Communities respectively.",
"According to SDG 11, half of humanity (3.5 billion people), including Amina and Mariam, lives in cities/town today and 5 billion people are projected to live in cities by 2030. Lindi traders are now thriving, growing and eliminating poverty in the safe environment as Tanzania is eyeing to join the club of the middle-income countries.",
"The world’s cities occupy just 3 per cent of the Earth’s land, but account for 60-80 per cent of energy consumption and 75 per cent of carbon emissions, turning them not better place to live and work. But the Lindi Municipality has opted renewable energy to make the fastest growing town not only a safe place to work but also reducing bills, maintenance cost while conserving environment through usage of renewable energy."
]
},
{
"headline": [
"Safer Cities for inclusive development"
],
"paragraphs": [
"As SGD 7 predicts that half of population of the world will be in cities and town by 2030, researcher Dr Kalpana Viswanath writes in an article published in UN Habitat website titled ‘Creating Engagement in Public Spaces for Safer Cities for Women’ that “Women face the fear of sexual violence as a constant threat to their ability to move around, to work and their general well-being.”",
"While the researcher is calling for affirmative actions to make cities and towns better place for women to live, Mr Deogratius Temba, who is working with Tanzania Gender Networking Programme (TGNP) as Programme Officer, Mobilization and Outreach, says lights in town or cities is a solution for minimizing crimes against girls and women.",
"In the darkness, Mr Temba says, women can be harassed, attacked by robbers or raped and suppose they report such issues to police, no clear evidence can be presented because, “they can’t mark them in darkness.”",
"He adds, when towns, like Lindi, install light infrastructures, act as security guard because there is evidence of low level of attacks to women and girls in places where there is light. Another advantage, according to Temba, is that female students can walk freely in areas with where there is light without be harmed.",
"He adds some students are being forced to walk long distance to their respective schools. He says due to distance, they are being forced to go to school early and return home. So, if they pass in areas where there is light, he adds, they are safe."
]
},
{
"headline": [
"Street lighting"
],
"paragraphs": [
"According to study conducted by Teri Allery of North Dakota State University and his fellows titled ‘Solar Street Lighting: Using Renewable Energy for Safety for the Turtle Mountain Band of Chippewa’ in 2018, street lighting has been around “since humans began living together.”",
"In their study, researchers explain different ways used to light the streets as early as 500 BC saying “the ancient Romans used oil lamps filled with vegetable oil in front of their homes. In 1802, William Murdock used a gas light fuelled with coal gas.”",
"Although Lindi has various sources of power, including gas, for the case of sustainability, the municipal has changed from traditional way of lighting its streets to embark on renewable energy."
]
}
]
},
"publishing_date": "2019-04-24 06:59:25+00:00",
"title": "Solar-powered streetlights boost Lindi women’s businesses"
}
}
Binary file not shown.
27 changes: 27 additions & 0 deletions tests/resources/parser/test_data/tz/HabariLeo.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"V1": {
"authors": [
"Rahimu Fadhili"
],
"body": {
"summary": [],
"sections": [
{
"headline": [],
"paragraphs": [
"JESHI la Polisi mkoani Shinyanga limefanikiwa kukamata watu 81 kwa kuhutumiwa makosa mbalimbali katika kipindi cha kuanzia mwezi Novemba hadi Desemba mwaka huu.",
"Kamanda wa Jeshi la Polisi mkoani hapa, Janeth Magomi amesema hayo leo mbele ya waandishi wa habari nakueleza kuanza oparesheni hiyo iliyokuwa na lengo la kubaini na kuzuia uhalifu .",
"Kamanda Magomi amesema watu hao licha ya kukamatwa kwa kutuhumiwa kwa makosa wapo waliokutwa na madini ya dhahabu bandia gramu 250, bangi gramu 9000, mirungi bunda tisa pamoja na pombe ya Moshi lita 113. Advertisement",
"Kamanda Magomi amesema polisi waliokuwa doria walifanikiwa kukamata silaha aina ya Gobore katika kijiji cha Bugomba A kata ya Ulewe Halmashauri ya Ushetu.",
"“Gobore hilo lilikuwa limepakizwa kwenye baiskeli na mtu ambaye alikuwa akiendesha alitelekeza nakufanikiwa kukimbia lakini wanaendelea kumtafuta mtuhumiwa,” amesema Magomi.",
"Kamanda Magomi alisema kesi 18 zimepata mafanikio ambapo watuhumiwa wanne wamefungwa maisha huku kesi moja ya ubakaji mshtakiwa alihukumiwa miaka 30 kwenda jela.",
"Kamanda Magomi amesema wamefanikiwa kukamata makosa 5,376 ya usalama barabarani ikiwa makosa ya kwenye magari ni 3,767 na makosa ya bajaji na pikipiki ni 1,609.",
"“Tumefanya mikutano 97 ya utoaji elimu juu ya kuzuia uhalifu na kuondoa ukatili kupitia vyombo vya habari tukiwataka wananchi kushirikiana na jeshi hili,” amesema Magomi"
]
}
]
},
"publishing_date": "2024-12-23 12:15:27+00:00",
"title": "Watu 18 wanaswa kwa makosa mbalimbali Shinyanga"
}
}
Binary file not shown.
10 changes: 10 additions & 0 deletions tests/resources/parser/test_data/tz/meta.info
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"DailyNewsTZ_2024_12_23.html.gz": {
"url": "https://dailynews.co.tz/solar-powered-streetlights-boost-lindi-womens-businesses/",
"crawl_date": "2024-12-23 14:07:12.655046"
},
"HabariLeo_2024_12_23.html.gz": {
"url": "https://habarileo.co.tz/watu-18-wanaswa-kwa-makosa-mbalimbali-shinyanga/",
"crawl_date": "2024-12-23 14:06:36.419551"
}
}
Loading