Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewheiss committed Dec 27, 2018
0 parents commit 684f088
Show file tree
Hide file tree
Showing 25 changed files with 616 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
^.*\.Rproj$
^\.Rproj\.user$
^CONDUCT\.md$
^data-raw$
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.Rproj.user
.Rhistory
.RData
.Ruserdata
25 changes: 25 additions & 0 deletions CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Contributor Code of Conduct

As contributors and maintainers of this project, we pledge to respect all people who
contribute through reporting issues, posting feature requests, updating documentation,
submitting pull requests or patches, and other activities.

We are committed to making participation in this project a harassment-free experience for
everyone, regardless of level of experience, gender, gender identity and expression,
sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.

Examples of unacceptable behavior by participants include the use of sexual language or
imagery, derogatory comments or personal attacks, trolling, public or private harassment,
insults, or other unprofessional conduct.

Project maintainers have the right and responsibility to remove, edit, or reject comments,
commits, code, wiki edits, issues, and other contributions that are not aligned to this
Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed
from the project team.

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by
opening an issue or contacting one or more of the project maintainers.

This Code of Conduct is adapted from the Contributor Covenant
(http:contributor-covenant.org), version 1.0.0, available at
http://contributor-covenant.org/version/1/0/0/
18 changes: 18 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Package: quRan
Type: Package
Title: Complete text of the Qur'an
Version: 0.1.0
Authors@R: person("Andrew", "Heiss", email = "[email protected]", role = c("aut", "cre"))
Description: Full text, in data frames containing one row per verse, of the
Qur'an in Arabic (with and without vowels) and in English (the Yusuf Ali
and Saheeh International translations), formatted to be convenient for
text analysis.
URL: https://github.com/andrewheiss/quRan
BugReports: https://github.com/andrewheiss/quRan/issues
Depends: R (>= 3.0.0)
Suggests: dplyr,
testthat
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
51 changes: 51 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
MIT License

Copyright (c) 2018 Andrew Heiss

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

--------------------------------------------------------------------------------

Tanzil Quran License

- This quran text is distributed under the terms of a Creative Commons
Attribution 3.0 License.

- Permission is granted to copy and distribute verbatim copies of this text,
but CHANGING IT IS NOT ALLOWED.

- This quran text can be used in any website or application, provided its source
(Tanzil.net) is clearly indicated, and a link is made to http://tanzil.net to
enable users to keep track of changes.

- This copyright notice shall be included in all verbatim copies of the text,
and shall be reproduced appropriately in all files derived from or containing
substantial portion of this text.

--------------------------------------------------------------------------------

Text taken from AlQuran.cloud and the Islamic Network is open source

https://alquran.cloud/terms-and-conditions

--------------------------------------------------------------------------------

Saheeh International Translation provided by the Qur'an Project with no rights reserved

See https://archive.org/stream/QuranTranslationBySaheehInternational/Quran%20-%20Translation%20by%20Saheeh%20International_djvu.txt
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Generated by roxygen2: do not edit by hand

3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# quRan 0.1.0

- Initial release of full text of the Qur'an in Arabic and English
133 changes: 133 additions & 0 deletions R/quran.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
#' The Qur'an
#'
#' This package contains the complete text of the Qur'an in Arabic (with and
#' without vowels) and in English (the Yusuf Ali and Saheeh International
#' translations), formatted to be convenient for text analysis.
#' @docType package
#' @name quRan
#' @aliases quRan quRan-package
NULL

#' The text of the Qur'an (Arabic, all vowels)
#'
#' A dataset containing the Arabic text of the Qur'an, with vowels.
#'
#' Because Surahs 89 and 113 are both translated as "The Dawn," Surah 113 has
#' been retitled "The Daybreak"
#'
#' @source Tanzil (\url{http://tanzil.net/docs/download}) and Al Quran Cloud (\url{https://alquran.cloud/})
#' @format A data frame with 6236 rows and 18 columns:
#' \describe{
#' \item{\code{surah_id}}{Unique id number for the surah}
#' \item{\code{ayah_id}}{Unique id number for the ayah}
#' \item{\code{surah_title_ar}}{Name of the surah (Arabic)}
#' \item{\code{surah_title_en}}{Name of the surah (English; transliterated)}
#' \item{\code{surah_title_en_trans}}{Name of the surah (English; translated)}
#' \item{\code{revelation_type}}{Type of the surah (Meccan or Medinan)}
#' \item{\code{text}}{Ayah text}
#' \item{\code{surah}}{Surah}
#' \item{\code{ayah}}{Ayah}
#' \item{\code{ayah_title}}{Combined surah and ayah (e.g. 2:242)}
#' \item{\code{juz}}{Juz'}
#' \item{\code{manzil}}{Manzil}
#' \item{\code{page}}{Page number}
#' \item{\code{hizb_quarter}}{Maqra}
#' \item{\code{sajda}}{Binary indicator for presence of a sajdah}
#' \item{\code{sajda_id}}{Unique id number for the sajdah}
#' \item{\code{sajda_recommended}}{Binary indicator for whether a sajdah is recommended}
#' \item{\code{sajda_obligatory}}{Binary indicator for whether a sajdah is obligatory}
#' }
"quran_ar"

#' The text of the Qur'an (Arabic, no vowels)
#'
#' A dataset containing the Arabic text of the Qur'an, without vowels.
#'
#' Because Surahs 89 and 113 are both translated as "The Dawn," Surah 113 has
#' been retitled "The Daybreak"
#'
#' @source Tanzil (\url{http://tanzil.net/docs/download}) and Al Quran Cloud (\url{https://alquran.cloud/})
#' @format A data frame with 6236 rows and 18 columns:
#' \describe{
#' \item{\code{surah_id}}{Unique id number for the surah}
#' \item{\code{ayah_id}}{Unique id number for the ayah}
#' \item{\code{surah_title_ar}}{Name of the surah (Arabic)}
#' \item{\code{surah_title_en}}{Name of the surah (English; transliterated)}
#' \item{\code{surah_title_en_trans}}{Name of the surah (English; translated)}
#' \item{\code{revelation_type}}{Type of the surah (Meccan or Medinan)}
#' \item{\code{text}}{Ayah text}
#' \item{\code{surah}}{Surah}
#' \item{\code{ayah}}{Ayah}
#' \item{\code{ayah_title}}{Combined surah and ayah (e.g. 2:242)}
#' \item{\code{juz}}{Juz'}
#' \item{\code{manzil}}{Manzil}
#' \item{\code{page}}{Page number}
#' \item{\code{hizb_quarter}}{Maqra}
#' \item{\code{sajda}}{Binary indicator for presence of a sajdah}
#' \item{\code{sajda_id}}{Unique id number for the sajdah}
#' \item{\code{sajda_recommended}}{Binary indicator for whether a sajdah is recommended}
#' \item{\code{sajda_obligatory}}{Binary indicator for whether a sajdah is obligatory}
#' }
"quran_ar_min"

#' The Yusuf Ali translation of the Qur'an (English)
#'
#' A dataset containing the English text of the Yusuf Ali translation of the Qur'an.
#'
#' Because Surahs 89 and 113 are both translated as "The Dawn," Surah 113 has
#' been retitled "The Daybreak"
#'
#' @source Tanzil (\url{http://tanzil.net/docs/download}) and Al Quran Cloud (\url{https://alquran.cloud/})
#' @format A data frame with 6236 rows and 18 columns:
#' \describe{
#' \item{\code{surah_id}}{Unique id number for the surah}
#' \item{\code{ayah_id}}{Unique id number for the ayah}
#' \item{\code{surah_title_ar}}{Name of the surah (Arabic)}
#' \item{\code{surah_title_en}}{Name of the surah (English; transliterated)}
#' \item{\code{surah_title_en_trans}}{Name of the surah (English; translated)}
#' \item{\code{revelation_type}}{Type of the surah (Meccan or Medinan)}
#' \item{\code{text}}{Ayah text}
#' \item{\code{surah}}{Surah}
#' \item{\code{ayah}}{Ayah}
#' \item{\code{ayah_title}}{Combined surah and ayah (e.g. 2:242)}
#' \item{\code{juz}}{Juz'}
#' \item{\code{manzil}}{Manzil}
#' \item{\code{page}}{Page number}
#' \item{\code{hizb_quarter}}{Maqra}
#' \item{\code{sajda}}{Binary indicator for presence of a sajdah}
#' \item{\code{sajda_id}}{Unique id number for the sajdah}
#' \item{\code{sajda_recommended}}{Binary indicator for whether a sajdah is recommended}
#' \item{\code{sajda_obligatory}}{Binary indicator for whether a sajdah is obligatory}
#' }
"quran_en_yusufali"

#' The Saheeh International translation of the Qur'an (English)
#'
#' A dataset containing the English text of the Saheeh International translation of the Qur'an.
#'
#' Because Surahs 89 and 113 are both translated as "The Dawn," Surah 113 has
#' been retitled "The Daybreak"
#'
#' @source Tanzil (\url{http://tanzil.net/docs/download}) and Al Quran Cloud (\url{https://alquran.cloud/})
#' @format A data frame with 6236 rows and 18 columns:
#' \describe{
#' \item{\code{surah_id}}{Unique id number for the surah}
#' \item{\code{ayah_id}}{Unique id number for the ayah}
#' \item{\code{surah_title_ar}}{Name of the surah (Arabic)}
#' \item{\code{surah_title_en}}{Name of the surah (English; transliterated)}
#' \item{\code{surah_title_en_trans}}{Name of the surah (English; translated)}
#' \item{\code{revelation_type}}{Type of the surah (Meccan or Medinan)}
#' \item{\code{text}}{Ayah text}
#' \item{\code{surah}}{Surah}
#' \item{\code{ayah}}{Ayah}
#' \item{\code{ayah_title}}{Combined surah and ayah (e.g. 2:242)}
#' \item{\code{juz}}{Juz'}
#' \item{\code{manzil}}{Manzil}
#' \item{\code{page}}{Page number}
#' \item{\code{hizb_quarter}}{Maqra}
#' \item{\code{sajda}}{Binary indicator for presence of a sajdah}
#' \item{\code{sajda_id}}{Unique id number for the sajdah}
#' \item{\code{sajda_recommended}}{Binary indicator for whether a sajdah is recommended}
#' \item{\code{sajda_obligatory}}{Binary indicator for whether a sajdah is obligatory}
#' }
"quran_en_sahih"
143 changes: 143 additions & 0 deletions data-raw/clean_data.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
library(dplyr)
library(tidyr)
library(stringr)
library(purrr)
library(jsonlite)


# Download raw data from alquran.cloud ------------------------------------

# Arabic, all vowels
download.file("http://api.alquran.cloud/quran/quran-simple",
"./data-raw/quran-simple.json")
quran_ar_raw <- read_json("./data-raw/quran-simple.json")

# Arabic, no vowels
download.file("http://api.alquran.cloud/quran/quran-simple-clean",
"./data-raw/quran-simple-clean.json")
quran_clean_ar_raw <- read_json("./data-raw/quran-simple-clean.json")

# English, Yusuf Ali
download.file("http://api.alquran.cloud/quran/en.yusufali",
"./data-raw/en.yusufali.json")
quran_en_yusufali_raw <- read_json("./data-raw/en.yusufali.json")

# English, Saheeh International
download.file("http://api.alquran.cloud/quran/en.sahih",
"./data-raw/en.sahih.json")
quran_en_sahih_raw <- read_json("./data-raw/en.sahih.json")



# Load and clean raw data -------------------------------------------------

# There are 15 sajdas, and each has additional metadata (like whether it is is
# recommended or obligatory) that is in a nested list. If we just use
# as_data_frame(), ayahs with sajdas turn into a three-row data frame that can't
# be bound with the sajda-free ayahs. So, we have to treat sajda ayahs a little
# differently, flattening them and renaming some of the columns
extract_ayahs <- function(x) {
n_sajda <- length(x$sajda)

if (n_sajda == 1) {
as_data_frame(x)
} else {
as_data_frame(purrr::flatten(x)) %>%
mutate(sajda = TRUE) %>%
rename(sajda_id = id, sajda_recommended = recommended,
sajda_obligatory = obligatory)
}
}

# Extract all the surah and ayah details
quran_ar <- data_frame(surah_id = 1:114) %>%
mutate(surah_details =
map(surah_id, ~ as_data_frame(quran_ar_raw$data$surahs[[.x]]))) %>%
unnest(surah_details) %>%
mutate(ayahs = ayahs %>% map(~ extract_ayahs(.x))) %>%
unnest(ayahs) %>%
mutate(ayah_title = paste0(surah_id, ":", numberInSurah),
surah = surah_id) %>%
select(surah_id, ayah_id = number1,
surah_title_ar = name, surah_title_en = englishName,
surah_title_en_trans = englishNameTranslation,
revelation_type = revelationType,
text, surah, ayah = numberInSurah, ayah_title,
juz, manzil, page, hizb_quarter = hizbQuarter,
sajda, sajda_id, sajda_recommended, sajda_obligatory) %>%
mutate(surah_title_ar = str_replace(surah_title_ar, "سورة ", "")) %>%
# There are two "The Dawns" so make 113 Al-Falaq be "The Daybreak"
mutate(surah_title_en_trans = ifelse(surah_id == 113, "The Daybreak", surah_title_en_trans)) %>%
mutate_at(vars(surah_title_ar, surah_title_en, surah_title_en_trans),
funs(forcats::fct_inorder))

quran_ar_min <- data_frame(surah_id = 1:114) %>%
mutate(surah_details =
map(surah_id, ~ as_data_frame(quran_clean_ar_raw$data$surahs[[.x]]))) %>%
unnest(surah_details) %>%
mutate(ayahs = ayahs %>% map(~ extract_ayahs(.x))) %>%
unnest(ayahs) %>%
mutate(ayah_title = paste0(surah_id, ":", numberInSurah),
surah = surah_id) %>%
select(surah_id, ayah_id = number1,
surah_title_ar = name, surah_title_en = englishName,
surah_title_en_trans = englishNameTranslation,
revelation_type = revelationType,
text, surah, ayah = numberInSurah, ayah_title,
juz, manzil, page, hizb_quarter = hizbQuarter,
sajda, sajda_id, sajda_recommended, sajda_obligatory) %>%
mutate(surah_title_ar = str_replace(surah_title_ar, "سورة ", "")) %>%
# There are two "The Dawns" so make 113 Al-Falaq be "The Daybreak"
mutate(surah_title_en_trans = ifelse(surah_id == 113, "The Daybreak", surah_title_en_trans)) %>%
mutate_at(vars(surah_title_ar, surah_title_en, surah_title_en_trans),
funs(forcats::fct_inorder))

quran_en_yusufali <- data_frame(surah_id = 1:114) %>%
mutate(surah_details =
map(surah_id, ~ as_data_frame(quran_en_yusufali_raw$data$surahs[[.x]]))) %>%
unnest(surah_details) %>%
mutate(ayahs = ayahs %>% map(~ extract_ayahs(.x))) %>%
unnest(ayahs) %>%
mutate(ayah_title = paste0(surah_id, ":", numberInSurah),
surah = surah_id) %>%
select(surah_id, ayah_id = number1,
surah_title_ar = name, surah_title_en = englishName,
surah_title_en_trans = englishNameTranslation,
revelation_type = revelationType,
text, surah, ayah = numberInSurah, ayah_title,
juz, manzil, page, hizb_quarter = hizbQuarter,
sajda, sajda_id, sajda_recommended, sajda_obligatory) %>%
mutate(surah_title_ar = str_replace(surah_title_ar, "سورة ", "")) %>%
# There are two "The Dawns" so make 113 Al-Falaq be "The Daybreak"
mutate(surah_title_en_trans = ifelse(surah_id == 113, "The Daybreak", surah_title_en_trans)) %>%
mutate_at(vars(surah_title_ar, surah_title_en, surah_title_en_trans),
funs(forcats::fct_inorder))

quran_en_sahih <- data_frame(surah_id = 1:114) %>%
mutate(surah_details =
map(surah_id, ~ as_data_frame(quran_en_sahih_raw$data$surahs[[.x]]))) %>%
unnest(surah_details) %>%
mutate(ayahs = ayahs %>% map(~ extract_ayahs(.x))) %>%
unnest(ayahs) %>%
mutate(ayah_title = paste0(surah_id, ":", numberInSurah),
surah = surah_id) %>%
select(surah_id, ayah_id = number1,
surah_title_ar = name, surah_title_en = englishName,
surah_title_en_trans = englishNameTranslation,
revelation_type = revelationType,
text, surah, ayah = numberInSurah, ayah_title,
juz, manzil, page, hizb_quarter = hizbQuarter,
sajda, sajda_id, sajda_recommended, sajda_obligatory) %>%
mutate(surah_title_ar = str_replace(surah_title_ar, "سورة ", "")) %>%
# There are two "The Dawns" so make 113 Al-Falaq be "The Daybreak"
mutate(surah_title_en_trans = ifelse(surah_id == 113, "The Daybreak", surah_title_en_trans)) %>%
mutate_at(vars(surah_title_ar, surah_title_en, surah_title_en_trans),
funs(forcats::fct_inorder))


# Add to package ----------------------------------------------------------

devtools::use_data(quran_ar, overwrite = TRUE)
devtools::use_data(quran_ar_min, overwrite = TRUE)
devtools::use_data(quran_en_yusufali, overwrite = TRUE)
devtools::use_data(quran_en_sahih, overwrite = TRUE)
1 change: 1 addition & 0 deletions data-raw/en.sahih.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions data-raw/en.yusufali.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions data-raw/quran-simple-clean.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions data-raw/quran-simple.json

Large diffs are not rendered by default.

Binary file added data/quran_ar.rda
Binary file not shown.
Binary file added data/quran_ar_min.rda
Binary file not shown.
Binary file added data/quran_en_sahih.rda
Binary file not shown.
Binary file added data/quran_en_yusufali.rda
Binary file not shown.
Loading

0 comments on commit 684f088

Please sign in to comment.