Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add further education type when not "Not applicable" to SEND dataset #14

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 51 additions & 40 deletions data/edubasealldata20250109-establishment-types.csv
Original file line number Diff line number Diff line change
@@ -1,40 +1,51 @@
type-of-establishment-code,type-of-establishment-name,establishment-type-group-code,establishment-type-group-name
1,Community school,4,Local authority maintained schools
2,Voluntary aided school,4,Local authority maintained schools
3,Voluntary controlled school,4,Local authority maintained schools
5,Foundation school,4,Local authority maintained schools
6,City technology college,3,Independent schools
7,Community special school,5,Special schools
8,Non-maintained special school,5,Special schools
10,Other independent special school,5,Special schools
11,Other independent school,3,Independent schools
12,Foundation special school,5,Special schools
14,Pupil referral unit,4,Local authority maintained schools
15,Local authority nursery school,4,Local authority maintained schools
18,Further education,1,Colleges
24,Secure units,9,Other types
25,Offshore schools,9,Other types
26,Service children's education,9,Other types
27,Miscellaneous,9,Other types
28,Academy sponsor led,10,Academies
29,Higher education institutions,2,Universities
30,Welsh establishment,6,Welsh schools
31,Sixth form centres,1,Colleges
32,Special post 16 institution,9,Other types
33,Academy special sponsor led,10,Academies
34,Academy converter,10,Academies
35,Free schools,11,Free Schools
36,Free schools special,11,Free Schools
37,British schools overseas,9,Other types
38,Free schools alternative provision,11,Free Schools
39,Free schools 16 to 19,11,Free Schools
40,University technical college,11,Free Schools
41,Studio schools,11,Free Schools
42,Academy alternative provision converter,10,Academies
43,Academy alternative provision sponsor led,10,Academies
44,Academy special converter,10,Academies
45,Academy 16-19 converter,10,Academies
46,Academy 16 to 19 sponsor led,10,Academies
49,Online provider,13,Online provider
56,Institution funded by other government department,9,Other types
57,Academy secure 16 to 19,10,Academies
establishment-type-group-code,establishment-type-group-name,type-of-establishment-code,type-of-establishment-name,further-education-type-name-applicable
1,Colleges,18,Further education,
1,Colleges,18,Further education,"Art, Design and Performing Arts College"
1,Colleges,18,Further education,General Further Education College
1,Colleges,18,Further education,Land-Based College
1,Colleges,18,Further education,Sixth Form College (General)
1,Colleges,18,Further education,Sixth Form College (Voluntary Aided)
1,Colleges,18,Further education,Sixth Form College (Voluntary Controlled)
1,Colleges,18,Further education,Specialist Designated College
1,Colleges,18,Further education,Tertiary College
1,Colleges,31,Sixth form centres,
2,Universities,29,Higher education institutions,
2,Universities,29,Higher education institutions,"Art, Design and Performing Arts College"
3,Independent schools,6,City technology college,
3,Independent schools,11,Other independent school,
4,Local authority maintained schools,1,Community school,
4,Local authority maintained schools,2,Voluntary aided school,
4,Local authority maintained schools,3,Voluntary controlled school,
4,Local authority maintained schools,5,Foundation school,
4,Local authority maintained schools,14,Pupil referral unit,
4,Local authority maintained schools,15,Local authority nursery school,
5,Special schools,7,Community special school,
5,Special schools,8,Non-maintained special school,
5,Special schools,10,Other independent special school,
5,Special schools,12,Foundation special school,
6,Welsh schools,30,Welsh establishment,
9,Other types,24,Secure units,
9,Other types,25,Offshore schools,
9,Other types,26,Service children's education,
9,Other types,27,Miscellaneous,
9,Other types,27,Miscellaneous,Specialist Designated College
9,Other types,32,Special post 16 institution,
9,Other types,32,Special post 16 institution,Specialist Designated College
9,Other types,37,British schools overseas,
9,Other types,56,Institution funded by other government department,Specialist Designated College
10,Academies,28,Academy sponsor led,
10,Academies,33,Academy special sponsor led,
10,Academies,34,Academy converter,
10,Academies,42,Academy alternative provision converter,
10,Academies,43,Academy alternative provision sponsor led,
10,Academies,44,Academy special converter,
10,Academies,45,Academy 16-19 converter,
10,Academies,46,Academy 16 to 19 sponsor led,
10,Academies,57,Academy secure 16 to 19,
11,Free Schools,35,Free schools,
11,Free Schools,36,Free schools special,
11,Free Schools,38,Free schools alternative provision,
11,Free Schools,39,Free schools 16 to 19,
11,Free Schools,40,University technical college,
11,Free Schools,41,Studio schools,
13,Online provider,49,Online provider,
157 changes: 87 additions & 70 deletions src/witan/gias.clj
Original file line number Diff line number Diff line change
Expand Up @@ -545,20 +545,19 @@
- CSV file to read: via `::edubaseall-file-path` or `::edubaseall-resource-file-name` (for files in resource folder).
[Defaults to `::edubaseall-resource-file-name` of `default-edubaseall-resource-file-name`.]
- Additional or over-riding options for `->dataset`."
([] (edubaseall->ds {}))
([{::keys [edubaseall-resource-file-name edubaseall-file-path]
:or {edubaseall-resource-file-name default-edubaseall-resource-file-name}
:as options}]
(with-open [in (-> (or edubaseall-file-path (io/resource edubaseall-resource-file-name))
io/file
io/input-stream)]
(ds/->dataset in (merge {:file-type :csv
:separator ","
:dataset-name (or edubaseall-file-path edubaseall-resource-file-name)
:header-row? true
:key-fn edubaseall-csv-key-fn
:parser-fn edubaseall-parser-fn}
options)))))
[& {::keys [edubaseall-resource-file-name edubaseall-file-path]
:or {edubaseall-resource-file-name default-edubaseall-resource-file-name}
:as options}]
(with-open [in (-> (or edubaseall-file-path (io/resource edubaseall-resource-file-name))
io/file
io/input-stream)]
(ds/->dataset in (merge {:file-type :csv
:separator ","
:dataset-name (or edubaseall-file-path edubaseall-resource-file-name)
:header-row? true
:key-fn edubaseall-csv-key-fn
:parser-fn edubaseall-parser-fn}
options))))

(comment
(defn- csv-ds-column-info
Expand Down Expand Up @@ -588,13 +587,20 @@

(comment
;; Write distinct establishment types to data file
(-> (edubaseall->ds {:column-allowlist (map (update-vals edubaseall-columns :csv-col-name) [:type-of-establishment-code :type-of-establishment-name
:establishment-type-group-code :establishment-type-group-name])
:dataset-name (str (re-find #".+(?=\.csv$)" default-edubaseall-resource-file-name) "-establishment-types" ".csv")})
(tc/unique-by)
(tc/order-by [:type-of-establishment-code])
(as-> $
(tc/write! $ (str "./data/" (tc/dataset-name $)))))
(let [establishment-type-cols [:establishment-type-group-code :establishment-type-group-name
:type-of-establishment-code :type-of-establishment-name
:further-education-type-name]]
(-> (edubaseall->ds {:column-allowlist (map (update-vals edubaseall-columns :csv-col-name)
establishment-type-cols)
:dataset-name (str (re-find #".+(?=\.csv$)" default-edubaseall-resource-file-name)
"-establishment-types" ".csv")})
(tc/map-columns :further-education-type-name-applicable [:further-education-type-name]
#(when (not= % "Not applicable") %))
(tc/drop-columns :further-education-type-name)
tc/unique-by
(tc/reorder-columns establishment-type-cols)
(#(tc/order-by % (tc/column-names %)))
(as-> $ (tc/write! $ (str "./data/" (tc/dataset-name $))))))

)

Expand All @@ -604,14 +610,19 @@
(def edubaseall-send-columns
(as-> [:urn
:last-changed-date
;; Local Authority
:la-code
:la-name
;; Establishment
:ukprn
:establishment-number
:establishment-name
:type-of-establishment-code
:type-of-establishment-name
:establishment-type-group-code
:establishment-type-group-name
:la-code
:la-name
:further-education-type-name
:further-education-type-name-applicable ; derived
;; Status
:establishment-status-name
:open-date
Expand All @@ -620,7 +631,9 @@
:phase-of-education-name
:statutory-low-age
:statutory-high-age
:further-education-type-name
#_:nursery-provision-name
#_:official-sixth-form-code
#_:official-sixth-form-name
;; Overall capacity & NOR
:school-census-date
:school-capacity
Expand All @@ -634,70 +647,75 @@
:sen-no-stat
;; RP & SENU Provision
#_:type-of-resourced-provision-name
:sen-unit? ; derived
:sen-unit? ; derived
:sen-unit-capacity
:sen-unit-on-roll
:resourced-provision? ; derived
:resourced-provision? ; derived
:resourced-provision-capacity
:resourced-provision-on-roll
;; SEN provision types
:sen-provision-types-vec ; derived
:sen-provision-types-vec ; derived
] $
;; Add order and (for columns coming from CSV file) column details
(map-indexed (fn [idx k] {k (merge {:col-idx idx
(map-indexed (fn [idx k] {k (merge {:col-idx idx
:col-name k}
(select-keys (edubaseall-columns k) [:csv-col-name :col-label]))}) $)
(into {} $)
;; Add details for derived columns
(merge-with merge $ {:sen-provision-types-vec {:derived? true
:col-label "SEN Provision Types (derived)"}
:sen-unit? {:derived? true
:col-label "SEN Unit? (derived)"}
:resourced-provision? {:derived? true
:col-label "Resourced Provision? (derived)"}})
(merge-with merge $ {:further-education-type-name-applicable {:derived? true
:col-label "Further education type (when applicable)"}
:sen-unit? {:derived? true
:col-label "SEN Unit? (derived)"}
:resourced-provision? {:derived? true
:col-label "Resourced Provision? (derived)"}
:sen-provision-types-vec {:derived? true
:col-label "SEN Provision Types (derived)"}})
;; Order the map
(into (sorted-map-by (partial compare-mapped-keys (update-vals $ :col-idx))) $)))

(defn edubaseall-send->ds
"Read SEND related columns from GIAS edubaseall \"all establishment\" data from CSV file into a dataset
with default column names, with additional derived columns:
- `:sen-provision-types-vec` - vector of (upper-case) SEN provision type abbreviations extracted from \"SEN1\"-\"SEN13\"
- `:resourced-provision?` - Boolean indicating if `:type-of-resourced-provision-name` indicates estab. has RP.
- `:further-education-type-name-applicable` - with contents of `:further-education-type-name` where not \"Not applicable\"
- `:sen-unit?` - Boolean indicating if `:type-of-resourced-provision-name` indicates estab. has a SENU.
- `:resourced-provision?` - Boolean indicating if `:type-of-resourced-provision-name` indicates estab. has RP.
- `:sen-provision-types-vec` - vector of (upper-case) SEN provision type abbreviations extracted from \"SEN1\"-\"SEN13\"
Use optional `options` map to specify:
- CSV file to read: via `::edubaseall-file-path` or `::edubaseall-resource-file-name` (for files in resource folder).
[Defaults to `::edubaseall-resource-file-name` of `default-edubaseall-resource-file-name`.]
- Additional or over-riding options for `->dataset`
(though note that any `:column-allowlist`, `:column-blocklist` or `:key-fn` will be ignored)."
([] (edubaseall-send->ds {}))
([options]
(let [sen-provision-type-columns (map (comp keyword (partial format "sen-provision-type-%,d")) (range 1 14))
columns-to-read ((comp distinct concat)
(keys edubaseall-send-columns)
sen-provision-type-columns
[:type-of-resourced-provision-name])
csv-columns-to-read (keep (update-vals edubaseall-columns :csv-col-name) columns-to-read)]
(-> (edubaseall->ds (-> options
(dissoc :key-fn :column-blocklist)
(assoc :column-allowlist csv-columns-to-read)))
;; Parse `:type-of-resourced-provision-name` into separate booleans for RP & SENU
(tc/map-columns :resourced-provision?
[:type-of-resourced-provision-name]
#({"Not applicable" false
"Resourced provision" true
"Resourced provision and SEN unit" true
"SEN unit" false} % %))
(tc/map-columns :sen-unit?
[:type-of-resourced-provision-name]
#({"Not applicable" false
"Resourced provision" false
"Resourced provision and SEN unit" true
"SEN unit" true} % %))
;; Pack non-nil SEN provision type abbreviations into a vector
(tc/map-columns :sen-provision-types-vec sen-provision-type-columns #(filterv some? %&))
;; Arrange dataset
(tc/select-columns (keys edubaseall-send-columns))
(as-> $ (tc/set-dataset-name $ (str (tc/dataset-name $) " (SEND columns)")))))))
[& {:as options}]
(let [sen-provision-type-columns (map (comp keyword (partial format "sen-provision-type-%,d")) (range 1 14))
columns-to-read ((comp distinct concat)
(keys edubaseall-send-columns)
sen-provision-type-columns
[:type-of-resourced-provision-name])
csv-columns-to-read (keep (update-vals edubaseall-columns :csv-col-name) columns-to-read)]
(-> (edubaseall->ds (-> options
(dissoc :key-fn :column-blocklist)
(assoc :column-allowlist csv-columns-to-read)))
;; Add `:further-education-type-name-applicable` with contents of `:further-education-type-name` when not "Not applicable"
(tc/map-columns :further-education-type-name-applicable [:further-education-type-name]
#(when (not= % "Not applicable") %))
;; Parse `:type-of-resourced-provision-name` into separate booleans for RP & SENU
(tc/map-columns :resourced-provision?
[:type-of-resourced-provision-name]
#({"Not applicable" false
"Resourced provision" true
"Resourced provision and SEN unit" true
"SEN unit" false} % %))
(tc/map-columns :sen-unit?
[:type-of-resourced-provision-name]
#({"Not applicable" false
"Resourced provision" false
"Resourced provision and SEN unit" true
"SEN unit" true} % %))
;; Pack non-nil SEN provision type abbreviations into a vector
(tc/map-columns :sen-provision-types-vec sen-provision-type-columns #(filterv some? %&))
;; Arrange dataset
(tc/select-columns (keys edubaseall-send-columns))
(as-> $ (tc/set-dataset-name $ (str (tc/dataset-name $) " (SEND columns)"))))))

(defn edubaseall-send->map
"Read SEND related columns from GIAS edubaseall \"all establishment\" data from CSV file and return as a map keyed by URN
Expand All @@ -710,11 +728,10 @@
[Defaults to `::edubaseall-resource-file-name` of `default-edubaseall-resource-file-name`.]
- Additional or over-riding options for `->dataset`
(though note that any `:column-allowlist`, `:column-blocklist` or `:key-fn` will be ignored)."
([] (edubaseall-send->map {}))
([options]
(let [edubaseall-send-ds (edubaseall-send->ds options)]
(zipmap (edubaseall-send-ds :urn)
(tc/rows edubaseall-send-ds :as-maps)))))
[& {:as options}]
(let [edubaseall-send-ds (edubaseall-send->ds options)]
(zipmap (edubaseall-send-ds :urn)
(tc/rows edubaseall-send-ds :as-maps))))

(comment ; Examine structure of edubaseall-send dataset
(-> (edubaseall-send->ds
Expand Down