🐧🌾🗒️🫅 (Linux kernel changelog rex)
Fetches and parses Linux kernel ChangeLogs from https://kernel.org/pub/linux/kernel/ into Elixir structs that can later be handled with Ecto.
changelogr_fetch_and_process-2023-11-22_02.57.57.mp4
First, check which ChangeLogs are available.
{ :ok, available } = Changelogr.Fetcher.fetch_available("v6.x")
available
is a %FetchOp{} (Fetch Operation) struct that contains HREFs and dates of all ChangeLogs from the directory listing of https://kernel.org/pub/linux/kernel/v6.x/.
iex> IO.inspect available
%Changelogr.FetchOp{
url: "https://cdn.kernel.org/pub/linux/kernel/v6.x/",
timestamp: ~U[2023-11-19 10:43:10Z],
status: 200,
body: "<html>\r\n<head><title>Index of /pub/linux/kernel/v6.x/</title></head> ...",
hrefs: %{
"6.3.3" => "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.3.3",
"6.5.6" => "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.5.6",
...
},
dates: %{
"6.3.3" => ~N[2023-05-17 12:09:00],
"6.5.6" => ~N[2023-10-06 11:24:00],
...
},
errors: nil
}
To be able to fetch their content, we convert available
into a list of %ChangeLog{}
structs.
{ :ok, fetchable } = Changelogr.Fetcher.fetchop_to_changelogs(available)
fetchable
is a list of %ChangeLog{}
structs. The :body
attribute of the struct is still nil
.
iex> IO.inspect fetchable
[
%Changelogr.ChangeLog{
kernel_version: "6.1.35",
url: "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.35",
date: ~N[2023-06-21 14:12:00],
timestamp: nil,
body: nil,
commits: nil
},
%Changelogr.ChangeLog{
kernel_version: "6.1.40",
url: "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.40",
date: ~N[2023-07-23 12:00:00],
timestamp: nil,
body: nil,
commits: nil
},
...
]
Next, we'll fetch them in sequence.
changelogs = Enum.map(fetchable, &Changelogr.Fetcher.fetch_changelog/1)
If the fetch was successful, the :body
of each %ChangeLog{}
struct in the list now contains the ChangeLog text, but it is still unparsed (:commit
is still nil
).
For example:
iex> hd changelogs
{:ok,
%Changelogr.ChangeLog{
kernel_version: "6.1.40",
url: "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.40",
date: ~N[2023-07-23 12:00:00],
timestamp: ~U[2023-11-19 10:44:24Z],
body: "commit 75389113731bb629fa5e971baf58e422414c8d23\nAuthor: Greg Kroah-Hartman <[email protected]>\nDate: Sun Jul 23 13:49:51 2023 +0200\n\n Linux 6.1.40\n \n Link: https://lore.kernel.org/ ...",
commits: nil
}}
Next, we'll process each one of these %ChangeLog{}
structs into %Commit{}
structs.
- First, we filter the list to keep only those with status
:ok
. - Then, we convert it into a list of
%ChangeLog{}
structs by discarding the status. - Finally, we parse the
:body
of each%ChangeLog{}
structs into a list of%Commit{}
structs in parallel. - Then, we select only those with status
:ok
, ... - ...and we flatten the list.
commits =
changelogs
|> Enum.filter(fn {status, _} -> status == :ok end)
|> Enum.map(fn {_, x} -> x end)
|> Task.async_stream(&Changelogr.Parser.changelog_to_commits/1)
|> Enum.to_list()
|> Enum.filter(fn {status, _} -> status == :ok end)
|> Enum.map(fn {_, x} -> x end)
|> List.flatten()
Each item of the commits
list is a %Commit{}
struct in which :body
and :commit
are filled in, as well as anything else known from the ChangeLog that contains the commit, and timestamps of when it was fetched, and where it was fetched from.
iex> hd commits
%Changelogr.Commit{
kernel_version: "6.1.35",
changelog_url: "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.35",
fetched_timestamp: ~U[2023-11-19 11:23:01Z],
changelog_timestamp: ~N[2023-06-21 14:12:00],
commit: "commit e84a4e368abe42cf359fe237f0238820859d5044\n",
author: nil,
date: nil,
body: "Author: Greg Kroah-Hartman <[email protected]>\nDate: Wed Jun 21 16:01:03 2023 +0200\n\n ...",
reported_by: nil,
tested_by: nil,
message_id: nil,
noticed_by: nil,
suggested_by: nil,
fixes: nil,
reviewed_by: nil,
closes: nil,
cc: nil,
acked_by: nil,
signed_off_by: nil,
link: nil,
upstream_commit: nil
}
To fill in the rest of the attributes and to process the existing fields, we need to extract and process them. Processing is done in parallel.
processed =
commits
|> Task.async_stream(&Changelogr.Parser.extract_all_fields/1)
|> Enum.to_list()
|> Enum.map(fn {:ok, x} -> x end)
And so we end up with a list of parsed and processed commits. Here's what such a "finished" commit looks like:
iex> hd processed
%Changelogr.Commit{
kernel_version: "6.1.35",
changelog_url: "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.35",
fetched_timestamp: ~U[2023-11-19 11:23:01Z],
changelog_timestamp: ~N[2023-06-21 14:12:00],
commit: "e84a4e368abe42cf359fe237f0238820859d5044",
author: "Greg Kroah-Hartman <[email protected]>",
date: #DateTime<2023-06-21 16:01:03+02:00 +02 Etc/UTC+2>,
body: ["Linux 6.1.35"],
reported_by: nil,
tested_by: ["Florian Fainelli <[email protected]>",
"Markus Reichelt <[email protected]>",
"Salvatore Bonaccorso <[email protected]>",
"Linux Kernel Functional Testing <[email protected]>",
"Chris Paterson (CIP) <[email protected]>",
"Jon Hunter <[email protected]>", "Ron Economos <[email protected]>",
"Conor Dooley <[email protected]>",
"Sudip Mukherjee <[email protected]>",
"Takeshi Ogasawara <[email protected]>",
"Allen Pais <[email protected]>",
"Shuah Khan <[email protected]>",
"Guenter Roeck <[email protected]>"],
message_id: nil,
noticed_by: nil,
suggested_by: nil,
fixes: nil,
reviewed_by: nil,
closes: nil,
cc: nil,
acked_by: nil,
signed_off_by: ["Greg Kroah-Hartman <[email protected]>"],
link: ["https://lore.kernel.org/r/[email protected]"],
upstream_commit: nil
}