Skip to content

Fetches and parses Linux kernel ChangeLogs from https://kernel.org/pub/linux/kernel/ into Elixir structs that can later be handled with Ecto.

Notifications You must be signed in to change notification settings

waseigo/Changelogrex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Changelogrex

🐧🌾🗒️🫅 (Linux kernel changelog rex)

Fetches and parses Linux kernel ChangeLogs from https://kernel.org/pub/linux/kernel/ into Elixir structs that can later be handled with Ecto.

Front-end stuff

changelogr_fetch_and_process-2023-11-22_02.57.57.mp4

Back-end stuff

Batch fetch-and-process Linux kernel v6.x ChangeLogs into properly parsed commits

First, check which ChangeLogs are available.

{ :ok, available } = Changelogr.Fetcher.fetch_available("v6.x")

available is a %FetchOp{} (Fetch Operation) struct that contains HREFs and dates of all ChangeLogs from the directory listing of https://kernel.org/pub/linux/kernel/v6.x/.

iex> IO.inspect available
%Changelogr.FetchOp{
  url: "https://cdn.kernel.org/pub/linux/kernel/v6.x/",
  timestamp: ~U[2023-11-19 10:43:10Z],
  status: 200,
  body: "<html>\r\n<head><title>Index of /pub/linux/kernel/v6.x/</title></head> ...",
  hrefs: %{
    "6.3.3" => "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.3.3",
    "6.5.6" => "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.5.6",
    ...
  },
  dates: %{
    "6.3.3" => ~N[2023-05-17 12:09:00],
    "6.5.6" => ~N[2023-10-06 11:24:00],
    ...
  },
  errors: nil
}

To be able to fetch their content, we convert available into a list of %ChangeLog{} structs.

{ :ok, fetchable } = Changelogr.Fetcher.fetchop_to_changelogs(available)

fetchable is a list of %ChangeLog{} structs. The :body attribute of the struct is still nil.

iex> IO.inspect fetchable
[
  %Changelogr.ChangeLog{
    kernel_version: "6.1.35",
    url: "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.35",
    date: ~N[2023-06-21 14:12:00],
    timestamp: nil,
    body: nil,
    commits: nil
  },
  %Changelogr.ChangeLog{
    kernel_version: "6.1.40",
    url: "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.40",
    date: ~N[2023-07-23 12:00:00],
    timestamp: nil,
    body: nil,
    commits: nil
  },
...
]

Next, we'll fetch them in sequence.

changelogs = Enum.map(fetchable, &Changelogr.Fetcher.fetch_changelog/1)

If the fetch was successful, the :body of each %ChangeLog{} struct in the list now contains the ChangeLog text, but it is still unparsed (:commit is still nil).

For example:

iex> hd changelogs
{:ok,
 %Changelogr.ChangeLog{
   kernel_version: "6.1.40",
   url: "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.40",
   date: ~N[2023-07-23 12:00:00],
   timestamp: ~U[2023-11-19 10:44:24Z],
   body: "commit 75389113731bb629fa5e971baf58e422414c8d23\nAuthor: Greg Kroah-Hartman <[email protected]>\nDate:   Sun Jul 23 13:49:51 2023 +0200\n\n    Linux 6.1.40\n    \n    Link: https://lore.kernel.org/ ...",
   commits: nil
 }}

Next, we'll process each one of these %ChangeLog{} structs into %Commit{} structs.

  1. First, we filter the list to keep only those with status :ok.
  2. Then, we convert it into a list of %ChangeLog{} structs by discarding the status.
  3. Finally, we parse the :body of each %ChangeLog{} structs into a list of %Commit{} structs in parallel.
  4. Then, we select only those with status :ok, ...
  5. ...and we flatten the list.
commits =
  changelogs
  |> Enum.filter(fn {status, _} -> status == :ok end)
  |> Enum.map(fn {_, x} -> x end)
  |> Task.async_stream(&Changelogr.Parser.changelog_to_commits/1)
  |> Enum.to_list()
  |> Enum.filter(fn {status, _} -> status == :ok end)
  |> Enum.map(fn {_, x} -> x end)
  |> List.flatten()

Each item of the commits list is a %Commit{} struct in which :body and :commit are filled in, as well as anything else known from the ChangeLog that contains the commit, and timestamps of when it was fetched, and where it was fetched from.

iex> hd commits
%Changelogr.Commit{
  kernel_version: "6.1.35",
  changelog_url: "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.35",
  fetched_timestamp: ~U[2023-11-19 11:23:01Z],
  changelog_timestamp: ~N[2023-06-21 14:12:00],
  commit: "commit e84a4e368abe42cf359fe237f0238820859d5044\n",
  author: nil,
  date: nil,
  body: "Author: Greg Kroah-Hartman <[email protected]>\nDate:   Wed Jun 21 16:01:03 2023 +0200\n\n    ...",
  reported_by: nil,
  tested_by: nil,
  message_id: nil,
  noticed_by: nil,
  suggested_by: nil,
  fixes: nil,
  reviewed_by: nil,
  closes: nil,
  cc: nil,
  acked_by: nil,
  signed_off_by: nil,
  link: nil,
  upstream_commit: nil
}

To fill in the rest of the attributes and to process the existing fields, we need to extract and process them. Processing is done in parallel.

processed =
 commits
 |> Task.async_stream(&Changelogr.Parser.extract_all_fields/1)
 |> Enum.to_list()
 |> Enum.map(fn {:ok, x} -> x end)

And so we end up with a list of parsed and processed commits. Here's what such a "finished" commit looks like:

iex> hd processed
%Changelogr.Commit{
  kernel_version: "6.1.35",
  changelog_url: "https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.35",
  fetched_timestamp: ~U[2023-11-19 11:23:01Z],
  changelog_timestamp: ~N[2023-06-21 14:12:00],
  commit: "e84a4e368abe42cf359fe237f0238820859d5044",
  author: "Greg Kroah-Hartman <[email protected]>",
  date: #DateTime<2023-06-21 16:01:03+02:00 +02 Etc/UTC+2>,
  body: ["Linux 6.1.35"],
  reported_by: nil,
  tested_by: ["Florian Fainelli <[email protected]>",
   "Markus Reichelt <[email protected]>",
   "Salvatore Bonaccorso <[email protected]>",
   "Linux Kernel Functional Testing <[email protected]>",
   "Chris Paterson (CIP) <[email protected]>",
   "Jon Hunter <[email protected]>", "Ron Economos <[email protected]>",
   "Conor Dooley <[email protected]>",
   "Sudip Mukherjee <[email protected]>",
   "Takeshi Ogasawara <[email protected]>",
   "Allen Pais <[email protected]>",
   "Shuah Khan <[email protected]>",
   "Guenter Roeck <[email protected]>"],
  message_id: nil,
  noticed_by: nil,
  suggested_by: nil,
  fixes: nil,
  reviewed_by: nil,
  closes: nil,
  cc: nil,
  acked_by: nil,
  signed_off_by: ["Greg Kroah-Hartman <[email protected]>"],
  link: ["https://lore.kernel.org/r/[email protected]"],
  upstream_commit: nil
}

About

Fetches and parses Linux kernel ChangeLogs from https://kernel.org/pub/linux/kernel/ into Elixir structs that can later be handled with Ecto.

Topics

Resources

Stars

Watchers

Forks