Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the way we hold releases on GitHub #11

Open
at055612 opened this issue May 25, 2017 · 5 comments
Open

Improve the way we hold releases on GitHub #11

at055612 opened this issue May 25, 2017 · 5 comments

Comments

@at055612
Copy link
Member

Currently a release on GitHub represents a new version of a single content pack. Find the release you want will quickly become a nightmare as more packs are added and existing packs get updated to new versions.

A better approach maybe to get travis to build all packs whenever a tag is created, and have travis add all content pack zips to that release whether they have changed since the last release or not. If we implement some form of versioning of packs then this could include all versions of each pack.

The tag for the release could be the name and version of the thing(s) that have changed or some arbitrary version number for the packs as a whole.

@burnalting
Copy link

As content packs will be the most diverse from an contribution standpoint, can we get some documentation at some point on how and what a contributor should do?

  • Should the package use standard pipelines once they are released?
  • Should the package come with a test/validation data set?
  • Should we require the contributor to document the package (in-line or a separate README.md file or both)?
  • Once there is sufficient content how will it be offered? A tagged flat directory? Some hierarchy (with inferred tags)?

I' guessing all this should contained with (or a link from) the stroom-content/README.md file.

@at055612
Copy link
Member Author

This change may not be needed as I have now added links to the current releases for each pack to the root readme, saving people from having to trawl through the releases page to find the latest release of a pack.

@at055612
Copy link
Member Author

After a chat with @stroomdev66 we agreed that removing the current mandrolic process of manually running the gradle build when we have tagged a singe pack at a new version, then manually creating a release for that tag in github and manually adding the build zips for the pack into it, is prone to error.

It should be possible for a travis build to detect that it is a tagged commit, extract the pack key from the git tag (assuming we follow a convention like pack-name-vx.y.z), then run the gradle build and finally release the zip(s) for that pack to github.

Discussed the fact that a tag applies to the whole repo, but really applies to a single pack. While a bit odd, it still ensures we can point to the source for a version of a pack.

Discussed having a single manifest file in the root of the repo (probably json) that defines all the packs, their versions, the download urls for the versions and compatibility with stroom versions. e.g.

{                                                                                                                                                                                                                                                                               
    packs: [                                                                                                                                                                                                                                                                    
        {                                                                                                                                                                                                                                                                       
            name: stroom-101,                                                                                                                                                                                                                                                   
            description: some wordy stuff,                                                                                                                                                                                                                                      
            versions: [                                                                                                                                                                                                                                                         
                {                                                                                                                                                                                                                                                               
                    version: v2.0.0,                                                                                                                                                                                                                                            
                    releaseDate: 20180228,                                                                                                                                                                                                                                      
                    compatibleStroomVersions: [ v6.0 ],                                                                                                                                                                                                                         
                    "zipUrl": "https://github.com/gchq/stroom-content/releases/download/stroom-101-v1.0/stroom-101-v1.0.zip",                                                                                                                                                   
                    zipWithDepsUrl: https://github.com/gchq/stroom-content/releases/download/stroom-101-v1.0/stroom-101-v1.0-all.zip                                                                                                                                            
                },                                                                                                                                                                                                                                                              
                {                                                                                                                                                                                                                                                               
                    version: v1.0.0,                                                                                                                                                                                                                                            
                    releaseDate: 20180228,                                                                                                                                                                                                                                      
                    compatibleStroomVersions: [ v5.0, v5.1 ],                                                                                                                                                                                                                   
                    zipUrl: https://github.com/gchq/stroom-content/releases/download/stroom-101-v1.0/stroom-101-v1.0.zip,                                                                                                                                                       
                    zipWithDepsUrl: https://github.com/gchq/stroom-content/releases/download/stroom-101-v1.0/stroom-101-v1.0-all.zip                                                                                                                                            
                }                                                                                                                                                                                                                                                               
            ]                                                                                                                                                                                                                                                                   
        }                                                                                                                                                                                                                                                                       
    ]                                                                                                                                                                                                                                                                           
}

Maybe this file could also include dependency information for each pack version, eg. pack X v1.2 depends on pack Y v3.4.

With a bit of static javascript and github pages we could easily render this into something more readable for the web.

The alternative would be for each pack to define its own manifest file and then have a process that collates them together into one big easily queryable file.

With a manifest file like this stroom could be changed so that when provided with a link to the manifest file (i.e. hosted on raw.github.com for an appropriate branch) it could then present the user with a list of packs to get.

We need to branch the repo as stroom-v5.0 and stroom-v5.1 to give us the ability to support older pack versions. Currently all packs on master (with the exception of the latest internal stat packs) are v5.x compatible

@at055612
Copy link
Member Author

at055612 commented Oct 22, 2021

Further to the above, the thinking now is to move to one repo per pack. The following was posted elsewhere:

Been doing some more thinking about the development of content packs going forward and have discussed with @stroomdev66 and @gcdev373 . The current idea is for each content pack to have its own repo on github. A pack would be per log producing system/app, e.g. squid, apache etc. and would typically handle one log format. Some things like windows produce multiple log formats so for these it would make sense to bundle them into one pack as you are likely to always want all formats for a system.
Each pack repo could be owned/maintained by anyone, e.g. @burnalting could create the squid pack in his org's github. This reduces the need for us to be a blocker on everything and the people with the most interest in a pack are responsible for maintaining it. It would contain all the stroom content along with docs and any supporting scripts/config/etc. It would have its own lifecycle and would be tagged/released as changes are made to it.
Each pack repo would need the means (i.e. scripts, github actions) to validate and package up a pack so these would be maintained in a single repo probably owned by us that each pack repo could make use of via git sub-modules. The current approach for pulling in dependency packs would need to change so that instead it fetched released pack zips from github releases and used the content from them. Each pack repo would need to conform to some defined structure so the scripts would work on any repo. We would continue the practice of releasing a zip with no deps along with a fat zip with all deps in it. It would also make sense for packs to include some meta file that defines their name, version, all the deps they have to other packs, import format version (i.e. v5, v6, v7, etc.) and maybe some description. e.g.

./meta.yml
./CHANGELOG.md
./README.md # root readme describing the pack
./content # all the stroom content files (xslts, pipes, etc.)
./clientArtefacts # any supporting scripts/config for doing the logging
./docs # any docs that don't fit in the root readme
./framework # git submodule link to the central repo that contains the pack build scripts

We would then maintain a central directory of packs in some (probably gchq) repo which would have links to all the pack repos the people have created along with released versions, compatibility matrices and such like. If this directory was held in some structured form, e.g. yaml, then it could in theory be read by stroom to pull in packs in a more friendly way.
There are still some un-answered questions around dependency conflicts and resolving them that I think can only be answered by stroom having an understanding of what a pack is and the deps between them. This is a much bigger problem that is not going to get fixed in the short term though.

The meta.yml could look like

---
id: gchq/stroom-content-101/v1.2.3 #Not sure we need this if we have version and repo
repo: gchq/stroom-content-101
name: Stroom 101
description: some wordy stuff
version: v1.2.3
releaseDate: 20180228
compatibleStroomVersions: 
  - v6.0
  - v6.1
  - v7.0
dependencies:
  - gchq/stroom-content-standard-pipelines/v0.4
  - gchq/stroom-content-template-pipelines/v0.3
# Maybe include the urls for the pack release zips in case we want to support non github hosted release artefacts.

If each pack is identified by // and each repo follows a convention for releasing using tags that match the version number in the id then one pack can easily fetch the dependency artefacts from github. The central directory could then just be just be a yaml list of pack IDs which it could parse, fetch all the pack metas and then render into some nice html with names, descriptions, version and links. e.g.

---
packs:
  - gchq/stroom-content-standard-pipelines:
    - v0.3
    - v0.4
  - gchq/stroom-content-template-pipelines:
    - v0.2
    - v0.3
  - otherorg/stroom-content-squid-proxy:
    - v1.0
# Maybe for each pack ver include the url of its meta.yml file in case we want to support non github repos.

Also, maybe compatibleStroomVersions ought to just be minimum StroomVersion?

The specification for a pack repo would be:

  • Apache 2.0 licence
  • Hosted publicly on github
  • Repo name in lower kebab case prefixed with stroom-content-
  • Version tags of the form v[0-9]+\.[0-9]+(-(alpha|beta))?\.[0-9]+
  • meta.yml conforming to above structure in root of repo
  • dir structure as above
  • CHANGELOG.md in the root of the repo
  • Documentation in github markdown format
  • Packs released against version tags using github releases

@at055612
Copy link
Member Author

A further evolution of how this all could work:

Pack manifest file

stroom-101_v1.2.3.yml

---
uuid: 19e3fab7-3929-4c6e-bbdf-7944965715e4 # A uuid for the pack, used as entity uuid in stroom, maybe?
repo: stroom-content # Should the pack know what repo it is in?
name: stroom-101 # enforce name pattern for pack names, e.g. [-a-zA-Z]+
version: v1.2.3 # pack version
description: some wordy stuff # This could become the Description tab of the pack in stroom (in markdown)
releaseDate: 2023-08-25T13:04:01+01:00
checksum: "addf120b430021c36c232c99ef8d926aea2acd6b" # Hash of all files in the pack (except this yaml file)
minimumStroomVersion: v7.2
packFormatVersion: v1.0 # Version of the structure of the pack and this yaml, so it can be parsed/imported appropriately
files: # relative to pack manifest file, so stroom knows where to download files from
  - STROOM_101.Pipeline.38a86873-1365-4173-bb7d-1e41eaca72a8.data.xml
  - STROOM_101.Pipeline.38a86873-1365-4173-bb7d-1e41eaca72a8.node
  - STROOM_101.Pipeline.38a86873-1365-4173-bb7d-1e41eaca72a8.xml
  - etc.
dependencies:
  - repo: "stroom-content" # A unique name for a repo. Stroom would need to have this dep repo configured
    name: "template-pipelines"
    version: v0.3
  - repo: "burns-content" # Another repo
    name: "squid-proxy"
    version: v0.2.1

Content repo manifest file

content-pack-repo.yml

This file could be generated by crawling a directory containing packs

---
uuid: 19e3fab7-3929-4c6e-bbdf-7944965715e4 # A uuid for the repo, maybe
name: "stroom-content" # Unique across all repos
description: some wordy stuff # Description of the repo, would be displayed in stroom next to the repo (in markdown)
repoFormatVersion: v1.0 # Version of the structure of the repo and this yaml, so it can be parsed/imported appropriately
packs:
  - name: "stroom-101"
    version: v1.2.3
    location: "stroom-101/v1.2.3/stroom-101_v1.2.3.yml" # rel path to the pack manifest
    checksum: "addf120b430021c36c232c99ef8d926aea2acd6b" # The pack's hash
  - name: "template-pipelines"
    version: v0.3
    location: "stroom-content/v0.3/stroom-content_v0.3yml"
    checksum: "f572d396fae9206628714fb2ce00f72e94f2258f" # The pack's hash

Stroom explorer tree

Repos, packs and all their content are special things in the explorer tree, under their own special root Content Packs and distinct from user created content in System.
Alternatively they could be displayed on their own screen, but it is probably easier to have them in one place for the user.

+ Favourites
+ System
+ Content Packs
  + stroom-content  # A content repo
    + stroom-101  # A pack in a repo
    + template-pipelines
      - Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.data.xml
      - Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.node
      - Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.xml
      - etc.
  + burns-content  # Another content repo
    + squid-proxy  # A pack in this repo

All content in a pack would be read-only unless the pack has been set to writable.

Content Repo entity

UI has:

  • Location
    • Repo manifest file url - e.g. https://raw.githubusercontent.com/gchq/stroom-content/master/content-pack-repo.yml
  • Settings
    • Version
  • Description - comes from manifest

URL could be on a http server or a shared file system, e.g. file://shared-storage/stroom-content-repo/content-pack-repo.yml
Rest of repo info obtained from that file.
Displays a list of all packs (and their versions) in the repo on its entity page.
Each one has a button to import the pack.
Importing a version of a pack where another version of that pack is already imported will prompt the user to confirm overwriting the existing pack.
Multiple repos can be added in stroom, but can't add two with the same name.
A pack repo is like a special kind of folder in the exp tree.
You can only add/remove children by importing/removing packs in the repo's entity page or via content menu.

Importing a pack with dependencies would require stroom to have already loaded the repo(s) for the dependency packs, and it would prompt the user to confirm import of the dependency pack(s), which may in turn overwrite existing versions.
Removing a pack that is used by another installed pack or where its content is referenced by non-pack content would prompt the user with a warning.

Content Pack entity

Read-only by default.
Importing a pack from a repo creates the pack entity as a child of the repo entity in the tree.
ALL files in the pack are descendants of the pack entity in the tree.
A pack may contain folders to sub-divide its content.

UI has:

  • Location
    • Repo manifest file url - e.g. https://raw.githubusercontent.com/gchq/stroom-content/master/content-pack-repo.yml
  • Settings
    • Editable (true/false) - false by default. Setting to true changes the version to UNVERSIONED
    • Version
    • Dependencies - List of versioned packs that this one depends on.
  • Description - comes from manifest

Editing of packs is only there to allow the development of packs.
Content in a pack can ONLY depend on any entities that in a versioned pack that is included in its dependencies and installed in stroom.
This is to ensure you cannot publish a pack that has broken dependencies.

If you need to set properties on a pipeline belonging to a pack (e.g. to set an output feed that is different to the input) then create a non-pack pipeline that extends the pack one and edit that.

Stroom might need a new column on the doc table to hold the pack uuid, which can be used to determine the read-only state of pack entities.

Repo/Pack file structure

/
  /content-pack-repo.yml  # repo manifest file
    /stroom-101
      /v1.2.3
        ...
    /template-pipelines
      /v0.2
        /template-pipelines_v0.2.yml  # pack manifest file
        /content
          /Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.data.xml
          /Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.node
          /Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.xml
          /etc.
        /clientArtefacts  # Dir for any non-content files, e.g. scripts
      /v0.3
        /template-pipelines_v0.3.yml  # pack manifest file
        /content
          /Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.data.xml
          /Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.node
          /Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.xml
          /etc.
        /clientArtefacts  # Dir for any non-content files, e.g. scripts

Doesn't use any git versioning for simplicity.
This means the pack repo could be a git repo, a simple http server or a shared file server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants