Skip to content

g-pan/HPCC-Platform

This branch is 1 commit ahead of, 136 commits behind hpcc-systems/HPCC-Platform:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

28d326f · Jan 28, 2025
Jan 20, 2025
Nov 14, 2024
Jan 22, 2015
Oct 2, 2018
Jan 17, 2025
Jan 20, 2025
Aug 28, 2024
Jan 23, 2025
May 29, 2018
Sep 11, 2024
Nov 14, 2024
Feb 20, 2020
Jan 23, 2025
Jan 28, 2025
Jan 15, 2025
Jan 10, 2025
Jan 23, 2025
Jan 23, 2025
Dec 20, 2012
Jan 28, 2025
Jan 28, 2025
Mar 13, 2024
Jun 19, 2013
Oct 26, 2016
Jan 17, 2025
Jan 23, 2025
Nov 11, 2024
Apr 10, 2019
Jan 23, 2025
Jan 23, 2025
Jan 20, 2025
Jan 17, 2025
Jan 17, 2025
Jun 19, 2024
Apr 25, 2023
Mar 27, 2024
Dec 7, 2023
Mar 2, 2023
Apr 11, 2024
Mar 14, 2024
Aug 23, 2011
Sep 6, 2016
Sep 14, 2011
Aug 4, 2015
Dec 15, 2015
Jul 4, 2024
Sep 22, 2015
Aug 4, 2015
Nov 2, 2023
Dec 4, 2018
Aug 10, 2016
Feb 18, 2020
Feb 28, 2023
Oct 29, 2024
Oct 29, 2024
Jan 17, 2025
May 9, 2024
Jun 15, 2023
Mar 7, 2023
Dec 19, 2022
Jan 17, 2025
Apr 18, 2024
Jan 14, 2025

Repository files navigation

Description / Rationale

HPCC Systems offers an enterprise ready, open source supercomputing platform to solve big data problems. As compared to Hadoop, the platform offers analysis of big data using less code and less nodes for greater efficiencies and offers a single programming language, a single platform and a single architecture for efficient processing. HPCC Systems is a technology division of LexisNexis Risk Solutions.

Getting Started

Release + Support Policy

In general, a new version of the HPCC Platform is released every 3 months. These releases can be either Major (with breaking changes) or Minor (with new features). Maintenance and security releases (point releases) are typically made weekly, and may occasionally include technical previews.

Maintenance releases are supported for the current and previous release, while security releases are supported for the current and previous two releases:

Loading
---
displayMode: compact
---
gantt
    title Release Schedule
    axisFormat %Y-Q%q
    tickInterval 3month
    dateFormat YYYY-MM-DD
    section v8.12.x
        Active:          active, 2023-02-07, 5M
        Critical:        3M
        Security:        6M
    section v9.0.x
        Active:          active, 2023-04-03, 6M
        Critical:        3M
        Security:        6M
    section v9.2.x
        Active:          active, 2023-07-04, 9M
        Critical:        3M
        Security:        3M
    section v9.4.x
        Active:          active, 2023-10-04, 9M
        Critical:        3M
        Security:        3M
    section v9.6.x
        Active:          active, 2024-04-04, 6M
        Critical:        3M
        Security:        3M
    section v9.8.x
        Active:          active, 2024-07-02, 6M
        Critical:        3M
        Security:        3M
    section v9.10.x
        Active:          active, 2024-10-01, 6M
        Critical:        3M
        Security:        3M

Architecture

The HPCC Systems architecture incorporates the Thor and Roxie clusters as well as common middleware components, an external communications layer, client interfaces which provide both end-user services and system management tools, and auxiliary components to support monitoring and to facilitate loading and storing of filesystem data from external sources. An HPCC environment can include only Thor clusters, or both Thor and Roxie clusters. Each of these cluster types is described in more detail in the following sections below the architecture diagram.

Thor

Thor (the Data Refinery Cluster) is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. A cluster can scale from a single node to thousands of nodes.

  • Single-threaded
  • Distributed parallel processing
  • Distributed file system
  • Powerful parallel processing programming language (ECL)
  • Optimized for Extraction, Transformation, Loading, Sorting, Indexing and Linking
  • Scales from 1-1000s of nodes

Roxie

Roxie (the Query Cluster) provides separate high-performance online query processing and data warehouse capabilities. Roxie (Rapid Online XML Inquiry Engine) is the data delivery engine used in HPCC to serve data quickly and can support many thousands of requests per node per second.

  • Multi-threaded
  • Distributed parallel processing
  • Distributed file system
  • Powerful parallel processing programming language (ECL)
  • Optimized for concurrent query processing
  • Scales from 1-1000s of nodes

ECL

ECL (Enterprise Control Language) is the powerful programming language that is ideally suited for the manipulation of Big Data.

  • Transparent and implicitly parallel programming language
  • Non-procedural and dataflow oriented
  • Modular, reusable, extensible syntax
  • Combines data representation and algorithm implementation
  • Easily extend using C++ libraries
  • ECL is compiled into optimized C++

ECL IDE

ECL IDE is a modern IDE used to code, debug and monitor ECL programs.

  • Access to shared source code repositories
  • Complete development, debugging and testing environment for developing ECL dataflow programs
  • Access to the ECLWatch tool is built-in, allowing developers to watch job graphs as they are executing
  • Access to current and historical job workunits

ESP

ESP (Enterprise Services Platform) provides an easy to use interface to access ECL queries using XML, HTTP, SOAP and REST.

  • Standards-based interface to access ECL functions

Developer documentation

The following links describe the structure of the system and detail some of the key components:

Regression test

cd /opt/HPCCSystems/testing/regress
./ecl-test query --target thor nlppp.ecl

About

HPCC Systems big data solution, community edition

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 65.8%
  • ECL 16.2%
  • XSLT 5.1%
  • TypeScript 3.6%
  • JavaScript 3.0%
  • CMake 1.4%
  • Other 4.9%