Skip to content

Latest commit

 

History

History
77 lines (37 loc) · 4.74 KB

README.md

File metadata and controls

77 lines (37 loc) · 4.74 KB

DataHub Interactive Education (DINE)

Overview

Data Hub INteractive Education(DINE) is an educational content for SAP Data Hub. Our hands-on exercises are developed to show you how to use SAP Data Hub features.

SAP Data Hub allows you to connect to different data sources such as SAP HANA, SAP ERP, SAP BW, Oracle DB2, SQL Server, and many more and can process various data types; structured, semi-structured and unstructured using Kafka, streaming engine, text and image analysis, etc. SAP Data Hub can bring all your data together so you can work across them seamlessly. You can quickly develop your prototype on SAP Data Hub and the result can be easily turned to a production level system since SAP Data Hub takes care of execution, orchestration, scheduling, and monitoring. SAP Data Hub is developed on Kubernetes and therefore it is deployable on premise or in the cloud. It runs on a distributed execution engine and is designed for Big Data world by proving understanding on metadata in a Big Data landscape.

Also go through the official documentation of SAP Data Hub

DINE makes it easy to learn how to build pipelines in SAP Data Hub using its operators . It acts as reference for application developers and showcases the features of Data Hub in an easy to understand business scenario. This demo content comes complete with:

  • Sample data
  • Code snippets
  • Tutorials

Prerequisites

SAP Data Hub Setup - Follow the Installation Guide for SAP Data Hub and setup your SAP Data Hub environment.

You can also use SAP Data Hub Developer Edition or SAP Data Hub Trial Edition

Scenarios

Alt text

We will learn SAP Data Hub through the below scenarios which are based on dummy entity called as SAP Data Hub Market Place , an e-commerce platform which is developed for the purpose of demo and learning, where customers across the globe make thousands of purchases everyday.

The scenarios are detailed below:

  • Sentiment Analyser : This scenario is used to categorize products based on the reviews submitted by customers. This scenario is implemented is Python and uses VORA text analysis engine to find the 5 most popular products based on customer reviews. Follow the tutorial to implement this scenario.

  • Product Recommender : This scenario is used to recommend the products which are frequently bought together based on sales history. This scenario is implemented using Python Machine Learning Libraries. Follow the tutorial to implement this scenario.

Datasets

Our dataset for the above scenarios comprise of 6 files, which contain customers, products and sales information.

  • CUSTOMER table has details of customers , this table has ADDRESSID which is mapped to ADDRESS table where details of customers address are stored.

  • When a Customer buys a Product, Sales Order is generated (SO_HEADER) and each sales order has multiple order items (SO_ITEM).

  • SO_HEADER has PARTNERID , a foreign key which links to CUSTOMER table.

  • SO_ITEM has SALESORDERID, a foreign key which links to SO_HEADER.

  • Each SO_ITEM will have PRODUCTID which is mapped to PRODUCT table where details of products are stored.

  • Customer Reviews about the products are stored in REVIEW table.

  • So basically we have 6 tables.

It is sythetic dataset derived from SHINE and is enriched to suit our usecases

ER Diagram

Alt text

To access the datasets, explore the data folder in this repository.

Known issues

None

Support

Please use GitHub issues for any bugs to be reported.

License

Copyright (c) 2018 SAP SE or an SAP affiliate company. All rights reserved. This file is licensed under SAP Sample Code License Agreement, except as noted otherwise in the LICENSE file.