-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathposter-main.tex
197 lines (155 loc) · 12.3 KB
/
poster-main.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
%\documentclass[letter,6pt]{article}
\documentclass[A0,6pt]{article}
\usepackage[tagged]{accessibility}
\usepackage[margin=1cm,landscape]{geometry}
\usepackage{multicol}
\usepackage{tikz,lipsum,lmodern}
\usepackage[most]{tcolorbox}
\input{format/packages}
\input{format/macros}
\usepackage{pdfcomment}
\usepackage[explicit]{titlesec}
\usepackage[normalem]{ulem}
\setlength{\columnseprule}{0.4pt}
\date{}
\hypersetup{
pdfauthor={Gregor von Laszewski,
Henry Bobeck,
Anthony Orlowski,
Richard Otten},
pdfsubject={UROC},
pdftitle={Towards an AI Service Catalogue and Registry},
pdfkeywords={multi-cloud, hybrid cloud, AI services, REST, cloudmesh}
}
\begin{document}
\thispagestyle{empty}
\vspace{-3.5cm}
\begin{figure}[!h]
\begin{tcolorbox}[enhanced,
frame style,
colback=red!10,
colframe=red!75!black]
{\Large\bf Towards an AI Service Catalogue and Registry}
{\em Gregor von Laszewski$^{1,2}$,
Henry Bobeck$^{2,*}$,
Anthony Orlowski$^2$
$^1$University of Virginia,
$^2$Indiana University, $^*$IU UROC Students \hfill {\footnotesize Contact:: [email protected], \url{https://laszewski.github.io}}
}
\end{tcolorbox}
\vspace{-24pt}
\end{figure}
\begin{multicols}{3}
\makeatletter
\renewcommand\section{\@startsection{section}{1}{\z@}%
{-3.25ex\@plus -1ex \@minus -.2ex}%
{-1.5ex \@plus -.2ex}% Formerly 1.5ex \@plus .2ex
{\normalfont\normalsize\bfseries\underline}}
\makeatother
\footnotesize
\vspace{-24pt}
\section*{Introduction.}
Cloud providers offer a significant number of analytics services to its user community. Typically a provider advertises its own capabilities without regard of what other providers offer. In such a heterogeneous ecosystem it is difficult to quickly identify potential candidates for cloud services that can be used to bootstrap a sophisticated analytics workflow addressing a particular task. This also includes reuse of services in the same workflow by different providers.
Developers require two lookup mechanisms. First, they require a high level {\bf catalogue} description of services to identify if they could be of potential use. Second they require a {\bf registry} of services that they use as part of their development. In a previous work we have developed a framework called GAS \cite{las-2021-gas} that allows the creation of REST based analytics services easily from python code. We expand this framework here by allowing analytics services that have a REST interface to be published into a registry, but we also allow with GAS to create REST services and host them in the cloud.
Through this architecture users have the ability to
(1) find quickly potential analytics services (2) find the details of the services (3) register instances of the services in a registry so they can be reused pragmatically (4) enhance the catalogue and registry with custom designed analytics services (5) build customized catalogues and registries.
\section*{Architecture.}
We base our architecture on cloudmesh, an open-source hybrid multi-cloud toolkit. We integrated two new components that provides data scientists with the ability to catalogue and register REST services. Due to our previous effort we also allow the integration of analytics functions to create a REST service that can be provisioned and thus reused. This allows an easy pathway to integrate Services offered by various cloud providers, such as AWS, Azure, Google, and others (see Fig.~1).
\begin{center}
\pdftooltip{\includegraphics[width=1.0\columnwidth]{images/arch-registry.pdf}
}{This figure shows the architecture of the catalogue and registry services, as well as the possibility to create new analytics services with GAS.}
\end{center}
\vspace{-12pt}
{\em Fig 1. Layered architecture of the cloudmesh Open\-API framework.}
The reason why we need to distinguish Catalogue from Registry is based on the fact that we support with the catalogue findability without going in too much details, while the Registry provides the ability to register the details of interaction with the service, but most importantly the instantiation of such a service to encourage reusability. Furthermore our implementation of the catalogue entries is independent from a REST specification, the registry makes explicit use of the registry.
With this division we can first identify candidates from the plethora of services available to us and in a second step evaluate their integration details into a REST service enhanced analytics workflow. Such workflows can also be represented as high level REST services and thus become subject of reusability and deployment in larger analytics service ecosystems.
\section*{AI Service Catalogue.}
To make it easy for users to obtain a quick overview about available AI cloud services, we have limited the initial set of attributes to those that support findability. The attributes are listed in the next table. The catalogue is organized as a list of entries, where each entry contains a number of attributes. These attributes may be required or optional.
\resizebox{0.9\columnwidth}{!}{%
\begin{tabular}{p{3cm}p{8cm}p{1cm}}
Name & Description & Required \\
\hline
ID & UUID, globally unique & \OK \\
Name & Name of the service & \OK \\
Title & Human readable title & \OK \\
Public & True if Public
(needs use case to delineate what pub private means) & \OK \\
Description & Human readable short Description of the Service & \OK \\
Version & The version number or tag of the service & \OK \\
License & The license description & \OK \\
Microservice & \OK/No/Mixed & \OK \\
Protocol & REST & \OK \\
Owner & Name of the distributing entity, organization or individual. It could be a vendor. & \OK \\
Modified & Modification Timestamp (when first same as created) \OK \\
Created & Date on which the entry was first created & \OK \\
Documentation & Link to a URL with detailed description of the service & O \\
Source & Link to the source code if available & O \\
Tag(s) & Human readable common tags that are used to identify the service that are associated with the service & O \\
Category(s) & A category that this service belongs to (NLP, Finance, ...) & O &
Specification/ Schema & Pointer to where schema is located & O \\
Additional metadata & Pointer to where additional is located including the one here. & O \\
Endpoint & The endpoint of the service & O \\
SLA/Cost & & O \\
Authors & contact details of the people or organization responsible for the service (freeform string) & O \\
Data & Description on how data is managed & O \\
\hline
\multicolumn{3}{l}{\OK = Required; O = Optional}
\end{tabular}
}
\section*{AI Service Registry} The goal of an analytics service registry is to establish registries to locate and consume analytics services
with persistent identifiers across organizations. It can serve as a public, private, or federated registry. In case of a federated registry, more than one registry can be joined, to provide the user the impression of a single registry.
Within the analytics services we distinguish two classes. The first class are instantiated (running) services that are offered by a service provider and allow direct reuse. The second class are library providers that distribute analytics activity not as an instantiated service, but as a source code library which can be deployed as a service. The list of attributes are presented in the next table.
\section*{Implementation.}
The implementation of the catalogue and registry are available in GitHub as an open source project. It is building a basis for discussions taking place in the
NIST big data working group \cite{github-cloudmesh-catalogue}.
\resizebox{0.9\columnwidth}{!}{%
\begin{tabular}{p{3cm}p{11cm}p{0.5cm}p{0.5cm}}
Name & Description & \rot{\shortstack{Service\\ provider}} & \rot{\shortstack{Library\\ provider}} \\
\hline
ID & UUID, globally unique & \OK & \OK \\
Name & Name of the service & \OK & \OK \\
Title & Human readable title & \OK & \OK \\
Public & True if Public
(needs use case to delineate what pub private means) & \OK & \OK \\
Description & Human readable short Description of the Service & \OK & \OK \\
Endpoint & The endpoint of the service & \OK & N/A \\
List of Input Parameters &
A list of parameters to the service. The parameters have each the form of name, function, type, value, access. The type indicates the data type. The access indicates if the parameter is a data stream, database, single value/function, event.
The function responds to a different function in case multiple are provided by the service. & \OK & \OK \\
List of Output Parameters
style (event, stream, data)
value
timestamp &
List of responses cast by the service. The responses have the form of function, name, type, value, access, timestamp. The type indicates the data type. The access indicates if the parameter is a data stream, database, single value/function, event.
The function responds to a different function in case multiple are provided by the service. & \OK & \OK \\
Version & The version number or tag of the service & \OK & \OK \\
License & The license description & \OK & \OK \\
Protocol & REST & \OK & \OK \\
Modified & Modification Timestamp & \OK& \OK \\
Owner & Name of the distributing entity, organization or individual. It could be a vendor. & \OK & O \\
Author & Contact details of the people or organization responsible for the service & O & \OK \\
Tags & Human readable common tags that are used to identify the service that are associated with the service & O & O \\
Categories & A category that this service belongs to (NLP, Finance, ...) & O & O \\
Created & date and time on which the analytics service was instantiated or created instantiated & \OK & \OK \\
Heartbeat & State and timestamp of the last check when the service was active & O & N/A \\
Documentation & Link to a URL with detailed description of the service
Source Link to the source code if available & O & O \\
Specification & Pointer to where specification schema is located & O & O \\
AdditionalMetadata & Pointer to where additional is located including the one here. & O & O \\
SLA & Serves level agreement including cost & O & O \\
CachingInterval &If a service is accessed a lot, the caching interval can be used to put a limitation on the Response with an LRU cache & O & N/A \\
DataIntegration & In case of big data the data cannot be provided as a parameter to the analysis function. Instead, we need to provide the data as endpoint. However, often that may need to be uploaded or can be downloaded. In this case this field provides the upload and download endpoints and the protocol to access the data & O & O \\
\hline
\multicolumn{3}{l}{\OK = Required; O = Optional}
\end{tabular}
}
\section*{Conclusion.}
A service catalogue and registry can publicize and improve end user access to data from different sources by overcoming some of the challenges inherent in describing and distributing document content and format. Publication and discovery of information resources that are enriched with metadata enable the findability and reusability of analytics services. By describing the interfaces and allowing for the instantiation or the reuse of already instantiated services we address accessibility and interoperability requirements. With respect to analytics as a service, end users should be able to find various analytic services and similar services without having to individually search multiple ‘locations’ or databases, each built to operate on its own, with unique storage and retrieval constructs. Through these descriptions, automated service integration can be provisioned while targeting not only the functionality involved, but also allowing service level considerations to be addressed. Furthermore, such analytics services could provide significant security implications such as the protection of a database while only exposing a subset of approved analytics functions that are approved for execution on the data sets. This includes partial and controlled sharing of data mashups that can be made available to the community and registered to make reuse easier without requiring service replication.
{\tiny
\section*{\footnotesize Acknowledgment.}
We would like to thank Lamara DeChelle Warren for initiating the contact of the UROC students so they can be part of this effort.
\bibliographystyle{IEEEtran}
\bibliography{references}
}
\end{multicols}
\end{document}