Skip to content

Commit

Permalink
improved edu section
Browse files Browse the repository at this point in the history
  • Loading branch information
laszewsk committed Feb 15, 2023
1 parent a57c98b commit 759ddeb
Show file tree
Hide file tree
Showing 8 changed files with 361 additions and 124 deletions.
Binary file modified images/nist-analytics-processes.pptx
Binary file not shown.
61 changes: 61 additions & 0 deletions section-dev.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
\section{Insights into Development from the Earthquake Code}

The original code was developed by a single researcher with the goal to create a DL method called tvelop to apply spacial timeseries evolution for multiplw applications including eartquake, hydrology and COVID prediction. The code was developed in a large Python Jupyter notebook on Google Collab. The total number of lines of code was \TODO{line number}. The code included all definitions of variables and hyperparemetes in the code itself.


difficult to maintain and understand for others

easy to develop by author, many experimental aspects

all varables defined in code not config file

lots of graphic outputs for interactive development

How many lines of code??


no use of libraries
limited use of functions
if conditions for different science applications


large code is too dificult to maintain in colab

papermill

mlcommons focus on one science application at a time

students can not comprehend code

rewritten code to just focus on earth quake

rewritten code to add selected hyperparameters into a configuration file


setup

for

training

valiadation

comparing output

not much use of libraries

choices

development of multiple runs based on variation of additional time based internal hyperparameters,
--> long runtime, no changes to evaluation section in code

take these parameters out and place them in a configuration fil
-> multiple runs needed and caomparision has to be separated fromprg, ;lots of changes to the program, program will run shorter,


libraries for mlcommons benchmarking, cloudmesh
portable way to define data locations via config
experiment permutation over hyperparameters.
* repeated experiements
* separate evaluation and comparision of accuray which was not in original code.
* comparision of accuracy across different hyperparameter searches.
33 changes: 16 additions & 17 deletions section-earthquake.tex
Original file line number Diff line number Diff line change
Expand Up @@ -51,36 +51,35 @@ \subsection{Earthquake Data}
benchmark \cite{las-22-mlcommons-science}.


\begin{small}
\begin{table}
\caption{Summary of the {\tt tevelop} Benchmark}\label{tab:eq-summary}
%\resizebox{1.0\textwidth}{!}{
\caption{Summary of the Earthquake {\em tevelop} Benchmark}\label{tab:eq-summary}
% \resizebox{1.0\textwidth}{!}{
\begin{center}
\begin{tabular}{p{0.4\columnwidth}p{0.6\columnwidth}}
{\footnotesize
\begin{tabular}{p{0.2\columnwidth}p{0.2\columnwidth}p{0.45\columnwidth}}
\hline
{\bf Area} & \multicolumn{2}{l}{Earthquake Forecasting~\cite{fox2022-jm,TFT-21,eq-code,eq-data}.}\\
\hline
{\bf Description} & Earthquake Forecasting~\cite{fox2022-jm,TFT-21,eq-code,eq-data}.\\
{\bf Objectives} & \multicolumn{2}{l}{Improve the quality of Earthquake
forecasting in a region of Southern California.}\\
\hline
{\bf Objectives} & Improve the quality of Earthquake
forecasting in a region of Southern California.\\
{\bf Metrics} & \multicolumn{2}{l}{Normalized Nash-Sutcliffe model efficiency coefficient (NNSE)with $0.8\leq NNSE\leq 0.99$}\\
\hline
{\bf Metrics} & Normalized Nash-Sutcliffe model efficiency coefficient (NNSE)with $0.8\leq NNSE\leq 0.99$\\
{\bf Data} & Type: & Richter Measurements with spatial and temporal information (Events). \\
& Input: & Earthquakes since 1950.\\
& Size: & 11.3GB (Uncompressed), 21.3MB (Compressed)\\
& Training samples: & 2,400 spatial bins\\
& Validation samples: & 100 spatial bins\\
& Source: & USGS Servers~\cite{eq-data}\\
\hline
{\bf Data} & Type: Richter Measurements with spatial and temporal information (Events). \\
& Input: Earthquakes since 1950.\\
& Size: 11.3GB (Uncompressed), 21.3MB (Compressed)\\
& Training samples: 2,400 spatial bins\\
& Validation samples: 100 spatial bins\\
& Source: USGS Servers~\cite{eq-data}\\
\hline
{\bf Reference Implementation} & \cite{eq-code}\\
{\bf Reference Implementation} & \cite{eq-code} & \\
% \hline
\hline
\end{tabular}
}
\end{center}
%}
\end{table}
\end{small}


\subsection{Implementation}
Expand Down
169 changes: 169 additions & 0 deletions section-edu-ml.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
\section{Insights of Machine Learning in Education}

Before we start with the insights from MLCommons we like to first
review some of our experience regarding topics taught in educational
activities in machine learning in general. We distinguish machine
learning {\em methods}, {\em applications} that use or can use machine
learning, the {\em libraries} that are used to apply these methods for
applications, software development {\em tools}, and the {\em
infrastructure} that is needed to execute them.

\subsection{Methods}

We list a number of keywords that relate to typical methods in machine
learning (ML) and artificial intelligence (AI) that may be taught in
classes. This includes Clustering (exemplified via k-means), image
classification, sentiment analysis, time series prediction, surrogates
(a new topic that is often not yet taught), and neural networks (with
its modifications such as CNN, KNN, ANN, SVM). In addition, general
topics of interest are supervised learning and unsupervised learning.
More traditional methods include random forests, decision trees, and
genetic algorithms.

From this small list, we can already see that this can not be taught
in a one-semester course in sufficient depth, but needs to span the
duration of a student's curiculumn.

\subsection{Libraries}

Many libraries and tools exist that support AI. We list here a subset
of frequently used software libraries and tools that enable the
machine learning engineer and student to write ML applications.

First, we note that at the university level, the predominant language
in machine learning and data science has become Python. This is
explainable due to the availability of sophisticated libraries thus as
scikitlearn, PyTorch, and TensorFlow.

Most recently we see a trend that PyTorch has become more popular on
the university level then TensorFlow. Although the learning curve of
these tools is significant, they provide invaluable opportunities
while applying them to many different applications.

In contrast, other specialized classes that focus on the development
of faster GPU based methods use typically C++ code leveraging the
vendor's specialized libraries to interface with the GPUs such as
NVIDIA CUDA.

\subsection{Tools}\label{sec:tools}

In order to efficiently use the methods and libraries, and also the
infrastructure which we discuss later students need a basic
understanding of software engineering tools. This includes an editor
and code management system.

The common choice for managing codes is git, while GitHub is
used. Alternatively one finds also the use of GitLab. These code
management systems are key to implementing teams that share the
developed code and allow collaborative code management. However, they
require a significant learning curve.

Another important aspect is the use of a capable editor in support of
python syntax with code highlighting and code inspection. The use of
tools such as notepad++, IDLE, or other simplistic editors is
insufficient. Instead, students ought to use tools such as {\em
pycharm} and as alternative choice {\em vscode}. These editors
provide sophisticated features to improve code quality and also
integrate with git. One of the strengths of pyCharm is that it has a
sophisticated code inspector and auto-completion, making writing
reliable code faster. Vscode may be known to some students, but its
default features seem not to match that of pyCharm.

An additional tool is jupyter, with its successor jupyter-lab. It
provides a web browser interface to interactive python notebooks
(ipynb). The strength here is a rich external ecosystem that allows to
run [rograms in interactive fashion while integrating graphics
components and data frames to conduct data analysis. The biggest
disadvantage we saw in using notebooks is that the code developed by
the students is not following proper software engineering practices
such as defining and using functions, classes, and self defined
installable python libraries that make code management sustainable and
easier. Often we found that code developed as jupyter notebook is also
poorly documented although the integration of markdown as a document
feature is built in. This relates to a general problem at the
university level. While the material taught in ML fills more of a
semester, students often come ill-prepared for ML classes as typical
classes to teach python do only deal with language aspects but not
with a sustainable {\em practical} software engineering approach for
example in python.

\subsection{Infrastructure}

An additional aspect ML students are exposed to is the need for access
to computational resources due to the computational needs posed by the
ML implementations for applications. One common way of dealing with
this is to use Google Collab which is easy to get access to, and use,
in a limited fashion for free (larger computational needs will need a
paid subscription). However, as Collab is based on jupyter we
experience the same disadvantages as discussed in
Section~/ref{sec:tools}

Other resources for machine learning can be found in the cloud. This
may include IaaS and PaaS offerings from Amazon, Azure, Google Cloud,
Salesforce, and others. In addition to the computational needs for
executing neural networks and deep learning algorithms, we find also
services that can be accessed mainly through REST APIs to be
integrated into the application research. Most popular for such tools
are focussing on natural language processing such as translation and
more recently of text analysis and responses through ChatGPT and Bart.

However, many academic institutions have access to campus-level and
national-level computing resources in HPC centers. This includes
resources from DOE and NSF. Such computing resources are accessed
mostly through traditional batch queues (such as SLURM). To allow
sharing of the limited resources with the large user community. For
this reason, centers often implement a scheduling policy putting
significant restrictions on the computational resources that can be
used at the same time, and/or for a particular period. The number of
files and the access to a local disk on compute nodes constituting the
HPC resource may also be limited. This provides a potential very high
entry barrier as these policy restrictions may not be integrated into
the application design from the start. Moreover, in some cases, these
restrictions may provide a significant performance penalty when data
is placed in a slow NFS file system instead of directly in memory
(often the data does not fit in memory) or in NVMe storage if it
exists on the computing nodes. It is also important to understand
that such nodes may also be shared with other users and it is
important to provision the requirements in regards to computation
time, memory footprint, and file storage requirements accurately so
that scheduling can be performed most expediently. Furthermore, the
software on these systems is managed by the computing staff and it is
best to develop with the version provided, which may already be
outdated versions. Container technologies limit this issue, by
allowing for centers that support this to develop their own software
stack in containers. The most popular for this is singularity, but
some centers offer also docker. As images developed can be rather
large it is not sufficient to just copy the image from your local
computer, but the center must allow the ability to create the image
within the HPC infrastructure. This is especially true when the
University requires all resources to be accessed through a VPN. Here
you can often see a factor of 10 or more slowdown in transfer and
access speeds.

All this has to be learned and will take up a considerable
time. Hence, using HPC resources has to be introduced with specialized
educational efforts often provided by the HPC center. However,
sometimes these courses are not targeted specifically to running a
particular version of PyTorch or TensorFlow with cuDNN, but just the
general aspect of accessing the queues.

Furthermore, specifically customized queues demanding allocations,
partitions and resource requirements may not be published and a burden
is placed on the faculty member to integrate this accurately into the
course curriculum.

Access to national-scale infrastructure is often restricted to
research projects that require following a detailed application
process. This process is done by the faculty supervisor and not the
student. Background checks and review of the project may delay the
application. Additional security requirements such as the use of DUO,
and multifactor authentication has to be carefully taught.

In case the benchmark includes environmental monitoring such as
temperatures on the CPU/GPU and power consumption, access may be
enabled through default libraries and can be generalized while
monitoring the environmental controls over time. However, HPC centers
may not allow access to the overall power consumption of entire
compute racks as it is often very tightly controlled and only
accessible for the HPC operational support staff.

12 changes: 6 additions & 6 deletions section-edu.tex → section-edu-mlcommons.tex
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
\section{Educational Insights}
\label{S:edu-insights}

\section{Insights of MLCommons in Education}
\label{sec:edu-insights}

The MLCommons benchmarks provide a valuable starting point for
educational material addressing various aspects of the machine and
Expand Down Expand Up @@ -70,7 +69,7 @@ \section{Educational Insights}
software related to big data systems. This includes setting up
Python beyond the use of conda and Colab notebooks, the use of
queueing systems, containers, and cloud computing software for
A, DL, and HPC experiments as well as other advanced aspects of
AI, DL, and HPC experiments as well as other advanced aspects of
software engeneering.

\item {\bf Execution ecosystem.} While in-class problems typically
Expand Down Expand Up @@ -99,7 +98,7 @@ \section{Educational Insights}
DL algorithms that require a large number of fast IO interactions.

\item {\bf Data Analysis.} The examples provide valuable input to
further enhance abilities to conduct non-trivial data analysis
further, enhance abilities to conduct non-trivial data analysis
through advanced Python scripts while integrating them in
coordinated runs to analyze log files that are created to
validate the numerical stability of the benchmarks. This obviously
Expand All @@ -117,9 +116,10 @@ \section{Educational Insights}

\item {\bf Benefits to Society.} The MLCommons benchmarks are
including opportunities to improve the quality of ML algorithms
that can be applicable to societal tasks. Obviously, improving
that can be applied to societal tasks. Obviously, improving
benchmarks such as earthquake forecasting are beneficial to the
society and can motivate students to participate in such
educational opportunities.

\end{itemize}

38 changes: 24 additions & 14 deletions section-mlcommons.tex
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@

\section{MLCommons}

MLCommons has the goal to accelerate machine learning innovation to benefit everyone with the help of members from Industry, Academia and Government \cite{www-mlcommons}. This includes, but is not limited to, application
areas from healthcare, automotive, image analysis, and natural
language processing. MLCommons is concerned with benchmarking training \cite{mlperf-training}
and validation algorithms to measure progress over time.
Through this, MLCommons investigates ML efforts in the areas of
MLCommons is a non-profit organization, that has the goal to
accelerate machine learning innovation to benefit everyone with the
help of more then 70 members from industry, academia and government
\cite{www-mlcommons}. Its main focus is to developing standardized
benchmarks for measuring the performance systems using machine
learning while applying tem to various applications. This includes,
but is not limited to, application areas from healthcare, automotive,
image analysis, and natural language processing. MLCommons is
concerned with benchmarking training \cite{mlperf-training} and
validation algorithms to measure progress over time. Through this,
MLCommons investigates machine learning efforts in the areas of
benchmarking, datasets in support of benchmarking, and best practices
that leverage machine learning.

Expand All @@ -18,19 +24,13 @@ \section{MLCommons}
Medical, Science, Storage. The science working group is concerned
with improving the science just a static benchmark \cite{las-22-mlcommons-science}.

Due to the strong affiliation with industry as well as the integration
of National Labs and Academic High-Performance Computing centers
MLCommons provides a well-positioned starting point for academic
participation. Over the last years, we have participated significantly
in its effort and integrated efforts from MLCommons into our
educational activities. We have, based on these activities, obtained a number of important educational insights that we discuss in
Section~\ref{S:edu-insights}.
A list of selected benchmarks for the workinggroups focussing on
inference, training and science is show in Table~\ref{tab:mlcommons-benchmarks}.

\subsection{MLCommons Selected Benchmarks}

\begin{table}[htb]
\caption{MLCommons Benchmarks}
\label{tab:inference}
\label{tab:mlcommons-benchmarks}
\bigskip

\resizebox{\linewidth}{!}{
Expand Down Expand Up @@ -60,3 +60,13 @@ \subsection{MLCommons Selected Benchmarks}
}

\end{table}

Due to the strong affiliation with industry as well as the integration
of National Labs and Academic High-Performance Computing centers
MLCommons provides a well-positioned starting point for academic
participation. Over the last years, we have participated significantly
in its effort and integrated efforts from MLCommons into our
educational activities. Since its inception, we levereged the
MLCommons activities and obtained a
number of important educational insights that we discuss in this paper.

Loading

0 comments on commit 759ddeb

Please sign in to comment.