+Except where otherwise noted, content is published under a [CC-BY-4.0 international license](https://creativecommons.org/licenses/by/4.0/).
-## More Jekyll!
@@ -0,0 +1,59 @@
+layout: post
+title: "Can I teach my computer to do some research for me?"
+date: 2018-06-25 11:52:15 +0200
+author: Susan Branchett
+image: 2018-06-25-neural-network.png
+**This is the question that brought 80 scientific researchers together for a hands-on Deep Learning Workshop.**
+Deep learning is a type of machine learning that trains a computer to perform human-like tasks, such as making predictions, recognising speech and images. With the buzz around artificial intelligence, there’s increasing interest in this subject and what it can offer researchers.
+# The Workshop
+The first part of the workshop covered the basic theory of Deep Learning. TU Delft’s [Dr. Jan van Gemert](https://www.tudelft.nl/en/eemcs/current/humans-of-eemcs/jan-van-gemert/) explained the differences between Deep Learning and other types of Machine Learning. He introduced the concept of a ‘feed-forward’ network and worked through the algebra to train a network to produce the correct result for a simple problem (an exclusive-or gate). He then introduced the convolutional network and explained how it can be used to solve image recognition problems, the topic of his own research. Jan has an ‘active learning’ style and the participants took full advantage of the chance to ask questions.
+For the second part of the workshop [Osman Kayhan](https://www.tudelft.nl/en/eemcs/the-faculty/departments/intelligent-systems/pattern-recognition-bioinformatics/pattern-recognition-bioinformatics-computer-vision-lab/people/osman-semih-kayhan/) and [Paul van Gent](https://www.tudelft.nl/en/ceg/about-faculty/departments/transport-planning/staff/personal-pages/gent-p-van/) prepared hands-on examples for the participants to experiment with their own feed-forward networks and to give them practice with convolutional networks. The participants had come well prepared. They all had at least a basic knowledge of Python, NumPy and Jupyter Notebooks. They had also pre-installed the required software (Python, Jupyter Notebooks, TensorFlow and Keras, using Anaconda). This meant that Osman and Paul could spend more of their time helping the participants to understand the Deep Learning concepts.
+![Participants]({{ "/assets/img/2018-06-25-participants.jpg" | absolute_url }})
+# So how did this workshop come about?
+The Innovation Department of TU Delft central ICT Department was set up to support our academic staff in their primary processes of scientific research and education. The workshop was an activity the department organised as an experiment, to see if researchers would appreciate support with Deep Learning.
+The answer was a definite: YES! The workshop had a maximum capacity of 80 participants and registration had to be closed within 24 hours of the initial announcement.
+![Full]({{ "/assets/img/2018-06-25-full.jpg" | absolute_url }})
+When asking participants for ideas on how the department could support them in the future, their ideas included: programming language courses (Python, C++, Arduino, etc.), coding helpdesk, machine learning course, more in depth deep learning course, more targeted seminars per faculty, course on time series analysis, GPU computing, cloud computing, build a GPU cloud, ...
+Plenty to think about for the future.
+# What’s next?
+Jan, Osman and Paul very generously donated their time for this workshop. If you have ideas about how we could carry out further workshops of this nature, for example by teaming up with the Graduate School or the Delft Data Science institute, then feel free to send them to me. See 'About the Author' below.
+If you’re interested in the subject of Deep Learning, you might like the free online book: . Jan used chapters 6, 6.1, 4.3, 5.9 for this workshop.
+If you’re in Delft, you could also consider signing up for the Jan’s Masters course:
+If you’d like to enrol for upcoming workshops, please consult the ‘Upcoming events’ section of our website:
+# About the Author
+Susan Branchett is Expert Research Data Innovation in the ICT-Innovation department of the TU Delft. She has a Ph.D. in physics and many years’ experience in software development and IT.
+Find her at
+[TU Delft](https://www.tudelft.nl/staff/s.e.branchett/) or
+[LinkedIn](https://linkedin.com/in/sebranchett) or
+This blog expresses the views of the author.
+# Acknowledgements
+Thank you to colleague [Julie Beardsell](https://www.tudelft.nl/staff/j.a.beardsell/) for editing this blog and adding the introductory paragraph.
+The neural network image is from: and is used under the [Creative Commons Attribution 2.5 international license](https://creativecommons.org/licenses/by/2.5/).
+The pie-chart was created using Google forms.
+The full sign image is from: and is used under the [Creative Commons Public Domain license Mark 1.0](https://creativecommons.org/publicdomain/mark/1.0/).
+Except where otherwise noted this blog is available under a [Creative Commons Attribution 4.0 international license](https://creativecommons.org/licenses/by/4.0/).
+layout: post
+title: "Using Jupyter to study Earth"
+date: 2018-12-06 11:52:15 +0200
+author: Susan Branchett
+image: 2018-12-10-thelatestfrom-jupiter.jpg
+**How TU Delft’s ICT-Innovation department is providing hands-on help to researchers in order to understand their IT requirements better.**
+# How did it come about?
+Earlier this year I was reading through the [‘TU Delft Strategic Framework 2018-2024’](https://d1rkab7tlqy5f1.cloudfront.net/TUDelft/Over_TU_Delft/Strategie/Towards%20a%20new%20strategy/TU%20Delft%20Strategic%20Framework%202018-2024%20%28EN%29.pdf) and buried deep within its pages I found this hidden gem:
+> We strengthen the social cohesion and interaction within the organisation, by:
+> * Supporting mobility across the campus. For example through interfaculty micro-sabbaticals.
+> * Stimulating joint activities and knowledge exchange across the various faculties and service departments.
+> * Strengthening relations between academic staff members and support staff.
+![Hidden Gem]({{ "/assets/img/2018-12-10-secret-diamond-wedding-band.jpg" | absolute_url }})
+This seemed especially relevant to our ICT-Innovation department. We are continually on the look-out for ways to support the primary processes of the university, research and education, by applying IT solutions. I decided to find myself a suitable micro-sabbatical.
+Since October 2018 I’ve been spending one day a week in the group of [Prof.dr.ir. Nick van de Giesen](https://www.tudelft.nl/en/staff/n.c.vandegiesen/) and [Dr.ir. Rolf Hut](https://www.tudelft.nl/en/staff/r.w.hut/), working with their bright, new Ph.D. student, Jerom Aerts, on the eWaterCycle II project.
+# What’s it about?
+eWaterCylce II aims to understand water movement on a global scale in order to predict floods, droughts and the effect of land use on water. You can read more about it here or here .
+![Sacramento River Delta]({{ "/assets/img/2018-12-10-Islands_Sacramento_River_Delta_California.jpg" | absolute_url }})
+Hydrologists are encouraged to use their own local models within a global hydrological model.
+In order to test whether their model is working properly, the project team is developing a Python [Jupyter notebook](https://jupyter.org/) that makes it easy for hydrologists to produce the graphs and statistics that they are familiar with.
+During my micro-sabbatical, I am contributing to the development of this [Jupyter notebook](https://github.com/eWaterCycle/hydro-analyses/blob/master/eosc_pilot/forecast_ensemble_analyses.ipynb).
+# What did I learn?
+* Wi-Fi is an essential service for researchers and needs to be reliable
+* Standard TU Delft laptops are not adequate for research
+* Data for this project is hosted in Poland due to the collaboration with many partners and funding from [EOSC](https://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud)
+* The team initially hosted their forecasting site on AWS, because AWS is quick to set up and it works in all the countries involved. For the minimum viable product of the global hydrology model they moved to the [SURFsara HPC Cloud](https://userinfo.surfsara.nl/systems/hpc-cloud)
+* If data is not open, then researchers are hesitant to use it. Their work can’t be reproduced easily, leading to fewer quality checks and less publicity
+* In the face of bureaucracy, cramped conditions and an ever growing number of extra required activities, our researchers’ determination and passion for their field of expertise is truly magnificent
+![TU Delft light bulb]({{ "/assets/img/2018-12-10-TU-Delft-light-bulb.jpg" | absolute_url }})
+I shall be using these insights to guide my work within the ICT-Innovation department and to feed our conversations with the Shared Service Center.
+# What next?
+From 1st April 2019 I’ll be moving on to my next micro-sabbatical at the Chemical Engineering department of the Applied Sciences faculty. There I shall be installing molecular simulation software on a computer cluster and getting it up and running.
+My ambition is to cover all 8 faculties of the TU Delft within 4 years. In October 2019 I shall be available for the next micro-sabbatical. If you have any suggestions, please do not hesitate to get in touch.
+# About the Author
+Susan Branchett is Expert Research Data Innovation in the ICT-Innovation department of the TU Delft. She has a Ph.D. in physics and many years’ experience in software development and IT.
+Find her at
+[TU Delft](https://www.tudelft.nl/staff/s.e.branchett/) or
+[LinkedIn](https://linkedin.com/in/sebranchett) or
+[Twitter](https://twitter.com/sebranchett) or
+This blog expresses the views of the author.
+# Acknowledgements
+The image of Jupiter is from [here](https://www.jpl.nasa.gov/spaceimages/details.php?id=pia21974). Image credit: NASA/JPL-Caltech/SwRI/MSSS/Kevin M. Gill. [License](https://www.jpl.nasa.gov/imagepolicy/).
+The hidden gem image is from [here](https://www.macintyres.co.uk/diamond-fancy-wedding-rings-/8075-18ct-yellow-gold-secret-diamond-wedding-band.html) and is reproduced by kind permission of Macintyres.
+The Sacramento River Delta image is from [here](https://commons.wikimedia.org/wiki/File:Islands,_Sacramento_River_Delta,_California.jpg) and is reproduced under a [CC-BY-2.0](https://creativecommons.org/licenses/by/2.0/) license.
+Except where otherwise noted this blog is available under a [CC-BY-4.0 international license](https://creativecommons.org/licenses/by/4.0/).
+layout: post
+title: "Results of the TU Delft Research Data Storage survey"
+date: 2019-05-21 00:00:00 +0000
+author: Susan Branchett
+image: 2019-05-21_survey_header.png
+**Survey results presented at the Second Data Champions meeting.**
+On the 21st May 2019, the TU Delft Data Champions held their second meeting. I was invited to present the results of the Research Data Storage survey, held earlier in the year.
+You can read all about the Data Champions meeting, and find my presentation, in this [blog post](https://openworking.wordpress.com/2019/07/03/the-second-tu-delft-data-champions-meeting/)
+by Esther Plomp, Maria Cruz, Marta Teperek, Santosh Ilamparuthi and Yasemin Turkyilmaz-van der Velden.
+The results are also publically available on [GitHub](https://github.com/sebranchett/2019-Survey-Data-Storage-Requirements).
+# About the Author
+Susan Branchett is Expert Research Data Innovation in the ICT-Innovation department of the TU Delft. She has a Ph.D. in physics and many years’ experience in software development and IT.
+Find her at
+[TU Delft](https://www.tudelft.nl/staff/s.e.branchett/) or
+[LinkedIn](https://linkedin.com/in/sebranchett) or
+[Twitter](https://twitter.com/sebranchett) or
+This blog is made available under a [CC-BY-4.0 international license](https://creativecommons.org/licenses/by/4.0/).
+layout: post
+title: "Helping Researchers do IT"
+date: 2019-08-29 00:00:00 +0100
+author: Susan Branchett
+image: 2019-08-29-cc-sb-illustration.png
+Want to know more about the work of the TU Delft ICT-Innovation team? Read this blog about how [Susan Branchett](https://www.tudelft.nl/staff/s.e.branchett/) is helping researches do IT, written by [Connie Clare](https://www.linkedin.com/in/connie-clare/):
+Connie was a PhD student at the University of Nottingham who spent 3-months in the summer of 2019 on an internship at TU Delft, working on the Data Champions Programme. Read more of her blogs here:
+Illustration by Connie Clare.
+layout: post
+title: "How can the TU Delft support your research computing needs?"
+date: 2020-03-19 00:00:00 +0200
+author: Susan Branchett
+image: 2020-03-19_compute_header.jpg
+**Laptop can’t keep up with your research needs? Try these resources…**
+## Why this blog?
+Sometimes you just can’t solve your research computing problems with a laptop alone. Maybe you’re trying to capture data from an experiment that’s running over several days. Maybe your computer simulation needs more computer memory than you can fit into your laptop. Or maybe several members of your group need to use the same software installation. Perhaps you need to host a website for your research project.
+Whatever the reason, this blog post lists resources you may find useful, and people that may be able to help you further.
+To access some of the links, you may have to log in with your TU Delft NetID.
+## Human help
+![Header]({{ "/assets/img/2020-03-19_compute_alpaca.jpg" | absolute_url }})
+*Do you have the kind of prickly problem we like to sink our teeth into?*
+Each faculty has its own Faculty IT Manager (FIM). You can find your FIM [here](https://intranet.tudelft.nl/en/-/faculty-it-manager) (log into intranet.tudelft.nl first and then click on the 'here' link). If you need to discuss your IT requirements, your FIM is your first point of contact.
+Is your problem more data related? Each faculty has a Data Steward. You can find your Data Steward [here](https://www.tudelft.nl/en/library/current-topics/research-data-management/r/support/data-stewardship/contact/).
+Perhaps one of your fellow researchers has already solved your type of problem. Maybe it’s worth contacting a [Data Champion close to home](https://www.tudelft.nl/en/library/current-topics/research-data-management/r/data-stewardship/data-champions/our-data-champions/).
+Still no joy? Please feel free to contact me or one of my colleagues at ICT-Innovation. Chances are that you have just the kind of problem we like to sink our teeth into. You can find us [here](https://www.tudelft.nl/ict-innovation/about-innovation/).
+## Computer resources
+![Header]({{ "/assets/img/2020-03-19_compute_server.jpg" | absolute_url }})
+*Is the biggest threat to your research at this moment, the cleaner who needs to plug in the vacuum cleaner?*
+# Nonstandard laptops, desktops and specials
+Please discuss this with your FIM first and then order [here](https://intranet.tudelft.nl/-/ordering-hardware-laptop-desktop-and-paraphernalia-) (log into intranet.tudelft.nl first and then click on the 'here' link).
+# Research group servers and clusters
+Some groups and departments at the TU Delft have their own locally maintained computer clusters. Ask your colleagues or your FIM. There are also a number centrally maintained computer clusters. You can find out more about these clusters, and apply for access [here](https://hpcwiki.tudelft.nl/index.php/Introduction).
+# Virtual Machines (VM) / Hosted Servers
+If you are looking for a maintained, ‘always on’ computer with an operating system, network access, backup and not much else, then a [hosted server](https://intranet.tudelft.nl/en/-/hosting-servers) could be what you are looking for (log into intranet.tudelft.nl first and then click on the 'hosted server' link).
+Personal anecdote: I find this particularly useful. I have a standard Windows laptop and I work on a number of different projects at the same time for which linux is more appropriate. I apply for a hosted server for the duration of the project and within a few days I have what I need, with thanks to my friendly FIM.
+## DHPC – Delft High Performance Computing
+Hopefully coming at the beginning of 2021!
+## SURF National Computer Facilities
+If you need more computational power, then you might consider the national computer facilities provided by SURF, [here]( https://www.surf.nl/en/which-compute-service-for-which-research-question).
+- [Cartesius – large scale parallel](https://userinfo.surfsara.nl/systems/cartesius)
+- [Lisa – ‘friendly’ processing power](https://userinfo.surfsara.nl/systems/lisa)
+- [HPC cloud – for individual or group](https://userinfo.surfsara.nl/systems/hpc-cloud)
+- [Grid – extremely large datasets](https://userinfo.surfsara.nl/systems/grid)
+For ‘large’ requests, [applications go through NWO](https://www.nwo.nl/en/funding/our-funding-instruments/enw/computing-time-on-national-computer-facilities/access-to-the-national-computer-facilities-for-regular-projects.html).
+For ‘small’ requests, [applications go through SURF](https://www.surf.nl/en/applying-for-access-to-compute-services)
+Alternatively, your FIM can arrange access to Cartesius and Lisa on a pay per use basis: see [here](https://userinfo.surfsara.nl/systems/shared/rccs).
+## Cloud services
+Of course, if you are not working with (privacy) sensitive data, it’s always possible to use commercial providers such as Microsoft Azure, Amazon AWS Google GCS, etc. Please discuss this with your FIM first.
+If you have privacy or security concerns please use the TU Delft [Cloud Advice Service](https://intranet.tudelft.nl/en/-/cloud-advice-service-1) (log into intranet.tudelft.nl first and then click on the 'Cloud Advice Service' link).
+## Website Hosting
+If you need to host a website, you can apply for a LAMP environment [here](https://intranet.tudelft.nl/en/-/application-for-lamp-website) (log into intranet.tudelft.nl first and then click on the 'here' link).
+Apparently there’s a new improved LAMP environment coming soon!
+You could also consider using Github pages for Github project websites. See [here](https://pages.github.com/) and [here](https://help.github.com/en/articles/using-jekyll-as-a-static-site-generator-with-github-pages).
+If you would like a TU Delft domain name for your website, you can apply for one [here](https://tudelft.topdesk.net/tas/public/ssp/). Log into the self-service portal and then navigate to SOFTWARE & AUTORISATIONS > IT FOR COMMUNICATION > REQUEST DOMAIN NAMES
+Do you need to host a Data Portal? Please consider using the [4TU data repository](https://researchdata.4tu.nl/en/), [DANS](https://dans.knaw.nl/en), [Zenodo](https://zenodo.org/), or a domain specific repository first. If you have to build your own, consider using the [CKAN Data Portal Platform](https://ckan.org).
+Do you need to host a Research Software Directory? Please consider using the [Dutch national directory](https://www.research-software.nl/). If you need to build your own, consider reusing the [software used to build the national directory](https://github.com/research-software-directory/research-software-directory/).
+## Can’t find what you need?
+![Header]({{ "/assets/img/2020-03-19_compute_train.jpg" | absolute_url }})
+*Things can move quickly in the world of scientific computing*
+By the time you read this blog, a lot of the links may no longer work and the information will probably be outdated. No problem! If you can’t find what you are looking for, or if you have a problem not mentioned here, or if you know of additional useful resources, please don’t hesitate to get in touch.
+You can find me at [TU Delft](https://www.tudelft.nl/staff/s.e.branchett/) and on [LinkedIn](https://linkedin.com/in/sebranchett) and on [Twitter](https://twitter.com/sebranchett) and on [github](https://github.com/sebranchett).
+## Acknowledgements
+This blog expresses the views of the author, [Susan Branchett](https://www.tudelft.nl/staff/s.e.branchett/).
+The FIMs, Data Stewards and ICT-Innovation staff are kindly acknowledged for their help in collecting the information provided above.
+[Binary code](https://pixabay.com/photos/binary-binary-code-binary-system-2910663/), [alpaca](https://pixabay.com/photos/alpaca-cactus-teeth-tooth-eat-3647011/) and [train](https://pixabay.com/photos/train-station-transportation-people-2593687/) images are used under the [Pixabay license](https://pixabay.com/service/license/).
+[Server](https://commons.wikimedia.org/wiki/File:Random_Linux_Servers_(2005).jpeg) image by Flickr user Phil! Gold / [CC BY-SA](https://creativecommons.org/licenses/by-sa/2.0).
+This blog was originally published [here](https://www.tudelft.nl/en/ict-innovation/articles/how-can-the-tu-delft-support-your-research-computing-needs/).
+Except where otherwise noted this blog is published under a [CC-BY-4.0 international license](https://creativecommons.org/licenses/by/4.0/).
+layout: post
+title: "Harvesting the benefits of Raspberry Pi Clusters"
+date: 2020-08-03 14:00:00 +0200
+author: Susan Branchett
+image: 2020-08-03_raspberries.jpg
+**Preparing for Delft High Performance Computing (DHPC)**
+# What’s it all about?
+The [TU Delft]( https://www.tudelft.nl/) is currently setting up its High Performance Computing facility ([DHPC]( https://www.tudelft.nl/2019/dcse/grand-opening-go-dhpc-center/)).
+What will this mean for researchers at the TU Delft? How does a cluster of computers actually work? How do you go about taking advantage of such a facility? What do I need to know about parallel computing, before I can support TU Delft researchers using DHPC?
+DHPC is basically a large number of computers, connected together in a clever way. So, in order to answer some of my questions, I decided to prototype my own computing facility, using [Raspberry Pi computers]( https://www.raspberrypi.org/help/what-%20is-a-raspberry-pi/). The DLPC (Delft LOW performance cluster) was born.
+![DLPC]({{ "/assets/img/2020-08-03_DLPC.jpg" | absolute_url }})
+I spent several happy hours following the instructions in this excellent book: “Raspberry Pi Supercomputing and Scientific Programming: MPI4PY, NumPy, and SciPy for Enthusiasts”, by Ashwin Pajankar, ISBN 978-1-4842-2877-7.
+Without too much trouble I managed to cobble together a computer cluster from 8 Raspberry Pi 3 model B, a fast ethernet desktop switch, a pile of cables and power supplies, and a role of hook-and-loop fastener. The only aspect I managed to waste a lot of time on was the network. This was due to a faulty cable, my typing mistakes and me not really knowing enough about computer networks.
+Reading the instructions again and again, switching cables and checking the network interface configuration files many, many times, finally paid off.
+![network]({{ "/assets/img/2020-08-03_networks_maller.jpg" | absolute_url }})
+I was left wondering if that little plastic clip you are afraid will break off, could be the most robust part of a computer network
+Since there is a lot of focus on learning Python at the TU Delft, I decided to concentrate on [mpi4py](https://mpi4py.readthedocs.io). This is an implementation of the Message Passing Interface standard for the Python language. Basically, it enables my Python scripts to run in parallel on a cluster, spreading the workload over different processors and giving me results faster … if I use it correctly.
+# When you know it should work
+A classic example of when parallel computing should speed up a calculation is throwing darts to calculate π. If you have a square dart board with the largest quarter circle possible painted on it, you can estimate the value of π from the number of throws landing inside and outside the circle. One line of maths for those who are interested:
+![equation]({{ "/assets/img/2020-08-03_equation.jpg" | absolute_url }})
+In principle, the more throws, the more accurate your estimation:
+![network]({{ "/assets/img/2020-08-03_dart_boards.jpg" | absolute_url }})
+As you can see, with 1000 dart throws, the value of π is already approaching the true value (≈3.14159). I decided to stick with 1,000,000 dart throws for the next part.
+The DLPC has 8 Raspberry Pi computers, each with 4 processors. This means that I have 32 processors, or ranks, to play with. I set up calculations to distribute the 1,000,000 dart points over 1, 2, 3, … 32 processors. I was expecting that if I used 2 processors it would take half the time; and with 10 processors a tenth of the time (and with N processors, (time for 1 processor)/N). This is what I got:
+![network]({{ "/assets/img/2020-08-03_pi_with_coords.jpg" | absolute_url }})
+As you can see, I wasn’t getting the performance speed-up I was expecting (1/N). I removed a load of print statements that I used for debugging, but that didn’t help much.
+I started to discuss this with a nearby scientist and tried to convince him that this was due to unavoidable communication between the processors. He didn’t believe a word! Not being able to convince him I was right, I investigated further and eventually discovered what the problem was.
+I really wanted to show you how increasing the number of dart throws increases the accuracy of the estimate for π, so I wrote back all the dart point positions (x- and y-coordinates) and the inside/outside statuses from the worker processors to the central manager processor. This is how I could make the dart board figures above. It turns out that this communication was killing my performance. A classic beginner’s mistake. Once I only wrote back the number of dart points inside the quarter circle from each processor back to central manager processor, things got a lot better:
+![network]({{ "/assets/img/2020-08-03_pi_without_coords.jpg" | absolute_url }})
+This was great, except for that strange point at 20 processors. Not sure what happened. Could have been that I accidently pulled on the network cable, or that a Raspberry Pi got too warm, or that I got impatient and took a look at what was going on, which slowed everything down, or a bug really did crawl into the DLPC.
+Now I was left with the question: when shouldn’t you invest your time in making something work in parallel?
+# When you know it shouldn’t work
+Parallel computing doesn’t work well for inherently sequential problems, but what does that mean?
+I decided to simulate an experiment that takes measurements at regular intervals and writes them to log files. These log files then have to be analysed. The fact that the log files are produced sequentially makes it impossible to process them all in parallel, in real time, hence this is an inherently sequential problem.
+My simulation consists of measuring the temperature of the manager Raspberry Pi (using the `vcgencmd` command), writing the time and the temperature to a log file and then waiting for 5 seconds before taking the next measurement.
+The analysis is independent of the measurements and consists of reading a log file as soon as possible, extracting the date, time and temperature and then waiting for 1 second (to simulate the analysis). I did this ‘analysis’ for 20 log files in total.
+![network]({{ "/assets/img/2020-08-03_log_1_second.jpg" | absolute_url }})
+It takes 100 seconds to take 20 measurements, so a total time of 101 seconds is the minimum time needed. As expected, increasing the number of processors doesn’t speed up the total time. You just have to wait for the experiment to produce the numbers, before you can analyse them.
+My final step was to find the point where parallel processing would break down and adding more processors wouldn’t help. For this I kept the ‘experimental temperature logging’ the same (5 seconds between each measurement for 20 measurements), but changed the analysis time from 1 to 60 seconds.
+![network]({{ "/assets/img/2020-08-03_log_60_second.jpg" | absolute_url }})
+The time taken to measure the temperature is the same as before (100 seconds). The difference is that it takes 60 seconds to analyse the measurement, so sometimes you have to wait for a processor to become available before you can analyse the next measurement.
+Here you can see that 5 processors already give a good performance improvement and more than 10 is a waste of processors. The optimal number of processors depends very much on the problem you’re trying to solve.
+For those of you who are curious, here’s what happened to the temperature of the DLPC during these experiments:
+![network]({{ "/assets/img/2020-08-03_temperature_plot.png" | absolute_url }})
+You can clearly see the heating up at different rates, depending on the processing and communication loads, and the cooling down periods, when the DLPC was waiting for its next task.
+# Lessons Learned
+Communication is hard, time consuming and error prone, whether it’s between:
+* humans – trying to write a blog that is informative for both researchers and research support staff
+* humans and computers – learning a new kind of software programming
+* computers – getting processors to share just enough data at the right moment
+Managing hardware and networks is hard, time consuming and error prone. Getting the right information and software packages on the 8 Raspberry Pi-s and keeping them up-to-date is something I could only achieve through automation. I now have even more respect for my colleagues at IT.
+Running in parallel can dramatically improve performance, but you have to choose your problem wisely and be careful about your implementation.
+You don’t have to run in parallel on a cluster. You can run everything on one processor, it just might take longer.
+# Resources and Acknowledgements
+If you want to build your own Raspberry Pi cluster, then this book has very clear instructions: “Raspberry Pi Supercomputing and Scientific Programming: MPI4PY, NumPy, and SciPy for Enthusiasts”, by Ashwin Pajankar, ISBN 978-1-4842-2877-7.
+If you’re interested in MPI, but not necessarily in Python, then [this TU Delft course is excellent](https://www.tudelft.nl/cse/education/courses/mpi-course/). I’m hoping there will be so much demand that [Kees Vuik](https://www.tudelft.nl/ewi/over-de-faculteit/afdelingen/applied-mathematics/numerical-analysis/people/c-vuik/) and [Kees Lemmens](https://www.tudelft.nl/ewi/over-de-faculteit/afdelingen/applied-mathematics/mathematical-physics/people/kees-lemmens/) will let me be a helper next time, so please sign up!
+This PRACE training [Parallel and GPU Programming in Python](https://events.prace-ri.eu/event/946/overview) takes a broader look at speeding up your Python code. Well worth checking their website for future events.
+You can find my scripts [here](https://github.com/sebranchett/DLPC).
+[Laurens Siebbeles](https://www.tudelft.nl/tnw/over-faculteit/afdelingen/chemical-engineering/people/laurens-siebbeles/) is gratefully acknowledged for relentless discussions.
+[Mark Schenk](https://www.tudelft.nl/staff/m.m.a.schenk/) is gratefully acknowledged for paying for this adventure and sharing my enthusiasm.
+[Raspberry image](https://pixabay.com/photos/raspberry-berry-detail-food-fresh-2276/) by [PublicDomainPictures](https://pixabay.com/users/publicdomainpictures-14/) on Pixabay.
+This blog expresses the views of the author, [Susan Branchett](https://www.tudelft.nl/staff/s.e.branchett/).
+This article is published under a [CC-BY-4.0 international license](https://creativecommons.org/licenses/by/4.0/).
+layout: post
+title: "Supporting Artificial Intelligence for Research –
+the (machine) learning process"
+author: Susan Branchett
+image: 2021-06-30_IDE.png
+**Hands-on Machine Learning with Campus Wi-Fi data**
+# Introduction
+At the TU Delft [ICT Innovation department](https://www.tudelft.nl/ict-innovation), we are looking into ways to support researchers using Artificial Intelligence (AI). After talks with our first 16 [AI labs](https://www.tudelft.nl/en/ai/research/tu-delft-ai-labs), I realised I need to know a lot more about AI.
+Machine Learning is one of the most popular forms of AI, so I decided to start there. For Machine Learning to work well, you need lots of data.
+In June 2021, my colleague [Lolke Boonstra](https://www.tudelft.nl/staff/l.boonstra) and David Šálek from SURF, launched the TU Delft ICT data platform. This is a streaming data platform that streams anonymised campus Wi-Fi connection data. As a contributor to this project, I have access.
+This blog documents my first steps in Machine Learning. I follow Chapter 2 of ‘Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow’ by Aurélien Géron ([accessible here](https://learning-oreilly-com.tudelft.idm.oclc.org/library/view/hands-on-machine-learning/9781492032632/) for TU Delft staff). I use campus Wi-Fi connection data from the ICT data platform. My goal is to predict how busy a building on campus will be, at a given date and time.
+# Discover the Data
+The first step is to discover the data. Each building on campus has several Wi-Fi access points. The IT department monitors these access points every 5 minutes to make sure everything is working properly. The ICT data platform collects this data and live-streams it.
+![First steps aren’t always easy]({{ "/assets/img/2021-06-30_first_steps.jpg" | absolute_url }})
+First steps aren’t always easy
+To make my life easier, I first captured a week of data in a text file. I then cleaned it up so I was left with rows containing:
+* a time-stamp
+* the building where the access point is located
+* the number of connections
+Not every access point is monitored at the exact same millisecond, so I added up the total number of connections per building, in each five minute interval. I now had a clean dataset to work with.
+Before going any further, I set aside 20% of the data, stratified (selected proportionately) over the different buildings and randomly selected in the time. This is the test set, which I’ll use at the end to test how well my models are working. The remaining 80% is my training set. This is what some of it looks like:
+![Discover the data]({{ "/assets/img/2021-06-30_discover_data.png" | absolute_url }})
+The data starts on a Friday afternoon and runs for a week. You can see that there is a daily rhythm with what looks like a lunchtime dip.
+The top row has data for the Aula and the Library. These two buildings are next to each other in the centre of the campus. At the Library, it looks like people come back to study after their evening meal, whereas the Aula is looking a little neglected.
+On the bottom row, there is data from Applied Sciences (AS) South and Aerospace Engineering (AE). Again, two buildings close to each other, this time at the south of the campus. It looks like AE takes weekends very seriously, whilst at AS South they seem to be more afternoon people. Their highest peak is after the lunch dip.
+I was wondering why there were so few connections on the Friday afternoon at the start of this one week period, compared to the Friday afternoon at the end of the period. The first Friday was the day after Ascension and was a collective free day. That could explain it.
+I’m lead to believe that a person on campus has, on average, more than 2 Wi-Fi connections at a given moment. It’s also important to note that some connections are from devices, not associated with an individual. The number of connections doesn’t tell us how many people are in a building, but it does give an indication of how busy the building is.
+# Preparing the Data
+Géron warns that preparing data takes a lot of human effort, compared to applying a Machine Learning model. He was not joking.
+![Preparation]({{ "/assets/img/2021-06-30_preparation.jpg" | absolute_url }})
+Preparation is the key to success
+The timestamps in this data are measured in milliseconds, starting from midnight UTC, 1st January 1970. I decided to introduce a personal bias by adding some extra attributes (also called features). The idea is to help the Machine Learning model discover useful patterns in the data. I added:
+* day of the week
+* time of day
+* weekend or not
+Inspired by the low numbers for the collective free day and the after dinner numbers at the library, I also added:
+* University/National holiday
+* [academic year category](https://www.tudelft.nl/en/student/education/academic-calendar)
+Following Chapter 2, I:
+* checked there was no missing data
+* used a ‘One Hot Encoder’, to convert the building and the academic year roster category into a convenient matrix form
+* used a ‘standard scaler’ for all other attributes. This helps the models when numerical attributes have different scales (e.g. day of week from 0 to 6, number of connections ~100s)
+* wrote a transformation pipeline to add attributes and do transformations ‘automatically’
+Writing a pipeline seems like a lot of extra work at the time. However, when I had to repeat these steps, in the right order, to test parts of the data, create graphs for you and move to a larger dataset, I saw the wisdom of Géron’s advice.
+![Wise words]({{ "/assets/img/2021-06-30_wise_words.jpg" | absolute_url }})
+Automate your process with a pipeline. Wise words indeed
+# Train a Model and Validate
+Now we finally get to train and evaluate Machine Learning models. Following Chapter 2 again, I chose two different models:
+* Linear Regression - which models relationships by drawing the best straight line
+* Tree Regression - which creates a kind of decision flowchart to predict a value
+I used ‘cross-validation’ to get a feel for how well the 2 models work. You can skip the rest of this paragraph, if it’s too confusing. Cross-validation involves dividing up the training data into batches, setting one batch aside, training the model on the remaining batches and calculating the root mean square error for the set-aside batch, thus validating the model. Cross-validation then cycles onto the next batch, until each of the batches has been used for validation. Géron explains it better.
+For Linear Regression, the average error was 99 connections. To be more precise the mean root mean square error was 99, with a standard deviation of 3.5.
+For Decision Tree Regression, the average error was 5 connections. Again, to be more precise, the mean root mean square error was 5, with a standard deviation of 0.17.
+Decision Tree Regression seems to do a lot better than Linear Regression, but what does it actually look like? Without my extra attributes:
+![Aula]({{ "/assets/img/2021-06-30_Aula.png" | absolute_url }})
+Linear Regression can’t do any better than a straight line. Decision Tree Regression is doing very well at fitting the data points, probably too well, or overfitting. That could explain why the Decision Tree Regression model is not so good at predicting the future.
+Does adding my extra attributes help?
+![Aula extra attributes]({{ "/assets/img/2021-06-30_Aula_extra_attributes.png" | absolute_url }})
+Well, Linear Regression is doing a bit better, except for the negative number of connections and the future predictions. On the other hand, Decision Tree Regression predictions are looking a lot better.
+The best way to reduce overfitting is to add more data, so I collected 3 more weeks. Here are the data points and the predictions for the Aula, with my extra attributes:
+![Aula four weeks]({{ "/assets/img/2021-06-30_Aula_four_weeks.png" | absolute_url }})
+Okay, that’s not looking bad. The weekly and daily patterns, and peak number of connections are looking quite good for the Decision Tree Regression model. If you look carefully at the bottom of the Linear Regression plot, there is a slightly increasing trend. This could be real, as COVID restrictions are being eased. On the other hand, it could be a result of the collective free day and Whit-Monday leading to more holidays in the first half of the data collection period.
+…and what happened in the Aula on the 3rd June?
+…and which of my extra attributes is responsible for improving these models?
+Following Chapter 2, I applied the Grid Search technique. This technique is usually used for hyperparameters, which I need to learn more about. I used this technique to turn my extra attributes off and on.
+Grid Search calculated that adding the weekend status and the academic year roster doesn’t improve the model, but adding each of the other attributes does. As I only have 4 weeks of data, I’m guessing it’s difficult to find patterns associated with the academic year roster. Also, because the model already has the day of the week, adding the weekend status probably doesn’t add any extra useful information.
+Using the best combination of extra attributes for the Decision Tree Regression model gives these results:
+![Best model]({{ "/assets/img/2021-06-30_best_model.png" | absolute_url }})
+I can now return to the test set, the 20% I set aside at the beginning. Evaluating my best model on the data points in this test set, I get an average error of slightly more than 7 connections. Given that the number of connections varies between zero and a few hundred, I’m quite pleased with that.
+I like the way the model sees a difference between weekdays and the weekend and didn’t get confused by Whit-Monday. I’m also pleased that the model predicts a lunch dip for all 4 buildings shown, but only the Library has an after dinner peak.
+My biggest hope is that by September 2021, this model will be completely useless.
+# Lessons learned
+* Géron’s book is a great way to get your hands dirty with Machine Learning
+* I still have a lot to learn. It took me around 7 days over the past six weeks to get to this point. Running the models only takes a few minutes
+* The ICT data platform is a really interesting, real-time, anonymised source of campus data. I did not do it justice in this blog
+* This is not the way you should do machine learning on time-series data. There is a whole section on this in Chapter 15, but I haven’t got that far yet
+* Even with limited knowledge, simplified data and basic Machine Learning models, you can make a feasible looking prediction of how busy a building is going to be
+# Wrap up
+If you want to see exactly what I did, [here is my repository](https://github.com/sebranchett/wifi_blog).
+If you would like to know more about the ICT data platform, or use the data for your research, please contact [Lolke Boonstra](https://www.tudelft.nl/staff/l.boonstra).
+If you would like me to put you in contact with someone who does know what they are doing with AI, or you want to help me improve my AI skills, or you want to join me on this journey of discovery, or you think I could help you with your research in any other way, [I would be delighted to hear from you](https://www.tudelft.nl/staff/s.e.branchett).
+This blog expresses the views of the author, [Susan Branchett](https://www.tudelft.nl/staff/s.e.branchett).
+This article is published under a [CC-BY-4.0 international license](https://creativecommons.org/licenses/by/4.0/).
+# Image credits
+[TU Delft IDE building courtesy of Gerd Kortuem.](https://twitter.com/kortuem/status/908930023549267968)
+[First steps aren’t always easy.](https://unsplash.com/photos/ALzOa_AtV7o)
+[Preparation is the key to success.](https://pixabay.com/illustrations/success-key-gold-gold-colored-1433400/)
+[Wise words indeed.](https://pixabay.com/photos/woman-human-read-learn-book-4135301/)
+layout: post
+title: "I wonder how my experiment is going...
+delivering IoT device data to the right people"
+author: Susan Branchett
+image: 2023-05-31-wires.jpg
+**Creating an IoT Prototype**
+# How it came about
+I’ve been fascinated by the Internet of Things (IoT) for some time now, but have always wondered how much of it is hype and how it works in practice.
+In May 2023, two useful problems occurred in the same week. Firstly, the [TU Delft](https://www.tudelft.nl/en/) [Cloud4Research](https://tu-delft-ict-innovation.github.io/Cloud4Research/) team were organising an [‘IoT services through Cloud4Research’ event](https://www.eventbrite.nl/e/iot-services-through-cloud4research-tickets-624213448227) for 13 June 2023, together with [AWS]( https://aws.amazon.com/). The AWS team wanted to give a live demo using a low cost camera. They needed a way to connect the camera to the internet while on the TU Delft campus.
+The second useful problem was a power outage at the south of the TU Delft campus. In one of the Chemical Engineering labs, a moving mirror is used to control a laser experiment. When the power stopped, the mirror stopped moving. When the power resumed, unfortunately, the mirror didn’t start moving again. After some tweaking, they got it working, but it was a bit unreliable for a while. Wouldn’t it be nice if you could keep an eye on the mirror when you’re not in the lab?
+I decided to investigate.
+# Hardware
+![Destroy!]({{ "/assets/img/2023-05-31-power_only.jpg" | absolute_url }})
+Following the AWS team’s lead, I got myself an ESP32-CAM. This is a microcontroller board with a camera, Wi-Fi and Bluetooth on board, …and all for around €10! The model I chose uses the 'AI Thinker ESP32-CAM' board manager, but there are others.
+As this was the first time I’d used one of these boards, I found [this Random Nerd tutorial](https://randomnerdtutorials.com/program-upload-code-esp32-cam/) and [this DroneBot Workshop video](https://www.youtube.com/watch?v=visj0KE5VtY) extremely useful.
+Some things I wished I’d known in advance:
+1. Most ESP32-CAMs need a USB serial port adapter. This board connects the ESP32-CAM to your laptop so you can program your ESP32-CAM
+2. Some USB cables are for charging only. These are useless if you are trying to program an ESP32-CAM. If you find one, destroy it immediately or mark it very clearly!
+3. You can only select a serial port when the ESP32-CAM is connected
+4. Connecting and disconnecting the IO0 pin to ground and pressing the reset button is tricky and essential. I started with my ESP32-CAM on a breadboard, which made pressing the reset button difficult. This is the IO0-reset dance that finally worked for me...
+![The Dance of the IO0 pin]({{ "/assets/img/2023-05-31-pin_dance.jpg" | absolute_url }})
+# Network
+If you connect a new device to the Wi-Fi at home, you need the SSID, or name of your Wi-Fi network, and a long and difficult to remember key, or password.
+At the TU Delft Campus, you use your NetID to connect your laptop or phone to the Wi-Fi.
+Most IoT devices, like the ESP32-CAM, connect using an SSID and key, which can cause problems on campus.
+Fortunately, there are two helpful services provided by the TU Delft IT department. The first gives you a licence which allows you to attach up to three devices.
+If you have many devices, the second service provides you with an interface to manage your own set of licences, for your lab or group.
+I sent an email to my [Service Desk](https://www.tudelft.nl/en/student/ict/service-desk) with the text:
+> Please could you make it possible for me to attach 2 IoT devices to the TUD-facility network?
+They sent me a manual and a link to register and administer my devices.
+To register a device, you need to know it’s [MAC address](https://en.wikipedia.org/wiki/MAC_address). I found mine using [this Random Nerd tutorial](https://randomnerdtutorials.com/get-change-esp32-esp8266-mac-address-arduino/). Each device gets its own key. Only registered MAC addresses can join the network.
+**Bonus:** In [this article (in Dutch only)](https://www.surf.nl/iotroam-veilig-en-herleidbaar-aansluiten-van-alle-iot-apparaten), you can read how [SURF](https://www.surf.nl/en) is working on an IoT equivalent of [Eduroam]( https://eduroam.org/). The TU Delft campus solution may well become part of this larger initiative.
+![FrankenThing]({{ "/assets/img/2023-05-31-FrankenThing.jpg" | absolute_url }})
+The result of an afternoon with a cereal packet and a soldering iron: FrankenThing
+# Syncing and Sharing
+The next challenge was to get photos from the camera to a place where I could control who has access to them. I decided to use [SURFdrive](https://www.surf.nl/en/surfdrive-store-and-share-your-files-securely-in-the-cloud) , but this solution would also work for [ResearchDrive](https://www.surf.nl/en/research-drive-securely-and-easily-store-and-share-research-data), or any other cloud storage service with a [WebDav]( https://en.wikipedia.org/wiki/WebDAV) interface.
+SURFdrive gives me the possibility to share the folder full of photos with other SURFdrive users, such as the people working in the lab.
+SURFdrive [has an excellent description](https://wiki.surfnet.nl/display/SURFdrive/Accessing+files+via+WebDAV) of how to generate a token that you need to use the WebDAV interface. The `curl` examples (with added `--verbose` flag) were very handy for debugging.
+To ensure you are talking to the real SURFdrive, you will also need the HTTPs certificate. If you are using Google Chrome, you can download this certificate by clicking on the lock to the left of the address bar, selecting ‘Connection is secure’, then ‘Certificate is valid’, then ‘Details’, then selecting ‘USERtrust RSA Certification Authority’ at the top of the ‘Certificate Hierarchy’, and finally ‘Export…’. You can open the downloaded `.crt` file with a text editor. You’ll need to add some quotes and line endings to get it into the correct format for the ESP32-CAM. [Here is the example I followed.](https://github.com/espressif/arduino-esp32/blob/master/libraries/HTTPClient/examples/BasicHttpsClient/BasicHttpsClient.ino)
+# Lessons learned
+![Proof the Prototype Works]({{ "/assets/img/2023-05-31-photo_series.jpg" | absolute_url }})
+One photo per minute - zoom in to check out the clock top right
+* It’s possible to connect an IoT device to a Wi-Fi network at the TU Delft campus, without misusing your NetId – so please don’t misuse your NetId
+* It’s possible to send data directly from an IoT device to a secure cloud solution, such as SURFdrive or SURF’s Research Drive
+* There are a million ways to waste an afternoon!
+If you would like to reuse my solution, you can find it [here]( https://github.com/sebranchett/ESP32_photo_HttpsClient). You’ll have to check the `rootCACertificate`, add the Wi-Fi `ssid` and `password`, add the SURFdrive `username` and `token`, and check the paths (`webdavUrl`, `hostname`, `serverPath`).
+# What next?
+* Get a more robust enclosure. [Thingiverse](https://www.thingiverse.com/search?q=esp32-cam&page=1&type=things&sort=relevant) has some nice examples to 3D print
+* Find a way to ensure that people don’t get accidentally photographed
+* Work on energy efficiency. If I upload a photo every minute, a 9V battery only lasts a couple of hours. Note that my ESP32-CAM had a built in voltage regulator to accept up to 12V. My temporary fix was to use an adapted phone charger, but [this link looks very userful](https://randomnerdtutorials.com/esp32-deep-sleep-arduino-ide-wake-up-sources)
+* Find a way to alert a researcher automatically when the experiment isn’t going the way it should. I’m hoping the [‘IoT services through Cloud4Research’ event](https://www.eventbrite.nl/e/iot-services-through-cloud4research-tickets-624213448227) will provide some inspiration for this
+# Last, but not least
+I am very grateful to:
+* [Random Nerd](https://randomnerdtutorials.com/program-upload-code-esp32-cam/), [DroneBot Workshop](https://www.youtube.com/watch?v=visj0KE5VtY), [Random Nerd again](https://randomnerdtutorials.com/get-change-esp32-esp8266-mac-address-arduino/), [Random Nerd yet again](https://randomnerdtutorials.com/esp32-cam-http-post-php-arduino/) and [Espressif](https://github.com/espressif/arduino-esp32/blob/master/libraries/HTTPClient/examples/BasicHttpsClient/BasicHttpsClient.ino) for their excellent examples
+* [Michiel Fokke](https://nl.linkedin.com/in/michielfokke) and [Laurens Siebbeles]( https://www.tudelft.nl/tnw/over-faculteit/afdelingen/chemical-engineering/principal-scientists/laurens-siebbeles) for very useful problems and helpful discussions
+* [Lolke Boonstra](https://www.tudelft.nl/en/staff/l.boonstra), [Fred Roeling](https://www.tudelft.nl/en/staff/f.q.c.roeling) and [Mark Schenk](https://www.tudelft.nl/en/staff/m.m.a.schenk) for helpful discussions
+* [Niket Agrawal](https://www.tudelft.nl/en/staff/n.agrawal) for inspiring me to blog again
+If you have any suggestions to improve this solution, or you would like some help reproducing this solution for your research problem, or you have some other research and IT related problem we could work on together, [I would be delighted to hear from you](https://www.tudelft.nl/en/staff/s.e.branchett).
+This blog expresses the views of the author, [Susan Branchett](https://www.tudelft.nl/en/staff/s.e.branchett).
+This blog is published under a [CC-BY-4.0 international license](https://creativecommons.org/licenses/by/4.0/).
