List of Most Popular Haas Compute ServicesCustom JupyterHub for Courseshttps://jupyterhub.haas.berkeley.edu
Our most popular service provides custom JupyterHub environments for coursework. This service offers numerous advantages for instructors:
High-Performance Computing (HPC) ClusterOur HPC cluster is designed for researchers and research groups, featuring:
Available Software:Python
R
Julia
FORTRAN
Perl
PHP
Stata
SAS
JupyterLab
Our computing cluster primarily handles batch jobs, accommodating a diverse range of research needs. Some researchers demand jobs with numerous cores, while others submit thousands of single-core jobs. To serve this varied user base, we migrated our data to Google Cloud in 2020 and replicated our servers using virtual machines. Tony then initiated a one-on-one training program to guide researchers on using the new cluster. However, this process proved time-consuming. After only two weeks of training, with just seven of the fifty users transitioned, the estimated monthly Google Cloud costs reached $17,000. Tony promptly alerted the manager about this escalating expense and was told to immediately shutdown the project. Tony still runs other projects on Google Cloud and works with research groups to use services only available on the cloud like Big Query. Browser-Based Jupyter Environmenthttps://jupyter.haastech.org
Access Python, R, Julia, and Stata through a familiar Jupyter interface:
GPU Computing Environmenthttps://engineering.haas.berkeley.edu
Specialized GPU-enabled JupyterHub featuring:
Tutorial Platformhttps://tutorials.haas.berkeley.edu
A collaborative platform developed by Richard Huntsinger, Thomas Lee, and Tony Cricelli offering:
Custom Research hubshttps://jupyterhub.haas.berkeley.edu
These are custom Research Jupyterhubs used by Research groups who prefer the Jupyter platform for
Neilsen Data Available on the HPC ClusterTony Cricelli has been appointed as the **Data Steward** for the Nielsen Data. In this role, Tony is responsible for:
Open OnDemand (OOD)Tony is working with Charles on a new test cluster to setup the Open OnDemand services. The plan is to setup and test a small cluster where users can login and request resources. If the resources are available locally, the requested program will be launched, if not, for example a request for 100,000 GPU cores, the user will be connected to a cloud service which has the resources and the job will be launched. We will thoroughly investigate how to budget for users to prevent run away costs on the cloud. Currently the cluster on Google Cloud, with no users is going to cost about $3000/mo. As we add storage and users, the cost is going to go up fast. We are investigating other cloud providers like Azure and AWS. In general, GCP seems to be the least expensive. Comparing like to like is difficult because of different naming convenstions and services offered. Kubernetes Cluster InsiteOur JupyterHub cluster has successfully served over 10 million requests since Aug 2023. The cluster often has multiple courses with over a hundred students each. The exact number of courses varies semester to semester. It's likely we served well over 100 million requests since starting the service. The only significant outages were due to campus turning power off. Although they were scheduled outages they happend in the middle of a semester. It was not under our control and we did our best to notify and keep users informed. The EWMBA folks are frequent users of our tutorials. We also allow anyone with a @berkeley.edu to use our tutorials. As a side note, we initially experimented with Google and AWS Cloud, but students found the performance to be unsatisfactory. The cloud's high costs for maintaining idle servers and the significant startup time (up to 5 minutes) negatively impacted the user experience, especially during live lectures. Cloud resources are better suited for targeted research projects, start a server, run analysis, turn off server. |