Frequently Asked Questions - Haas Research Computing

General questions

What reserach services does Haas offer?

Haas management has decided that Research computing needs to be managed by two different departments, One department manages windows based research computers and the other department manages Linux based computers. Please click here to view the services offered to reserachers and instructors.

Who should be contacted to report issues with this site?

In general, anytime you have a question or an issue to report, you can email helpdesk@haas.berkeley.edu. If you wish, you can contact tony_cricelli@berkeley.edu directly since he is the web developer and database administrator for this site.

What research resources does Haas offer?

Haas maintains four general use clusters:

Ubuntu-based Linux Cluster
Microsoft-based Windows Cluster
Kubernetes Jupyterhub-based Cluster
Analytic Environment on Demand (AEoD) for custom Virtual Machines

What software is available on the Haas Clusters?

Matlab
SAS
Python
R
C,C++,Fortran
Julia

Does Haas offer programming help?

In general, Haas has systems administrators, not programmers. The Haas administrators main duties are to keep the computer systems updated and available 24/7. Systems administrators also ensure that research software applications are fully-functioning. It does not hurt to ask, the Haas administrators have helped thousands of students debug codes over the years. Send an email to helpdesk@haas.berkeley.edu and ask for a Haas Research Computing staff member for assistance.

The Berkeley Data Lab (D-Lab) provides consulting for research applications and programming languages. Go here D-Lab Consulting to schedule a consulting appointment.

What is Jupyterhub ?

Jupyterhub is a web interface to other software such has Python, R, SAS, MATLAB, etc.... R and Python are the main two languages used.

What is kubernetes?

Kubernetes is the software we use to manage our Jupyterhub teaching workload. It manages containerized workloads and services automatically.

Is the Haas kubernetes cluster backed up?

Yes, we use Kopia to do encrypted snapshots every 12 hours. The snapshots are stord in the Google Cloud.

Is the Haas HPC cluster backed up?

Yes, we use Kopia to do encrypted snapshots every 12 hours. The snapshots are stord in the Google Cloud. We will keep the last 60 snapshots which allow us to go back 30 days.

Are the Mysql and PostgreSQL databases backed up?

Yes, we use Kopia to do encrypted snapshots every 12 hours. The snapshots are stord in the Google Cloud. We will keep the last 60 snapshots which allow us to go back 30 days.

Do you have a disaster recovery plan in case of disaster ?

Yes, we snapshot research data located on our HPC cluster to Google Cloud every 12 hours. In case of emergency, we can spin up a cluster on Google cloud and restore latest snapshot. Our specific procedure is: we would allocate a Google Filestore large enough to be able to restore data. We would then allocate a Linux based (Ubuntu 20.+) node and mount our Kopia snapshot to it. We would first mount the /etc snapshot to recover all accounts and passwords. Then we would mount the /apps folder to recover all installed apps and finally we would mount the /home snapshot and have immediate account to all research data. Then we would mount the Google Filestore and start copying the research files from the mounted /home snapshot. We estimate this would take 4 hours to complete.

Linux Cluster FAQs

How do I request access to the research cluster?

Please email helpdesk@haas.berkeley.edu and request access to the Haas Linux research cluster.

How do I connect to the Haas research cluster?

In order to access the Haas research cluster, you must be on the U.C. Berkeley network or have the Berkeley VPN connected. SSH and SFTP are the only two connection methods supported.

Which SSH or SFTP Clients do you recommend?

There are many ssh and sftp clients available so it is mostly personal preference.

MacOS ships with ssh. Many researchers, open a terminal windows and enter: ssh haas-hpc00.haas.berkeley.edu
Windows has two main clients researchers tend to use, putty and mobaxterm, both freely downloadable. Download puTTY here: puTTY Downloads. Download MobaXTerm here: MobaXTerm Download
Fast-X is an excellent SSH client that accelerates GUI-based applications such as XSTATA, RStudio, Spyder and others. FastX is available for Windows, MacOS, and Linux. Download here: FastX Downloads

May I request customer software or libraries be installed?

Yes open a ticket by emailing helpdesk@haas.berkeley.edu You can also contact tony@haas.berkeley.edu or zane@berkeley.edu directly. We will do our best to install any package that is available to us and does not break the system.

May I install my own Python libraries ?

Yes ! You may create your own Python environment with this command: conda init bash

May I use Jupyterhub on the HPC Cluster?

Yes ! There are several ways to run it. Below are the setup directions:

Two Steps in preparation to launch jupyter that only have to be done once:

1. Create your virtual Environment:
conda init bash
conda create -n my_new_environment_name
conda activate my_new_environment_name

2. Pick a browser that you normally do not use on your computer. Some browser names
are Chrome, Brave, Disenter, FireFox, Opera, Edge, etc....

This browser is going to be set up differently than your normal browsers.
It is going to be set up so that it uses a "proxy". The proxy is
going to be created by ssh.

I chose firefox as my jupyterhub browser. After starting the browser:

a. Click on the 3 horizontal lines on the upper left of the browser (often called the "hamburger")
b. Click on preferences
c. Click on General on the right
d. Scroll to the bottom of the page
e. Click on settings under network settings
f. Select Manual proxy configuration
g. Click on Socks Host box and enter 127.0.0.1
h. In the port box put a number in, like 3456
i. Click OK and save it.

What we just did is force the browser to "surf" through localhost port 3456.

Next are the steps to launch a jupyterhub-notebook job to the cluster and use ssh
to connect your local browser to the HPC computer node running jupyterhub.

1. ssh to hpc.haastech.org:

ssh -C -D 3456 username@hpc.haastech.org

2. launch jupyterhub:
/apps/bin/jupyterhub

(make note of JobID)

3. run bpeek command after about 10 seconds waiting for jupyterhub to start:
bpeek jobid

Towards the bottom of the output, you will see something like this:

http://haas-hpc10:2225/?token=72abceasyas1230b3b52a2220055eb1662f628e12707c5e3

4. The final step is to enter the above URL into your home browser that is being proxied.
But you will run into a problem because your home computer does not know what
haas-hpc10 is. From home you have to put the IP address, so haas-hpc10 becomes 10.10.10.20.

Examples:
haas-hpc01 becomes 10.10.10.11
haas-hpc07 becomes 10.10.10.17
haas-hpc08 becomes 10.10.10.18
haas-hpc09 becomes 10.10.10.19
haas-hpc10 becomes 10.10.10.20

So the output of the third step was:

http://haas-hpc10:2225/?token=72abceasyas1230b3b52a2220055eb1662f628e12707c5e3
but you change it to this:
http://10.10.10.20:2225/?token=72abceasyas1230b3b52a2220055eb1662f628e12707c5e3

5. Enjoy your custom jupyterhub! Since you are in a virtual environment,
you may add custom languages and custom libraries. You can close your
browser at any time and return days later and pick up where you left off.

What queues are available on the cluster?

The cluster offers several queues for users of varying sizes. The command to use is

bqueues

. To submit a job to a specific queue use the bsub command with the -q option. For examples:

bsub -q Queue Name

I have access to the Linux Cluster what can I do with it?

Sometimes this question comes up and is tricky to answer. It is akin to asking, I have access to a Boeing 747, how do I fly it? The answer could be, first sit in the pilots seat ... The cluster is a computer, you can edit files, store files, write programs in many languages, run interactive jobs, run batch jobs, etc...

What software is available on the Linux cluster ?

The Linux cluster has the following softare installed:

STATA, SAS, Matlab

R, Python, Julia, C, C++, Fortran

Bash, Perl, Awk

Emacs, Vi, RStudio, Spyder

Jupyter Notebook, JupyterHub

Is there an example R job script for the cluster?

Yes, put the following in a file, but change the cd command, the error output file name and the console output file name.

			#!/bin/sh
			#BSUB -q special                    # Queue to submit job to
			#BSUB -n 1                         # Ask for 1 core
			#BSUB -e product_server_err.txt    # error output from script
			#BSUB -o product_server_out.txt    # console output from script
			#BSUB -Is


			#
			# Change your working folder
			#
			cd /home/tony
			
			#
			# run system R with my program
			#
			/apps/src/R-4.1.2/bin/R   tony.R

Then to submit the job enter this command:

		    bsub < scriptname

I try to use discriptive names, like R_program.bsub, but it is totally up to you.

How do I request more than 1 core in a job script?

It's really easy. In your job script, use the -n parameter.


#!/bin/sh
#BSUB -q special  # Queue to submit job to
#BSUB -n 8      # Ask for 8 cores
#BSUB -e error.txt  # Error output from script
#BSUB -o screen.txt # Console output from script
#BSUB -Is

# Change your working folder
cd $HOME

# Run system R with my program
/apps/src/R-4.1.2/bin/R my_program.R

Then, to submit the job, enter this command:


bsub < scriptname

I try to use descriptive names, like R_program.bsub, but it's totally up to you.

How do I request 16 cores and use 4 servers so each server uses 4 cores?

It's really easy. In your job script, use the -n parameter and the -R span[ ptile=4] parameter.


#!/bin/sh
#BSUB -q special          # Queue to submit job to
#BSUB -n 8                # Ask for 8 cores
#BSUB -R "span[ptile=4]" # Number of servers to spread cores over
#BSUB -e error.txt        # Error output from script
#BSUB -o screen.txt       # Console output from script
#BSUB -Is

# Change your working folder
cd $HOME

# Run system R with my program
/apps/src/R-4.1.2/bin/R my_program.R

Then, to submit the job, enter this command:


bsub < scriptname

With the above settings you will request 4 cores on 4 servers for your code. Hopefully you are using openMPI or something similar for you code to talk to all the cores.

How do I install FastX?

The FastX clients are here:
StarNet Download Site
Once you have fastx installed, on startup you should see something like this:

Click the + and add your information:

Click OK then double click on the line that shows up:

On the next screen you will see another + sign:

You will see:

I use XFCE as my GUI.
That will get you this:

All the setup will be saved, next time you start up fastx, it will pick-up where you left off. All the above setup is done just once.
You may safely ignore this window if it pops up. We do not run the webserver part of FastX

How do I install MobaXterm?

MS Windows SSH/X11 Software (Alternative to FastX)
MobaXterm Home Edition Download Site
This program will allow you to ssh into the Haas HPC and run graphical programs from the HPC.
Easy to use, download, install and run it.
Click on ssh create a session. The DISPLAY variable will be automatically set.

Windows Cluster FAQs