Berkeley Haas Wordmark

Haas Research Computing FAQ

Cluster Overview

The Haas OpenLava cluster provides high‑performance batch computing for research workloads.

  • Master / login node: hpc.haastech.org (haas-hpc00) β€” submit jobs here; do not run heavy workloads directly on this node
  • Compute nodes: haas-hpc01 through haas-hpc10
  • Special server: haas-hpc11 (SQL / JupyterHub support)
  • Two queues: normal for standard jobs  |  hi-mem for large-memory workloads

Node Specifications

Use this table when deciding how many cores to request or which queue to target.

NodeCPU CoresAvailable RAMMax Cores / JobQueue
haas-hpc0164755 GiB40hi-mem
haas-hpc0264755 GiB40hi-mem
haas-hpc0364661 GiB40hi-mem
haas-hpc0464251 GiB40normal
haas-hpc0540251 GiB30normal
haas-hpc0640251 GiB30normal
haas-hpc0740251 GiB30normal
haas-hpc0840251 GiB30normal
haas-hpc0940251 GiB30normal
haas-hpc1040251 GiB30normal
πŸ’‘ Max Cores / Job is the per-job slot limit set in OpenLava (MXJ in lsb.hosts). Requesting more cores than this limit will cause your job to wait indefinitely in the queue.

Queue Guide β€” Which Queue Should I Use?

🟒 normal (default)

  • Nodes: hpc04 – hpc10
  • RAM per node: up to 251 GiB
  • Max cores per job: 30–40
  • Max jobs per user: 200
  • Max jobs in queue: 1,000
  • Best for: most research workloads, array jobs, standard parallel jobs

πŸ”΅ hi-mem

  • Nodes: hpc01, hpc02, hpc03
  • RAM per node: 661 – 755 GiB
  • Max cores per job: 40
  • Best for: large in-memory datasets, genome assembly, ML model training requiring > 251 GiB RAM
πŸ’‘ Not sure? Start with the normal queue. Switch to hi-mem only if your job runs out of memory or you know you need more than ~200 GiB RAM.
⚠️ Job limits: The normal queue enforces a limit of 200 concurrent jobs per user and 1,000 jobs total in the queue. Plan large array jobs accordingly.

Targeting a Specific Queue


# Submit to the normal queue (this is the default)
bsub -q normal ./my_job.sh

# Submit to the hi-mem queue
bsub -q hi-mem ./my_job.sh

Check Current Queue Status


bqueues

How do I SSH into the cluster?

From macOS or Linux:


ssh yourusername@hpc.haastech.org

Windows users should use MobaXterm or PuTTY.

πŸ”’ Off campus? Connect to the UC Berkeley VPN at vpn.berkeley.edu before logging in.

FastX Remote Desktop

FastX provides a full graphical desktop session on the cluster, accessible from your browser or a native desktop client. It is ideal for running GUI applications such as RStudio, MATLAB, or any graphical tool without needing a local installation.

Step 1 β€” Download the FastX Client

🍎
macOS macOS 10.13+ Download
πŸͺŸ
Windows Windows 10/11 Download
🐧
Linux RPM & DEB packages Download
πŸ’‘ Browser option: You can also connect in a modern web browser without installing any client β€” navigate to the FastX web URL provided by Haas IT and log in with your cluster credentials.

Step 2 β€” Install the Client

  • macOS: Open the .dmg and drag FastX to your Applications folder.
  • Windows: Run the .exe installer and follow the prompts.
  • Linux (DEB): sudo dpkg -i fastx-client_*.deb
  • Linux (RPM): sudo rpm -i fastx-client_*.rpm

Step 3 β€” Connect to the Cluster

  • Host: hpc.haastech.org
  • Port: 3300 (default FastX port)
  • Username: your Haas cluster username
  • Authentication: Password or SSH key (same credentials as SSH access)
πŸ”’ Off-campus users: You must be connected to the UC Berkeley VPN before launching a FastX session.

Step 4 β€” Start a Desktop Session

Click + to create a new session. Choose XFCE or KDE for best performance. Your session persists if you close the client window β€” you can resume it later without losing your work.

Resuming or Ending a Session

Existing sessions appear in the FastX session list when you connect β€” click one to resume. To fully terminate a session and free resources, right-click it and choose Terminate. Simply closing the window suspends without terminating.

Submitting Jobs with OpenLava

Basic Job Submission


# Submit a script to the default (normal) queue
bsub ./my_job.sh

Request Multiple Cores on a Single Node


bsub -n 16 -R "span[hosts=1]" ./my_job.sh

Submit to the hi-mem Queue


bsub -q hi-mem -n 32 -R "span[hosts=1]" ./my_job.sh

Set a Runtime Limit (recommended)


# Request 4 cores, limit to 24 hours
bsub -n 4 -W 24:00 ./my_job.sh

Check Queues, Nodes, and Running Jobs


bqueues        # queue summary and load
bhosts         # compute node status
bjobs          # your running and pending jobs
bjobs -u all   # all users' jobs

Job Script Template

Copy this template as a starting point. Lines beginning with #BSUB are directives read by OpenLava β€” they are not ordinary comments.


#!/bin/bash
#BSUB -J my_analysis          # Job name
#BSUB -q normal                # Queue: normal or hi-mem
#BSUB -n 8                     # Number of CPU cores
#BSUB -R "span[hosts=1]"       # Keep all cores on one node
#BSUB -W 12:00                 # Wall-clock limit (HH:MM)
#BSUB -o output_%J.log         # Stdout  (%J = job ID)
#BSUB -e error_%J.log          # Stderr

# --- Load your environment ---
source ~/.bashrc
conda activate myenv            # or module load, etc.

# --- Your commands ---
python run_analysis.py --input data.csv --output results/
πŸ’‘ Always set a -W wall-clock limit. Jobs without a limit can hold node resources indefinitely if something goes wrong.

Hi-mem Variant


#!/bin/bash
#BSUB -J big_model
#BSUB -q hi-mem
#BSUB -n 32
#BSUB -R "span[hosts=1]"
#BSUB -W 48:00
#BSUB -o output_%J.log
#BSUB -e error_%J.log

source ~/.bashrc
conda activate myenv
python train_model.py

bjobs / bkill Cheat Sheet

Viewing Jobs


# Your jobs (running + pending)
bjobs

# All users' jobs
bjobs -u all

# Detailed info for a specific job
bjobs -l 12345

# Show only running jobs
bjobs -r

# Show only pending jobs
bjobs -p

# Show finished / exited jobs
bjobs -d
bjobs -x

Canceling Jobs


# Kill a specific job by ID
bkill 12345

# Kill all your pending and running jobs
bkill 0

# Kill all jobs with a specific name
bkill -J my_analysis

Checking Node Load


# Node status summary
bhosts

# Detailed info for one node
bhosts -l haas-hpc01

# Live load (refreshes every few seconds)
lsload

Understanding bjobs Status Codes

StatusMeaning
RUNJob is actively running on a compute node
PENDJob is queued, waiting for available slots
DONEJob completed successfully (exit code 0)
EXITJob exited with an error β€” check your error_%J.log
SSUSPJob suspended by the system (e.g., load threshold exceeded)
USUSPJob suspended by the user via bstop

Advanced OpenLava Examples

Job Array (parameter sweep)


# Submit 100 jobs; $LSB_JOBINDEX (1–100) is available inside the script
bsub -J "sweep[1-100]" ./run_case.sh
πŸ’‘ Job arrays share a single job ID with an index suffix, e.g. 9876[42]. Use bjobs 9876 to see all array elements at once.

Interactive Session


bsub -Is -q interactive bash

Job Dependency (run B after A succeeds)


# Submit job A and capture its ID
JOB_A=$(bsub ./job_a.sh | grep -oP '(?<=Job <)\d+')

# Submit job B to run only after A finishes successfully
bsub -w "done($JOB_A)" ./job_b.sh

Linux Bash Essentials


ls -lh          # list files with sizes
du -sh *        # disk usage per item
top             # live process monitor
df -h           # filesystem disk usage
htop            # interactive process viewer (if installed)

Making Conda / Mamba Work Properly


conda init bash
source ~/.bashrc
mamba create -n research python=3.11
conda activate research
πŸ’‘ Always activate your conda environment inside your job script (after source ~/.bashrc) so compute nodes use the correct Python and packages.

MariaDB Database Server (haas-hpc11)

haas-hpc11 hosts a shared MariaDB relational database server available to all Haas cluster users. It is the right tool when your research involves structured data that benefits from SQL queries, joins across large tables, or sharing a dataset with collaborators on the cluster without copying files.

Available Datasets

DatasetDescriptionAccess
OpenAlex A fully open catalog of the global research system β€” over 250 million scholarly works, authors, institutions, journals, and citation relationships. Useful for bibliometrics, citation network analysis, and science-of-science research. Request access from Haas IT
Nielsen Scanner & Panel Data Retail point-of-sale and consumer panel data. See the Nielsen FAQ below for details. Restricted β€” see Nielsen FAQ

Connecting to the Database

Connect from any cluster node using the standard MariaDB client:


mysql -h haas-hpc11 -u your_username -p

Or specify a database directly:


mysql -h haas-hpc11 -u your_username -p openalex

Connecting from Python


import pymysql
import pandas as pd

conn = pymysql.connect(
    host="haas-hpc11",
    user="your_username",
    password="your_password",
    database="openalex"
)

df = pd.read_sql("SELECT * FROM works LIMIT 100", conn)
conn.close()

Connecting from R


library(DBI)
library(RMariaDB)

con <- dbConnect(MariaDB(),
  host     = "haas-hpc11",
  user     = "your_username",
  password = "your_password",
  dbname   = "openalex"
)

df <- dbGetQuery(con, "SELECT * FROM works LIMIT 100")
dbDisconnect(con)

Requesting Access

Database accounts are not created automatically. To request access, contact Haas IT and include:

  • Your cluster username
  • Which dataset(s) you need access to
  • A brief description of your research use
πŸ’‘ OpenAlex is free to use for all Haas cluster users β€” just request an account. No additional approvals or licensing are required.
⚠️ The database server is a shared resource. Avoid running very large unoptimized queries during peak hours. Use LIMIT while developing queries, and add WHERE clauses and indexes where possible before running full-table scans.

Nielsen Scanner & Panel Data

The Haas cluster hosts a licensed copy of two Nielsen datasets widely used in marketing, economics, and consumer behavior research. Access is restricted to authorized researchers due to licensing requirements.

What is Nielsen Scanner Data?

Nielsen Scanner Data (also called Retail Scanner Data or RMS β€” Retail Measurement Services) captures weekly point-of-sale transaction records from a large national sample of retail stores, including grocery, drug, and mass-merchandise outlets. Each record contains:

  • UPC-level product information (brand, size, category)
  • Weekly unit sales and revenue by store
  • Price and promotional flag (feature ad, display, temporary price reduction)
  • Store identifiers with market and channel type

Scanner data is well suited for studying pricing strategy, promotional effectiveness, market competition, and demand elasticity at the product and category level.

What is Nielsen Panel Data?

Nielsen Panel Data (also called HMS β€” Homescan Consumer Panel) tracks the purchasing behavior of a nationally representative panel of households over time. Panelists scan all their retail purchases at home using a handheld scanner. Each record includes:

  • Household demographics (income, household size, age, education β€” anonymized)
  • Every UPC purchased, the store visited, price paid, and any coupon use
  • Purchase date and quantity

Panel data is especially useful for studying household brand loyalty, switching behavior, coupon responsiveness, and the effect of marketing on individual consumers over time.

How Scanner and Panel Data Complement Each Other

Scanner data tells you what sold and at what price across stores. Panel data tells you who bought it and where they shopped. Together they are a powerful combination for linking supply-side pricing decisions to demand-side consumer responses.

⚠️ Access is restricted. Nielsen data is licensed and may only be used for approved academic research. You must have a signed data use agreement on file before accessing these tables. Do not copy, export, or share raw data outside the cluster environment.

Requesting Access

To request access to the Nielsen datasets, contact Haas IT with the following:

  • Your name, cluster username, and faculty sponsor (if applicable)
  • A brief description of your research project and intended use of the data
  • Confirmation that you have read and agree to the Nielsen data use terms

Haas IT will verify your eligibility and provision database access to the relevant Nielsen schemas on haas-hpc11.

πŸ’‘ If you are unsure whether Nielsen data is appropriate for your project, reach out to Haas IT or your faculty advisor before requesting access.

Windows Cluster Access

Use Remote Desktop (RDP) to connect to the Windows research environment. Ensure VPN is active when off campus.

AEoD Virtual Desktop

AEoD provides virtual desktops through Citrix Workspace. Request access through Haas IT.