Cluster Overview
The Haas OpenLava cluster provides highβperformance batch computing for research workloads.
- Master / login node: hpc.haastech.org (haas-hpc00) β submit jobs here; do not run heavy workloads directly on this node
- Compute nodes: haas-hpc01 through haas-hpc10
- Special server: haas-hpc11 (SQL / JupyterHub support)
- Two queues: normal for standard jobs | hi-mem for large-memory workloads
Node Specifications
Use this table when deciding how many cores to request or which queue to target.
| Node | CPU Cores | Available RAM | Max Cores / Job | Queue |
|---|---|---|---|---|
| haas-hpc01 | 64 | 755 GiB | 40 | hi-mem |
| haas-hpc02 | 64 | 755 GiB | 40 | hi-mem |
| haas-hpc03 | 64 | 661 GiB | 40 | hi-mem |
| haas-hpc04 | 64 | 251 GiB | 40 | normal |
| haas-hpc05 | 40 | 251 GiB | 30 | normal |
| haas-hpc06 | 40 | 251 GiB | 30 | normal |
| haas-hpc07 | 40 | 251 GiB | 30 | normal |
| haas-hpc08 | 40 | 251 GiB | 30 | normal |
| haas-hpc09 | 40 | 251 GiB | 30 | normal |
| haas-hpc10 | 40 | 251 GiB | 30 | normal |
MXJ in lsb.hosts). Requesting more cores than this limit will cause your job to wait indefinitely in the queue.
Queue Guide β Which Queue Should I Use?
π’ normal (default)
- Nodes: hpc04 β hpc10
- RAM per node: up to 251 GiB
- Max cores per job: 30β40
- Max jobs per user: 200
- Max jobs in queue: 1,000
- Best for: most research workloads, array jobs, standard parallel jobs
π΅ hi-mem
- Nodes: hpc01, hpc02, hpc03
- RAM per node: 661 β 755 GiB
- Max cores per job: 40
- Best for: large in-memory datasets, genome assembly, ML model training requiring > 251 GiB RAM
Targeting a Specific Queue
# Submit to the normal queue (this is the default) bsub -q normal ./my_job.sh # Submit to the hi-mem queue bsub -q hi-mem ./my_job.sh
Check Current Queue Status
bqueues
How do I SSH into the cluster?
From macOS or Linux:
ssh yourusername@hpc.haastech.org
Windows users should use MobaXterm or PuTTY.
FastX Remote Desktop
FastX provides a full graphical desktop session on the cluster, accessible from your browser or a native desktop client. It is ideal for running GUI applications such as RStudio, MATLAB, or any graphical tool without needing a local installation.
Step 1 β Download the FastX Client
Step 2 β Install the Client
- macOS: Open the
.dmgand drag FastX to your Applications folder. - Windows: Run the
.exeinstaller and follow the prompts. - Linux (DEB):
sudo dpkg -i fastx-client_*.deb - Linux (RPM):
sudo rpm -i fastx-client_*.rpm
Step 3 β Connect to the Cluster
- Host:
hpc.haastech.org - Port:
3300(default FastX port) - Username: your Haas cluster username
- Authentication: Password or SSH key (same credentials as SSH access)
Step 4 β Start a Desktop Session
Click + to create a new session. Choose XFCE or KDE for best performance. Your session persists if you close the client window β you can resume it later without losing your work.
Resuming or Ending a Session
Existing sessions appear in the FastX session list when you connect β click one to resume. To fully terminate a session and free resources, right-click it and choose Terminate. Simply closing the window suspends without terminating.
Submitting Jobs with OpenLava
Basic Job Submission
# Submit a script to the default (normal) queue bsub ./my_job.sh
Request Multiple Cores on a Single Node
bsub -n 16 -R "span[hosts=1]" ./my_job.sh
Submit to the hi-mem Queue
bsub -q hi-mem -n 32 -R "span[hosts=1]" ./my_job.sh
Set a Runtime Limit (recommended)
# Request 4 cores, limit to 24 hours bsub -n 4 -W 24:00 ./my_job.sh
Check Queues, Nodes, and Running Jobs
bqueues # queue summary and load bhosts # compute node status bjobs # your running and pending jobs bjobs -u all # all users' jobs
Job Script Template
Copy this template as a starting point. Lines beginning with #BSUB are directives read by OpenLava β they are not ordinary comments.
#!/bin/bash #BSUB -J my_analysis # Job name #BSUB -q normal # Queue: normal or hi-mem #BSUB -n 8 # Number of CPU cores #BSUB -R "span[hosts=1]" # Keep all cores on one node #BSUB -W 12:00 # Wall-clock limit (HH:MM) #BSUB -o output_%J.log # Stdout (%J = job ID) #BSUB -e error_%J.log # Stderr # --- Load your environment --- source ~/.bashrc conda activate myenv # or module load, etc. # --- Your commands --- python run_analysis.py --input data.csv --output results/
-W wall-clock limit. Jobs without a limit can hold node resources indefinitely if something goes wrong.
Hi-mem Variant
#!/bin/bash #BSUB -J big_model #BSUB -q hi-mem #BSUB -n 32 #BSUB -R "span[hosts=1]" #BSUB -W 48:00 #BSUB -o output_%J.log #BSUB -e error_%J.log source ~/.bashrc conda activate myenv python train_model.py
bjobs / bkill Cheat Sheet
Viewing Jobs
# Your jobs (running + pending) bjobs # All users' jobs bjobs -u all # Detailed info for a specific job bjobs -l 12345 # Show only running jobs bjobs -r # Show only pending jobs bjobs -p # Show finished / exited jobs bjobs -d bjobs -x
Canceling Jobs
# Kill a specific job by ID bkill 12345 # Kill all your pending and running jobs bkill 0 # Kill all jobs with a specific name bkill -J my_analysis
Checking Node Load
# Node status summary bhosts # Detailed info for one node bhosts -l haas-hpc01 # Live load (refreshes every few seconds) lsload
Understanding bjobs Status Codes
| Status | Meaning |
|---|---|
RUN | Job is actively running on a compute node |
PEND | Job is queued, waiting for available slots |
DONE | Job completed successfully (exit code 0) |
EXIT | Job exited with an error β check your error_%J.log |
SSUSP | Job suspended by the system (e.g., load threshold exceeded) |
USUSP | Job suspended by the user via bstop |
Advanced OpenLava Examples
Job Array (parameter sweep)
# Submit 100 jobs; $LSB_JOBINDEX (1β100) is available inside the script bsub -J "sweep[1-100]" ./run_case.sh
9876[42]. Use bjobs 9876 to see all array elements at once.Interactive Session
bsub -Is -q interactive bash
Job Dependency (run B after A succeeds)
# Submit job A and capture its ID JOB_A=$(bsub ./job_a.sh | grep -oP '(?<=Job <)\d+') # Submit job B to run only after A finishes successfully bsub -w "done($JOB_A)" ./job_b.sh
Linux Bash Essentials
ls -lh # list files with sizes du -sh * # disk usage per item top # live process monitor df -h # filesystem disk usage htop # interactive process viewer (if installed)
Making Conda / Mamba Work Properly
conda init bash source ~/.bashrc mamba create -n research python=3.11 conda activate research
source ~/.bashrc) so compute nodes use the correct Python and packages.MariaDB Database Server (haas-hpc11)
haas-hpc11 hosts a shared MariaDB relational database server available to all Haas cluster users. It is the right tool when your research involves structured data that benefits from SQL queries, joins across large tables, or sharing a dataset with collaborators on the cluster without copying files.
Available Datasets
| Dataset | Description | Access |
|---|---|---|
| OpenAlex | A fully open catalog of the global research system β over 250 million scholarly works, authors, institutions, journals, and citation relationships. Useful for bibliometrics, citation network analysis, and science-of-science research. | Request access from Haas IT |
| Nielsen Scanner & Panel Data | Retail point-of-sale and consumer panel data. See the Nielsen FAQ below for details. | Restricted β see Nielsen FAQ |
Connecting to the Database
Connect from any cluster node using the standard MariaDB client:
mysql -h haas-hpc11 -u your_username -p
Or specify a database directly:
mysql -h haas-hpc11 -u your_username -p openalex
Connecting from Python
import pymysql
import pandas as pd
conn = pymysql.connect(
host="haas-hpc11",
user="your_username",
password="your_password",
database="openalex"
)
df = pd.read_sql("SELECT * FROM works LIMIT 100", conn)
conn.close()
Connecting from R
library(DBI) library(RMariaDB) con <- dbConnect(MariaDB(), host = "haas-hpc11", user = "your_username", password = "your_password", dbname = "openalex" ) df <- dbGetQuery(con, "SELECT * FROM works LIMIT 100") dbDisconnect(con)
Requesting Access
Database accounts are not created automatically. To request access, contact Haas IT and include:
- Your cluster username
- Which dataset(s) you need access to
- A brief description of your research use
LIMIT while developing queries, and add WHERE clauses and indexes where possible before running full-table scans.Nielsen Scanner & Panel Data
The Haas cluster hosts a licensed copy of two Nielsen datasets widely used in marketing, economics, and consumer behavior research. Access is restricted to authorized researchers due to licensing requirements.
What is Nielsen Scanner Data?
Nielsen Scanner Data (also called Retail Scanner Data or RMS β Retail Measurement Services) captures weekly point-of-sale transaction records from a large national sample of retail stores, including grocery, drug, and mass-merchandise outlets. Each record contains:
- UPC-level product information (brand, size, category)
- Weekly unit sales and revenue by store
- Price and promotional flag (feature ad, display, temporary price reduction)
- Store identifiers with market and channel type
Scanner data is well suited for studying pricing strategy, promotional effectiveness, market competition, and demand elasticity at the product and category level.
What is Nielsen Panel Data?
Nielsen Panel Data (also called HMS β Homescan Consumer Panel) tracks the purchasing behavior of a nationally representative panel of households over time. Panelists scan all their retail purchases at home using a handheld scanner. Each record includes:
- Household demographics (income, household size, age, education β anonymized)
- Every UPC purchased, the store visited, price paid, and any coupon use
- Purchase date and quantity
Panel data is especially useful for studying household brand loyalty, switching behavior, coupon responsiveness, and the effect of marketing on individual consumers over time.
How Scanner and Panel Data Complement Each Other
Scanner data tells you what sold and at what price across stores. Panel data tells you who bought it and where they shopped. Together they are a powerful combination for linking supply-side pricing decisions to demand-side consumer responses.
Requesting Access
To request access to the Nielsen datasets, contact Haas IT with the following:
- Your name, cluster username, and faculty sponsor (if applicable)
- A brief description of your research project and intended use of the data
- Confirmation that you have read and agree to the Nielsen data use terms
Haas IT will verify your eligibility and provision database access to the relevant Nielsen schemas on haas-hpc11.
Windows Cluster Access
Use Remote Desktop (RDP) to connect to the Windows research environment. Ensure VPN is active when off campus.
AEoD Virtual Desktop
AEoD provides virtual desktops through Citrix Workspace. Request access through Haas IT.