Getting Started with RoseLab Servers
Overview
The RoseLab servers, owned and managed by the UCSD CSE Rose Lab, provide a robust platform for machine learning research. These servers offer:
- Development and execution of models within Linux Containers
- Real-time machine metrics tracking via Grafana
- Data sharing and backup through Seafile
- S3 dataset hosting with MinIO
- Online markdown collaboration using Hedgedoc
- Self-hosted experiment tracking via WandB (contact admin for invitation email)
- Lab-shared ChatGPT service frontend BetterGPT (contact admin for backend API access)
Additional web applications are planned to support future research needs.
Hardware
Located in Rack C05 of the CSE server room 1215, the RoseLab servers include:
- roselab1: Gigabyte G292-Z40: 2x AMD EPYC 7453 CPU, 512 GB RAM, 4 TB home SSD, 2 x NVIDIA A100 80GB PCIe GPUs, 2 x NVIDIA A100 40GB PCIe GPUs
- roselab2: Asus ESC8000A: 2x AMD EPYC 7282 CPU, 512 GB RAM, 4 TB home SSD, 8x NVIDIA GeForce RTX 4090 GPUs
- roselab3: Asus ESC8000A: 2x AMD EPYC 7282 CPU, 512 GB RAM, 4 TB home SSD, 8x NVIDIA GeForce RTX 4090 GPUs
- roselab4: Gigabyte G482-Z54: 2x AMD EPYC 7313 CPU, 1 TB RAM, 8 TB home SSD, 8 x NVIDIA L40S GPUs
- rosedata: Supermicro 12-bay Storage server (6x 20TB hard drives)
Regarding the hardware difference, referring to the Lambda Labs GPU benchmark.
Note
The RoseLab servers are in early development. User feedback is appreciated. More hardware additions are planned. For information on the rationale behind the servers, see the Why RoseLab section.
Quick Start
1. Apply for Access
Submit your request using the Account Request Form.
- Don't worry about preset image, port mappings, or resource quota; these can be adjusted later by contacting the admin.
- Don't worry about your initial machine assignment; you can create containers on other machines once you have access, as described in the Moving between Machines guide.
- You'll have root permission to install or remove software (except the NVIDIA driver and kernel modules).
WARNING
The host and container NVIDIA driver versions must match. (You cannot change host driver versions, which are in the config table) If you see Failed to initialize NVML: Driver/library version mismatch
, contact the admin.
2. Access Your Container
Upon approval, you'll receive an email with two tables:
Port mappings table:
<id>
is a 3-digit number.Container Port → Host Port 22 (SSH) <id>
003389 (RDP) <id>
015900 (VNC) <id>
0280 (HTTP) <id>
03443 (HTTPS) <id>
048080 (Web server, e.g. Tomcat) <id>
058888 (Jupyter) <id>
068889 (Backup) <id>
078890 (Backup) <id>
08Account credentials table:
description account password System, Remote Desktop ubuntu <token1>
S3 Object Storage <name>
<token2>
Seafile <email>
<token3>
Hedgedoc <email>
<token4>
Jupyter <token5>
SSH ubuntu <keyfile>
Note
- If Jupyter password is not provided, it is at line 991 of
~/.jupyter/jupyter_lab_config.py
. - If S3 credential is not provided, the name is the same as your container name, and the password is the same as Seafile.
- Keep the tables in a secure place and do not share with others.
Note
Each container appears as a dedicated machine from the inside. The default username is ubuntu
. Changing this username is not recommended due to static configurations.
3. SSH Login
Assuming access to roselab1
(replace with your assigned machine if different):
For Unix-based systems (macOS/Linux):
Move the private key to
~/.ssh
and set permissions:bashmv path/to/keyfile ~/.ssh/ chmod 600 ~/.ssh/keyfile
Connect via SSH:
bashssh ubuntu@roselab1.ucsd.edu -p <id>00 -i ~/.ssh/keyfile
For Windows:
If you're using PowerShell or Command Prompt:
- Move the keyfile to a known location (e.g.,
C:\Users\YourUsername\.ssh\
) - You may need to follow this StackOverflow guide to set the correct permissions.
- Move the keyfile to a known location (e.g.,
Connect via SSH:
powershellssh ubuntu@roselab1.ucsd.edu -p <id>00 -i C:\Users\YourUsername\.ssh\keyfile
If you're using PuTTY:
- Convert the keyfile to PPK format using PuTTYgen
- In PuTTY, set the hostname to
roselab1.ucsd.edu
, port to<id>00
- Under Connection > SSH > Auth, browse to your PPK file
- Click 'Open' to connect
Note
While the login does not require a VPN, the UCSD_GUEST network may block SSH. Please use a different network if you encounter issues.
VSCode RemoteSSH (Recommended)
For a more integrated development experience:
Edit
~/.ssh/config
:Host roselab1 HostName roselab1.ucsd.edu User ubuntu Port <id>00 IdentityFile ~/.ssh/keyfile
Install the "Remote - SSH" VSCode extension.
Open the Command Palette (Ctrl+Shift+P or Cmd+Shift+P) and search for "Remote-SSH: Connect to Host".
Select "roselab1" from the list of configured SSH hosts.
Troubleshooting: SSH Known Host Issues
If you encounter an SSH connection failure with a message about host key verification, e.g.,
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
it's likely due to changes in the network architecture or server configuration. This is common when you delete your container and create a new one (see Moving between Machines). To resolve this:
Remove the old host key from your known_hosts file:
bashssh-keygen -R [roselab1.ucsd.edu]:[id]00
Replace
[id]00
with your assigned SSH port.After removing the old key, try connecting again. You'll be prompted to add the new host key:
The authenticity of host '[roselab1.ucsd.edu]:[id]00 ([IP_ADDRESS])' can't be established. ED25519 key fingerprint is SHA256:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX. Are you sure you want to continue connecting (yes/no/[fingerprint])?
Type 'yes' to add the new key to your known_hosts file.
Note
If you're still experiencing connection issues after this step, please contact the RoseLab administrator for further assistance. There might be additional network or configuration changes that need to be addressed.
4. Explore Your Container
Now let's check the resources assigned to you. First, use lscpu
to check the CPU cores. Although the CPU indices may differ, you should see 12 online CPU cores. Here's an example output:
$ lscpu
...
CPU(s): 56
On-line CPU(s) list: 4,6,11,13,18,19,23,29,34,38,41,42
Off-line CPU(s) list: 0-3,5,7-10,12,14-17,20-22,24-28,30-33,35-37,39,40,43-55
...
Next, you can inspect the memory assigned to you using the /proc/meminfo
file. You should see around 128 GB of total RAM.
$ cat /proc/meminfo
MemTotal: 125000000 kB
MemFree: 96093828 kB
MemAvailable: 96883860 kB
To see the file system, run df -H
. You would see
- the system SSD with around 256 GB of available space,
- a 5 TB private data HDD mounted under
/data
that is only accessible to you, and - a 5 TB public data HDD mounted under
/public
that is accessible to everyone.
Note
The system SSD is faster but has limited space and is local to the machine. The HDD is synchronized between machines and is suitable for large datasets. When you modify files in /data
in one container, the changes will be reflected in your containers on all other machines. Use the HDD for larger or archived project data.
$ df -H
Filesystem Size Used Avail Use% Mounted on
zfs-pool/containers/account 290G 43G 248G 15% /
...
data/account-vol 5.0T 263k 5.0T 1% /data
data/public 5.0T 263k 5.0T 1% /public
It is recommended to use soft links to access your data files on the /data
HDD. For example, instead of downloading your data files to /data/project1/sample...pt
and hard-coding their absolute paths, you can create a soft link under the code folder using the ln -s /data/project1/ /home/ubuntu/project1/data/
command. Then, you can refer to the data files as if they and the code are in the same project structure.
5. Verify Web Application Access
Refer to the documentation for Seafile, HedgeDoc, and MinIO to access these services. For Jupyter Lab or Remote Desktop, check the respective pages for login instructions.
What's Next?
Congratulations on setting up your RoseLab container! Here are some next steps:
- Create additional containers on other machines referring to the Moving between Machines guide.
- Review the differences between containers and dedicated servers.
- Enhance your security by updating passwords and keys.
- If you encounter performance issues, consult the Troubleshooting section.
Support
For questions or assistance, contact the admin Zihao Zhou (ziz244@ucsd.edu).