Limitations
Linux container works like you own a bare metal machine. However, there are occasions when you notice something does not work as intended. Here list some common cases. You are welcomed to contribute more to this section.
GPU Passthrough
Contributed by Ray
After running a python CUDA script, the processes cannot be seen in the output of nvidia-smi
.
ubuntu@account$ nvidia-smi
...
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
This is because we utilize the LXC passthrough to pass every GPU to multiple containers. Read this article for details. Your /dev/nvidia*
are host devices, and nvidia-smi
cannot show processes running on the host's kernel. The admin can view the running processes.
ubuntu@admin$ nvidia-smi
...
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3835077 C python3 4898MiB |
| 0 N/A N/A 3835078 C python3 4898MiB |
...
Trivia
NVIDIA limits their consumer GPUs to be passed to a maximum of three concurrent containers. However, our GPUs are professional GPUs that do not have such a limit.
Monitor GPU Process
To view the status of your GPU processes, you can visit the Grafana GPU processes page. The page visualizes the information fetched from the host, and allows you to monitor your memory and computation usage without nvidia-smi
. It also enables you to track your previous runs. For instance, by selecting To=now-4h, you can view the processes that were running four hours ago.
Kill GPU Process
Even though you cannot see the running processes in nvidia-smi
, you can still locate them in ps aux
. For example, to kill all your python processes, run
kill $(ps aux | grep '[p]ython' | awk '{print $2}')
WARNING
The same process's PID on the container OS and on the host OS has no correlation due to implementation complication, so you cannot kill a process by its PID shown on Grafana. Killing all python
processes will also restart your Jupyter Lab. A better approach is to use ps aux | grep '[p]ython'
to identify the PID first.
ubuntu@account$ ps aux | grep '[p]ython'
ubuntu 1266303 ... python3 run.py ...
ubuntu@account$ kill -9 1266303
Nested Container
It is technically feasible to create a Docker container inside your container, but it is not allowed by default. Most Docker containers require intercepting and emulating system calls, which cannot be executed in default unprivileged containers. While the host can grant such permission, a privileged server is not always root-safe. In other words, if a privileged container is compromised, a hacker may escape from the container and breach the host OS.
LXC upstream's position is that those containers aren't and cannot be root-safe. They are still valuable in an environment where you are running trusted workloads or where no untrusted task is running as root in the container. —— Linux Containers - LXC - Security
Our position is Docker-based development should be deployed to the Nautilus cluster. However, you may request privilege on a short term basis with valid reason (e.g., to build a Docker image for your accepted paper). However, docker overlay2 storage driver does not work well with LXC, so you may need to downgrade docker to 2.3.0 in order to use the removed overlay
driver. See Troubleshooting for more information.
Firewall
Your container resides on the internal network managed by the host OS. When you host a web service, such as Jupyter Lab at localhost:8888
, it is not directly accessible from the outside. The host OS listens the requests on a external port (for example, roselab1.ucsd.edu:18888
) and forwards the requests to the container.
As a result, when you run a service that listen only on localhost, you may be surprised to discover that the service is still accessible everywhere.
# ip=0.0.0.0 means the server will accept request from anywhere
# ip=localhost means that the server only accept requests from the server
$ jupyter notebook --ip=localhost
Jupyter Notebook is running at: http://localhost:8888
# Able to see it at roselab1.ucsd.edu:18888
This is because the request made by an outsider, after being forwarded by the host, appears to be a request from the host. While LXC supports the PROXY protocol to allow a service to view the source IP address, the Jupyter service does not respect this protocol. Similarily, most firewall services cannot inspect the source IP as well.
As a result, it is not recommended to establish your own firewall inside the container. If you want to control access to your service, for example, only allowing your work PC to access your Jupyter Lab, please contact the admin to add a firewall rule to the host OS.