Python 环境配置

记录一下当前 Python 涉及的一些环境配置。

基本信息

当前本地以及服务器的Python和CUDA的基本信息如下：

Windows（以及WSL）：
- 系统 Windows 11
- miniconda
- Python 版本为 3.12.1
- CUDA 驱动最高支持版本 12.8
- CUDA Toolkit 版本 12.4（nvcc --version）
GPU服务器：
- 系统 Ubuntu 20.04
- anaconda
- Python 版本为 3.12.7
- CUDA 驱动最高支持版本 12.4
- CUDA Toolkit 版本 12.1

为了尽量保持本地和服务器的版本一致，选择使用 Python 3.12.x，CUDA 12.4。这里并没有刻意保证Python的小版本号一致，直接使用默认的Python版本，反正不同系统的包实际上也有很多区别。

常用包

记录一些常用的包

# 必备
conda install numpy scipy pandas matplotlib seaborn scikit-learn sympy jupyter

# 支持import ipynb文件
pip install import-ipynb

# 进度条
conda install tqdm

VSCode Python 配置

VSCode 与 Python 有关的插件如下：（巨硬把这些插件拆分的实在太细了）

Python
Python Debugger
Pylance：Python语言服务器，支持自动补全、代码提示等
Black Formatter：代码格式化
Flake8：代码静态分析
Jupyter 插件包：
- Jupyter
- Jupyter Keymap
- Jupyter Cell Tags
- Jupyter Notebook Renderers
- Jupyter Slide show

下面是目前VSCode关于Python的配置（2025年3月5日）

"[python]": {
    "editor.defaultFormatter": "ms-python.black-formatter",
    "editor.formatOnType": true,
    "editor.unicodeHighlight.allowedLocales": {
        "zh-hans": true,
        "zh-hant": true
    },
},

//[[Python]]
"python.terminal.activateEnvironment": false,
"python.languageServer": "Pylance",
"python.analysis.typeCheckingMode": "basic",
// python格式化基于black(需要单独vscode插件以及conda下载对应模块)
// python静态分析基于flake8(需要单独vscode插件以及conda下载对应模块)
// flake8静态分析的设置 flake8.severity
"flake8.severity": {
    "E": "Hint",
    "F": "Warning"
},
"black-formatter.args": [
    "--verbose",
],
"flake8.showNotifications": "onError",
"python.analysis.diagnosticSeverityOverrides": {
    "reportGeneralTypeIssues": "none"
},
"jupyter.askForKernelRestart": false,
"notebook.cellToolbarLocation": {
    "default": "right",
    "jupyter-notebook": "right"
},
"notebook.markup.fontSize": 16,
"notebook.output.textLineLimit": 50,
"notebook.output.scrolling": true,
"notebook.lineNumbers": "on",
"notebook.outline.showCodeCells": true,
"notebook.outline.showMarkdownHeadersOnly": false,
"notebook.diff.ignoreMetadata": true,

其中大部分都是细节配置，需要注意的是要单独下载 black 和 flake8 包

1	conda install black flake8

如果语法高亮等出现问题，最好先清理相关的配置缓存并重启VSCode，可以解决不少问题。

PyTorch 安装要求 / 安装命令

目前官网的 PyTorch 稳定版为 2.6.0，要求：

python >= 3.9
支持 cuda 12.4、cuda 12.6 或 cpu 版本
建议使用 pip 安装，只提供 pip 安装命令，不再提供 conda 安装命令

目前官网提供的几种安装命令如下

# (*) windows cuda=12.4
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

# windows cuda=12.6
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

# windows cpu
pip3 install torch torchvision torchaudio

# (*) linux cuda=12.4
pip3 install torch torchvision torchaudio

# linux cuda=12.6
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

# linux cpu
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

解释一下：

torch: 是 PyTorch 基础库，提供张量计算和自动求导等基础功能。
torchvision: 是 PyTorch 的一个子模块，专门用于视觉处理任务。
torchaudio: 是 PyTorch 的另一个子模块，专门用于音频处理任务。

conda 环境配置记录

Windows 的 conda 环境 myenv-pytorch 的配置记录：

# 创建环境
conda create --name myenv-pytorch python=3.12.1

# 激活环境
conda activate myenv-pytorch

# 确保在激活的conda环境中使用pip
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

# 安装常用包
conda install numpy scipy pandas matplotlib seaborn scikit-learn sympy jupyter

GPU服务器的 conda 环境 myenv-pytorch-linux 的配置记录：

# 创建环境
conda create --name myenv-pytorch-linux python=3.12.7

# 激活环境
conda activate myenv-pytorch-linux

# 确保在激活的conda环境中使用pip
pip3 install torch torchvision torchaudio

# 安装常用包
conda install numpy scipy pandas matplotlib seaborn scikit-learn sympy jupyter

注意：

不要随便升级numpy，这可能导致pytorch无法正常运行。
上面两个conda环境中实际选择的numpy版本不一样，可能是python版本和平台差异导致的。

PyTorch 测试脚本

可以跑一段 Python 脚本来检测当前环境安装的 PyTorch 是否正常运行，是否支持使用CUDA，以及检测GPU的信息

import torch

def print_gpu_info():
    if torch.cuda.is_available():
        num_gpus = torch.cuda.device_count()
        print(f"CUDA is available! Number of GPUs: {num_gpus}\n")

        for i in range(num_gpus):
            prop = torch.cuda.get_device_properties(i)
            print(f"GPU {i} Name: {prop.name}")
            print(f"GPU {i} Total Memory: {prop.total_memory / (1024 ** 3):.2f} GB")
            print(f"GPU {i} Compute Capability: {prop.major}.{prop.minor}\n")
    else:
        print("CUDA is not available.")

if __name__ == "__main__":
    print_gpu_info()

例如在个人笔记本的输出

CUDA is available! Number of GPUs: 1

GPU 0 Name: NVIDIA GeForce RTX 4060 Laptop GPU
GPU 0 Total Memory: 8.00 GB
GPU 0 Compute Capability: 8.9

在GPU服务器上的输出

CUDA is available! Number of GPUs: 6

GPU 0 Name: NVIDIA RTX A6000
GPU 0 Total Memory: 47.53 GB
GPU 0 Compute Capability: 8.6

GPU 1 Name: NVIDIA RTX A6000
GPU 1 Total Memory: 47.53 GB
GPU 1 Compute Capability: 8.6

GPU 2 Name: NVIDIA RTX A6000
GPU 2 Total Memory: 47.53 GB
GPU 2 Compute Capability: 8.6

GPU 3 Name: NVIDIA RTX A6000
GPU 3 Total Memory: 47.53 GB
GPU 3 Compute Capability: 8.6

GPU 4 Name: NVIDIA RTX A6000
GPU 4 Total Memory: 47.53 GB
GPU 4 Compute Capability: 8.6

GPU 5 Name: NVIDIA RTX A6000
GPU 5 Total Memory: 47.53 GB
GPU 5 Compute Capability: 8.6

为了指定Pytorch使用GPU，在代码中通常有如下语句

1	device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

对于多个GPU的情形，可以手动加上编号来指定使用的GPU，例如cuda:0，cuda:1，cuda:2等

1	device = torch.device('cuda:2' if torch.cuda.is_available() else 'cpu')

虽然cuda:0和cuda都指代第一个显卡，但是直接进行比较仍然可能被视作不同设备。

动手学深度学习 docker 配置

为了学习李沐的动手学深度学习，可以使用 docker 配置对应的环境。（参考 leaning_d2l）

Dockerfile参考如下

FROM nvcr.io/nvidia/cuda:12.4.0-runtime-ubuntu20.04

# 设置标签
LABEL maintainer="fenglielie"
LABEL version="1.0"
LABEL description="This is a dockerfile for building a container with CUDA 12.4.0 runtime on Ubuntu 20.04 for learning d2l"

# 设置环境变量
ENV LANG=C.UTF-8
ENV DEBIAN_FRONTEND=noninteractive

# 设置工作目录
WORKDIR /root

# 更换APT源并更新系统
RUN cp -a /etc/apt/sources.list /etc/apt/sources.list.bak && \
    sed -i "s@http://.*archive.ubuntu.com@http://mirrors.huaweicloud.com@g" /etc/apt/sources.list && \
    sed -i "s@http://.*security.ubuntu.com@http://mirrors.huaweicloud.com@g" /etc/apt/sources.list && \
    apt-get update && \
    apt-get install -y --no-install-recommends software-properties-common && \
    add-apt-repository -y ppa:deadsnakes/ppa && \
    apt-get update

# 配置 SSH 以允许远程访问
RUN mkdir /etc/ssh && \
    echo "PermitRootLogin yes" >> /etc/ssh/sshd_config && \
    echo "PasswordAuthentication yes" >> /etc/ssh/sshd_config

# 安装 Python 3.9 及相关依赖
RUN apt-get install -y --no-install-recommends \
    python3.9 \
    python3.9-dev \
    python3.9-distutils \
    python3.9-venv \
    build-essential \
    vim \
    wget \
    openssh-server && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

# 设置 Python 3.9 为默认版本
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 1 && \
    update-alternatives --config python3

# 安装 pip 并配置 PyPI 镜像源
RUN wget -O get-pip.py https://bootstrap.pypa.io/get-pip.py && \
    python3 get-pip.py && \
    pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

# 安装 Python 依赖包
RUN pip install --no-cache-dir \
    torch==1.12.0 \
    torchvision==0.13.0 \
    d2l==0.17.6 \
    jupyter

# 开放 SSH 端口
EXPOSE 22

# 运行 bash 作为默认命令
CMD ["/bin/bash"]

构建docker镜像命令如下

1	docker build -t learn-d2l:1.0 .

在服务器中启动容器的命令如下

docker run -itd --name myd2l \
    --net=host --ipc=host --gpus=all \
    --mount type=bind,source=$HOME/learn-d2l,target=/learn-d2l \
    learn-d2l:1.0 /bin/bash

注：

--net=host：直接使用宿主机的网络，此时不再需要类似于 -p 8888:8888 的端口映射；
--ipc-host：可以优化性能；
--gpus=all：使用全部GPU资源；
--mount：挂载目录，需要保证宿主机的 $HOME/learn-d2l 目录已经存在。

在容器中开启 jupyter notebook

1	jupyter notebook --no-browser --notebook-dir=/learn-d2l --port=8888 --allow-root

在本地可以使用ssh本地转发

1	ssh -N -L 8888:localhost:8888 user@hostname

然后在本地浏览器访问 localhost:8888 即可。

补充说明：这里使用 Docker 的做法还是太复杂了，涉及 Docker 与外部的网络问题，目录以及用户权限问题等，使用 Docker 的额外复杂度远高于搭建环境本身。