最後更新: 2021-02-12
BIOS
Nvidia bios is encrypted. No bios mod available.
NVIDIA SLI (Scalable Link Interface)
* 須要相同型號的顯示卡
- 3-way SLI
- Quad SLI
Install NVIDIA
Check HW Installed
lspci | grep NVIDIA
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1) 01:00.1 Audio device: NVIDIA Corporation Device 10f1 (rev a1)
lshw -C display
... *-display description: VGA compatible controller product: NVIDIA Corporation vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:01:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress vga_controller bus_master cap_list rom configuration: driver=nvidia latency=0 resources: irq:42 memory:fd000000-fdffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:e000(size=128) memory:fe000000-fe07ffff
Install Driver
apt-get install build-essential
# 因為 Ubuntu package - nvidia-384 沒有 nvidia-settings 及 opencl 所以才自行 Download
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/384.111/NVIDIA-Linux-...
chmod 700 NVIDIA-Linux-x86_64-384.111.run
./NVIDIA-Linux-x86_64-384.111.run
P.S.
如果在 Ubuntu 安裝最新 Kernel (linux-image-4.15.0-999-generic_4.15.0-999.201801292100_amd64.deb) 那要安以下 Package
- linux-headers-4.15.0-999_4.15.0-999.201801292100_all.deb
- linux-headers-4.15.0-999-generic_4.15.0-999.201801292100_amd64.deb
Check Driver Version
# 安裝完的見到它們, card0 應該係 onboard display
ls -1 /dev/dri/card*
# Version
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.111 Tue Dec 19 23:51:45 PST 2017 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)
Error msg
ERROR: The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding.
cat /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
blacklist nouveau options nouveau modeset=0
update-initramfs -u
Or
kernel parameter
# for Nvidia graphics
nomodeset nouveau.modeset=0
nomodeset
instructs the kernel to not load video drivers and use BIOS modes instead until X is loaded.
nouveau.modeset=0
The nouveau project aims to build high-quality, free/libre software drivers for nVidia cards.
The xxx.modeset=0 disables kernel mode setting for the hardware.
Driver with Cuda Version
# Check current cuda version
nvidia-smi
List
NVIDIA-Linux-x86_64-455.45.01 # CUDA Version: 11.1
nvidia-settings
lightdm 成功與 Xorg 成功 start 後
# List GPU unit in X
DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -c :0 -q gpus
1 GPU on home:0 [0] home:0[gpu:0] (GeForce GTX 1060 6GB) Has the following names: GPU-0 GPU-x-x-x-x-x
# Verify your GPU is in that perf level:
DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -q '[gpu:0]/GPUCurrentPerfLevel'
Attribute 'GPUCurrentPerfLevel' (home:0[gpu:0]): 2. 'GPUCurrentPerfLevel' is an integer attribute. 'GPUCurrentPerfLevel' is a read-only attribute. 'GPUCurrentPerfLevel' can use the following target types: X Screen, GPU.
# check GPU 可以行幾快
DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -q '[gpu:0]/GPUPerfModes'
Attribute 'GPUPerfModes' (home:0[gpu:0]): perf=0, nvclock=139, nvclockmin=139, nvclockmax=607, nvclockeditable=1, memclock=405, memclockmin=405, memclockmax=405, memclockeditable=1, memTransferRate=810, memTransferRatemin=810, memTransferRatemax=810, memTransferRateeditable=1 ; perf=1, nvclock=139, nvclockmin=139, nvclockmax=1911, nvclockeditable=1, memclock=810, memclockmin=810, memclockmax=810, memclockeditable=1, memTransferRate=1620, memTransferRatemin=1620, memTransferRatemax=1620, memTransferRateeditable=1 ; perf=2, nvclock=202, nvclockmin=202, nvclockmax=1974, nvclockeditable=1, memclock=3802, memclockmin=3802, memclockmax=3802, memclockeditable=1, memTransferRate=7604, memTransferRatemin=7604, memTransferRatemax=7604, memTransferRateeditable=1 ; perf=3, nvclock=202, nvclockmin=202, nvclockmax=1974, nvclockeditable=1, memclock=4004, memclockmin=4004, memclockmax=4004, memclockeditable=1, memTransferRate=8008, memTransferRatemin=8008, memTransferRatemax=8008, memTransferRateeditable=1
OC
Fan
# Check
DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 \
nvidia-settings -c :0 -q 'GPUTargetFanSpeed'
-c CTRL-DISPLAY # Control the specified X display.
# Set speed to 70%
DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 \
nvidia-settings -c :0 \
-a 'GPUFanControlState=1' \
-a 'GPUTargetFanSpeed=70'
# 多 GPU 時 Control 風扇
#!/bin/bash DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 \ nvidia-settings -c :0 \ -a '[gpu:0]/GPUFanControlState=1' \ -a '[fan:0]/GPUTargetFanSpeed=70' DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 \ nvidia-settings -c :0 \ -a '[gpu:1]/GPUFanControlState=1' \ -a '[fan:1]/GPUTargetFanSpeed=60'
GPUGraphicsClockOffset & GPUMemoryTransferRateOffset
# You can not overclock without powermizer in xorg. You need to use coolbits.
# there's no way to overclock an nvidia GPU without X - nvidia-settings is the only way to currently oc 10xx cards.
nvidia-smi -ac MEM_CLOCK,GRAPHICS_CLOCK <-- 唔work
Powermizer with coolbits
# nvidia-xconfig to generate a minimal
# Sets cool-bits as needed in xorg to enable overclocking
nvidia-xconfig -a --enable-all-gpus \
--allow-empty-initial-configuration \
--cool-bits=31 \
--use-display-device="DFP-0" \
--connected-monitor="DFP-0"
......... Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1060 6GB" BusID "PCI:1:0:0" EndSection Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "UseDisplayDevice" "DFP-0" Option "Coolbits" "31" Option "ConnectedMonitor" "DFP-0" SubSection "Display" Depth 24 EndSubSection EndSection
/etc/init.d/lightdm restart
Checking
nvidia-smi
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 11964 G /usr/lib/xorg/Xorg 12MiB | | 1 11964 G /usr/lib/xorg/Xorg 6MiB | | 2 11964 G /usr/lib/xorg/Xorg 6MiB | | 3 11964 G /usr/lib/xorg/Xorg 6MiB | +-----------------------------------------------------------------------------+
PowerMizer
* values are offsets
# check
DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 \ nvidia-settings -c :0 -q 'GPUCurrentClockFreqsString'
# set
#!/bin/bash DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 \ nvidia-settings -a [gpu:0]/GPUPowerMizerMode=1 \ -a [gpu:0]/GPUGraphicsClockOffset[3]=50 \ -a [gpu:0]/GPUMemoryTransferRateOffset[3]=400
Be wary of very high overclocks as often the error correction is being used to achieve it and performance drops off.
To find the right amount you should overclock slowly and benchmark each stage until performance drops,
when it does back off and that is your actual overclock.
Remark
... \ -a [gpu:0]/GPUGraphicsClockOffsetAllPerformanceLevels=1100 \ -a [gpu:0]/GPUMemoryTransferRateOffsetAllPerformanceLevels=-400
Card Hang
[215304.057110] Process accounting resumed [240674.872126] NVRM: GPU at PCI:0000:05:00: GPU-6c4480a9-?-?-?-? [240674.872133] NVRM: GPU Board Serial Number: [240674.872138] NVRM: Xid (PCI:0000:05:00): 31, Ch 00000013, engmask 00000101, intr 10000000
OC 技術
It requires no configuration. It only increases the core clock, not the memory one.
GPU Boost 1 HW: GTX 680 ....
GPU Boost 2 HW: GTX 700 ....
GPU Boost 3 HW: GTX 1080
Offset 頻率不再固定, 每個電壓點都有對應的頻率Offset
nvidia-smi
# -i, --id=ID Display data for a single specified GPU or Unit.
# --query-gpu 必須與 --format 一起使用
nvidia-smi -i 0 --format=csv --query-gpu=power.limit
power.limit [W] 120.00 W
# -lms ms, --loop-ms=ms: Same as -l,--loop but in milliseconds.
nvidia-smi -i 0 -lms 500 --format=csv,noheader --query-gpu=temperature.gpu
37 37 37 ...
Other "--query-gpu" opt (--help-query-gpu)
- index # Zero based index of the GPU. Can change at each boot.
- count # The number of NVIDIA GPUs in the system.
- pstate # P0 (maximum performance) to P12 (minimum performance)
- memory.used
- utilization.memory
- timestamp # YYYY/MM/DD HH:MM:SS.msec
- pci.bus_id # PCI bus id as "domain:bus:device.function" in hex. ( 00000000:07:00.0 )
- pci.bus # PCI bus number, in hex. (0x07)
- pci.device # PCI device number, in hex. (0x00)
- serial # [Not Supported]
nvidia-smi --format=csv,noheader --query-gpu=index,pci.bus_id
Useful checking
nvidia-smi -l 1 --format=csv,noheader --query-gpu=index,utilization.gpu,temperature.gpu,fan.speed,power.draw
0, 100 %, 68, 67 %, 92.75 W 1, 100 %, 63, 48 %, 90.93 W 2, 100 %, 66, 60 %, 92.56 W 3, 100 %, 63, 50 %, 91.15 W 4, 100 %, 67, 64 %, 91.69 W 5, 100 %, 59, 35 %, 89.09 W 6, 100 %, 66, 58 %, 92.77 W 7, 100 %, 61, 42 %, 89.66 W 8, 100 %, 63, 51 %, 91.53 W
nvidia-smi -q -d PERFORMANCE
Timestamp : Mon Jan 29 21:53:05 2018 Driver Version : 384.111 Attached GPUs : 1 GPU 00000000:01:00.0 Performance State : P2 Clocks Throttle Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active
nvidia-smi -q -d CLOCK
Clocks Graphics : 139 MHz SM : 139 MHz Memory : 405 MHz Video : 544 MHz ... Max Clocks Graphics : 1974 MHz SM : 1974 MHz Memory : 4004 MHz Video : 1708 MHz ...
nvidia-smi -q -d SUPPORTED_CLOCKS
Timestamp : Tue Jan 30 19:29:25 2018 Driver Version : 390.25 Attached GPUs : 1 GPU 00000000:01:00.0 Supported Clocks : N/A
# monitoring of detail stats such as power
nvidia-smi stats -i 0 -d pwrDraw
查看 PCIe 資料
# The current PCI-E link width. These may be reduced when the GPU is not in use.
nvidia-smi -i 0 --format=csv,noheader --query-gpu=pcie.link.width.current
16
# 1 = 1x
# 16 = 16x
# The current PCI-E link generation. These may be reduced when the GPU is not in use.
nvidia-smi -i 0 --format=csv,noheader --query-gpu="pcie.link.gen.current"
2
其他 PCIe 資料另見
https://datahunter.org/pcie
Setting
# Set Power Limit
nvidia-smi -pl 130
Power limit for GPU 00000000:01:00.0 was set to 120.00 W from 130.00 W. All done.
# GPU Accounting
# keep track of usage of resources throughout lifespan of a single process.
# 0|DISABLED or 1|ENABLED
-am, --accounting-mode=MODE
# Enabled persistence mode for GPU
# Any settings below for clocks and power get reset between program runs unless you enable persistence mode (PM) for the driver.
# to persistence mode to keep the NVIDIA driver loaded even when no applications are accessing the cards
nvidia-smi -pm 1
新增 Display Card
nvidia-xconfig --enable-all-gpus
nvidia-xconfig --cool-bits=31
/etc/X11/xorg.conf 內建立了
Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1060 6GB" BusID "PCI:1:0:0" EndSection Section "Device" Identifier "Device1" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070" BusID "PCI:2:0:0" EndSection Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 Option "Coolbits" "31" SubSection "Display" Depth 24 EndSubSection EndSection Section "Screen" Identifier "Screen1" Device "Device1" Monitor "Monitor1" DefaultDepth 24 Option "Coolbits" "31" SubSection "Display" Depth 24 EndSubSection EndSection
/etc/X11/xorg.conf 追加設定
Section "Screen" Option "ConnectedMonitor" "DFP-0" Option "AllowEmptyInitialConfiguration" "True" Option "Interactive" "False" ... EndSection
Remark
Option "ConnectedMonitor" "string"
# string: "CRT" (cathode ray tube), "DFP" (digital flat panel), or "TV" (television);
Allows you to override what the NVIDIA kernel module detects is connected to your graphics card.
Option "Interactive" "boolean"
# Default: on
This option controls the behavior of the driver's watchdog,
which attempts to detect and terminate GPU programs that get stuck,
in order to ensure that the GPU remains available for other processes.
This option controls the behavior of the driver's watchdog,
which attempts to detect and terminate GPU programs that get stuck,
in order to ensure that the GPU remains available for other processes.
Option "AllowEmptyInitialConfiguration" "boolean"
# Default: off.
Normally, the NVIDIA X driver will fail to start if it cannot find any display devices connected to the NVIDIA GPU.
overrides that behavior so that the X server will start anyway, even if no display devices are connected.
/etc/init.d/lightdm restart
ps aux | grep X