GPUd automates monitoring, diagnostics, and issue identification for GPUs
One simple tool to monitor your GPU/CPU machines and run workloads.
GPUd is designed to be self-contained and to integrate seamlessly with other systems such as Docker, containerd, Kubernetes, and Nvidia ecosystems.
GPUd is GPU-centric, providing a unified view of critical GPU metrics and issues.
GPUd is a self-contained binary that runs on any machine with a low footprint.
GPUd is used in Lepton AI production infrastructure.