Explore
Select a tab
10 results found
NVIDIA DGX BasePOD
In this learning path, we cover NVIDIA's DGX systems and BasePOD infrastructure, detailing the setup, licensing, and management of Base Command Manager and DGX OS for high-performance AI workloads. They explain hardware requirements, network configurations, and system provisioning, emphasizing efficient resource management, scalability, and optimized AI model training across NVIDIA's cutting-edge computing platforms.
Learning Path
•Fundamentals
NVIDIA Run:ai for Platform Engineers
Welcome to the NVIDIA Run:ai for Platform Engineers Learning Path! This learning path is designed to build both foundational knowledge and practical skills for platform engineers and administrators responsible for managing GPU resources at scale. It begins by introducing learners to the key components of the NVIDIA Run:ai platform, including its Control Plane and Cluster, and explains how NVIDIA Run:ai extends Kubernetes to orchestrate AI workloads efficiently. The learning path then covers essential topics such as authentication and role-based access, organizational management through projects and departments, and workload operations using assets, templates, and policies. Learners will also explore GPU fractioning to understand how NVIDIA Run:ai maximizes GPU utilization and ensures fair resource allocation across teams. All this builds toward a hands-on lab experience designed to reinforce your learning and give you practical experience working directly with NVIDIA Run:ai.
Learning Path
•Fundamentals
ATC+
NVIDIA AI Enterprise
NVIDIA AI Enterprise (NVAIE) offers a robust suite of AI tools for various applications, including reasoning, speech & translation, biomedical, content generation, and route planning. It features community, NVIDIA, and custom models. NVAIE provides essential microservices such as NIM and CUDA-X used for security advisory, enterprise support, cluster management, and infrastructure optimization. Designed for cloud, data centers, workstations, and edge environments, NVAIE ensures scalable, secure, and efficient AI deployment.
Learning Path
•Fundamentals
ATC+
NVIDIA DGX SuperPOD and DGX BasePOD Day 2 Operations
This Learning Series was created for NVIDIA DGX admins and operators to explore things you would use on Day 2 when administering your NVIDIA DGX SuperPOD and BasePOD environments with BCM (Base Command Manager). It will detail how to update firmware, patch systems, run jobs against the infrastructure, and integrate other parts into BCM (Switches, AD, Cloud, etc.).
Learning Path
•Intermediate
ATC+
Building Cisco RoCE fabric for AI/ML using NEXUS Dashboard
The user of this learning path will learn the components of RoCE and why it is essential for clean, fast, and reliable AI/ML compute communication.
Learning Path
•Fundamentals
HPE Private Cloud AI
In this learning path we will take you through HPE Private Cloud AI or HPE PCAI. We will guide you through all the components that make up the solution such as HPE GreenLake, Private Cloud AI, HPE Morpheus VM Essentials, GreenLake for Files storage array, HPE Ezmeral Container Platfrom and Aruba/NVIDIA switches. We will also allow you to interact with some hands on labs that will take you into both of our physical HPE Private Cloud AI environments including a small and medium setup.
Learning Path
•Intermediate
ATC+
NVIDIA DGX SuperPOD and DGX BasePOD Day 3 Operations
This Learning Series was created for NVIDIA DGX admins and operators to explore things you would use on Day 3 when administering your NVIDIA DGX SuperPOD and BasePOD environments with BCM (Base Command Manager). It will go into advanced topics of cmshell, cloud bursting from BCM, HA for headnodes, IB setup and testing of worker nodes, active directory integrations, as well as advanced workload topics of deploying Kubernetes from Base Command Manager.
Learning Path
•Advanced
AI High-Performance Computing
High-performance computing (HPC) is a rapidly evolving field that enables researchers, scientists and engineers to solve complex problems and drive innovation across various domains. As the demand for computational power continues to grow, professionals with skills in HPC are becoming increasingly valuable in today's job market. This learning path is designed to provide you with a comprehensive understanding of HPC concepts, technologies and best practices, empowering you to harness the power of supercomputers and parallel processing to tackle the most challenging computational tasks.
Learning Path
•Fundamentals
Introduction to NVIDIA NIM for LLM
This learning path introduces NVIDIA NIM for LLM microservices, covering its purpose, formats, and benefits. You'll explore deployment options via API Catalog, Docker, and Kubernetes, and complete hands-on labs for Docker and Kubernetes-based inference workflows—building skills to deploy, scale, and integrate GPU-optimized LLMs into enterprise applications.
Learning Path
•Fundamentals
High Performance Storage for AI
Explore the critical role of high-performance storage in AI infrastructure. Gain insights into storage requirements for AI/ML workloads, architectures like distributed file systems and all-flash arrays, and strategies to optimize storage for model training and inference. Stay ahead with emerging trends shaping the future of AI storage solutions.
Learning Path
•Fundamentals