Resource-efficient Execution of Deep Learning Computations

Resource-efficient Execution of Deep Learning Computations
Author :
Publisher :
Total Pages : 0
Release :
ISBN-10 : 9798494452825
ISBN-13 :
Rating : 4/5 ( Downloads)

Book Synopsis Resource-efficient Execution of Deep Learning Computations by : Deepak Narayanan

Download or read book Resource-efficient Execution of Deep Learning Computations written by Deepak Narayanan and published by . This book was released on 2021 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Deep Learning models have enabled state-of-the-art results across a broad range of applications. Training these models, however, is extremely time- and resource-intensive, taking weeks on clusters with thousands of expensive accelerators in the extreme case. As Moore's Law slows down, numerous parallel accelerators have been introduced to meet this new computational demand. This dissertation shows how model- and hardware-aware optimizations in software systems can help intelligently navigate this heterogeneity. In particular, it demonstrates how careful automated scheduling of computation across levels of the software stack can be used to perform distributed training and resource allocation more efficiently. In the first part of this dissertation, we study pipelining, a technique commonly used as a performance optimization in various systems, as a way to perform more efficient distributed model training for both models with small training footprints and those with training footprints larger than the memory capacity of a single GPU. For certain types of models, pipeline parallelism can facilitate model training with lower communication overhead than previous methods. We introduce new strategies for pipeline parallelism, with different tradeoffs between training throughput, memory footprint, and weight update semantics; these outperform existing methods in certain settings. Pipeline parallelism can also be used in conjunction with other forms of parallelism, helping create a richer search space of parallelization strategies. By partitioning the training graph across accelerators in a model-aware way, pipeline parallelism combined with data parallelism can be up to 5x faster than data parallelism in isolation. We also use a principled combination of pipeline parallelism, tensor model parallelism, and data parallelism to efficiently scale training to language models with a trillion parameters on 3072 A100 GPUs (aggregate throughput of 502 petaFLOP/s, which is 52% of peak device throughput). In the second part of this dissertation, we show how heterogeneous compute resources (e.g., different GPU generations like NVIDIA K80 and V100 GPUs) in a shared cluster (either in a private deployment or in the public cloud) should be partitioned among multiple users to optimize objectives specified over one or more training jobs. By formulating existing policies as optimization problems over the allocation, and then using a concept we call effective throughput, policies can be extended to be heterogeneity-aware. A policy-agnostic scheduling mechanism then helps realize the heterogeneity-aware allocations returned by these policies in practice. We can improve various scheduling objectives, such as average completion time, makespan, or cloud computing resource cost, by up to 3.5x, using these heterogeneity-aware policies. Towards the end of this dissertation, we also touch on how the dynamic pricing information of spot instances can be plugged into this heterogeneity-aware policy framework to optimize cost objectives in the public cloud. This can help reduce cost compared to using more expensive on-demand instances alone.

Resource-efficient Execution of Deep Learning Computations Related Books

Resource-efficient Execution of Deep Learning Computations
Language: en
Pages: 0
Authors: Deepak Narayanan
Categories: Computer scheduling
Type: BOOK - Published: 2021 - Publisher:

GET EBOOK

Deep Learning models have enabled state-of-the-art results across a broad range of applications. Training these models, however, is extremely time- and resource
Efficient Processing of Deep Neural Networks
Language: en
Pages: 254
Authors: Vivienne Sze
Categories: Technology & Engineering
Type: BOOK - Published: 2022-05-31 - Publisher: Springer Nature

GET EBOOK

This book provides a structured treatment of the key principles and techniques for enabling efficient processing of deep neural networks (DNNs). DNNs are curren
Towards Efficient Inference and Improved Training Efficiency of Deep Neural Networks
Language: en
Pages: 0
Authors: Ravi Shanker Raju (Ph.D.)
Categories:
Type: BOOK - Published: 2022 - Publisher:

GET EBOOK

In recent years, deep neural networks have surpassed human performance on image classification tasks and and speech recognition. While current models can reach
Deep Learning at Scale
Language: en
Pages: 404
Authors: Suneeta Mall
Categories: Computers
Type: BOOK - Published: 2024-06-18 - Publisher: "O'Reilly Media, Inc."

GET EBOOK

Bringing a deep-learning project into production at scale is quite challenging. To successfully scale your project, a foundational understanding of full stack d
Resource-efficient Deep Learning
Language: en
Pages: 0
Authors: Dongkuan Xu
Categories:
Type: BOOK - Published: 2022 - Publisher:

GET EBOOK

The phenomenal success of deep learning in the past decade has been mostly driven by the construction of increasingly large deep neural network models. These mo