Get -PDF- Resource Efficient Execution Of Deep Learning Computations Read Full

Resource-efficient Execution of Deep Learning Computations

Author	: Deepak Narayanan
Publisher	:
Total Pages	: 0
Release	: 2021
ISBN-10	: 9798494452825
ISBN-13	:
Rating	: 4/5 ( Downloads)

GET BOOK

Book Synopsis Resource-efficient Execution of Deep Learning Computations by : Deepak Narayanan

Download or read book Resource-efficient Execution of Deep Learning Computations written by Deepak Narayanan and published by . This book was released on 2021 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Deep Learning models have enabled state-of-the-art results across a broad range of applications. Training these models, however, is extremely time- and resource-intensive, taking weeks on clusters with thousands of expensive accelerators in the extreme case. As Moore's Law slows down, numerous parallel accelerators have been introduced to meet this new computational demand. This dissertation shows how model- and hardware-aware optimizations in software systems can help intelligently navigate this heterogeneity. In particular, it demonstrates how careful automated scheduling of computation across levels of the software stack can be used to perform distributed training and resource allocation more efficiently. In the first part of this dissertation, we study pipelining, a technique commonly used as a performance optimization in various systems, as a way to perform more efficient distributed model training for both models with small training footprints and those with training footprints larger than the memory capacity of a single GPU. For certain types of models, pipeline parallelism can facilitate model training with lower communication overhead than previous methods. We introduce new strategies for pipeline parallelism, with different tradeoffs between training throughput, memory footprint, and weight update semantics; these outperform existing methods in certain settings. Pipeline parallelism can also be used in conjunction with other forms of parallelism, helping create a richer search space of parallelization strategies. By partitioning the training graph across accelerators in a model-aware way, pipeline parallelism combined with data parallelism can be up to 5x faster than data parallelism in isolation. We also use a principled combination of pipeline parallelism, tensor model parallelism, and data parallelism to efficiently scale training to language models with a trillion parameters on 3072 A100 GPUs (aggregate throughput of 502 petaFLOP/s, which is 52% of peak device throughput). In the second part of this dissertation, we show how heterogeneous compute resources (e.g., different GPU generations like NVIDIA K80 and V100 GPUs) in a shared cluster (either in a private deployment or in the public cloud) should be partitioned among multiple users to optimize objectives specified over one or more training jobs. By formulating existing policies as optimization problems over the allocation, and then using a concept we call effective throughput, policies can be extended to be heterogeneity-aware. A policy-agnostic scheduling mechanism then helps realize the heterogeneity-aware allocations returned by these policies in practice. We can improve various scheduling objectives, such as average completion time, makespan, or cloud computing resource cost, by up to 3.5x, using these heterogeneity-aware policies. Towards the end of this dissertation, we also touch on how the dynamic pricing information of spot instances can be plugged into this heterogeneity-aware policy framework to optimize cost objectives in the public cloud. This can help reduce cost compared to using more expensive on-demand instances alone.

Resource-efficient Execution of Deep Learning Computations

Resource-efficient Execution of Deep Learning Computations Related Books

Resource-efficient Execution of Deep Learning Computations

Efficient Processing of Deep Neural Networks

Towards Efficient Inference and Improved Training Efficiency of Deep Neural Networks

Deep Learning at Scale

Resource-efficient Deep Learning