In this article, the author discusses the need for efficient ways to distribute computational workloads in large-scale machine learning. They introduce two key techniques, model parallelism and data parallelism, and explain their strengths, weaknesses, and ideal use-cases. Model parallelism involves distributing different parts of the machine learning model across multiple computing resources, while data parallelism involves distributing the dataset into smaller chunks across multiple resources. The article explores the working and challenges of both techniques and emphasizes the importance of combining them for optimal results. The author concludes by stating that understanding these techniques is crucial for developing and deploying large-scale machine learning models, and that they are essential for scalability and efficiency in ML workloads.
source update: Model v/s Data Parallelism – Towards AI