Apache Spark Training Chennai

Apache Spark Training Overview

Apache Spark is the next-generation successor to MapReduce. Spark is a powerful, open source processing engine for data in the Hadoop cluster, optimized for speed, ease of use, and sophisticated analytics. The Spark framework supports streaming data processing and complex, iterative algorithms, enabling applications to run up to 100x faster than traditional Hadoop MapReduce programs.


Apache Spark Training Chennai

Apache Spark course enables participants to build complete, unified Big Data applications combining batch, streaming, and interactive analytics on all their data. With Spark, developers can write sophisticated parallel applications for faster business decisions and better user outcomes, applied to a wide variety of use cases, architectures, and industries.



This course is best suited to developers and software engineers. Course examples and exercises are presented in Python and Scala, so knowledge of one of these programming languages is required. Basic knowledge of Linux is assumed. Prior knowledge of Hadoop is not required.


Course Outline

1. Why Spark?

  • Problems with Traditional Large-Scale Systems

  • Introducing Spark

2. Spark Basics

  • What is Apache Spark?

  • Using the Spark Shell

  • Resilient Distributed Datasets (RDDs)

  • Functional Programming with Spark

3. Working with RDDs

  • RDD Operations

  • Key-Value Pair RDDs

  • MapReduce and Pair RDD Operations

4. The Hadoop Distributed File System

  • Why HDFS?

  • HDFS Architecture

  • Using HDFS

5. Running Spark on a Cluster

  • A Spark Standalone Cluster

  • The Spark Standalone Web UI

6. Parallel Programming with Spark

  • RDD Partitions and HDFS Data Locality

  • Working with Partitions

  • Executing Parallel Operations

7. Caching and Persistence

  • RDD Lineage

  • Caching Overview

  • Distributed Persistence

8. Writing Spark Applications

  • Spark Applications vs. Spark Shell

  • Creating the SparkContext

  • Configuring Spark Properties

  • Building and Running a Spark Application

  • Logging

9. Spark, Hadoop, and the Enterprise Data Center

  • Spark and the Hadoop Ecosystem

  • Spark and MapReduce

10. Spark Streaming

  • Example: Streaming Word Count

  • Other Streaming Operations

  • Sliding Window Operations

  • Developing Spark Streaming Applications

11. Common Spark Algorithms

  • Iterative Algorithms

  • Graph Analysis

  • Machine Learning

12. Improving Spark Performance

  • Shared Variables: Broadcast Variables

  • Shared Variables: Accumulators

  • Common Performance Issues


Apache Spark Training Chennai

Contact us

Mail: info@bigdatatraining.in
Call: +91 9789968765 / 044 – 42645495

Weekdays / Fast Track / Weekends / Corporate Training modes available

Apache Spark Training Also available across India in Bangalore, Pune, Hyderabad, Mumbai, Kolkata, Ahmedabad, Delhi, Gurgon, Noida, Kochin, Tirvandram, Goa, Vizag, Mysore,Coimbatore, Madurai, Trichy, Guwahati

On-Demand Fast track Apache Spark Training globally available also at Singapore, Dubai, Malaysia, London, San Jose, Beijing, Shenzhen, Shanghai, Ho Chi Minh City, Boston, Wuhan, San Francisco, Chongqing.