What Do the Numbers on the Progress Bar Mean in Spark-Shell?

When you run Apache Spark jobs using the Spark shell (`spark-shell`), you will observe a progress bar displayed in the console. This progress bar provides a visual indication of the job execution status, enabling you to monitor the progress of your Spark job. Here’s an explanation of what the numbers on the progress bar mean:

Explanation of Progress Bar Numbers

The progress bar contains multiple pieces of information, typically in the following format:


[Stage <stageId>: <progressInfo> (<numTasksCompleted> + <numTasksPending> / <totalNumTasks>) <timeElapsed>]

Here’s a breakdown of the different elements:

  • <stageId>: This represents the ID of the current stage being executed. Each stage corresponds to a set of tasks.
  • <progressInfo>: This part shows a visual representation of the progress using a bar. It fills up as tasks are completed.
  • <numTasksCompleted>: This indicates the number of tasks that have been successfully completed in the current stage.
  • <numTasksPending>: This shows the number of tasks that are currently pending (i.e., running or waiting to be executed).
  • <totalNumTasks>: This is the total number of tasks for the current stage.
  • <timeElapsed>: This indicates the time elapsed since the start of the stage execution.

Example Progress Bar Description

Let’s illustrate the progress bar with an example. Suppose the progress bar looks like this:


[Stage 1: =====> (20 + 10 / 30) 14.5s]

The interpretation of this progress bar would be:

  • <Stage 1>: The ID of the currently executing stage is 1.
  • <=====>: This is a visual progress bar indicating the completion of tasks.
  • <(20 + 10 / 30)>: Out of a total of 30 tasks, 20 tasks have been completed, and 10 tasks are still pending (either running or waiting).
  • <14.5s>: It has been 14.5 seconds since the stage execution started.

Understanding these elements can help you get a quick glance at the status and efficiency of your Spark job execution.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts deeply skilled in Apache Spark, PySpark, and Machine Learning, alongside proficiency in Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They're not just experts; they're passionate educators, dedicated to demystifying complex data concepts through engaging and easy-to-understand tutorials.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top