How Do I Stop Info Messages from Displaying on the Spark Console?

When working with Apache Spark, the default logging level is often set to “INFO,” which can result in a large number of informational messages being displayed in the console. These messages can obscure more important messages, such as warnings and errors. To stop these INFO messages from displaying, you can change the logging level either programmatically or by modifying the log4j properties file.

Method 1: Using a Configuration File (log4j.properties)

One way to suppress INFO messages is by modifying the `log4j.properties` file. Follow these steps:

1. Create a file named `log4j.properties` if it doesn’t already exist.
2. Add the following lines to the file to set the logging level to “WARN”:


# Set everything to be logged to the console
log4j.rootCategory=WARN, console

# settings to quiet third-party logs that are too verbose
log4j.logger.org.apache.spark=WARN
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.server=WARN

log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

3. Pass the path of this `log4j.properties` file to your Spark application using the `–conf` option from the command line:


spark-submit --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/path/to/log4j.properties" \
             --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/path/to/log4j.properties" \
             your_spark_application.py

Method 2: Programmatically Using PySpark

If you prefer to change the logging level programmatically within your Spark application, you can achieve this by configuring the logging level directly in the code. Below is an example using PySpark:


from pyspark.sql import SparkSession
import logging

# Create a Spark session
spark = SparkSession.builder \
    .appName("Suppress INFO messages") \
    .getOrCreate()

# Suppress INFO messages
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)

# Your Spark code here
data = [("John", 28), ("Smith", 35), ("Adam", 50)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)
df.show()

# Stopping the Spark session
spark.stop()

Method 3: Programmatically Using Scala

Similarly, you can change the logging level in Scala as follows:


import org.apache.spark.sql.SparkSession
import org.apache.log4j.{Level, Logger}

// Initialize Spark session
val spark = SparkSession.builder()
                        .appName("Suppress INFO messages")
                        .getOrCreate()

// Suppress INFO messages
Logger.getLogger("org").setLevel(Level.WARN)
Logger.getLogger("akka").setLevel(Level.WARN)

// Your Spark code here
val data = Seq(("John", 28), ("Smith", 35), ("Adam", 50))
val columns = Seq("Name", "Age")
val df = spark.createDataFrame(data).toDF(columns: _*)
df.show()

// Stopping the Spark session
spark.stop()

Output of the Example Code


+-----+---+
| Name|Age|
+-----+---+
| John| 28|
|Smith| 35|
| Adam| 50|
+-----+---+

By following either of these methods, you can suppress the INFO messages from being displayed in the Spark console, allowing you to focus on more critical log messages.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts who are highly skilled in Apache Spark, PySpark, and Machine Learning. They are also proficient in Python, Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They aren't just experts; they are passionate teachers. They are dedicated to making complex data concepts easy to understand through engaging and simple tutorials with examples.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top