When working with Apache Spark, the default logging level is often set to “INFO,” which can result in a large number of informational messages being displayed in the console. These messages can obscure more important messages, such as warnings and errors. To stop these INFO messages from displaying, you can change the logging level either programmatically or by modifying the log4j properties file.
Method 1: Using a Configuration File (log4j.properties)
One way to suppress INFO messages is by modifying the `log4j.properties` file. Follow these steps:
1. Create a file named `log4j.properties` if it doesn’t already exist.
2. Add the following lines to the file to set the logging level to “WARN”:
# Set everything to be logged to the console
log4j.rootCategory=WARN, console
# settings to quiet third-party logs that are too verbose
log4j.logger.org.apache.spark=WARN
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.server=WARN
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
3. Pass the path of this `log4j.properties` file to your Spark application using the `–conf` option from the command line:
spark-submit --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/path/to/log4j.properties" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/path/to/log4j.properties" \
your_spark_application.py
Method 2: Programmatically Using PySpark
If you prefer to change the logging level programmatically within your Spark application, you can achieve this by configuring the logging level directly in the code. Below is an example using PySpark:
from pyspark.sql import SparkSession
import logging
# Create a Spark session
spark = SparkSession.builder \
.appName("Suppress INFO messages") \
.getOrCreate()
# Suppress INFO messages
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)
# Your Spark code here
data = [("John", 28), ("Smith", 35), ("Adam", 50)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)
df.show()
# Stopping the Spark session
spark.stop()
Method 3: Programmatically Using Scala
Similarly, you can change the logging level in Scala as follows:
import org.apache.spark.sql.SparkSession
import org.apache.log4j.{Level, Logger}
// Initialize Spark session
val spark = SparkSession.builder()
.appName("Suppress INFO messages")
.getOrCreate()
// Suppress INFO messages
Logger.getLogger("org").setLevel(Level.WARN)
Logger.getLogger("akka").setLevel(Level.WARN)
// Your Spark code here
val data = Seq(("John", 28), ("Smith", 35), ("Adam", 50))
val columns = Seq("Name", "Age")
val df = spark.createDataFrame(data).toDF(columns: _*)
df.show()
// Stopping the Spark session
spark.stop()
Output of the Example Code
+-----+---+
| Name|Age|
+-----+---+
| John| 28|
|Smith| 35|
| Adam| 50|
+-----+---+
By following either of these methods, you can suppress the INFO messages from being displayed in the Spark console, allowing you to focus on more critical log messages.