To display the full content of a column in a Spark DataFrame, you often need to change the default settings for column width. By default, Spark truncates the output if it exceeds a certain length, usually 20 characters. Below is how you can achieve this in PySpark and Scala.
Method 1: Using `show` Method with `truncate` Parameter
The simplest way to display full column content is to use the show
method with the truncate
parameter set to False
.
Example in PySpark
from pyspark.sql import SparkSession
# Initialize SparkSession
spark = SparkSession.builder.appName("Display Full Column").getOrCreate()
# Sample data
data = [("Alice", "Engineering and Science"), ("Bob", "Arts and Humanities")]
columns = ["Name", "Department"]
# Create DataFrame
df = spark.createDataFrame(data, columns)
# Show DataFrame with full content
df.show(truncate=False)
Expected Output:
+-----+-------------------------+
| Name|Department |
+-----+-------------------------+
|Alice|Engineering and Science |
|Bob |Arts and Humanities |
+-----+-------------------------+
Example in Scala
import org.apache.spark.sql.SparkSession
// Initialize SparkSession
val spark = SparkSession.builder.appName("Display Full Column").getOrCreate()
// Sample data
val data = Seq(("Alice", "Engineering and Science"), ("Bob", "Arts and Humanities"))
val columns = Seq("Name", "Department")
// Create DataFrame
val df = spark.createDataFrame(data).toDF(columns: _*)
// Show DataFrame with full content
df.show(truncate = false)
Expected Output:
+-----+-------------------------+
| Name|Department |
+-----+-------------------------+
|Alice|Engineering and Science |
|Bob |Arts and Humanities |
+-----+-------------------------+
Method 2: Using `toPandas` Method in PySpark
If you are working with PySpark, another method to display the full content of columns is by converting the DataFrame to a Pandas DataFrame using the toPandas()
method.
Example in PySpark
# Create DataFrame
df = spark.createDataFrame(data, columns)
# Convert to Pandas DataFrame
pdf = df.toPandas()
# Display DataFrame
print(pdf)
Expected Output:
Name Department
0 Alice Engineering and Science
1 Bob Arts and Humanities
Method 3: Setting Spark Configuration
If you want to apply this setting globally, you can configure the Spark session to increase the width of the columns.
Example in PySpark
# Modify Spark configuration
spark.conf.set("spark.sql.debug.maxToStringFields", "100")
# Create DataFrame
df = spark.createDataFrame(data, columns)
# Display DataFrame
df.show(truncate=False)
Example in Scala
// Modify Spark configuration
spark.conf.set("spark.sql.debug.maxToStringFields", "100")
// Create DataFrame
val df = spark.createDataFrame(data).toDF(columns: _*)
// Display DataFrame
df.show(truncate = false)
These are the various methods to display full column content in a Spark DataFrame, each suited to different scenarios and use cases.