I'm trying to migrate from Spark 1.6.1 to Spark 2.0.0 and I am getting a weird error when trying to read a csv file into SparkSQL. Previously, when I would read a file from local disk in pyspark I would do:
Spark 1.6
df = sqlContext.read
.format('com.databricks.spark.csv')
.option('header', 'true')
.load('file:///C:/path/to/my/file.csv', schema=mySchema)
In the latest release I think it should look like this:
Spark 2.0
spark = SparkSession.builder
.master('local[*]')
.appName('My App')
.getOrCreate()
df = spark.read
.format('csv')
.option('header', 'true')
.load('file:///C:/path/to/my/file.csv', schema=mySchema)
But I am getting this error no matter how many different ways I try to adjust the path:
IllegalArgumentException: 'java.net.URISyntaxException: Relative path in
absolute URI: file:/C:/path//to/my/file/spark-warehouse'
Not sure if this is just an issue with Windows or there is something I am missing. I was excited that the spark-csv package is now a part of Spark right out of the box, but I can't seem to get it to read any of my local files anymore. Any ideas?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…