Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
212 views
in Technique[技术] by (71.8m points)

scala - How can I select a column dependent of a different columns content or the name of the column

I have the following three dataframes

val df_1 = spark.sparkContext.parallelize(Seq(
  ("FIDs", "123456")
)).toDF("subComponentName", "FID_HardVer")

val df_2 = spark.sparkContext.parallelize(Seq(
  ("CLDs", "123456")
)).toDF("subComponentName", "CLD_HardVer")

val df_3 = spark.sparkContext.parallelize(Seq(
  ("ANYs", "123456")
)).toDF("subComponentName", "ANY_HardVer")

I want to write a function that return a dataframe which adds a column named HardVer with the content of either FID_HardVer, CLD_HardVer, or ANY_HardVer.

Example output would look like this:

df_1

+----------------+-----------+-------+
|subComponentName|FID_HardVer|HardVer|
+----------------+-----------+-------+
|            FIDs|     123456| 123456|
+----------------+-----------+-------+

df_2:

+----------------+-----------+-------+
|subComponentName|CLD_HardVer|HardVer|
+----------------+-----------+-------+
|            CLDs|     123456| 123456|
+----------------+-----------+-------+

This is the code that I tried up unil now but it seems like spark can't handle this type of request since it validates the column even if the condition does not fit.

def addHardVer(spark: SparkSession, df: DataFrame) : DataFrame = {
    import spark.implicits._
    val df_withHardVer = df
      .withColumn("HardVer",
        when($"subComponentName" === "FIDs", $"FID_HardVer")
         .when($"subComponentName" === "CLDs", $"CLD_HardVer")
         .when($"subComponentName" === "ANYs", $"ANY_HardVer")
          .otherwise(lit("unknown"))
    )
    return df_withHardVer
  }

This throws an exception

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'CLD_HardVer' given input columns: [subComponentName, FID_HardVer];;

question from:https://stackoverflow.com/questions/66050330/how-can-i-select-a-column-dependent-of-a-different-columns-content-or-the-name-o

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

How about adding a check of whether the column exists in the dataframe?

def addHardVer(spark: SparkSession, df: DataFrame) : DataFrame = {
    import spark.implicits._
    val df_withHardVer = df
      .withColumn("HardVer",
        when($"subComponentName" === "FIDs", if (df.columns.contains("FID_HardVer")) $"FID_HardVer" else lit("unknown"))
       .when($"subComponentName" === "CLDs", if (df.columns.contains("CLD_HardVer")) $"CLD_HardVer" else lit("unknown"))
       .when($"subComponentName" === "ANYs", if (df.columns.contains("ANY_HardVer")) $"ANY_HardVer" else lit("unknown"))
       .otherwise(lit("unknown"))
    )
    return df_withHardVer
  }

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...