How to convert yyyyMMdd to yyyy-MM-dd format with appropriate schema in spark

programming beginner girl

I have a data frame with a column for date in the format yyyymmdd, how can I convert it into yyyy-MM-dd with appropriate schema in Spark ?

I would like to answer above question.

Converting yyyyMMdd to yyyy-MM-dd format with appropriate schema in spark

1. Prepare dataframe

This is simple dataframe having Date column with String type.

I used some methods below.

  • org.apache.spark.sql Dateset
    • toDF
      Converts this strongly typed collection of data to generic DataFrame with columns renamed.
    • show
      Displays the Dataset in a tabular form.
    • printSchema
      Prints the schema to the console in a nice tree format.

2. Convert yyyyMMdd to yyyy-MM-dd format

I used some methods below.

  • org.apache.spark.sql Dateset
    • withColumn
      Returns a new Dataset by adding a column or replacing the existing column that has the same name.
  • org.apache.spark.sql Column
    • cast
      Casts the column to a different data type, using the canonical string representation of the type.
  • org.apache.spark.sql functions
    • concat
      Concatenates multiple input columns together into a single column.
    • split
      Splits str around pattern (pattern is a regular expression).

3. Convert the column from String type into Date type

I used to_date methods : converts the column into a DateType (org.apache.spark.sql functions).

That’s all. Thank you.

toge

If you are new to Spark, I recommend Oreilly Safari online learning.


コメントを残す

メールアドレスが公開されることはありません。 * が付いている欄は必須項目です