A couple of things I learned recently:

1: If you carry out a SELECT column1, column2…. INTO dbo.NewTable the newly created table won’t have any of the indexes, foreign keys or primary keys etc. of the source/original table BUT!!! The table will retain any
IDENTITY specifications for any columns. So the source/original table I was looking at had an INTEGER IDENTITY column as the surrogate primary key. This was brought over to NewTable.

2: BCP will ignore the load for any columns where IDENTITY is specified. So when I was attempting to BCP data into NewTable all the columns came across as per source apart from the IDENTITY column. Which
had a seed of 1 and an increment of 1. Once I removed the IDENTITY specification the BCP worked out OK.

Spark SQL filter equal values


When filtering in a DataFrame in Scala you need to use a triple = (i.e. ===) to filter:

So you must use:

ss.filter(ss("ProductKey") === 68325).show()

Because using

ss.filter(ss("ProductKey") == 68325).show()

will return the following error:

<console>:25: error: overloaded method value filter with alternatives:
  (conditionExpr: String)org.apache.spark.sql.DataFrame <and>
  (condition: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame
 cannot be applied to (Boolean)
              ss.filter(ss("ProductKey") == 68325).show()


ss.filter(ss("ProductKey") = 68325).show()

will return the following error as you cannot update a DataFrame

<console>:25: error: value update is not a member of org.apache.spark.sql.DataFrame
              ss.filter(ss("StopSaleOnPropertyKey") = 68325).show()