Interactive programming, IAP for short, is the upgrade version of REPL, which meets Immediate Feedback Principle.
Synonyms: on-the-fly-programming, just in time programming, conversational programming.
IAP means when the programmer works on the program file, every time the file is saved, the toolchain provides on-the-fly feedbacks which are helpful for ongoing development.
IAP of each language is implemented on a specific toolchain (mostly implemented in the same language). For example, ghcid for Haskell, RStudio for R, Spyder for Python, and sbt for Scala.
Unit Test
with FunSuite
Based on Funsuite of ScalaTest,
and spark-testing-base,
we can write unit test of Spark with SparkContext
:
$ take sparkdemo
$ mkdir -p src/{main/{resources,scala},test/{resources,scala}}
$ cat << EOF > src/test/scala/Test1.scala
import org.scalatest.FunSuite
import com.holdenkarau.spark.testing.SharedSparkContext
class Test1 extends FunSuite with SharedSparkContext {
test("test with SparkContext") {
val list = List(1, 2, 3, 4)
val rdd = sc.parallelize(list)
assert(rdd.count === list.length)
}
}
EOF
$ cat << EOF > build.sbt
import scala.sys.process._
name := "Simple Project"
version := "0.0.1"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.1"
libraryDependencies += "com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test"
fork in Test := true
parallelExecution in Test := false
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
lazy val submit = taskKey[Unit]("submit to Spark")
submit := {"spark-submit --class SimpleApp target/scala-2.11/simple-project_2.11-0.0.1.jar" !}
addCommandAlias("rs", "; clean; package; submit")
EOF
$ sbt
sbt:Simple Project> ~ test
spark-testing-base provides SharedSparkContext
, so there's no need to
initialize and destroy SparkSession manually.
spark-testing-base also has many useful traits, such as DataFrameGenerator. With it and ScalaCheck, you can implemente property-based test, like QuckCheck in Haskell.
See its wiki for the full list.
Integration Test
You can run Spark application line by line in spark-shell
or pyspark
shell.
But the following implementation is file based, which means in each loop of
the iteration, a complete Spark job (a jar file) is submitted to Spark cluster
and return a result.
With the scaffold above, add a source file and run Integration test with
user defined command rs
(run spark):
$ cat << EOF > src/main/scala/SimpleApp.scala
import org.apache.spark.sql.SparkSession
object SimpleApp {
def main(args: Array[String]) {
val logFile = "/home/leo/apps/spark-2.2.0-bin-hadoop2.7/README.md"
val spark = SparkSession.builder.appName("Simple Application").getOrCreate()
val logData = spark.read.textFile(logFile).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println(s"Lines with a: $numAs, Lines with b: $numBs")
spark.stop()
}
}
EOF
$ sbt
sbt:Simple Project> ~ rs
Now change texts in file SimpleApp.scala and save, sbt will clean the project build, regenerate the jar file and submit to Spark.
P.S.: if the contents in build.sbt is changed, run reload
in sbt console.
Ref: