Interactive Mode
Follow Quick Start of Spark 1.0.2 on Apache Spark website.
# hadoop fs -put alarm_data_for_explore-0501-0505.txt alarm_data_for_explore-0501-0505.txt
# hadoop fs -ls
...
-rw-r--r-- 3 root root 260780698 2014-08-15 16:45 alarm_data_for_explore-0501-0505.txt
# wc -l alarm_data_for_explore-0501-0505.txt
1362005
# head -1 alarm_data_for_explore-0501-0505.txt
-2117657102|102|1000012276|License...|3|4|102|
# grep 007-002-00-000592 alarm_data_for_explore-0501-0505.txt | wc -l
12
# mv alarm_data_for_explore-0501-0505.txt aaa.txt // this local file is unnecessary any more
# spark-shell
scala> val textFile = sc.textFile("alarm_data_for_explore-0501-0505.txt")
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12
scala> textFile.count()
...
res5: Long = 1362005
scala> textFile.first()
...
res8: String = -2117657102|102|1000012276|License...|3|4|102|
scala> textFile.filter(line => line.contains("007-002-00-000592")).count()
...
res12: Long = 12
scala> import java.lang.Math
import java.lang.Math
scala> textFile.map(line => line.split("|").size).reduce((a, b) => Math.max(a, b))
...
res13: Int = 372
Run Spark Script
$ cat wfp-spark
val MIN_SUP = 0.0003
val MIN_CONF = 0
val MAX_RELATION_ORDER = 3
val DATA_FILE = "input"
val textFile = sc.textFile(DATA_FILE)
val weight = textFile.map(x => x -> x.split(",")(5).toFloat).cache
val data = textFile.groupBy(x => x.split(",")(0))
$ spark-shell -i wfp-spark
...
Now you are in a spark shell, all variables such as MIN_SUP, MIN_CONF are accessible.
Batch Mode
Follow Running Spark Applications in CDH 5 Installation Guide.
# locate spark-defaults.conf // find out where is $SPARK_HOME
/etc/spark/conf.cloudera.spark/spark-defaults.conf
...
# ll /etc/spark/conf.cloudera.spark
...
-rw-r--r-- 1 root root 883 Aug 14 10:12 spark-env.sh
# cat /etc/spark/conf.cloudera.spark/spark-env.sh
...
export SPARK_HOME=/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/spark
...
# source /etc/spark/conf.cloudera.spark/spark-env.sh // load $SPARK_HOME, etc
# spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode client --master yarn $SPARK_HOME/examples/lib/spark-examples_2.10-1.0.0-cdh5.1.0.jar 10