DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

Spark任务自动执行脚本


当前目录下创建两个脚本,运行脚本runJobs.sh和WFP任务脚本模板wfp-origin:

runJobs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#!/bin/bash

BMIN_CNT_WEI_LIST='0.5 0.3'
MIN_SUP_LIST='0.004 0.0002'

rm -rf {result,script}
mkdir {result,script}
origin_job=wfp-origin
for bmin_cnt_wei in $BMIN_CNT_WEI_LIST; do
    for min_sup in $MIN_SUP_LIST; do
        echo MIN_SUP is: $min_sup , Bmin count weight is: $bmin_cnt_wei
        new_script_name=wfp-bmin-${bmin_cnt_wei}-sup-${min_sup}
        sed "4s/xxx/$min_sup/" $origin_job > script/tmp
        sed "11s/xxx/$bmin_cnt_wei/" script/tmp > script/$new_script_name
        rm script/tmp
        echo Run shell script/$new_script_name
        spark-shell -i script/$new_script_name
        echo ------Calc is over------
    done
done

wfp-origin:

import Math.ceil
import scala.io

val MIN_SUP = xxx
val MIN_CONF = 0.1
val MAX_RELATION_ORDER = 3
val DATA_FILE = "input"
val SEP = "\001"
val WEI_IDX = 5
val MIN_INT_ID_LEN = 5
val BMIN_CNT_WEI = xxx

val RES_FILE = "result/wfp-result-bmin-" + BMIN_CNT_WEI.toString + "-sup-" + MIN_SUP.toString

val rawData = sc.textFile(DATA_FILE).distinct
val data = rawData.filter(x => x.split(SEP)(1).split("_")(0).size > MIN_INT_ID_LEN)   
val item_count = data.map(_.split(SEP)(1)).map(w => (w,1)).reduceByKey(_+_)
val wids = data.map(_.split(SEP)(0))
val MAX_ITEM = wids.map(w => (w,1)).reduceByKey(_+_).map(_._2).max
val T = wids.distinct.count

val weight = data.map(x => (x.split(SEP)(1), x.split(SEP)(WEI_IDX).toFloat)).distinct
val rule_sets = weight.top(2000)
scala.tools.nsc.io.File(RES_FILE).writeAll(rule_sets.map(_.toString).reduce(_ + "\n" + _))
exit

这里的wfp-orgin只是一个示例,根据自己的脚本调整具体内容,但最后一行"exit"必须有,否则不能退出spark shell执行后面的任务。

需要修改的参数用空格分开定义在一个字符串里,例如上面的BMIN_CNT_WEI_LIST和MIN_SUP_LIST, 脚本创建script目录,保存新生成的脚本,以及result目录,保存计算结果。

配合tmux,可以在tmux上运行runJobs.sh,下班时从tmux上detach出来,第二天上班时在attach上去看结果。



Published

Nov 3, 2014

Last Updated

Nov 3, 2014

Category

Tech

Tags

  • shell 46
  • spark 21

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor