DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

用aggregate方法处理MongoDB数据


下面的aggregate方法对Fair0606 collection做如下处理:

用$match(类似于filter)筛选出recurrence长度不小于2的document (fair).

$unwind类似于flatmap, 将每一个recurrence数组中的元素变成一个document;

[\(project](http://bit.ly/1UE0BTQ)类似于map,去掉原document的_id, 保留nameZHCN, website, 创建新字段time,值为recurrence.timeStart, 以及address,值为recurrence.address (路径的写法是路径名前面加\),详见Field Path and System Variables in Aggregation Pipeline Quick Reference).

如果希望保留recurrence.timeStart的结构,写成"recurrence.timeStart": 1,.

$limit要求只转换20个document, $out指定输出collection.

db.Fair0606.aggregate([
  {$match: {'recurrence.1': {$exists: true}}},
  {$unwind: "$recurrence"},
  {$project: {
    _id: 0,
    nameZHCN: 1,
    website: 1,
    time: "$recurrence.timeStart",
    address: "$recurrence.address"}},
  {$limit: 20},
  {$out: "out1"}])

通过unwind和project,可以实现对MongoDB schema的转换,下一步可以导出为csv 进行更复杂的数据分析,例如使用Incanter.

mongoexport --type=csv -d test -c out1 \
--fields nameZHCN,website,time,address -o fairs.csv

参考:

https://docs.mongodb.com/manual/reference/operator/aggregation/

Clojure for Data Science



Published

Jun 8, 2016

Last Updated

Jun 8, 2016

Category

Tech

Tags

  • aggregate 1
  • mongodb 24

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor