下面的aggregate方法对Fair0606 collection做如下处理:
用$match
(类似于filter)筛选出recurrence长度不小于2的document (fair).
$unwind类似于flatmap, 将每一个recurrence数组中的元素变成一个document;
[\(project](http://bit.ly/1UE0BTQ)类似于map,去掉原document的_id
,
保留nameZHCN, website,
创建新字段time,值为recurrence.timeStart,
以及address,值为recurrence.address
(路径的写法是路径名前面加\),详见Field Path and System Variables in
Aggregation Pipeline Quick Reference).
如果希望保留recurrence.timeStart的结构,写成"recurrence.timeStart": 1,
.
$limit
要求只转换20个document, $out
指定输出collection.
db.Fair0606.aggregate([
{$match: {'recurrence.1': {$exists: true}}},
{$unwind: "$recurrence"},
{$project: {
_id: 0,
nameZHCN: 1,
website: 1,
time: "$recurrence.timeStart",
address: "$recurrence.address"}},
{$limit: 20},
{$out: "out1"}])
通过unwind和project,可以实现对MongoDB schema的转换,下一步可以导出为csv 进行更复杂的数据分析,例如使用Incanter.
mongoexport --type=csv -d test -c out1 \
--fields nameZHCN,website,time,address -o fairs.csv
参考:
https://docs.mongodb.com/manual/reference/operator/aggregation/