When I synchronize data from a mongodb to a elasticsearch server with
mongo-connector -m 192.168.100.3:27017 -t http://192.168.100.24:9200
-d elastic_doc_manager --admin-username root --password xxx
,
the mongo-connector process quit with the following logs in
"mongo-connector.log" (all texts behind "BulkIndexError" are in the same line,
I rearrange the line for easy to understand):
2016-01-09 19:05:52,457 [CRITICAL] mongo_connector.oplog_manager:543 - Exception during collection dump
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 495, in do_dump upsert_all(dm)
...
BulkIndexError: (u'10 document(s) failed to index.',
[
{
u'index': {
u'status': 400,
u'_type': u'Fair',
u'_id': u'4zWhZTJnqPCd2RK93',
u'error': {
u'caused_by': {
u'reason': u'unknown property [latitude]',
u'type': u'illegal_argument_exception'},
u'reason': u'failed to parse [recurrence.location]',
u'type': u'mapper_parsing_exception'},
u'_index': u'staging'
}
},
...
]
When mongo-connector parse a document with _id: 4zWhZTJnqPCd2RK93
in Collection "Fair", it "failed to parse [recurrence.location]".
From _index: staging
, we know collection Fair is in mongodb database "staging".
List all "recurrence.location" of each document in staging database:
$ mongo 192.168.100.3:27017/staging -u xxx -p xxx
foba:PRIMARY> db.Fair.find({}, {"recurrence.location": 1})
{ "_id" : "h7Lo3hGLrFGEb6BK7", "recurrence" : [ { "location" : { "latitude" : 22.53, "longitude" : 114.06 } } ] }
{ "_id" : "Kd2e9w58P7vAScLDL", "recurrence" : [ { "location" : "" } ] }
{ "_id" : "4zWhZTJnqPCd2RK93", "recurrence" : [ { "location" : { "latitude" : 22.53, "longitude" : 114.06 } } ] }
It's now clear that the exception in synchronization is caused by different structure of "recurrence.location".
To guarantee mongo-connector synchronizing successfully, you must keep all document in a collection share the same schema.