Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.6k views
in Technique[技术] by (71.8m points)

mongodb - Using map/reduce for mapping the properties in a collection

Update: follow-up to MongoDB Get names of all keys in collection.

As pointed out by Kristina, one can use Mongodb 's map/reduce to list the keys in a collection:

db.things.insert( { type : ['dog', 'cat'] } );
db.things.insert( { egg : ['cat'] } );
db.things.insert( { type :  [] }); 
db.things.insert( { hello : []  } );

mr = db.runCommand({"mapreduce" : "things",
"map" : function() {
    for (var key in this) { emit(key, null); }
},  
"reduce" : function(key, stuff) { 
   return null;
}}) 

db[mr.result].distinct("_id")

//output: [ "_id", "egg", "hello", "type" ]

As long as we want to get only the keys located at the first level of depth, this works fine. However, it will fail retrieving those keys that are located at deeper levels. If we add a new record:

db.things.insert({foo: {bar: {baaar: true}}})

And we run again the map-reduce +distinct snippet above, we will get:

[ "_id", "egg", "foo", "hello", "type" ] 

But we will not get the bar and the baaar keys, which are nested down in the data structure. The question is: how do I retrieve all keys, no matter their level of depth? Ideally, I would actually like the script to walk down to all level of depth, producing an output such as:

["_id","egg","foo","foo.bar","foo.bar.baaar","hello","type"]      

Thank you in advance!

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

OK, this is a little more complex because you'll need to use some recursion.

To make the recursion happen, you'll need to be able to store some functions on the server.

Step 1: define some functions and put them server-side

isArray = function (v) {
  return v && typeof v === 'object' && typeof v.length === 'number' && !(v.propertyIsEnumerable('length'));
}

m_sub = function(base, value){
  for(var key in value) {
    emit(base + "." + key, null);
    if( isArray(value[key]) || typeof value[key] == 'object'){
      m_sub(base + "." + key, value[key]);
    }
  }
}

db.system.js.save( { _id : "isArray", value : isArray } );
db.system.js.save( { _id : "m_sub", value : m_sub } );

Step 2: define the map and reduce functions

map = function(){
  for(var key in this) {
    emit(key, null);
    if( isArray(this[key]) || typeof this[key] == 'object'){
      m_sub(key, this[key]);
    }
  }
}

reduce = function(key, stuff){ return null; }

Step 3: run the map reduce and look at results

mr = db.runCommand({"mapreduce" : "things", "map" : map, "reduce" : reduce,"out": "things" + "_keys"});
db[mr.result].distinct("_id");

The results you'll get are:

["_id", "_id.isObjectId", "_id.str", "_id.tojson", "egg", "egg.0", "foo", "foo.bar", "foo.bar.baaaar", "hello", "type", "type.0", "type.1"]

There's one obvious problem here, we're adding some unexpected fields here: 1. the _id data 2. the .0 (on egg and type)

Step 4: Some possible fixes

For problem #1 the fix is relatively easy. Just modify the map function. Change this:

emit(base + "." + key, null); if( isArray...

to this:

if(key != "_id") { emit(base + "." + key, null); if( isArray... }

Problem #2 is a little more dicey. You wanted all keys and technically "egg.0" is a valid key. You can modify m_sub to ignore such numeric keys. But it's also easy to see a situation where this backfires. Say you have an associative array inside of a regular array, then you want that "0" to appear. I'll leave the rest of that solution up to you.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...