Tuesday, May 7, 2013

MongoDB 2.4 - Text Indexes

Full text indexes were added with MongoDB 2.4.  I am working with MongoDB 2.4.3 and I have tested the functionality on my local Windows box.  I have not tested the performance.

For the "employees" collection with the given "typical" document shape:


/* 0 */
{
  "_id" : ObjectId("5189056ab38f56933e7224bb"),
  "_class" : "com.icfi.mongo.data.model.Employee",
  "address" : {
    "_id" : null,
    "addressLine1" : "227 Clifton Ave #-4",
    "city" : "Darby",
    "county" : "Delaware",
    "state" : "PA",
    "zipCode" : "19023"
  },
  "employeeId" : "28241",
  "hireDate" : ISODate("1988-01-28T05:00:00Z"),
  "department" : {
    "_id" : "d004",
    "name" : "Production",
    "managerId" : "110420"
  },
  "title" : "Senior Engineer",
  "salary" : 82927,
  "lastName" : "Baik",
  "firstName" : "Yuguang",
  "middleName" : "M",
  "birthDate" : ISODate("1959-01-04T05:00:00Z")
}
I want to perform full-text searches on the "title" field. First, I need to ensure that the proper "text" index is applied to that field. The following "ensureIndex" command takes care of that.
db["employees"].ensureIndex({"title":"text"})

Next I can perform a "text" search in the Mongo Shell with the format seen below:


db.collection.runCommand( "text", { search: ,
                                    filter: ,
                                    project: ,
                                    limit: ,
                                    language:  } )

My search command is:

db.employees.runCommand( "text", { search: "senior" } )

This runs a case-insensitive full-text search and returns documents containing the word "senior", based on the "title" field.  The default limit is 100 docs returned.  This can be overridden by the limit argument in the text command.  In the Java API it would look something like this:

final DBObject command = new BasicDBObject();
  command.put("text", "employees");
  command.put("search", "SeNiOr");
  // command.put("limit", 2);
  final CommandResult result = db.command(command);

10gen warns about the "text index".  They can grow very large and can adversely effect performance.  At this time, the text index and text command are in beta and not recommended for production use.

No comments:

Post a Comment