Tag Archive for Document Validation

Adding Document Validation Rules Using MongoDB Compass 1.5

Adding Document Validation Rules Using MongoDB Compass 1.5

This post looks at a new feature in MongoDB Compass 1.5 (in beta at the time of writing) which allows document validation rules to be added from the GUI rather from the mongo shell command line. This makes it easy to create and modify rules that ensure that all documents written to a collection contain the data you expect to be there.

Introduction

One of MongoDB’s primary attractions for developers is that it gives them the ability to start application development without first needing to define a formal schema. Operations teams appreciate the fact that they don’t need to perform a time-consuming schema upgrade operation every time the developers need to store a different attribute. For business leaders, the application gets launched much faster, and new features can be rolled out more frequently. MongoDB powers agility.

Many projects reach a point where it’s necessary to enforce rules on what’s being stored in the database – for example, that for any document in a particular collection, you can be certain that specific attributes are present in every document. Reasons for this include:

  • Different development teams can work with the same data, each needing to know what they can expect to find in a particular collection.
  • Development teams working on different applications can be spread over multiple sites, which means that a clear agreement on the format of shared data is important.
  • Development teams from different companies may be working with the same collections; misunderstandings about what data should be present can lead to issues.

As an example, an e-commerce website may centralize product catalog feeds from multiple vendors into a single collection. If one of the vendors alters the format of its product catalog, global catalog searches could fail.

To date, this resulted in developers building their own validation logic – either within the application code (possibly multiple times for different applications) or by adding middleware such as Mongoose.

To address the challenges discussed above, while at the same time maintaining the benefits of a dynamic schema, MongoDB 3.2 introduced document validations. Adding and viewing validation rules required understanding the correct commands to run from the mongo shell’s command line.

MongoDB Compass 1.5 allows users to view, add, and modify document validation rules through its GUI, making them more accessible to both developers and DBAs.

Validating Documents in MongoDB

Document Validation provides significant flexibility to customize which parts of the documents are and are not validated for any collection. For any attribute it might be appropriate to check:

  • That the attribute exists
  • If an attribute does exist, that it is of the correct type
  • That the value is in a particular format (e.g., regular expressions can be used to check if the contents of the string matches a particular pattern)
  • That the value falls within a given range

Further, it may be necessary to combine these checks – for example that the document contains the user’s name and either their email address or phone number, and if the email address does exist, then it must be correctly formed.

Adding the validation checks from the command line is intuitive to developers or DBAs familiar with the MongoDB query language as it uses the same expression syntax as a find query to search the database. For others, it can be a little intimidating.

As an example, the following snippet adds validations to the contacts collection that validates:

  • The year of birth is no later than 1994
  • The document contains a phone number and/or an email address
  • When present, the phone number and email address are strings
db.runCommand({
   collMod: "contacts",
   validator: { 
      $and: [
        {yearOfBirth: {$lte: 1994}},
        {$or: [ 
                  {"contact.phone": { $type: "string"}}, 
                  {"email": { $type: "string"}}
              ]}]
    }})

Note that types can be specified using either a number or a string alias.

Wouldn’t it be nice to be able to define these rules through a GUI rather than from the command line?

Using MongoDB Compass to Add Document Validation Rules

If you don’t already have MongoDB Compass 1.5 (or later) installed, download it and start the application. You’ll then be asked to provide details on how to connect to your database.

MongoDB Compass is free for evaluation and for use in development, for production, a MongoDB Professional of MongoDB Enterprise Advanced subscription is required.

If you don’t have a database to test this on, the simplest option is to create a new MongoDB Atlas cluster. Details on launching a MongoDB Atlas cluster can be found in this post.

Note that MongoDB Compass currently only accepts a single server address rather than the list of replica set members in the standard Atlas connect string and so it’s necessary to explicitly provide Compass with the address of the current primary – find that by clicking on the cluster in the Atlas GUI (Figure 1).

Identify the replica set primary

Figure 1: Identify the replica set primary

Connect MongoDB Compass to MongoDB Atlas

Figure 2: Connect MongoDB Compass to MongoDB Atlas

The connection panel can then be populated as shown in Figure 2.

Load Data and Check in MongoDB Compass

If you don’t already have a populated MongoDB collection, create one now. For example, use curl to download a pre-prepared JSON file containing contact data and use mongoimport to load it into your database:

curl -o contacts.json http://clusterdb.com/upload/contacts.json
mongoimport -h cluster0-shard-00-00-qfovx.mongodb.net -d clusterdb -c contacts --ssl -u billy -p SECRET --authenticationDatabase admin contacts.json

Connect MongoDB Compass to your database (Figure 3).

Connect MongoDB Compass to database

Figure 3: Connect MongoDB Compass to database

Select the contacts data and browse the schema (Figure 4).

Check schema in MongoDB Compass

Figure 4: Check schema in MongoDB Compass

Browse some documents (Figure 5).

Browse documents using MongoDB Compass

Figure 5: Browse documents using MongoDB Compass

Add Document Validation Rules

In this section, we build the document validation rule shown earlier.

Navigate to the Validation tab in MongoDB Compass GUI and select the desired validation action and validation level. The effects of these settings are shown in Figure 6. Any warnings generated by the rules are written to the MongoDB log.

MongoDB document validation configuration parameters

Figure 6: MongoDB document validation configuration parameters

When adding document validation rules to an existing collection, you may want to start with fairly permissive rules so that existing applications aren’t broken before you have chance to clean things up. Once you’re confident that all applications are following the rules you could then become stricter. Figure 7 shows a possible life cycle for a collection.

Life cycle of a MongoDB collection

Figure 7: Life cycle of a MongoDB collection

This post is starting with a new collection and so you can go straight to error/strict as shown in Figure 8.

Set document validation to error/strict

Figure 8: Set document validation to error/strict

Multiple rules for the document can then be added using the GUI (Figure 9). Note that the rule for the email address uses a regular expression (^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$) to test that the address is properly formatted – going further than the original rule.

Add new document validation rule through MongoDB Compass

Figure 9: Add new document validation rule through MongoDB Compass

Clicking UPDATE applies the change and then you can review it by pressing the JSON button (Figure 10).

JSON view of new document validation rule

Figure 10: JSON view of new document validation rule

At this point, a problem appears. Compass has combined the 3 sub-rules with an and relationship but our intent was to test that the document contained either an email address or a phone number and that yearOfBirth was no later than 1994. Fortunately, for these more complex checks, the JSON can be altered directly within Compass:

{
  "$and": [
    {"yearOfBirth": {"$lte": 1994}}, 
    {
      "$or": [
        {"contact.email": {
          "$regex": "^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$",
          "$options": ""
          }
        },
        {
          "$and": [
            {"contact.phone": {"$type": 2}},
            {"contact.email": {"$exists": false}}
          ]
        }
      ]
    }
  ]
}

Paste the refined rule into Compass and press UPDATE (Figure 11).

Manually edit validation rules in MongoDB Compass

Figure 11: Manually edit validation rules in MongoDB Compass

Recall that this rule checks that the yearOfBirth is no later than 1994 and that there is a phone number (formatted as a string)or a properly formatted email address.

Test The Rules

However you write to the database, the document validation rules are applied in the same way – through any of the drivers, or through the mongo shell. For this post we can test the rules directly from the MongoDB Compass GUI, from the DOCUMENTS tab. Select a document and try changing the yearOfBirth to a year later than 1994 as shown in Figure 12.

hange fails document validation

Figure 12: Change fails document validation

Find the Offending Documents Already in the Collection

There are a number of ways to track down existing documents that don’t meet the new rules. A very simple option is to query the database using the negation of the rule definition by wrapping the validation document in a $nor clause:

{"$nor": [
  {
    "$and": [
      {"yearOfBirth": {"$lte": 1994}}, 
      {
        "$or": [
          {"contact.email": {
            "$regex": "^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$",
            "$options": ""
            }
          },
          {"contact.phone": {"$type": 2}}
        ]
      }
    ]
  }]
}

The query document can be pasted into the query bar in Compass and then pressing APPLY reveals that there are 206 documents with yearOfBirth greater than 1994 – Figure 13.

Find documents not matching the validation rules

Figure 13: Find documents not matching the validation rules

Cleaning up Offending Documents

Potentially more problematic is how to clean up the existing documents which do not match the validation rules, as you need to decide what should happen to them. The good news is that the same $nor document used above can be used as a filter when executing your chosen action.

For example, if you decided that the offending documents should not be in the collection at all then this command can be run from the mongo shell command line to delete them:

use clusterdb
db.contacts.remove(
{"$nor": [
  {
    "$and": [
      {"yearOfBirth": {"$lte": 1994}}, 
      {
        "$or": [
          {"contact.email": {
            "$regex": "^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$",
            "$options": ""
            }
          },
          {"contact.phone": {"$type": 2}}
        ]
      }
    ]
  }]
})

Another Example – Coping with Multiple Schema Versions

A tricky problem to solve with RDBMSs is the versioning of data models; with MongoDB it’s very straight-forward to set up validations that can cope with different versions of documents, with each version having a different set of checks applied. In the example validation checks below, the following logic is applied:

  • If the document is unversioned (possibly dating to the time before validations were added), then no checks are applied
  • For version 1, the document is checked to make sure that the name key exists
  • For version 2 documents, the type of the name key is also validated to ensure that it is a string
{"$or": [
  {version: {"$exists": false}},
  {"$and": [
    {version: 1},
    {Name: {"$exists": true}}
  ]},
  {"$and": [
    {version: 2},
    {Name: {"$exists": true, "$type": "string"}}
  ]}
]} 

In this way, multiple versions of documents can exist within the same collection, and the application can lazily up-version them over time. Note that the version attribute is user-defined.

Where MongoDB Document Validation Excels (vs. RDBMSs)

In MongoDB, document validation is simple to set up – especially now that it can be done through the MongoDB Compass GUI. You can avoid the maintenance headache of stored procedures – which for many types of validation would be required in an RDBMS – and because the familiar MongoDB query language is used, there is no new syntax to learn.

The functionality is very flexible and it can enforce constraints on as little or as much of the schema as required. You get the best of both worlds – a dynamic schema for rapidly changing, polymorphic data, with the option to enforce strict validation checks against specific attributes from the onset of your project, or much later on. You can also use the Compass GUI to find and modify individual, pre-existing documents that don’t follow any new rules. If you initially have no validations defined, they can still be added later – even once in production, across thousand of servers.

It is always a concern whether adding extra checks will impact the performance of the system; in our tests, document validation adds a negligible overhead.

So, is all Data Validation Now Done in the Database?

The answer is “probably not” – either because there’s a limit to what can be done in the database or because there will always be a more appropriate place for some checks. Here are some areas to consider:

  • For a good user experience, checks should be made as early and as high up the stack as is sensible. For example, the format of an entered email address should be first checked in the browser rather than waiting for the request to be processed and an attempt made to write it to the database.
  • Any validations which need to compare values between keys, other documents, or external information cannot currently be implemented within the database.
  • Many checks are best made within the application’s business logic – for example “is this user allowed to use these services in their home country”; the checks in the database are primarily there to protect against coding errors.
  • If you need information on why the document failed validation, the developer or application will need to check the document against each of the sub-rules within the collection’s validation rule in turn as the error message doesn’t currently give this level of detail.

Summary

In this post, we’ve taken a look at the powerful document validation functionality that was added back in MongoDB 3.2. We then explored how MongoDB Compass 1.5 adds the convenience of being able to define these rules through an intuitive GUI.

This is just one of the recent enhancements to MongoDB Compass; others include:

A summary of the enhancements added in MongoDB Compass 1.4 & 1.5 can be found in MongoDB 3.4 – What’s New.





Document Validation – Adding Just the Right Amount of Control Over Your MongoDB Documents

This post looks at Document Validation, a new feature in MongoDB 3.2. It introduces the feature together with its benefits and then goes on to step through a tutorial on how to introduce validation to an existing, live MongoDB deployment. This material was orginally published on the MongoDB blog.

Disclaimer

MongoDB’s future product plans are for informational purposes only. MongoDB’s plans may change and you should not rely on them for delivery of a specific feature at a specific time.

Introduction

One of MongoDB’s primary attractions for developers is that it gives them the ability to start application development without first needing to define a formal schema. Operations teams appreciate the fact that they don’t need to perform a time-consuming schema upgrade operation every time the developers need to store a different attribute (as an example, The Weather Channel is now able to launch new features in hours whereas it used to take weeks. For business leaders, the application gets launched much faster, and new features can be rolled out more frequently. MongoDB powers agility.

Many projects reach a point where it’s necessary to enforce rules on what’s being stored in the database – for example, that for any document in a particular collection, you can be assured that certain attributes are present. Reasons for this include:

  • Different development teams working with the same data; each one needing to know what they can expect to find in a particular collection
  • Development teams working on different applications, spread over multiple sites means that a clear understanding of shared data is important
  • Development teams from different companies where misunderstandings about what data should be present can lead to issues

As an example, an e-commerce website may centralize a product catalog feed from each of its vendors into a single collection. If one of the vendors alters the format of its product catalog, the global catalog search could fail.

This has resulted in developers building their own validation logic – either with the application code (possibly multiple times for different applications) or by adding middleware such as Mongoose.

If the database doesn’t enforce rules about the data, development teams need to implement this logic in their applications. However, use of multiple development languages makes it hard to add a validation layer across multiple applications.

To address the challenges discussed above, while at the same time maintaining the benefits of a dynamic schema, MongoDB 3.2 introduces document validation.

Validating Documents in MongoDB 3.2

Note that at the time of writing, MongoDB 3.2 is not yet released but this functionality can be tried out in MongoDB 3.2 which is available for testing only, not production.

Document Validation provides significant flexibility to customize which parts of the documents are and are not validated for any collection. For any key it might be appropriate to check:

  • That a key exists
  • If a key does exist, is it of the correct type
  • That the value is in a particular format (e.g., regular expressions can be used to check if the contents of the string matches a particular pattern)
  • That the value falls within a given range

Further, it may be necessary to combine these checks – for example that the document contains the user’s name and either their email address or phone number, and if the email address does exist, then it must be correctly formed.

Adding the validation checks to a collection is very intuitive to any developer or DBA familiar with MongoDB as it uses the same expression syntax as a find query to search the database. As an example, the following snippet adds validations to the contacts collection that validates:

  • The year of birth is no later than 1994
  • The document contains a phone number and/or an email address
  • When present, the phone number and email addresses are strings
db.runCommand({
   collMod: "contacts",
   validator: { 
      $and: [
        {year_of_birth: {$lte: 1994}},
        {$or: [ 
                  {phone: { $type: "string"}}, 
                  {email: { $type: "string"}}
              ]}]
    }})

When and How to Add Document Validation

Proponents of the waterfall development processes would assert that all of the validations should be added right at the start of the project – certainly before going into production. This is possible, but in more agile approaches, the first version may deploy with no validations and future releases will add new data and checks. Fortunately, MongoDB 3.2 provides a great deal of flexibility in this area.

For existing data, we want to allow the application to continue to operate as we introduce validation into our collections. Therefore, we want to allow updates and simply log failed validations so we can take corrective measures separately if necessary, or take no action.

For new data, we want to ensure the data is valid and therefore return an error if the validation fails.

For any collection, developers or the DBA can choose to specify validation rules for each collection as well as indicating whether failed validations result in a hard error or just a warning – Table 1 shows the available permutations.

Configuration options for controlling how document validations are applied to a collection

Table 1: Configuration Options for Document Validation

Figure 1 illustrates one possible timeline for how the application is developed.

Lifecycle for introducing document validation

Figure 1: Aligning document validation with application lifecycle

Of course, as applications evolve they require additional pieces of data and it will often make sense to add to the documentat validation rules to check that this data is always included. Figure 2 illustrates an example timeline of how this could be managed.

Introducing New Data Together with Validations

Figure 2: Introducing New Data Together with Validations

Coping with Multiple Schema Versions

A tricky problem to solve with RDBMSs is the versioning of data models; with MongoDB it’s very straight-forward to set up validations that can cope with different versions of documents, with each version having a different set of checks applied. In the example validation checks below, the following logic is applied:

  • If the document is unversioned (possibly dating to the time before validations were added), then no checks are applied
  • For version 1, the document is checked to make sure that the name key exists
  • For version 2 documents, the type of the name key is also validated to ensure that it is a string
db.runCommand({
   collMod: "contacts",
   validator:
     {$or: [{version: {"$exists": false}},
            {version: 1,
             $and: [{Name: {"$exists": true}}]
            },
            {version: 2,
             $and: [{Name: {"$exists": true, "$type": 2}}]
            }
          ]
      } 
})

In this way, multiple versions of documents can exist within the same collection, and the application can lazily up-version them over time. Note that the version attribute is user-defined.

Document Validation Limitations in MongoDB 3.2

This is the first release of Document Validation and so it’s inevitable that there are still some things that would be great to add:

  • The current error message is very generic and doesn’t pick out which part of your document failed validation (note that the validation rule for a collection may check several things across many attributes). Jira ticket
  • The validation checks cannot compare one key’s value against another (whether in the same or different documents). For example {salary: {$gte: startingSalary}} is not possible. Jira ticket
  • It is the application or DBA’s responsibility to bring legacy data into compliance with new rules (there are no audits or tools) – the tutorial in this post attempts to show how this can be done.

Where MongoDB Document Validation Excels (vs. RDBMSs)

In MongoDB, Document Validation is simple to set up. There is no need for stored procedures – which for many types of validation would be required in an RDBMS – and because the familiar MongoDB query language is used, there is no new syntax to learn.

The functionality is very flexible and it can enforce constraints on as little or as much of the schema as required. You get the best of both worlds – a dynamic schema for rapidly changing, polymorphic data, with the option to enforce strict validation checks against specific attributes from the onset of your project, or much later on. If you initially have no validations defined, they can still be added later – even once in production, across thousand of servers.

It is always a concern whether adding extra checks will impact the performance of the system; in our tests, document validation adds a negligible overhead.

So, is all Data Validation Now Done in the Database?

The answer is ‘probably not’ – either because there’s a limit to what can be done in the database or because there will always be a more appropriate place for some checks. Here are some areas to consider:

  • For a good user-experience, checks should be made as high up the stack as is sensible. For example, the format of an entered email address should be first checked in the browser rather than waiting for the request to be processed and an attempt made to write it to the database.
  • Any validations which need to compare values between keys, other documents, or external information cannot currently be implemented within the database.
  • Many checks are best made within the application’s business logic – for example “is this user allowed to use these services in their home country”; the checks in the database are primarily there to protect against coding errors.
  • If you need information on why the document failed validation then the application will need to check against each of the sub-rules within collection’s validation rule as the error message will not currently give this level of detail.

Tutorial

The intent of this section is to step you through exactly how document validation can be introduced into an existing production deployment in such a way that there is no impact to your users. It covers:

  • Setting up some test data (not needed for a real deployment)
  • Using MongoDB Compass and the mongo shell to reverse engineer the de facto data model and identify anomalies in the existing documents
  • Defining the appropriate document validation rules
  • Preventing new documents being added which don’t follow the new rules
  • Bring existing documents “up to spec” against the new rules

This section looks at taking an existing, deployed database which currently has no document validations defined. It steps through understanding what the current document structure looks like; deciding on what rules to add and then rolling out those new rules.

As a pre-step add some data to the database (obviously, this isn’t needed if working with your real deployment).

use clusterdb;
db.dropDatabase();
use clusterdb();
db.inventory.insert({ "_id" : 1, "sku" : "abc", 
    "description" : "product 1", "instock" : 120 });
db.inventory.insert({ "_id" : 2, "sku" : "def", 
    "description" : "product 2", "instock" : 80 });
db.inventory.insert({ "_id" : 3, "sku" : "ijk", 
    "description" : "product 3", "instock" : 60 });
db.inventory.insert({ "_id" : 4, "sku" : "jkl", 
    "description" : "product 4", "instock" : 70 });
db.inventory.insert({ "_id" : 5, "sku" : null, 
    "description" : "Incomplete" });
db.inventory.insert({ "_id" : 6 });

for (i=1000; i<2000; i++) {
  db.orders.insert({
    _id: i,
    item: "abc", 
    price: i % 50,
    quantity: i % 5
  });
};

for (i=2000; i<3000; i++) {
  db.orders.insert({
    _id: i,
    item: "jkl", 
    price: i % 30,
    quantity: Math.floor(10 * Math.random()) + 1
  });
};

for (i=3000; i<3200; i++) {
  db.orders.insert({
    _id: i,
    price: i % 30,
    quantity: Math.floor(10 * Math.random()) + 1
  });
};

for (i=3200; i<3500; i++) {
  db.orders.insert({
    _id: i,
    item: null,
    price: i % 30,
    quantity: Math.floor(10 * Math.random()) + 1
  });
};

for (i=3500; i<4000; i++) {
  db.orders.insert({
    _id: i,
    item: "abc",
    price: "free",
    quantity: Math.floor(10 * Math.random()) + 1
  });
};

for (i=4000; i<4250; i++) {
  db.orders.insert({
    _id: i,
    item: "abc",
    price: "if you have to ask....",
    quantity: Math.floor(10 * Math.random()) + 1
  });
};

The easiest way to start understanding the de facto schema for your database is to use MongoDB Compass. Simply connect Compass to your mongod (or mongos if you’re using sharding) and select the database/collection you’d like to look into. To see MongoDB Compass in action – view this demo video.

As shown in Figure 3, there are typically four keys in each document from the clusterdb.orders table:

  • _id is always present and is a number
  • item is normally present and is a string (either “abc” or “jkl”) but is occasionally null or missing altogether (undefined)
  • price is always present and is in most cases a number (the histogram shows how the values are distributed between 0 and 49) but in some cases it’s a string
  • quantity is always present and is a number

Viewing the Document Schema using MongoDB Compass

Figure 3: Viewing the Document Schema using MongoDB Compass

For this tutorial, we’ll focus on the price. By clicking on the string label, Compass will show us more information about the string content for price – this is shown in Figure 4.

Drilling Down into string Values

Figure 4: Drilling Down into string Values

Compass shows us that:

  • For those instances of price which are strings, the common values are “free” and “if you have to ask….”.
  • If you click on one of those values, a query expression is formed and clicking “Apply” runs that query and now Compass will show you information only for that subset of documents. For example, where price == "if you have to ask...." (see Figure 5).
  • By selecting multiple attributes, you can build up fairly complex queries.
  • The query you build visually is printed at the top so you can easily copy/paste into other contexts like the shell.

Formulating Search Expressions with MongoDB Compass

Figure 5: Formulating Search Expressions with MongoDB Compass

If applications are to work with the price from these documents then it would be simpler it it was always set to a numerical value, and so this is something that should be fixed.

Before cleaning up the existing documents, the application should be updated to ensure numerical values are stored in the price field. We can do this by adding a new validation rule to the collection. We want this rule to:

  • Allow changes to existing invalid documents
  • Prevent inserts of new documents which violate validation rules
  • Set up a very simple document validation rule that checks that price exists and contains a double – see the enumeration of MongoDB BSON types

These steps should be run from the mongo shell:

db.orders.runCommand("collMod", 
                   {validationLevel: "moderate", 
                    validationAction: "error"});

db.runCommand({collMod: "orders", 
               validator: {
                  price: {$exists: true},
                  price: {$type: 1}
                }
              });

The validation rules for this collection can now be checked:

db.getCollectionInfos({name:"orders"})
[
  {
    "name": "orders",
    "options": {
      "validator": {
        "price": {
          "$type": 1
        }
      },
      "validationLevel": "moderate",
      "validationAction": "error"
    }
  }
]

Now that this has been set up, it’s possible to check that we can’t add a new document that breaks the rule:

db.orders.insert({
    "_id": 6666, 
    "item": "jkl", 
    "price": "rogue",
    "quantity": 1 });

Document failed validation
WriteResult({
  "nInserted": 0,
  "writeError": {
    "code": 121,
    "errmsg": "Document failed validation"
  }
})

But it’s OK to modify an existing document that does break the rule:

db.orders.findOne({price: {$type: 2}});

{
  "_id": 3500,
  "item": "abc",
  "price": "free",
  "quantity": 5
}

> db.orders.update(
    {_id: 3500},
    {$set: {quantity: 12}});

Updated 1 existing record(s) in 5ms
WriteResult({
  "nMatched": 1,
  "nUpserted": 0,
  "nModified": 1
})

Now that the application is no longer able to store new documents that break the new rule, it’s time to clean up the “legacy” documents. At this point, it’s important to point out that Compass works on a random sample of the documents in a collection (this is what allows it to be so quick). To make sure that we’re fixing all of the documents, we check from the mongo shell. As the following commands could consume significant resources, it may make sense to run them on a secondary):

secondary> db.orders.aggregate([
    {$match: {
      price: {$type: 2}}},
    {$group: {
      _id: "$price", 
      count: {$sum:1}}}
  ])

{ "_id" : "if you have to ask....", "count" : 250 }
{ "_id" : "free", "count" : 500 }

The number of exceptions isn’t too high and so it is safe to go ahead and fix up the data without consuming too many resources:

db.orders.update(
    {price:"free"},
    {$set: {price: 0}},
    {multi: true});

db.orders.update(
    {price:"if you have to ask...."},
    {$set: {price: 1000000}},
    {multi: true});

At this point it’s now safe to enter the strict mode where any inserts or updates will cause an error if the document being stored doesn’t follow the rules:

db.orders.runCommand("collMod", 
                   {validationLevel: "strict", 
                    validationAction: "error"});

Next Steps

Hopefully this has given you a sense for what the Document Validation functionality offers and started you thinking about how it could be applied to your application and database. I’d encourage you to read up more on the topic and these are some great resources:





Free Webinar: Document Validation in MongoDB 3.2

Defining MongoDB Document Validation RulesI’ll be presenting a free webinar on Thursday29th Octover – the new Document Validation feature coming in MongoDB 3.2.

Thursday, October 29, 2015
9am PDT | 12pm EDT | 4pm GMT

One of MongoDB’s primary attractions for developers is that it gives them the ability to start application development without needing to define a formal, up-front schema. Operations teams appreciate the fact that they don’t need to perform a time-consuming schema upgrade operation every time the developers need to store a different attribute.

Some projects reach a point where it’s necessary to define rules on what’s being stored in the database. This webinar explains how MongoDB 3.2 allows that document validation work to be performed by the database rather than in the application code.

This webinar focuses on the benefits of using document validation: how to set up the rules using the familiar MongoDB Query Language and how to safely roll it out into an existing, mature production environment.

During the webinar, you will get chance to submit your questions and get them answered by the experts.

The webinar is free but you need to register in advance here.