The Modern Application Stack – Part 4: Building a Client UI Using Angular 2 (formerly AngularJS) & TypeScript

Introduction

This is the fourth in a series of blog posts examining technologies such as Angular that are driving the development of modern web and mobile applications.

“Modern Application Stack – Part 1: Introducing The MEAN Stack” introduced the technologies making up the MEAN (MongoDB, Express, Angular, Node.js) and MERN (MongoDB, Express, React, Node.js) Stacks, why you might want to use them, and how to combine them to build your web application (or your native mobile or desktop app).

The remainder of the series is focussed on working through the end to end steps of building a real (albeit simple) application. – MongoPop. Part 2: Using MongoDB With Node.js created an environment where we could work with a MongoDB database from Node.js; it also created a simplified interface to the MongoDB Node.js Driver. Part 3: Building a REST API with Express.js built on Part 2 by using Express.js to add a REST API which will be used by the clients that we implement in the final posts.

This post demonstrates how to use Angular 2 (the evolution of Angular.js) to implement a remote web-app client for the Mongopop application.

Angular 2 (recap)

Angular, originally created and maintained by Google, runs your JavaScript code within the user’s web browsers to implement a reactive user interface (UI). A reactive UI gives the user immediate feedback as they give their input (in contrast to static web forms where you enter all of your data, hit “Submit” and wait.

Reactive Angular 2 application

Version 1 of Angular was called AngularJS but it was shortened to Angular in Angular 2 after it was completely rewritten in Typescript (a superset of JavaScript) – Typescript is now also the recommended language for Angular apps to use.

You implement your application front-end as a set of components – each of which consists of your JavaScript (Typescript) code and an HTML template that includes hooks to execute and use the results from your Typescript functions. Complex application front-ends can be crafted from many simple (optionally nested) components.

Angular application code can also be executed on the back-end server rather than in a browser, or as a native desktop or mobile application.

MEAN Stack Architecture

Downloading, running, and using the Mongopop application

The Angular client code is included as part if the Mongopop package installed in Part 2: Using MongoDB With Node.js.

The back-end application should be run in the same way as in parts 2 & 3. The client software needs to be transpiled from Typescript to JavaScript – the client software running in a remote browser can then download the JavaScript files and execute them.

The existing package.json file includes a script for transpiling the Angular 2 code:

  "scripts": {
        ...
    "tsc:w": "cd public && npm run tsc:w",
        ...  
},

That tsc:w delegates the work to a script of the same name defined in public/package.json;

  "scripts": {
        ...
    "tsc:w": "tsc -w",
        ...  
},

tsc -w continually monitors the client app’s Typescript files and reruns the transpilation every time they are edited.

To start the continual transpilation of the Angular 2 code:

npm run rsc:w

Component architecture of the Mongopop Angular UI

Angular applications (both AngularJS and Angular2) are built from one or more, nested components – Mongopop is no exception:

Mongopop Angular2 Components

The main component (AppComponent)contains the HTML and logic for connecting to the database and orchestrating its sub-components. Part of the definition of AppComponent is meta data/decoration to indicate that it should be loaded at the point that a my-app element (<my-app></my-app>) appears in the index.html file (once the component is running, its output replaces whatever holding content sits between <my-app> and </my-app>). AppComponent is implemented by:

  • A Typescript file containing the AppComponent class (including the data members, initialization code, and member functions
  • A HTML file containing
    • HTML layout
    • Rendering of data members
    • Elements to be populated by sub-components
    • Data members to be passed down for use by sub-components
    • Logic (e.g. what to do when the user changes the value in a form)
  • (Optionally) a CSS file to customise the appearance of the rendered content

Mongopop is a reasonably flat application with only one layer of sub-components below AppComponent, but more complex applications may nest deeper.

Changes to a data value by a parent component will automatically be propagated to a child – it’s best practice to have data flow in this direction as much as possible. If a data value is changed by a child and the parent (either directly or as a proxy for one of its other child components) needs to know of the change, then the child triggers an event. That event is processed by a handler registered by the parent – the parent may then explicitly act on the change, but even if it does nothing explicit, the change flows to the other child components.

This table details what data is passed from AppComponent down to each of its children and what data change events are sent back up to AppComponent (and from there, back down to the other children):

Flow of data between Angular components
Child component Data passed down Data changes passed back up
AddComponent
Data service Collection name
Collection name
Mockaroo URL
CountComponent
Data service Collection name
Collection name
UpdateComponent
Data service Collection name
Collection name
SampleComponent
Data service Collection name
Collection name

What are all of these files?

To recap, the files and folders covered earlier in this series:

  • package.json: Instructs the Node.js package manager (npm) what it needs to do; including which dependency packages should be installed
  • node_modues: Directory where npm will install packages
  • node_modues/mongodb: The MongoDB driver for Node.js
  • node_modues/mongodb-core: Low-level MongoDB driver library; available for framework developers (application developers should avoid using it directly)
  • javascripts/db.js: A JavaScript module we’ve created for use by our Node.js apps (in this series, it will be Express) to access MongoDB; this module in turn uses the MongoDB Node.js driver.
  • config.js: Contains the application–specific configuration options
  • bin/www: The script that starts an Express application; this is invoked by the npm start script within the package.json file. Starts the HTTP server, pointing it to the app module in app.js
  • app.js: Defines the main back-end application module (app). Configures:
    • That the application will be run by Express
    • Which routes there will be & where they are located in the file system (routes directory)
    • What view engine to use (Jade in this case)
    • Where to find the views to be used by the view engine (views directory)
    • What middleware to use (e.g. to parse the JSON received in requests)
    • Where the static files (which can be read by the remote client) are located (public directory)
    • Error handler for queries sent to an undefined route
  • views: Directory containing the templates that will be used by the Jade view engine to create the HTML for any pages generated by the Express application (for this application, this is just the error page that’s used in cases such as mistyped routes (“404 Page not found”))
  • routes: Directory containing one JavaScript file for each Express route
    • routes/pop.js: Contains the Express application for the /pop route; this is the implementation of the Mongopop REST API. This defines methods for all of the supported route paths.
  • public: Contains all of the static files that must be accessible by a remote client (e.g., our Angular to React apps).

Now for the new files that implement the Angular client (note that because it must be downloaded by a remote browser, it is stored under the public folder):

  • public/package.json: Instructs the Node.js package manager (npm) what it needs to do; including which dependency packages should be installed (i.e. the same as /package.json but this is for the Angular client app)
  • public/index.html: Entry point for the application; served up when browsing to http://<backend-server>/. Imports public/system.config.js
  • public/system.config.js: Configuration information for the Angular client app; in particular defining the remainder of the directories and files:
    • public/app: Source files for the client application – including the Typescript files (and the transpiled JavaScript files) together the HTML and any custom CSS files. Combined, these define the Angular components.
      • public/app/main.ts: Entry point for the Angular app. Bootstraps public/app/app.module.ts
      • public/app/app.module.ts: Imports required modules, declares the application components and any services. Declares which component to bootstrap (AppComponent which is implemented in public/app/app.component.*)
      • public/app/app.component.html: HTML template for the top-level component. Includes elements that are replaced by sub-components
      • public/app/app.component.ts: Implements the AppComponent class for the top-level component
      • public/app/X.component.html: HTML template for sub-component X
      • public/app/X.component.ts: Implements the class for sub-component X
      • AddDocsRequest.ts, ClientConfig.ts, CountDocsRequest.ts, MongoResult.ts, MongoReadResult.ts, SampleDocsRequest.ts, & UpdateDocsRequest.ts: Classes that match the request parameters and response formats of the REST API that’s used to access the back-end
      • data.service.ts: Service used to access the back-end REST API (mostly used to access the database)
      • X.js* & *X.js.map: Files which are generated by the transpilation of the Typescript files.
    • public/node-modules: Node.js modules used by the Angular app (as opposed to the Express, server-side Node.js modules)
    • public/styles.css: CSS style sheet (imported by public/index.html) – applies to all content in the home page, not just content added by the components
    • public/stylesheets/styles.css: CSS style sheet (imported by public/app/app.component.ts and the other components) – note that each component could have their own, specialized style sheet instead

“Boilerplate” files and how they get invoked

This is an imposing number of new files and this is one of the reasons that Angular is often viewed as the more complex layer in the application stack. One of the frustrations for many developers, is the number of files that need to be created and edited on the client side before your first line of component/application code is executed. The good news is that there is a consistent pattern and so it’s reasonable to fork you app from an existing project – the Mongopop app can be cloned from GitHub or, the Angular QuickStart can be used as your starting point.

As a reminder, here is the relationship between these common files (and our application-specific components):

Angular2 boilerplate files

Contents of the “boilerplate” files

This section includes the contents for each of the non-component files and then remarks on some of the key points.

public/package.json

The scripts section defines what npm should do when you type npm run <command-name> from the command line. Of most interest is the tsc:w script – this is how the transpiler is launched. After transpiling all of the .ts Typescript files, it watches them for changes – retranspiling as needed.

Note that the dependencies are for this Angular client. They will be installed in public/node_modules when npm install is run (for Mongopop, this is done automatically when building the full project ).

public/index.html

Focussing on the key lines, the application is started using the app defined in systemjs.config.js:

And the output from the application replaces the placeholder text in the my-app element:

<my-app>Loading MongoPop client app...</my-app>

public/systemjs.config.js

packages.app.main is mapped to public/app/main.js – note that main.js is referenced rather than main.ts as it is always the transpiled code that is executed. This is what causes main.ts to be run.

public/app/main.ts

This simply imports and bootstraps the AppModule class from public/app/app.module.ts (actually app.module.js)

public/app/app.module.ts

This is the first file to actually reference the components which make up the Mongopop application!

Note that NgModule is the core module for Angular and must always be imported; for this application BrowserModule, HttpModule, and FormsModule are also needed.

The import commands also bring in the (.js) files for each of the components as well as the data service.

Following the imports, the @NgModule decorator function takes a JSON object that tells Angular how to run the code for this module (AppModule) – including the list of imported modules, components, and services as well as the module/component needed to bootstrap the actual application (AppComponent).

Typescript & Observables (before getting into component code)

As a reminder from “Modern Application Stack – Part 1: Introducing The MEAN Stack”; the most recent, widely supported version is ECMAScript 6 – normally referred to as /ES6/. ES6 is supported by recent versions of Chrome, Opera, Safari, and Node.js). Some platforms (e.g. Firefox and Microsoft Edge) do not yet support all features of ES6. These are some of the key features added in ES6:

  • Classes & modules
  • Promises – a more convenient way to handle completion or failure of synchronous function calls (compared to callbacks)
  • Arrow functions – a concise syntax for writing function expressions
  • Generators – functions that can yield to allow others to execute
  • Iterators
  • Typed arrays

Typescript is a superset of ES6 (JavaScript); adding static type checking. Angular 2 is written in Typescript and Typescript is the primary language to be used when writing code to run in Angular 2.

Because ES6 and Typescript are not supported in all environments, it is common to transpile the code into an earlier version of JavaScript to make it more portable. tsc is used to transpile Typescript into JavaScript.

And of course, JavaScript is augmented by numerous libraries. The Mongopop Angular 2 client uses Observables from the RxJS reactive libraries which greatly simplify making asynchronous calls to the back-end (a pattern historically referred to as AJAX).

RxJS Observables fulfil a similar role to ES6 promises in that they simplify the code involved with asynchronous function calls (removing the need to explicitly pass callback functions). Promises are more contained than Observables, they make a call and later receive a single signal that the asynchronous activity triggered by the call succeeded or failed. Observables can have a more complex lifecycle, including the caller receiving multiple sets of results and the caller being able to cancel the Observable.

The Mongopop application uses two simple patterns when calling functions that return an Observable; the first is used within the components to digest the results from our own data service:

In Mongopop’s use of Observables, we don’t have anything to do in the final arrow function and so don’t use it (and so it could have used the second pattern instead – but it’s interesting to see both).

The second pattern is used within the data service when making calls to the Angular 2 http module (this example also shows how we return an Observable back to the components):

Calling the REST API

The DataService class hides the communication with the back-end REST API; serving two purposes:

  • Simplifying all of the components’ code
  • Shielding the components’ code from any changes in the REST API signature or behavior – that can all be handled within the DataService

By adding the @Injectable decorator to the class definition, any member variables defined in the arguments to the class constructor function will be automatically instantiated (i.e. there is no need to explicitly request a new Http object):

After the constructor has been called, methods within the class can safely make use of the http data member.

As a reminder from Part 3: Building a REST API with Express.js, this is the REST API we have to interact with:

Express routes implemented for the Mongopop REST API
Route Path HTTP Method Parameters Response Purpose

                      
/pop/
GET
{
"AppName": "MongoPop",
"Version": 1.0
}
        
Returns the version of the API.
/pop/ip
GET
{"ip": string}
Fetches the IP Address of the server running the Mongopop backend.
/pop/config
GET
{
mongodb: {
    defaultDatabase: string,
    defaultCollection: string,
    defaultUri: string
},
mockarooUrl: string
}
        
Fetches client-side defaults from the back-end config file.
/pop/addDocs
POST
{
MongoDBURI: string;
collectionName: string;
dataSource: string;
numberDocs: number;
unique: boolean;
}
        
{
success: boolean;
count: number;
error: string;
}
        
Add `numberDocs` batches of documents, using documents fetched from `dataSource`
/pop/sampleDocs
POST
{
MongoDBURI: string;
collectionName: string;
numberDocs: number;
}
        
{
success: boolean;   
documents: string;
error: string;
}
        
Read a sample of the documents from a collection.
/pop/countDocs
POST
{
MongoDBURI: string; 
collectionName: string;
}
        
{
success: boolean;   
count: number;
error: string;
}
        
Counts the number of documents in the collection.
/pop/updateDocs
POST
{
MongoDBURI: string;
collectionName: string;
matchPattern: Object;
dataChange: Object;
threads: number;
}
        
{
success: boolean;
count: number;
error: string;
}
        
Apply an update to all documents in a collection
which match a given pattern

Most of the methods follow a very similar pattern and so only a few are explained here; refer to the DataService class to review the remainder.

The simplest method retrieves a count of the documents for a given collection:

This method returns an Observable, which in turn delivers an object of type MongoResult. MongoResult is defined in MongoResult.ts:

The pop/count PUT method expects the request parameters to be in a specific format (see earlier table); to avoid coding errors, another Typescript class is used to ensure that the correct parameters are always included – CountDocsRequest:

http.post returns an Observable. If the Observable achieves a positive outcome then the map method is invoked to convert the resulting data (in this case, simply parsing the result from a JSON string into a Typescript/JavaScript object) before automatically passing that updated result through this method’s own returned Observable.

The timeout method causes an error if the HTTP request doesn’t succeed or fail within 6 minutes.

The catch method passes on any error from the HTTP request (or a generic error if error.toString() is null) if none exists.

The updateDBDocs method is a little more complex – before sending the request, it must first parse the user-provided strings representing:

  • The pattern identifying which documents should be updated
  • The change that should be applied to each of the matching documents

This helper function is used to parse the (hopefully) JSON string:

If the string is a valid JSON document then tryParseJSON returns an object representation of it; if not then it returns an error.

A new class (UpdateDocsRequest) is used for the update request:

updateDBDocs is the method that is invoked from the component code:

After converting the received string into objects, it delegates the actual sending of the HTTP request to sendUpdateDocs:

A simple component that accepts data from its parent

Recall that the application consists of five components: the top-level application which contains each of the add, count, update, and sample components.

When building a new application, you would typically start by designing the the top-level container and then work downwards. As the top-level container is the most complex one to understand, we’ll start at the bottom and then work up.

A simple sub-component to start with is the count component:

Mongopop Angular2 component
public/app/count.component.html defines the elements that define what’s rendered for this component:

You’ll recognise most of this as standard HTML code.

The first Angular extension is for the single input element, where the initial value (what’s displayed in the input box) is set to {{MongoDBCollectionName}}. Any name contained within a double pair of braces refers to a data member of the component’s class (public/app/count.component.ts).

When the button is clicked, countDocs (a method of the component’s class) is invoked with CountCollName.value (the current contents of the input field) passed as a parameter.

Below the button, the class data members of DocumentCount and CountDocError are displayed – nothing is actually rendered unless one of these has been given a non-empty value. Note that these are placed below the button in the code, but they would still display the resulting values if they were moved higher up – position within the HTML file doesn’t impact logic flow. Each of those messages is given a class so that they can be styled differently within the component’s CSS file:

Angular 2 success message

Angular 2 error message

The data and processing behind the component is defined in public/app/count.component.ts:

Starting with the @component decoration for the class:

This provides meta data for the component:

  • selector: The position of the component within the parent’s HTML should be defined by a <my-count></my-count> element.
  • templateUrl: The HMTL source file for the template (public/app/count.component.ts in this case – public is dropped as the path is relative)
  • styleUrls: The CSS file for this component – all components in this application reference the same file: public/stylesheets/style.css

The class definition declares that it implements the OnInit interface; this means that its ngOnInit() method will be called after the browser has loaded the component; it’s a good place to perform any initialization steps. In this component, it’s empty and could be removed.

The two data members used for displaying success/failure messages are initialized to empty strings:

this.DocumentCount = "";
this.CountDocError = "";

Recall that data is passed back and forth between the count component and its parent:

Flow of data between Angular components
Child component Data passed down Data changes pased back up
CountComponent
Data service Collection name
Collection name

To that end, two class members are inherited from the parent component – indicated by the @Input() decoration:

// Parameters sent down from the parent component (AppComponent)
@Input() dataService: DataService;
@Input() MongoDBCollectionName: string;

The first is an instance of the data service (which will be used to request the document count); the second is the collection name that we used in the component’s HTML code. Note that if either of these are changed in the parent component then the instance within this component will automatically be updated.

When the name of the collection is changed within this component, the change needs to be pushed back up to the parent component. This is achieved by declaring an event emitter (onCollection):

Recall that the HTML for this component invokes a member function: countDocs(CountCollName.value) when the button is clicked; that function is implemented in the component class:

After using the data service to request the document count, either the success or error messages are sent – depending on the success/failure of the requested operation. Note that there are two layers to the error checking:

  1. Was the network request successful? Errors such as a bad URL, out of service back-end, or loss of a network connection would cause this check to fail.
  2. Was the back-end application able to execute the request successfully? Errors such as a non-existent collection would cause this check to fail.

Note that when this.CountDocError or this.DocumentCount are written to, Angular will automatically render the new values in the browser.

Passing data down to a sub-component (and receiving changes back)

We’ve seen how CountComponent can accept data from its parent and so the next step is to look at that parent – AppComponent.

The HTML template app.component.html includes some of its own content, such as collecting database connection information, but most of it is delegation to other components. For example, this is the section that adds in CountComponent:

Angular will replace the <my-count></my-count> element with CountComponent; the extra code within that element passes data down to that sub-component. For passing data members down, the syntax is:

[name-of-data-member-in-child-component]="name-of-data-member-in-this-component"

As well as the two data members, a reference to the onCollection event handler is passed down (to allow CountComponent to propagate changes to the collection name back up to this component). The syntax for this is:

(name-of-event-emitter-in-child-component)="name-of-event-handler-in-this-component($event)"

As with the count component, the main app component has a Typescript class – defined in app.component.ts – in addition to the HTML file. The two items that must be passed down are the data service (so that the count component can make requests of the back-end) and the collection name – these are both members of the AppComponent class.

The dataService object is implicitly created and initialized because it is a parameter of the class’s constructor, and because the class is decorated with @Injectable:

MongoDBCollectionName is set during component initialization within the ngOnInit() method by using the data service to fetch the default client configuration information from the back-end:

Finally, when the collection name is changed in the count component, the event that it emits gets handled by the event handler called, onCollection, which uses the new value to update its own data member:

Conditionally including a component

It’s common that a certain component should only be included if a particular condition is met. Mongopop includes a feature to allow the user to apply a bulk change to a set of documents – selected using a pattern specified by the user. If they don’t know the typical document structure for the collection then it’s unlikely that they’ll make a sensible change. Mongopop forces them to first retrieve a sample of the documents before they’re given the option to make any changes.

The ngIf directive can be placed within the opening part of an element (in this case a <div>) to make that element conditional. This approach is used within app.component.html to only include the update component if the DataToPlayWith data member is TRUE:

Note that, as with the count component, if the update component is included then it’s passed the data service and collection name and that it also passes back changes to the collection name.

Angular includes other directives that can be used to control content; ngFor being a common one as it allows you to iterate through items such as arrays:

Returning to app.component.html, an extra handler (onSample) is passed down to the sample component:

sample.component.html is similar to the HTML code for the count component but there is an extra input for how many documents should be sampled from the collection:

On clicking the button, the collection name and sample size are passed to the sampleDocs method in sample.component.ts which (among other things) emits an event back to the AppComponent‘s event handler using the onSample event emitter:

Other code highlights

Returning to app.component.html; there is some content there in addition to the sub-components:

Most of this code is there to allow a full MongoDB URI/connection string to be built based on some user-provided attributes. Within the input elements, two event types (keyup & change) make immediate changes to other values (without the need for a page refresh or pressing a button):

Reactive Angular 2 Component

The actions attached to each of these events call methods from the AppComponent class to set the data members – for example the setDBName method (from app.component.ts):

In addition to setting the dBInputs.MongoDBDatabaseName value, it also invokes the data service method calculateMongoDBURI (taken from data.service.ts ):

This method is run by the handler associated with any data member that affects the MongoDB URI (base URI, database name, socket timeout, connection pool size, or password). Its purpose is to build a full URI which will then be used for accessing MongoDB; if the URI contains a password then a second form of the URI, MongoDBURIRedacted has the password replaced with **********.

It starts with a test as to whether the URI has been left to the default localhost:27017 – in which case it’s assumed that there’s no need for a username or password (obviously, this shouldn’t be used in production). If not, it assumes that the URI has been provided by the MongoDB Atlas GUI and applies these changes:

  • Change the database name from <DATATBASE> to the one chosen by the user.
  • Replace <PASSWORD> with the real password (and with ********** for the redacted URI).
  • Add the socket timeout parameter.
  • Add the connection pool size parameter.

Testing & debugging the Angular application

Now that the full MEAN stack application has been implemented, you can test it from within your browser:

Debugging the Angular 2 client is straightforward using the Google Chrome Developer Tools which are built into the Chrome browser. Despite the browser executing the transpiled JavaScript the Dev Tools allows you to browse and set breakpoints in your Typescript code:

Summary & what’s next in the series

Previous posts stepped through building the Mongopop application back-end. This post describes how to build a front-end client using Angular 2. At this point, we have a complete, working, MEAN stack application.

The coupling between the front and back-end is loose; the client simply makes remote, HTTP requests to the back-end service – using the interface created in Part 3: Building a REST API with Express.js.

This series will finish out by demonstrating alternate methods to implement front-ends; using ReactJS for another browser-based UI (completing the MERN stack) and then more alternative methods.

Continue following this blog series to step through building the remaining stages of the Mongopop application:





The Modern Application Stack – Part 3: Building a REST API Using Express.js

Introduction

This is the third in a series of blog posts examining the technologies that are driving the development of modern web and mobile applications.

Part 1: Introducing The MEAN Stack (and the young MERN upstart) introduced the technologies making up the MEAN (MongoDB, Express, Angular, Node.js) and MERN (MongoDB, Express, React, Node.js) Stacks, why you might want to use them, and how to combine them to build your web application (or your native mobile or desktop app).

The remainder of the series is focused on working through the end to end steps of building a real (albeit simple) application. – MongoPop. Part 2: Using MongoDB With Node.js created an environment where we could work with a MongoDB database from Node.js; it also created a simplified interface to the MongoDB Node.js Driver.

This post builds on from those first posts by using Express to build a REST API so that a remote client can work with MongoDB. You will be missing a lot of context if you have skipped those posts – it’s recommended to follow through those first.

The REST API

A Representational State Transfer (REST) interface provides a set of operations that can be invoked by a remote client (which could be another service) over a network, using the HTTP protocol. The client will typically provide parameters such as a string to search for or the name of a resource to be deleted.

Many services provide a REST API so that clients (their own and those of 3rd parties) and other services can use the service in a well defined, loosely coupled manner. As an example, the Google Places API can be used to search for information about a specific location:

Breaking down the URI used in that curl request:

  • No method is specified and so the curl command defaults to a HTTP GET.
  • maps.googleapis.com is the address of the Google APIs service.
  • /maps/api/place/details/json is the route path to the specific operation that’s being requested.
  • placeid=ChIJKxSwWSZgAUgR0tWM0zAkZBc is a parameter (passed to the function bound to this route path), identifying which place we want to read the data for.
  • key=AIzaSyC53qhhXAmPOsxc34WManoorp7SVN_Qezo is a parameter indicating the Google API key, verifying that it’s a registered application making the request (Google will also cap, or bill for, the number of requests made using this key).

There’s a convention as to which HTTP method should be used for which types of operation:

  • GET: Fetches data
  • POST: Adds new data
  • PUT: Updates data
  • DELETE: Removes data

Mongopop’s REST API breaks this convention and uses POST for some read requests (as it’s simpler passing arguments than with GET).

These are the REST operations that will be implemented in Express for Mongopop:

Express routes implemented for the Mongopop REST API
Route Path HTTP Method Parameters Response Purpose

                      
/pop/
GET
{
"AppName": "MongoPop",
"Version": 1.0
}
        
Returns the version of the API.
/pop/ip
GET
{"ip": string}
Fetches the IP Address of the server running the Mongopop backend.
/pop/config
GET
{
mongodb: {
    defaultDatabase: string,
    defaultCollection: string,
    defaultUri: string
},
mockarooUrl: string
}
        
Fetches client-side defaults from the back-end config file.
/pop/addDocs
POST
{
MongoDBURI: string;
collectionName: string;
dataSource: string;
numberDocs: number;
unique: boolean;
}
        
{
success: boolean;
count: number;
error: string;
}
        
Add `numberDocs` batches of documents, using documents fetched from `dataSource`
/pop/sampleDocs
POST
{
MongoDBURI: string;
collectionName: string;
numberDocs: number;
}
        
{
success: boolean;   
documents: string;
error: string;
}
        
Read a sample of the documents from a collection.
/pop/countDocs
POST
{
MongoDBURI: string; 
collectionName: string;
}
        
{
success: boolean;   
count: number;
error: string;
}
        
Counts the number of documents in the collection.
/pop/updateDocs
POST
{
MongoDBURI: string;
collectionName: string;
matchPattern: Object;
dataChange: Object;
threads: number;
}
        
{
success: boolean;
count: number;
error: string;
}
        
Apply an update to all documents in a collection
which match a given pattern

Express

Express is the web application framework that runs your back-end application (JavaScript) code. Express runs as a module within the Node.js environment.

Express can handle the routing of requests to the right functions within your application (or to different apps running in the same environment).

You can run the app’s full business logic within Express and even use an optional view engine to generate the final HTML to be rendered by the user’s browser. At the other extreme, Express can be used to simply provide a REST API – giving the front-end app access to the resources it needs e.g., the database.

The Mongopop application uses Express to perform two functions:

  • Send the front-end application code to the remote client when the user browses to our app
  • Provide a REST API that the front-end can access using HTTP network calls, in order to access the database

Downloading, running, and using the application

The application’s Express code is included as part of the Mongopop package installed in Part 2: Using MongoDB With Node.js.

What are all of these files?

A reminder of the files described in Part 2:

  • package.json: Instructs the Node.js package manager (npm) on what it needs to do; including which dependency packages should be installed
  • node_modues: Directory where npm will install packages
  • node_modues/mongodb: The MongoDB driver for Node.js
  • node_modues/mongodb-core: Low-level MongoDB driver library; available for framework developers (application developers should avoid using it directly)
  • javascripts/db.js: A JavaScript module we’ve created for use by our Node.js apps (in this series, it will be Express) to access MongoDB; this module in turn uses the MongoDB Node.js driver.

Other files and directories that are relevant to our Express application:

  • config.js: Contains the application–specific configuration options
  • bin/www: The script that starts an Express application; this is invoked by the npm start script within the package.json file. Starts the HTTP server, pointing it to the app module in app.js
  • app.js: Defines the main application module (app). Configures:
    • That the application will be run by Express
    • Which routes there will be & where they are located in the file system (routes directory)
    • What view engine to use (Jade in this case)
    • Where to find the /views/ to be used by the view engine (views directory)
    • What middleware to use (e.g. to parse the JSON received in requests)
    • Where the static files (which can be read by the remote client) are located (public directory)
    • Error handler for queries sent to an undefined route
  • views: Directory containing the templates that will be used by the Jade view engine to create the HTML for any pages generated by the Express application (for this application, this is just the error page that’s used in cases such as mistyped routes (“404 Page not found”))
  • routes: Directory containing one JavaScript file for each Express route
    • routes/pop.js: Contains the Express application for the /pop route; this is the implementation of the Mongopop REST API. This defines methods for all of the supported route paths.
  • public: Contains all of the static files that must be accessible by a remote client (e.g., our Angular to React apps). This is not used for the REST API and so can be ignored until Parts 4 and 5.

The rest of the files and directories can be ignored for now – they will be covered in later posts in this series.

Architecture

REST AIP implemented in Express.js

The new REST API (implemented in routes/pop.js) uses the javascripts/db.js database layer implemented in Part 2 to access the MongoDB database via the MongoDB Node.js Driver. As we don’t yet have either the Angular or React clients, we will user the curl command-line tool to manually test the REST API.

Code highlights

config.js

The config module can be imported by other parts of the application so that your preferences can be taken into account.

expressPort is used by bin/www to decide what port the web server should listen on; change this if that port is already in use.

client contains defaults to be used by the client (Angular or React). It’s important to create your own schema at Mockaroo.com and replace client.mockarooUrl with your custom URL (the one included here will fail if used too often).

bin/www

This is mostly boiler-plate code to start Express with your application. This code ensures that it is our application, app.js, that is run by the Express server:

This code uses the expressPort from config.js as the port for the server to listen on; it will be overruled if the user sets the PORT environment variable:

app.js

This file defines the app module ; much of the contents are boilerplate (and covered by comments in the code) but we look here at a few of the lines that are particular to this application.

Make this an Express application:

Define where the views (templates used by the Jade view engine to generate the HTML code) and static files (files that must be accessible by a remote client) are located:

Create the /pop route and associate it with the file containing its code (routes/pop.js):

routes/pop.js

This file implements each of the operations provided by the Mongopop REST API. Because of the the /pop route defined in app.js Express will direct any URL of the form http://<mongopop-server>:3000/pop/X here. Within this file a route handler is created in order direct incoming requests to http://<mongopop-server>:3000/pop/X to the appropriate function:

As the /pop route is only intended for the REST API, end users shouldn’t be browsing here but we create a top level handler for the GET method in case they do:

Results of browsing to the top-route for the Mongopop MongoDB application

This is the first time that we see how to send a response to a request; res.json(testObject); converts testObject into a JSON document and sends it back to the requesting client as part of the response message.

The simplest useful route path is for the GET method on /pop/ip which sends a response containing the IP address of the back-end server. This is useful to the Mongopop client as it means the user can see it and add it to the MongoDB Atlas whitelist. The code to determine and store publicIP is left out here but can be found in the full source file for pop.js.

Fetching the IP address for the MongoDB Mongopop back-end using REST API

We’ve seen that it’s possible to test GET methods from a browser’s address bar; that isn’t possible for POST methods and so it’s useful to be able to test using the curl command-line command:

The GET method for /pop/config is just as simple – responding with the client-specific configuration data:

The results of the request are still very simple but the output from curl is already starting to get messy; piping it through python -mjson.tool makes it easier to read:

The simplest operation that actually accesses the database is the POST method for the /pop/countDocs route path:

database is an instance of the object prototype defined in javascripts/db (see The Modern Application Stack – Part 2: Using MongoDB With Node.js) and so all this method needs to do is use that object to:

  • Connect to the database (using the address of the MongoDB server provided in the request body). The results from the promise returned by database.connect is passed to the function(s) in the first .then clause. Refer back to Part 2: Using MongoDB With Node.js if you need a recap on using promises.
  • The function in the .then clause handles the case where the database.connect promise is resolved (success). This function requests a count of the documents – the database connection information is now stored within the database object and so only the collection name needs to be passed. The promise returned by database.countDocuments is passed to the next .then clause. Note that there is no second (error) function provided, and so if the promise from database.connect is rejected, then that failure passes through to the next .then clause in the chain.
  • The second .then clause has two functions:
    • The first is invoked if and when the promise is resolved (success) and it returns a success response (which is automatically converted into a resolved promise that it passed to the final .then clause in the chain). count is the value returned when the promise from the call to database.countDocuments was resolved.
    • The second function handles the failure case (could be from either database.connect or database.countDocuments) by returning an error response object (which is converted to a resolved promise).
  • The final .then clause closes the database connection and then sends the HTTP response back to the client; the response is built by converting the resultObject (which could represent success or failure) to a JSON string.

Once more, curl can be used from the command-line to test this operation; as this is a POST request, the --data option is used to pass the JSON document to be included in the request:

curl can also be used to test the error paths. Cause the database connection to fail by using the wrong port number in the MongoDB URI:

Cause the count to fail by using the name of a non-existent collection:

The POST method for the pop/sampleDocs route path works in a very similar way:

Testing this new operation:

The POST method for pop/updateDocs is a little more complex as the caller can request multiple update operations be performed. The simplest way to process multiple asynchronous, promise-returning function calls in parallel is to build an array of the tasks and pass it to the Promise.all method which returns a promise that either resolves after all of the tasks have succeeded or is rejected if any of the tasks fail:

Testing with curl:

The final method uses example data from a service such as Mockaroo to populate a MongoDB collection. A helper function is created that makes the call to that external service:

That function is then used in the POST method for /pop/addDocs:

This method is longer than the previous ones – mostly because there are two paths:

  • In the first path, the client has requested that a fresh set of 1,000 example documents be used for each pass at adding a batch of documents. This path is much slower and will eat through your Mockaroo quota much faster.
  • In the second path, just one batch of 1,000 example documents is fetched from Mockaroo and then those same documents are repeatedly added. This path is faster but it results in duplicate documents (apart from a MongoDB-created _id field). This path cannot be used if the _id is part of the example documents generated by Mockaroo.

So far, we’ve used the Chrome browser and the curl command-line tool to test the REST API. A third approach is to use the Postman Chrome app:

Testing MongoDB Mongopop REST API with Postman Chrome app

Debugging Tips

One way to debug a Node.js application is to liberally sprinkle console.log messages throughout your code but that takes extra effort and bloats your code base. Every time you want to understand something new, you must add extra logging to your code and then restart your application.

Developers working with browser-side JavaScript benefit from the excellent tools built into modern browsers – for example, Google’s Chrome Developer Tools which let you:

  • Browse code (e.g. HTML and JavaScript)
  • Add breakpoints
  • View & alter contents of variables
  • View and modify css styles
  • View network messages
  • Access the console (view output and issue commands)
  • Check security details
  • Audit memory use, CPU, etc.

You open the Chrome DevTools within the Chrome browser using “View/Developer/Developer Tools”.

Fortunately, you can use the node-debug command of node-inspector to get a very similar experience for Node.js back-end applications. To install node-inspector:

node-inspector can be used to debug the Mongopop Express application by starting it with node-debug via the express-debug script in package.json:

To run the Mongopop REST API with node-debug, kill the Express app if it’s already running and then execute:

Note that this automatically adds a breakpoint at the start of the app and so you will need to skip over that to run the application.

Using Chrome Developer Tools with MongoDB Express Node.js application

Depending on your version of Node.js, you may see this error:

If you do, apply this patch to /usr/local/lib/node_modules/node-inspector/lib/InjectorClient.js.

Summary & what’s next in the series

Part 1: Introducing The MEAN Stack provided an overview of the technologies that are used by modern application developers – in particular, the MERN and MEAN stacks. Part 2: Using MongoDB With Node.js set up Node.js and the MongoDB Driver and then used them to build a new Node.js module to provide a simplified interface to the database.

This post built upon the first two of the series by stepping through how to implement a REST API using Express. We also looked at three different ways to test this API and how to debug Node.js applications. This REST API is required by both the Angular (Part 4) and React (Part 5) web app clients, as well as by the alternative UIs explored in Part 6.

The next part of this series implements the Angular client that makes use of the REST API – at the end of that post, you will understand the end-to-end steps required to implement an application using the MEAN stack.

Continue to follow this blog series to step through building the remaining stages of the MongoPop application:





The Modern Application Stack – Part 2: Using MongoDB With Node.js

Introduction

This is the second in a series of blog posts examining the technologies that are driving the development of modern web and mobile applications.

“Modern Application Stack – Part 1: Introducing The MEAN Stack” introduced the technologies making up the MEAN (MongoDB, Express, Angular, Node.js) and MERN (MongoDB, Express, React, Node.js) Stacks, why you might want to use them, and how to combine them to build your web application (or your native mobile or desktop app).

The remainder of the series is focussed on working through the end to end steps of building a real (albeit simple) application – MongoPop.

This post demonstrates how to use MongoDB from Node.js.

MongoDB (recap)

MongoDB provides the persistence for your application data.

MongoDB is an open-source, document database designed with both scalability and developer agility in mind. MongoDB bridges the gap between key-value stores, which are fast and scalable, and relational databases, which have rich functionality. Instead of storing data in rows and columns as one would with a relational database, MongoDB stores JSON documents in collections with dynamic schemas.

MongoDB’s document data model makes it easy for you to store and combine data of any structure, without giving up sophisticated validation rules, flexible data access, and rich indexing functionality. You can dynamically modify the schema without downtime – vital for rapidly evolving applications.

It can be scaled within and across geographically distributed data centers, providing high levels of availability and scalability. As your deployments grow, the database scales easily with no downtime, and without changing your application.

MongoDB Atlas is a database as a service for MongoDB, letting you focus on apps instead of ops. With MongoDB Atlas, you only pay for what you use with a convenient hourly billing model. With the click of a button, you can scale up and down when you need to, with no downtime, full security, and high performance.

Our application will access MongoDB via the JavaScript/Node.js driver which we install as a Node.js module.

Node.js (recap)

Node.js is a JavaScript runtime environment that runs your back-end application (via Express).

Node.js is based on Google’s V8 JavaScript engine which is used in the Chrome browser. It also includes a number of modules that provides features essential for implementing web applications – including networking protocols such as HTTP. Third party modules, including the MongoDB driver, can be installed, using the npm tool.

Node.js is an asynchronous, event-driven engine where the application makes a request and then continues working on other useful tasks rather than stalling while it waits for a response. On completion of the requested task, the application is informed of the results via a callback (or a promise or Observable. This enables large numbers of operations to be performed in parallel – essential when scaling applications. MongoDB was also designed to be used asynchronously and so it works well with Node.js applications.

The application – Mongopop

MongoPop is a web application that can be used to help you test out and exercise MongoDB. After supplying it with the database connection information (e.g., as displayed in the MongoDB Atlas GUI), MongoPop provides these features:

  • Accept your username and password and create the full MongoDB connect string – using it to connect to your database
  • Populate your chosen MongoDB collection with bulk data (created with the help of the Mockeroo service)
  • Count the number of documents
  • Read sample documents
  • Apply bulk changes to selected documents

Mongopop Demo

Downloading, running, and using the Mongopop application

Rather than installing and running MongoDB ourselves, it’s simpler to spin one up in MongoDB Atlas:

Create MongoDB Atlas Cluster

To get the application code, either download and extract the zip file or use git to clone the Mongopop repo:

git clone git@github.com:am-MongoDB/MongoDB-Mongopop.git
cd MongoDB-Mongopop

If you don’t have Node.js installed then that needs to be done before building the application; it can be downloaded from nodejs.org .

A file called package.json is used to control npm (the package manager for Node.js); here is the final version for the application:

The scripts section defines a set of shortcuts that can be executed using npm run <script-name>. For example npm run debug runs the Typescript transpiler (tsc) and then the Express framework in debug mode. start is a special case and can be executed with npm start.

Before running any of the software, the Node.js dependencies must be installed (into the node_modules directory):

npm install

Note the list of dependencies in package.json – these are the Node.js packages that will be installed by npm install. After those modules have been installed, npm will invoke the postinstall script (that will be covered in Part 4 of this series). If you later realise that an extra package is needed then you can install it and add it to the dependency list with a single command. For example, if the MongoDB Node.js driver hadn’t already been included then it could be added with npm install --save mongodb – this would install the package as well as saving the dependency in package.json.

The application can then be run:

npm start

Once running, browse to http://localhost:3000/ to try out the application. When browsing to that location, you should be rewarded with the IP address of the server where Node.js is running (useful when running the client application remotely) – this IP address must be added to the IP Whitelist in the Security tab of the MongoDB Atlas GUI. Fill in the password for the MongoDB user you created in MongoDB Atlas and you’re ready to go. Note that you should get your own URL, for your own data set using the Mockaroo service – allowing you to customise the format and contents of the sample data (and avoid exceeding the Mockaroo quota limit for the example URL).

What are all of these files?

  • package.json: Instructs the Node.js package manager (npm) what it needs to do; including which dependency packages should be installed
  • node_modues: Directory where npm will install packages
  • node_modues/mongodb: The MongoDB driver for Node.js
  • node_modues/mongodb-core: Low-level MongoDB driver library; available for framework developers (application developers should avoid using it directly)
  • javascripts/db.js: A JavaScript module we’ve created for use by our Node.js apps (in this series, it will be Express) to access MongoDB; this module in turn uses the MongoDB Node.js driver.

The rest of the files and directories can be ignored for now – they will be covered in later posts in this series.

Architecture

Using the JavaScript MongoDB Node.js Driver

The MongoDB Node.js Driver provides a JavaScript API which implements the network protocol required to read and write from a local or remote MongoDB database. If using a replica set (and you should for production) then the driver also decides which MongoDB instance to send each request to. If using a sharded MongoDB cluster then the driver connects to the mongos query router, which in turn picks the correct shard(s) to direct each request to.

We implement a shallow wrapper for the driver (javascripts/db.js) which simplifies the database interface that the application code (coming in the next post) is exposed to.

Code highlights

javascripts/db.js defines an /object prototype/ (think class from other languages) named DB to provide access to MongoDB.

Its only dependency is the MongoDB Node.js driver:

var MongoClient = require('mongodb').MongoClient;

The prototype has a single property – db which stores the database connection; it’s initialised to null in the constructor:

function DB() {
    this.db = null;         // The MongoDB database connection
}

The MongoDB driver is asynchronous (the function returns without waiting for the requested operation to complete); there are two different patterns for handling this:

  1. The application passes a callback function as a parameter; the driver will invoke this callback function when the operation has run to completion (either on success or failure)
  2. If the application does not pass a callback function then the driver function will return a promise

This application uses the promise-based approach. This is the general pattern when using promises:

The methods of the DB object prototype we create are also asynchronous and also return promises (rather than accepting callback functions). This is the general pattern for returning and then subsequently satisfying promises:

db.js represents a thin wrapper on top of the MongoDB driver library and so (with the background on promises under our belt) the code should be intuitive. The basic interaction model from the application should be:

  1. Connect to the database
  2. Perform all of the required database actions for the current request
  3. Disconnect from the database

Here is the method from db.js to open the database connection:

One of the simplest methods that can be called to use this new connection is to count the number of documents in a collection:

Note that the collection method on the database connection doesn’t support promises and so a callback function is provided instead.

And after counting the documents; the application should close the connection with this method:

Note that then also returns a promise (which is, in turn, resolved or rejected). The returned promise could be created in one of 4 ways:

  1. The function explicitly creates and returns a new promise (which will eventually be resolved or rejected).
  2. The function returns another function call which, in turn, returns a promise (which will eventually be resolved or rejected).
  3. The function returns a value – which is automatically turned into a resolved promise.
  4. The function throws an error – which is automatically turned into a rejected promise.

In this way, promises can be chained to perform a sequence of events (where each step waits on the resolution of the promise from the previous one). Using those 3 methods from db.js, it’s now possible to implement a very simple application function:

That function isn’t part of the final application – the actual code will be covered in the next post – but jump ahead and look at routes/pop.js if your curious).

It’s worth looking at the sampleCollection prototype method as it uses a database /cursor/ . This method fetches a “random” selection of documents – useful when you want to understand the typical format of the collection’s documents:

Note that [collection.aggregate](http://mongodb.github.io/ node-mongodb-native/2.0/api/Collection.html#aggregate “MongoDB aggregation from JavaScript Node.js driver”) doesn’t actually access the database – that’s why it’s a synchronous call (no need for a promise or a callback) – instead, it returns a cursor. The cursor is then used to read the data from MongoDB by invoking its toArray method. As toArray reads from the database, it can take some time and so it is an asynchronous call, and a callback function must be provided (toArray doesn’t support promises).

The rest of these database methods can be viewed in db.js but they follow a similar pattern. The Node.js MongoDB Driver API documentation explains each of the methods and their parameters.

Summary & what’s next in the series

This post built upon the first, introductory, post by stepping through how to install and use Node.js and the MongoDB Node.js driver. This is our first step in building a modern, reactive application using the MEAN and MERN stacks.

The blog went on to describe the implementation of a thin layer that’s been created to sit between the application code and the MongoDB driver. The layer is there to provide a simpler, more limited API to make application development easier. In other applications, the layer could add extra value such as making semantic data checks.

The next part of this series adds the Express framework and uses it to implement a REST API to allow clients to send requests of the MongoDB database. That REST API will subsequently be used by the client application (using Angular in Part 4 or React in Part 5).

Continue following this blog series to step through building the remaining stages of the MongoPop application:





The Modern Application Stack – Part 1: Introducing The MEAN Stack

Introducing the MEAN and MERN stacks

This is the first in a series of blog posts examining the technologies that are driving the development of modern web and mobile applications, notably the MERN and MEAN stacks. The series will go on to step through tutorials to build all layers of an application.

Users increasingly demand a far richer experience from web sites – expecting the same level of performance and interactivity they get with native desktop and mobile apps. At the same time, there’s pressure on developers to deliver new applications faster and continually roll-out enhancements, while ensuring that the application is highly available and can be scaled appropriately when needed. Fortunately, there’s a (sometimes bewildering) set of enabling technologies that make all of this possible.

If there’s one thing that ties these technologies together, it’s JavaScript and its successors (ES6, TypeScript, JSX, etc.) together with the JSON data format. The days when the role of JavaScript was limited to adding visual effects like flashing headers or pop-up windows are past. Developers now use JavaScript to implement the front-end experience as well as the application logic and even to access the database. There are two dominant JavaScript web app stacks – MEAN (MongoDB, Express, Angular, Node.js) and MERN (MongoDB, Express, React, Node.js) and so we’ll use those as paths to guide us through the ever expanding array of tools and frameworks.

This first post serves as a primer for many of these technologies. Subsequents posts in the series take a deep dive into specific topics – working through the end-to-end development of Mongopop – an application to populate a MongoDB database with realistic data and then perform other operations on that data.

The MEAN Stack

We’ll start with MEAN as it’s the more established stack but most of what’s covered here is applicable to MERN (swap Angular with React).

MEAN is a set of Open Source components that together, provide an end-to-end framework for building dynamic web applications; starting from top (code running in the browser) to the bottom (database). The stack is made up of:

  • Angular (formerly Angular.js, now also known as Angular 2): Front-end web app framework; runs your JavaScript code in the users browser, allowing your application UI to be dynamic
  • Express (sometimes referred to as Express.js): Back-end web application framework running on top of Node.js
  • Node.js : JavaScript runtime environment – lets you implement your application back-end in JavaScript
  • MongoDB : Document database – used by your back-end application to store its data as JSON (JavaScript Object Notation) documents

A common theme in the MEAN stack is JavaScript – every line of code you write can be in the same language. You even access the database using MongoDB’s native, Idiomatic JavaScript/Node.js driver. What do we mean by idiomatic? Using the driver feels natural to a JavaScript developer as all interaction is performed using familiar concepts such as JavaScript objects and asynchronous execution using either callback functions or promises (explained later). Here’s an example of inserting an array of 3 JavaScript objects:

myCollection.insertMany([
    {name: {first: "Andrew", last: "Morgan"},
    {name: {first: "Elvis"}, died: 1977},
    {name: {last: "Mainwaring", title: "Captain"}, born: 1885}
])
.then(
    function(results) {
        resolve(results.insertedCount);
    },
    function(err) {
        console.log("Failed to insert Docs: " + err.message);
        reject(err);
    }
)

Angular 2

Angular, originally created and maintained by Google, runs your JavaScript code within the user’s web browsers to implement a reactive user interface (UI). A reactive UI gives the user immediate feedback as they give their input (in contrast to static web forms where you enter all of your data, hit “Submit” and wait.

Reactive web application

Version 1 of Angular was called AngularJS but it was shortened to Angular in Angular 2 after it was completely rewritten in Typescript (a superset of JavaScript) – Typescript is now also the recommended language for Angular apps to use.

You implement your application front-end as a set of components – each of which consists of your JavaScript (TypeScript) code and an HTML template that includes hooks to execute and use the results from your TypeScript functions. Complex application front-ends can be crafted from many simple (optionally nested) components.

Angular application code can also be executed on the back-end server rather than in a browser, or as a native desktop or mobile application.

MEAN Stack architecture

Express

Express is the web application framework that runs your back-end application (JavaScript) code. Express runs as a module within the Node.js environment.

Express can handle the routing of requests to the right parts of your application (or to different apps running in the same environment).

You can run the app’s full business logic within Express and even generate the final HTML to be rendered by the user’s browser. At the other extreme, Express can be used to simply provide a REST API – giving the front-end app access to the resources it needs e.g., the database.

In this blog series, we will use Express to perform two functions:

  • Send the front-end application code to the remote browser when the user browses to our app
  • Provide a REST API that the front-end can access using HTTP network calls, in order to access the database

Node.js

Node.js is a JavaScript runtime environment that runs your back-end application (via Express).

Node.js is based on Google’s V8 JavaScript engine which is used in the Chrome browsers. It also includes a number of modules that provides features essential for implementing web applications – including networking protocols such as HTTP. Third party modules, including the MongoDB driver, can be installed, using the npm tool.

Node.js is an asynchronous, event-driven engine where the application makes a request and then continues working on other useful tasks rather than stalling while it waits for a response. On completion of the requested task, the application is informed of the results via a callback. This enables large numbers of operations to be performed in parallel which is essential when scaling applications. MongoDB was also designed to be used asynchronously and so it works well with Node.js applications.

MongoDB

MongoDB is an open-source, document database that provides persistence for your application data and is designed with both scalability and developer agility in mind. MongoDB bridges the gap between key-value stores, which are fast and scalable, and relational databases, which have rich functionality. Instead of storing data in rows and columns as one would with a relational database, MongoDB stores JSON documents in collections with dynamic schemas.

MongoDB’s document data model makes it easy for you to store and combine data of any structure, without giving up sophisticated validation rules, flexible data access, and rich indexing functionality. You can dynamically modify the schema without downtime – vital for rapidly evolving applications.

It can be scaled within and across geographically distributed data centers, providing high levels of availability and scalability. As your deployments grow, the database scales easily with no downtime, and without changing your application.

MongoDB Atlas is a database as a service for MongoDB, letting you focus on apps instead of ops. With MongoDB Atlas, you only pay for what you use with a convenient hourly billing model. With the click of a button, you can scale up and down when you need to, with no downtime, full security, and high performance.

Our application will access MongoDB via the JavaScript/Node.js driver which we install as a Node.js module.

What’s Done Where?

tl;dr – it’s flexible.

There is clear overlap between the features available in the technologies making up the MEAN stack and it’s important to decide “who does what”.

Perhaps the biggest decision is where the application’s “hard work” will be performed. Both Express and Angular include features to route to pages, run application code, etc. and either can be used to implement the business logic for sophisticated applications. The more traditional approach would be to do it in the back-end in Express. This has several advantages:

  • Likely to be closer to the database and other resources and so can minimise latency if lots of database calls are made
  • Sensitive data can be kept within this more secure environment
  • Application code is hidden from the user, protecting your intellectual property
  • Powerful servers can be used – increasing performance

However, there’s a growing trend to push more of the functionality to Angular running in the user’s browser. Reasons for this can include:

  • Use the processing power of your users’ machines; reducing the need for expensive resources to power your back-end. This provides a more scalable architecture, where every new user brings their own computing resources with them.
  • Better response times (assuming that there aren’t too many trips to the back-end to access the database or other resources)
  • Progressive Applications. Continue to provide (probably degraded) service when the client application cannot contact the back-end (e.g. when the user has no internet connection). Modern browsers allow the application to store data locally and then sync with the back-end when connectivity is restored.

Perhaps, a more surprising option for running part of the application logic is within the database. MongoDB has a sophisticated aggregation framework which can perform a lot of analytics – often more efficiently than in Express or Angular as all of the required data is local.

Another decision is where to validate any data that the user supplies. Ideally this would be as close to the user as possible – using Angular to check that a provided password meets security rules allows for instantaneous feedback to the user. That doesn’t mean that there isn’t value in validating data in the back-end as well, and using MongoDB’s document validation functionality can guard against buggy software writing erroneous data.

ReactJS – Rise of the MERN Stack

MERN Stack architecture with React

An alternative to Angular is React (sometimes referred to as ReactJS), a JavaScript library developed by Facebook to build interactive/reactive user interfaces. Like Angular, React breaks the front-end application down into components. Each component can hold its own state and a parent can pass its state down to its child components and those components can pass changes back to the parent through the use of callback functions.

React components are typically implemented using JSX – an extension of JavaScript that allows HTML syntax to be embedded within the code:

class HelloMessage extends React.Component {
  render() {
    return <div>Hello {this.props.name}</div>;
  }
}

React is most commonly executed within the browser but it can also be run on the back-end server within Node.js, or as a mobile app using React Native.

So should you use Angular 2 or React for your new web application? A quick google search will find you some fairly deep comparisons of the two technologies but in summary, Angular 2 is a little more powerful while React is easier for developers to get up to speed with and use. This blog series will build a near-identical web app using first the MEAN and then the MERN stack – hopefully these posts will help you find a favorite.

The following snapshot from Google Trends suggests that Angular has been much more common for a number of years but that React is gaining ground:

Comparing React/ReactJS popularity vs. Angular and Angular 2

Why are these stacks important?

Having a standard application stack makes it much easier and faster to bring in new developers and get them up to speed as there’s a good chance that they’ve used the technology elsewhere. For those new to these technologies, there exist some great resources to get you up and running.

From MongoDB upwards, these technologies share a common aim – look after the critical but repetitive stuff in order to free up developers to work where they can really add value: building your killer app in record time.

These are the technologies that are revolutionising the web, building web-based services that look, feel, and perform just as well as native desktop or mobile applications.

The separation of layers, and especially the REST APIs, has led to the breaking down of application silos. Rather than an application being an isolated entity, it can now interact with multiple services through public APIs:

  1. Register and log into the application using my Twitter account
  2. Identify where I want to have dinner using Google Maps and Foursquare
  3. Order an Uber to get me there
  4. Have Hue turn my lights off and Nest turn my heating down
  5. Check in on Facebook

Variety & Constant Evolution

Even when constraining yourself to the JavaScript ecosystem, the ever-expanding array of frameworks, libraries, tools, and languages is both impressive and intimidating at the same time. The great thing is that if you’re looking for some middleware to perform a particular role, then the chances are good that someone has already built it – the hardest part is often figuring out which of the 5 competing technologies is the best fit for you.

To further complicate matters, it’s rare for the introduction of one technology not to drag in others for you to get up to speed on: Node.js brings in npm; Angular 2 brings in Typescript, which brings in tsc; React brings in ES6, which brings in Babel; ….

And of course, none of these technologies are standing still and new versions can require a lot of up-skilling to use – Angular 2 even moved to a different programming language!

The Evolution of JavaScript

The JavaScript language itself hasn’t been immune to change.

Ecma International was formed to standardise the language specification for JavaScript (and similar language forks) to increase portability – the ideal being that any “JavaScript” code can run in any browser or other JavaScript runtime environment.

The most recent, widely supported version is ECMAScript 6 – normally referred to as ES6. ES6 is supported by recent versions of Chrome, Opera, Safari, and Node.js). Some platforms (e.g. Firefox and Microsoft Edge) do not yet support all features of ES6. These are some of the key features added in ES6:

  • Classes & modules
  • Promises – a more convenient way to handle completion or failure of synchronous function calls (compared to callbacks)
  • Arrow functions – a concise syntax for writing function expressions
  • Generators – functions that can yield to allow others to execute
  • Iterators
  • Typed arrays

Typescript is a superset of ES6 (JavaScript); adding static type checking. Angular 2 is written in Typescript and Typescript is the primary language to be used when writing code to run in Angular 2.

Because ES6 and Typescript are not supported in all environments, it is common to transpile the code into an earlier version of JavaScript to make it more portable. In this series’ Angular post, tsc is used to transpile Typescript into JavaScript while the React post uses Babel (via react-script) to transpile our ES6 code.

And of course, JavaScript is augmented by numerous libraries. The Angular 2 post in this series uses Observables from the RxJS reactive libraries which greatly simplify making asynchronous calls to the back-end (a pattern historically referred to as AJAX).

Summary & What’s Next in the Series

This post has introduced some of the technologies used to build modern, reactive, web applications – most notably the MEAN and MERN stacks. If you want to learn exactly how to use these then please continue to follow this blog series which steps through building the MongoPop application:

As already covered in this post, the MERN and MEAN stacks are evolving rapidly and new JavaScript frameworks are being added all of the time. Inevitably, some of the details in this series will become dated but the concepts covered will remain relevant.





What’s New in MongoDB 3.4 Webinar Replay

MongoDB 3.4 webinarLast week, Jason Ma and myself presented a webinar on what’s new in MongoDB 3.4 – the replay is now available here.

Overview

In the age of digital transformation and disruption, your ability to thrive depends on how you adapt to the constantly changing environment. MongoDB 3.4 is the latest release of the leading database for modern applications, a culmination of native database features and enhancements that will allow you to easily evolve your solutions to address emerging challenges and use cases.

In this webinar, we introduce you to what’s new, including:

  • Multimodel Done Right. Native graph computation, faceted navigation, rich real-time analytics, and powerful connectors for BI and Apache Spark bring additional multimodel database support right into MongoDB.
  • Mission-Critical Applications. Geo-distributed MongoDB zones, elastic clustering, tunable consistency, and enhanced security controls bring state-of-the-art database technology to your most mission-critical applications.
  • Modernized Tooling. Enhanced DBA and DevOps tooling for schema management, fine-grained monitoring, and cloud-native integration allow engineering teams to ship applications faster, with less overhead and higher quality.

Slides





Adding Document Validation Rules Using MongoDB Compass 1.5

Adding Document Validation Rules Using MongoDB Compass 1.5

This post looks at a new feature in MongoDB Compass 1.5 (in beta at the time of writing) which allows document validation rules to be added from the GUI rather from the mongo shell command line. This makes it easy to create and modify rules that ensure that all documents written to a collection contain the data you expect to be there.

Introduction

One of MongoDB’s primary attractions for developers is that it gives them the ability to start application development without first needing to define a formal schema. Operations teams appreciate the fact that they don’t need to perform a time-consuming schema upgrade operation every time the developers need to store a different attribute. For business leaders, the application gets launched much faster, and new features can be rolled out more frequently. MongoDB powers agility.

Many projects reach a point where it’s necessary to enforce rules on what’s being stored in the database – for example, that for any document in a particular collection, you can be certain that specific attributes are present in every document. Reasons for this include:

  • Different development teams can work with the same data, each needing to know what they can expect to find in a particular collection.
  • Development teams working on different applications can be spread over multiple sites, which means that a clear agreement on the format of shared data is important.
  • Development teams from different companies may be working with the same collections; misunderstandings about what data should be present can lead to issues.

As an example, an e-commerce website may centralize product catalog feeds from multiple vendors into a single collection. If one of the vendors alters the format of its product catalog, global catalog searches could fail.

To date, this resulted in developers building their own validation logic – either within the application code (possibly multiple times for different applications) or by adding middleware such as Mongoose.

To address the challenges discussed above, while at the same time maintaining the benefits of a dynamic schema, MongoDB 3.2 introduced document validations. Adding and viewing validation rules required understanding the correct commands to run from the mongo shell’s command line.

MongoDB Compass 1.5 allows users to view, add, and modify document validation rules through its GUI, making them more accessible to both developers and DBAs.

Validating Documents in MongoDB

Document Validation provides significant flexibility to customize which parts of the documents are and are not validated for any collection. For any attribute it might be appropriate to check:

  • That the attribute exists
  • If an attribute does exist, that it is of the correct type
  • That the value is in a particular format (e.g., regular expressions can be used to check if the contents of the string matches a particular pattern)
  • That the value falls within a given range

Further, it may be necessary to combine these checks – for example that the document contains the user’s name and either their email address or phone number, and if the email address does exist, then it must be correctly formed.

Adding the validation checks from the command line is intuitive to developers or DBAs familiar with the MongoDB query language as it uses the same expression syntax as a find query to search the database. For others, it can be a little intimidating.

As an example, the following snippet adds validations to the contacts collection that validates:

  • The year of birth is no later than 1994
  • The document contains a phone number and/or an email address
  • When present, the phone number and email address are strings
db.runCommand({
   collMod: "contacts",
   validator: { 
      $and: [
        {yearOfBirth: {$lte: 1994}},
        {$or: [ 
                  {"contact.phone": { $type: "string"}}, 
                  {"email": { $type: "string"}}
              ]}]
    }})

Note that types can be specified using either a number or a string alias.

Wouldn’t it be nice to be able to define these rules through a GUI rather than from the command line?

Using MongoDB Compass to Add Document Validation Rules

If you don’t already have MongoDB Compass 1.5 (or later) installed, download it and start the application. You’ll then be asked to provide details on how to connect to your database.

MongoDB Compass is free for evaluation and for use in development, for production, a MongoDB Professional of MongoDB Enterprise Advanced subscription is required.

If you don’t have a database to test this on, the simplest option is to create a new MongoDB Atlas cluster. Details on launching a MongoDB Atlas cluster can be found in this post.

Note that MongoDB Compass currently only accepts a single server address rather than the list of replica set members in the standard Atlas connect string and so it’s necessary to explicitly provide Compass with the address of the current primary – find that by clicking on the cluster in the Atlas GUI (Figure 1).

Identify the replica set primary

Figure 1: Identify the replica set primary

Connect MongoDB Compass to MongoDB Atlas

Figure 2: Connect MongoDB Compass to MongoDB Atlas

The connection panel can then be populated as shown in Figure 2.

Load Data and Check in MongoDB Compass

If you don’t already have a populated MongoDB collection, create one now. For example, use curl to download a pre-prepared JSON file containing contact data and use mongoimport to load it into your database:

curl -o contacts.json http://clusterdb.com/upload/contacts.json
mongoimport -h cluster0-shard-00-00-qfovx.mongodb.net -d clusterdb -c contacts --ssl -u billy -p SECRET --authenticationDatabase admin contacts.json

Connect MongoDB Compass to your database (Figure 3).

Connect MongoDB Compass to database

Figure 3: Connect MongoDB Compass to database

Select the contacts data and browse the schema (Figure 4).

Check schema in MongoDB Compass

Figure 4: Check schema in MongoDB Compass

Browse some documents (Figure 5).

Browse documents using MongoDB Compass

Figure 5: Browse documents using MongoDB Compass

Add Document Validation Rules

In this section, we build the document validation rule shown earlier.

Navigate to the Validation tab in MongoDB Compass GUI and select the desired validation action and validation level. The effects of these settings are shown in Figure 6. Any warnings generated by the rules are written to the MongoDB log.

MongoDB document validation configuration parameters

Figure 6: MongoDB document validation configuration parameters

When adding document validation rules to an existing collection, you may want to start with fairly permissive rules so that existing applications aren’t broken before you have chance to clean things up. Once you’re confident that all applications are following the rules you could then become stricter. Figure 7 shows a possible life cycle for a collection.

Life cycle of a MongoDB collection

Figure 7: Life cycle of a MongoDB collection

This post is starting with a new collection and so you can go straight to error/strict as shown in Figure 8.

Set document validation to error/strict

Figure 8: Set document validation to error/strict

Multiple rules for the document can then be added using the GUI (Figure 9). Note that the rule for the email address uses a regular expression (^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$) to test that the address is properly formatted – going further than the original rule.

Add new document validation rule through MongoDB Compass

Figure 9: Add new document validation rule through MongoDB Compass

Clicking UPDATE applies the change and then you can review it by pressing the JSON button (Figure 10).

JSON view of new document validation rule

Figure 10: JSON view of new document validation rule

At this point, a problem appears. Compass has combined the 3 sub-rules with an and relationship but our intent was to test that the document contained either an email address or a phone number and that yearOfBirth was no later than 1994. Fortunately, for these more complex checks, the JSON can be altered directly within Compass:

{
  "$and": [
    {"yearOfBirth": {"$lte": 1994}}, 
    {
      "$or": [
        {"contact.email": {
          "$regex": "^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$",
          "$options": ""
          }
        },
        {
          "$and": [
            {"contact.phone": {"$type": 2}},
            {"contact.email": {"$exists": false}}
          ]
        }
      ]
    }
  ]
}

Paste the refined rule into Compass and press UPDATE (Figure 11).

Manually edit validation rules in MongoDB Compass

Figure 11: Manually edit validation rules in MongoDB Compass

Recall that this rule checks that the yearOfBirth is no later than 1994 and that there is a phone number (formatted as a string)or a properly formatted email address.

Test The Rules

However you write to the database, the document validation rules are applied in the same way – through any of the drivers, or through the mongo shell. For this post we can test the rules directly from the MongoDB Compass GUI, from the DOCUMENTS tab. Select a document and try changing the yearOfBirth to a year later than 1994 as shown in Figure 12.

hange fails document validation

Figure 12: Change fails document validation

Find the Offending Documents Already in the Collection

There are a number of ways to track down existing documents that don’t meet the new rules. A very simple option is to query the database using the negation of the rule definition by wrapping the validation document in a $nor clause:

{"$nor": [
  {
    "$and": [
      {"yearOfBirth": {"$lte": 1994}}, 
      {
        "$or": [
          {"contact.email": {
            "$regex": "^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$",
            "$options": ""
            }
          },
          {"contact.phone": {"$type": 2}}
        ]
      }
    ]
  }]
}

The query document can be pasted into the query bar in Compass and then pressing APPLY reveals that there are 206 documents with yearOfBirth greater than 1994 – Figure 13.

Find documents not matching the validation rules

Figure 13: Find documents not matching the validation rules

Cleaning up Offending Documents

Potentially more problematic is how to clean up the existing documents which do not match the validation rules, as you need to decide what should happen to them. The good news is that the same $nor document used above can be used as a filter when executing your chosen action.

For example, if you decided that the offending documents should not be in the collection at all then this command can be run from the mongo shell command line to delete them:

use clusterdb
db.contacts.remove(
{"$nor": [
  {
    "$and": [
      {"yearOfBirth": {"$lte": 1994}}, 
      {
        "$or": [
          {"contact.email": {
            "$regex": "^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$",
            "$options": ""
            }
          },
          {"contact.phone": {"$type": 2}}
        ]
      }
    ]
  }]
})

Another Example – Coping with Multiple Schema Versions

A tricky problem to solve with RDBMSs is the versioning of data models; with MongoDB it’s very straight-forward to set up validations that can cope with different versions of documents, with each version having a different set of checks applied. In the example validation checks below, the following logic is applied:

  • If the document is unversioned (possibly dating to the time before validations were added), then no checks are applied
  • For version 1, the document is checked to make sure that the name key exists
  • For version 2 documents, the type of the name key is also validated to ensure that it is a string
{"$or": [
  {version: {"$exists": false}},
  {"$and": [
    {version: 1},
    {Name: {"$exists": true}}
  ]},
  {"$and": [
    {version: 2},
    {Name: {"$exists": true, "$type": "string"}}
  ]}
]} 

In this way, multiple versions of documents can exist within the same collection, and the application can lazily up-version them over time. Note that the version attribute is user-defined.

Where MongoDB Document Validation Excels (vs. RDBMSs)

In MongoDB, document validation is simple to set up – especially now that it can be done through the MongoDB Compass GUI. You can avoid the maintenance headache of stored procedures – which for many types of validation would be required in an RDBMS – and because the familiar MongoDB query language is used, there is no new syntax to learn.

The functionality is very flexible and it can enforce constraints on as little or as much of the schema as required. You get the best of both worlds – a dynamic schema for rapidly changing, polymorphic data, with the option to enforce strict validation checks against specific attributes from the onset of your project, or much later on. You can also use the Compass GUI to find and modify individual, pre-existing documents that don’t follow any new rules. If you initially have no validations defined, they can still be added later – even once in production, across thousand of servers.

It is always a concern whether adding extra checks will impact the performance of the system; in our tests, document validation adds a negligible overhead.

So, is all Data Validation Now Done in the Database?

The answer is “probably not” – either because there’s a limit to what can be done in the database or because there will always be a more appropriate place for some checks. Here are some areas to consider:

  • For a good user experience, checks should be made as early and as high up the stack as is sensible. For example, the format of an entered email address should be first checked in the browser rather than waiting for the request to be processed and an attempt made to write it to the database.
  • Any validations which need to compare values between keys, other documents, or external information cannot currently be implemented within the database.
  • Many checks are best made within the application’s business logic – for example “is this user allowed to use these services in their home country”; the checks in the database are primarily there to protect against coding errors.
  • If you need information on why the document failed validation, the developer or application will need to check the document against each of the sub-rules within the collection’s validation rule in turn as the error message doesn’t currently give this level of detail.

Summary

In this post, we’ve taken a look at the powerful document validation functionality that was added back in MongoDB 3.2. We then explored how MongoDB Compass 1.5 adds the convenience of being able to define these rules through an intuitive GUI.

This is just one of the recent enhancements to MongoDB Compass; others include:

A summary of the enhancements added in MongoDB Compass 1.4 & 1.5 can be found in MongoDB 3.4 – What’s New.





Powering Microservices with MongoDB, Docker, Kubernetes & Kafka

Andrew Morgan presenting on Microservices at MongoDB Europe 2016The slides and recording from my session at MongoDB Europe 2016, are now available. The presentation covers Microservices and some of the key technologies that enable them.

Session Summary

Organisations are building their applications around microservice architectures because of the flexibility, speed of delivery, and maintainability they deliver.

Want to try out MongoDB on your laptop? Execute a single command and you have a lightweight, self-contained sandbox; another command removes all trace when you’re done. Need an identical copy of your application stack in multiple environments? Build your own container image and then your entire development, test, operations, and support teams can launch an identical clone environment.

Containers are revolutionising the entire software lifecycle: from the earliest technical experiments and proofs of concept through development, test, deployment, and support. Orchestration tools manage how multiple containers are created, upgraded and made highly available. Orchestration also controls how containers are connected to build sophisticated applications from multiple, microservice containers.

This session introduces you to technologies such as Docker, Kubernetes & Kafka which are driving the microservices revolution. Learn about containers and orchestration – and most importantly how to exploit them for stateful services such as MongoDB.

Recording

Slides





Language-Specific Views in MongoDB 3.4

Language-Specific Views in MongoDB 3.4

Introduction

This post shows you how to create multiple language-specific views on top of a common collection. Each view is optimized for its language with a collated index which only presents entries for documents in that language. Additionally, each view excludes some fields from the underlying collection – further limiting the data that can be seen through the that view. Finally, user-defined roles are created to restrict users to just the view(s) they should be able to see, ensuring they can only access the data that they’re entitled to.

In the course of setting up this environment, a number of features are demonstrated:

  • Read-Only Views (New in MongoDB 3.4): DBAs can define non-materialized views that expose only a subset of data from an underlying collection, i.e. a view that filters out entire documents or specific fields, such as Personally Identifiable Information (PII) from sales data or health records. As a result, risks of data exposure are dramatically reduced. DBAs can define a view of a collection that’s generated from an aggregation over another collection(s) or view.
  • Multiple Language Collations (New in MongoDB 3.4): Applications addressing global audiences require handling content that spans many languages. Each language has different rules governing the comparison and sorting of data. MongoDB collations allow users to build applications that adhere to these language-specific comparison rules for over 100 different languages and locales. Developers can specify collations for collections, indexes, views, or for individual operations.
  • Partial Indexes: Partial indexes balance delivering good query performance while consuming fewer system resources. For example, consider an order processing application. The order collection is frequently queried by the application to display all incomplete orders for a particular user. Building an index on the userID field of the collection is necessary for good performance. However, only a small percentage of orders are in progress at a given time. Limiting the index on userID to contain only orders that are in the “active” state could reduce the number of index entries from millions to thousands, saving working set memory and disk space, while speeding up queries even further as smaller indexes result in faster searches.
  • User Defined Roles: User-defined roles, enable administrators to assign finely-grained privileges to users and applications, based on the specific functionality they require. MongoDB provides the ability to specify user privileges at both the database, collection, and view levels.
  • MongoDB Compass: MongoDB Compass is the easiest way for DBAs to explore and manage MongoDB data. As the GUI for MongoDB, Compass enables users to visually explore their data, and run ad-hoc queries in seconds – all with zero knowledge of MongoDB’s query language. The latest Compass release expands functionality to allow users to manipulate documents directly from the GUI, optimize performance, and create data governance controls.

The Data Set

The example used in this post is built on a collection containing documents for customers from multiple countries – one of the fields indicates a customer’s country, but there is no field that identifies their spoken language. To fix that, we infer their language from their country to create a new language field in each document:

db.customers.updateMany(
    {country: "China"},
    {$set: {language: "Chinese"}})

db.customers.updateMany(
    {country: "Germany"},
    {$set: {language: "German"}})

db.customers.updateMany(
    {country: {$in: ["USA", "Canada", "United Kingdom"]}},
    {$set: {language: "English"}})

A typical document now looks like this:

db.customers.findOne()
{
    "_id" : ObjectId("57fb8fbb99b01440193088eb"),
    "first_name" : "Ben",
    "last_name" : "Dixon",
    "country" : "Germany",
    "avatar" : "https://robohash.org/quiseumquam.bmp?size=50x50&set=set1",
    "ip_address" : "10.102.15.35",
    "dependents" : [
        {
            "name" : "Ben",
            "birthday" : "12-Apr-1994"
        },
        {
            "name" : "Lucas",
            "birthday" : "22-Jun-2016"
        },
        {
            "name" : "Erik",
            "birthday" : "05-Jul-2005"
        }
    ],
    "birthday" : "02-Jul-1964",
    "salary" : "£910070.80",
    "skills" : [
        {
            "skill" : "Cvent"
        },
        {
            "skill" : "TKI"
        }
    ],
    "gender" : "Male",
    "language" : "German"
}

You might ask why we need to add this extra field rather than simply calculating the language each time it’s needed? The answer is that multiple countries share the same language and partial indexes don’t allow us to use the $or or $in operators.

At this stage, the only index on the collection is on the _id field:

db.customers.getIndexes()
[
    {
        "v" : 2,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "production.customers"
    }
]

If you want to work through this example for yourself then the following steps will populate a collection called “customers” in a database called production:

curl -o customers.tgz http://clusterdb.com/upload/customers.tgz
tar fxz customers.tgz
mongorestore

There should be 111,000 documents in the collection after running mongorestore:

use production
db.customers.findOne()
{
    "_id" : ObjectId("57fb8fbb99b01440193088eb"),
    "first_name" : "Ben",
    "last_name" : "Dixon",
    "country" : "Germany",
    "avatar" : "https://robohash.org/quiseumquam.bmp?size=50x50&set=set1",
    "ip_address" : "10.102.15.35",
    "dependents" : [
        {
            "name" : "Ben",
            "birthday" : "12-Apr-1994"
        },
        {
            "name" : "Lucas",
            "birthday" : "22-Jun-2016"
        },
        {
            "name" : "Erik",
            "birthday" : "05-Jul-2005"
        }
    ],
    "birthday" : "02-Jul-1964",
    "salary" : "£910070.80",
    "skills" : [
        {
            "skill" : "Cvent"
        },
        {
            "skill" : "TKI"
        }
    ],
    "gender" : "Male",
    "language" : "German"
}

db.customers.count()
111000

Adding Indexes

Collations – allow values to be compared and sorted using rules specific to a local language. In this example, we are supporting 3 languages: English, German, and Chinese. For each of these languages, a collated index will later be used to correctly sort the customers based on their last and first name.

To this end, collation-specific, compound (last_name + first_name) indexes are created:

db.customers.createIndex( 
    {last_name: 1, first_name : 1 }, 
    {name: "chinese_name_index",
     collation: {locale: "zh" },
     partialFilterExpression: { language: "Chinese" } 
    }
);

db.customers.createIndex( 
    {last_name: 1, first_name : 1 }, 
    {name: "english_name_index",
     collation: {locale: "en" },
     partialFilterExpression: { language: "English" } 
    }
);

db.customers.createIndex( 
    {last_name: 1, first_name : 1 }, 
    {name: "german_name_index",
     collation: {locale: "de" },
     partialFilterExpression: { language: "German" } 
    }
);

The exact behaviour of comparisons and sorting using the collated index can be further refined by including additional parameters alongside the locale in the collation documentation. Details of these optional parameters can be found in the collation documentation.

Note that each of those indexes is partial, only containing entries for document where language is set to the matching value. This saves memory and disk space, and speeds up both reads and writes.

This is the final set of indexes:

db.customers.getIndexes()
[
    {
        "v" : 2,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "production.customers"
    },
    {
        "v" : 2,
        "key" : {
            "last_name" : 1,
            "first_name" : 1
        },
        "name" : "german_name_index",
        "ns" : "production.customers",
        "partialFilterExpression" : {
            "language" : "German"
        },
        "collation" : {
            "locale" : "de",
            "caseLevel" : false,
            "caseFirst" : "off",
            "strength" : 3,
            "numericOrdering" : false,
            "alternate" : "non-ignorable",
            "maxVariable" : "punct",
            "normalization" : false,
            "backwards" : false,
            "version" : "57.1"
        }
    },
    {
        "v" : 2,
        "key" : {
            "last_name" : 1,
            "first_name" : 1
        },
        "name" : "chinese_name_index",
        "ns" : "production.customers",
        "partialFilterExpression" : {
            "language" : "Chinese"
        },
        "collation" : {
            "locale" : "zh",
            "caseLevel" : false,
            "caseFirst" : "off",
            "strength" : 3,
            "numericOrdering" : false,
            "alternate" : "non-ignorable",
            "maxVariable" : "punct",
            "normalization" : false,
            "backwards" : false,
            "version" : "57.1"
        }
    },
    {
        "v" : 2,
        "key" : {
            "last_name" : 1,
            "first_name" : 1
        },
        "name" : "english_name_index",
        "ns" : "production.customers",
        "partialFilterExpression" : {
            "language" : "English"
        },
        "collation" : {
            "locale" : "en",
            "caseLevel" : false,
            "caseFirst" : "off",
            "strength" : 3,
            "numericOrdering" : false,
            "alternate" : "non-ignorable",
            "maxVariable" : "punct",
            "normalization" : false,
            "backwards" : false,
            "version" : "57.1"
        }
    }
]

Create Views

A view is created for each language to:

  • Filter out any documents where the language field doesn’t match that of the view
  • Remove the salary, country, and language fields
  • Indicate which collation should be used
db.createView(
    "chineseSpeakersRedacted",
    "customers",
    [
        {$match: {
            language: "Chinese",
            last_name: {$exists: true}
        }},
        {$project: {
            salary: 0, 
            country: 0,
            language: 0
            }
        }
    ],
    {collation: {locale: "zh"}}
)

db.createView(
    "englishSpeakersRedacted",
    "customers",
    [
        {$match: {
            language: "English",
            last_name: {$exists: true}
        }},
        {$project: {
            salary: 0, 
            country: 0,
            language: 0
            }
        }
    ],
    {collation: {locale: "en"}}
)

db.createView(
    "germanSpeakersRedacted",
    "customers",
    [
        {$match: {
            language: "German",
            last_name: {$exists: true}
        }},
        {$project: {
            salary: 0, 
            country: 0,
            language: 0
            }
        }
    ],
    {collation: {locale: "de"}}
)

You might ask why last_name: {$exists: true} is included in the $match stage? The reason is that it encourages the optimizer to use our language-specific partial indexes when using these views.

Note that this is using the MongoDB Aggregation Framework and so you could add lots of other operations, including: unwinding arrays, looking up values from other collections, grouping data, and adding new, computed fields.

The views now appear like collections and can be queried in the same manner (note that they are ready-only):

show collections

chineseSpeakersRedacted
customers
englishSpeakersRedacted
germanSpeakersRedacted
system.views

db.germanSpeakersRedacted.find({last_name: "Cole"}, {first_name:1, _id:0, gender:1}).sort({first_name: 1})
{ "first_name" : "Amelie", "gender" : "Female" }
{ "first_name" : "Amelie", "gender" : "Female" }
{ "first_name" : "Amelie", "gender" : "Female" }
{ "first_name" : "Amelie", "gender" : "Female" }
{ "first_name" : "Amelie", "gender" : "Female" }
{ "first_name" : "Amelie", "gender" : "Female" }
{ "first_name" : "Amelie", "gender" : "Female" }
{ "first_name" : "Amelie", "gender" : "Female" }
{ "first_name" : "Amelie", "gender" : "Female" }
{ "first_name" : "Anna", "gender" : "Female" }
{ "first_name" : "Anna", "gender" : "Female" }
{ "first_name" : "Anna", "gender" : "Female" }
{ "first_name" : "Anna", "gender" : "Female" }
{ "first_name" : "Anna", "gender" : "Female" }
{ "first_name" : "Anna", "gender" : "Female" }
{ "first_name" : "Anna", "gender" : "Female" }
{ "first_name" : "Anna", "gender" : "Female" }
{ "first_name" : "Anton", "gender" : "Male" }
{ "first_name" : "Anton", "gender" : "Male" }
{ "first_name" : "Anton", "gender" : "Male" }

The query above searches for all documents where the last_name is Cole (because this query is using the German view, behind the scenes, all non-German documents have already been filtered out), discards all but the first_name and gender fields, and then sorts by the first_name (using the German collation).

explain() confirms that the German collation index was used:

db.germanSpeakersRedacted.find({last_name: "Cole"}, {first_name:1, _id:0, gender:1}).sort({first_name: 1}).explain()
{
    "stages" : [
        {
            "$cursor" : {
                "query" : {
                    "$and" : [
                        {
                            "language" : "German",
                            "last_name" : {
                                "$exists" : true
                            }
                        },
                        {
                            "last_name" : "Cole"
                        }
                    ]
                },
                "fields" : {
                    "first_name" : 1,
                    "gender" : 1,
                    "_id" : 0
                },
                "queryPlanner" : {
                    "plannerVersion" : 1,
                    "namespace" : "production.customers",
                    "indexFilterSet" : false,
                    "parsedQuery" : {
                        "$and" : [
                            {
                                "language" : {
                                    "$eq" : "German"
                                }
                            },
                            {
                                "last_name" : {
                                    "$eq" : "Cole"
                                }
                            },
                            {
                                "last_name" : {
                                    "$exists" : true
                                }
                            }
                        ]
                    },
                    "collation" : {
                        "locale" : "de",
                        "caseLevel" : false,
                        "caseFirst" : "off",
                        "strength" : 3,
                        "numericOrdering" : false,
                        "alternate" : "non-ignorable",
                        "maxVariable" : "punct",
                        "normalization" : false,
                        "backwards" : false,
                        "version" : "57.1"
                    },
                    "winningPlan" : {
                        "stage" : "FETCH",
                        "filter" : {
                            "$and" : [
                                {
                                    "last_name" : {
                                        "$exists" : true
                                    }
                                },
                                {
                                    "language" : {
                                        "$eq" : "German"
                                    }
                                }
                            ]
                        },
                        "inputStage" : {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "last_name" : 1,
                                "first_name" : 1
                            },
                            "indexName" : "german_name_index",
                            "collation" : {
                                "locale" : "de",
                                "caseLevel" : false,
                                "caseFirst" : "off",
                                "strength" : 3,
                                "numericOrdering" : false,
                                "alternate" : "non-ignorable",
                                "maxVariable" : "punct",
                                "normalization" : false,
                                "backwards" : false,
                                "version" : "57.1"
                            },
                            "isMultiKey" : false,
                            "multiKeyPaths" : {
                                "last_name" : [ ],
                                "first_name" : [ ]
                            },
                            "isUnique" : false,
                            "isSparse" : false,
                            "isPartial" : true,
                            "indexVersion" : 2,
                            "direction" : "forward",
                            "indexBounds" : {
                                "last_name" : [
                                    "[\"-E?1\u0001\b\u0001\u0007\", \"-E?1\u0001\b\u0001\u0007\"]"
                                ],
                                "first_name" : [
                                    "[MinKey, MaxKey]"
                                ]
                            }
                        }
                    },
                    "rejectedPlans" : [ ]
                }
            }
        },
        {
            "$project" : {
                "language" : false,
                "country" : false,
                "salary" : false
            }
        },
        {
            "$sort" : {
                "sortKey" : {
                    "first_name" : 1
                }
            }
        },
        {
            "$project" : {
                "_id" : false,
                "gender" : true,
                "first_name" : true
            }
        }
    ],
    "ok" : 1

User-Defined Roles – Limiting Access to the Views

One of the reasons for creating the views was to protect some of the data (the customers’ salaries) as not all users should see this information. At this point, all users can still access the base “customers” collection and so we’ve fallen short of that objective. User-defined roles to the rescue!

We create an admin user that has the built in root role and so can access any database, create new users, and perform any other activity:

use admin
db.createUser({
    user: "admin",
    pwd: "secret",
    roles: [
        {role:"root",db:"admin"}
        ]
    })

The next step is to create a role that only gives its members read access to the germanSpeakersRedacted collection (within the production database):

use admin
db.createRole(
   {
     role: "germanViewer",
     privileges: [
       { resource: { db: "production", collection: "germanSpeakersRedacted" },  actions: [ "find" ] }
     ],
     roles: []
   }
)

You can then create one or more users that have germanViewer within their defined roles:

use admin
db.createUser({
    user: "germanIT",
    pwd: "secret",
    roles: [
        {role:"germanViewer",db:"admin"}
        ]
    })

Additional privileges can be added to existing roles using grantPrivilegesToRole and extra roles can be assigned to existing users using grantRolesToUser.

For these access controls to work, users must be created with appropriate permissions and the MongoDB server process must be started with the --auth option:

mongod --auth

When connecting to the database as our newly-created admin user, the base customers collection is still accessible:

mongo -u admin -p secret --authenticationDatabase admin

use production
db.customers.findOne()

{
    "_id" : ObjectId("57fb8fbb99b01440193088eb"),
    "first_name" : "Ben",
    "last_name" : "Dixon",
    "country" : "Germany",
    "avatar" : "https://robohash.org/quiseumquam.bmp?size=50x50&set=set1",
    "ip_address" : "10.102.15.35",
    "dependents" : [
        {
            "name" : "Ben",
            "birthday" : "12-Apr-1994"
        },
        {
            "name" : "Lucas",
            "birthday" : "22-Jun-2016"
        },
        {
            "name" : "Erik",
            "birthday" : "05-Jul-2005"
        }
    ],
    "birthday" : "02-Jul-1964",
    "salary" : "£910070.80",
    "skills" : [
        {
            "skill" : "Cvent"
        },
        {
            "skill" : "TKI"
        }
    ],
    "gender" : "Male",
    "language" : "German"
}

When connecting as the germanIT user, only the German view can be accessed:

mongo -u germanIT -p secret --authenticationDatabase admin

use production

show collections
2016-10-28T10:24:03.765+0100 E QUERY    [main] Error: listCollections failed: {
    "ok" : 0,
    "errmsg" : "not authorized on production to execute command { listCollections: 1.0, filter: {} }",
    "code" : 13,
    "codeName" : "Unauthorized"
} :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
DB.prototype._getCollectionInfosCommand@src/mongo/shell/db.js:805:1
DB.prototype.getCollectionInfos@src/mongo/shell/db.js:817:19
DB.prototype.getCollectionNames@src/mongo/shell/db.js:828:16
shellHelper.show@src/mongo/shell/utils.js:748:9
shellHelper@src/mongo/shell/utils.js:645:15
@(shellhelp2):1:1

db.customers.findOne()
2016-10-21T14:40:19.477+0100 E QUERY    [main] Error: error: {
    "ok" : 0,
    "errmsg" : "not authorized on production to execute command { find: \"customers\", filter: {}, limit: 1.0, singleBatch: true }",
    "code" : 13,
    "codeName" : "Unauthorized"
} :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
DBCommandCursor@src/mongo/shell/query.js:702:1
DBQuery.prototype._exec@src/mongo/shell/query.js:117:28
DBQuery.prototype.hasNext@src/mongo/shell/query.js:288:5
DBCollection.prototype.findOne@src/mongo/shell/collection.js:294:10
@(shell):1:1

db.germanSpeakersRedacted.findOne()
{
    "_id" : ObjectId("57fb8fbb99b01440193088eb"),
    "first_name" : "Ben",
    "last_name" : "Dixon",
    "avatar" : "https://robohash.org/quiseumquam.bmp?size=50x50&set=set1",
    "ip_address" : "10.102.15.35",
    "dependents" : [
        {
            "name" : "Ben",
            "birthday" : "12-Apr-1994"
        },
        {
            "name" : "Lucas",
            "birthday" : "22-Jun-2016"
        },
        {
            "name" : "Erik",
            "birthday" : "05-Jul-2005"
        }
    ],
    "birthday" : "02-Jul-1964",
    "skills" : [
        {
            "skill" : "Cvent"
        },
        {
            "skill" : "TKI"
        }
    ],
    "gender" : "Male"
}

MongoDB Compass – Viewing Views Graphically

While the mongo shell is very powerful and flexible, it is often easier to understand and navigate your data graphically, this is where MongoDB Compass is invaluable. The good news is that MongoDB Compass handles views in exactly the same way as it does collections.

In Figure 1, we can view the documents in the base, customers, collection. Note that the salary value is visible.

View data in MongoDB base customers collection

Figure 1: View data in base customers collection

Figure 2 confirms that the salary field has been removed from the German view.

Salary has been redacted from the German view

Figure 2: Salary has been redacted from the German view

In Figure 3, we see that only Chinese documents have been included in the Chinese view.

Chinese view contains only Chinese documents

Figure 3: Chinese view contains only Chinese documents

Next Steps

Collation and read-only views are just 2 of the exciting new features added in MongoDB 3.4 – read more and these and everything else that’s new in MongoDB 3.4: What’s New.





Webinar Replay (EMEA) – Data Streaming with Apache Kafka & MongoDB

MongoDB with Apache Kafka

The replay from the MongoDB/Apache Kafka webinar that I co-presented with David Tucker from Confluent earlier this week is now available:

The replay is now available: Data Streaming with Apache Kafka & MongoDB.

Abstract

A new generation of technologies is needed to consume and exploit today’s real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.

This webinar explores the use-cases and architecture for Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.

Watch the webinar to learn:

  • What MongoDB is and where it’s used
  • What data streaming is and where it fits into modern data architectures
  • How Kafka works, what it delivers, and where it’s used
  • How to operationalize the Data Lake with MongoDB & Kafka
  • How MongoDB integrates with Kafka – both as a producer and a consumer of event – data

Slides





Processing Data Streams with Amazon Kinesis and MongoDB Atlas

This post provides an introduction to Amazon Kinesis: its architecture, what it provides, and how it’s typically used. It goes on to step through how to implement an application where data is ingested by Amazon Kinesis before being processed and then stored in MongoDB Atlas.

This is part of a series of posts which examine how to use MongoDB Atlas with a number of complementary technologies and frameworks.

Introduction to Amazon Kinesis

The role of Amazon Kinesis is to get large volumes of streaming data into AWS where it can then be processed, analyzed, and moved between AWS services. The service is designed to ingest and store terabytes of data every hour, from multiple sources. Kinesis provides high availability, including synchronous replication within an AWS region. It also transparently handles scalability, adding and removing resources as needed.

Once the data is inside AWS, it can be processed or analyzed immediately, as well as being stored using other AWS services (such as S3) for later use. By storing the data in MongoDB, it can be used both to drive real-time, operational decisions as well as for deeper analysis.

As the number, variety, and velocity of data sources grow, new architectures and technologies are needed. Technologies like Amazon Kinesis and Apache Kafka are focused on ingesting the massive flow of data from multiple fire hoses and then routing it to the systems that need it – optionally filtering, aggregating, and analyzing en-route.

AWS Kinesis Architecture

Figure 1: AWS Kinesis Architecture

Typical data sources include:

  • IoT assets and devices(e.g., sensor readings)
  • On-line purchases from an ecommerce store
  • Log files
  • Video game activity
  • Social media posts
  • Financial market data feeds

Rather than leave this data to fester in text files, Kinesis can ingest the data, allowing it to be processed to find patterns, detect exceptions, drive operational actions, and provide aggregations to be displayed through dashboards.

There are actually 3 services which make up Amazon Kinesis:

  • Amazon Kinesis Firehose is the simplest way to load massive volumes of streaming data into AWS. The capacity of your Firehose is adjusted automatically to keep pace with the stream throughput. It can optionally compress and encrypt the data before it’s stored.
  • Amazon Kinesis Streams are similar to the Firehose service but give you more control, allowing for:
    • Multi-stage processing
    • Custom stream partitioning rules
    • Reliable storage of the stream data until it has been processed.
  • Amazon Kinesis Analytics is the simplest way to process the data once it has been ingested by either Kinesis Firehose or Streams. The user provides SQL queries which are then applied to analyze the data; the results can then be displayed, stored, or sent to another Kinesis stream for further processing.

This post focuses on Amazon Kinesis Streams, in particular, implementing a consumer that ingests the data, enriches it, and then stores it in MongoDB.

Accessing Kinesis Streams – the Libraries

There are multiple ways to read (consume) and write (produce) data with Kinesis Streams:

  • Amazon Kinesis Streams API
  • Amazon Kinesis Producer Library (KPL)
    • Easy to use and highly configurable Java library that helps you put data into an Amazon Kinesis stream. Amazon Kinesis Producer Library (KPL) presents a simple, asynchronous, high throughput, and reliable interface.
  • Amazon Kinesis Agent
    • The agent continuously monitors a set of files and sends new entries to your Stream or Firehose.
  • Amazon Kinesis Client Library (KCL)
    • A Java library that helps you easily build Amazon Kinesis Applications for reading and processing data from an Amazon Kinesis stream. KCL handles issues such as adapting to changes in stream volume, load-balancing streaming data, coordinating distributed services, providing fault-tolerance, and processing data.
  • Amazon Kinesis Client Library MultiLangDemon
    • The MultiLangDemon is used as a proxy by non-Java applications to use the Kinesis Client Library.
  • Amazon Kinesis Connector Library
    • A library that helps you easily integrate Amazon Kinesis with other AWS services and third-party tools.
  • Amazon Kinesis Storm Spout
    • A library that helps you easily integrate Amazon Kinesis Streams with Apache Storm.

The example application in this post use the Kinesis Agent and the Kinesis Client Library MultiLangDemon (with Node.js).

Role of MongoDB Atlas

MongoDB is a distributed database delivering a flexible schema for rapid application development, rich queries, idiomatic drivers, and built in redundancy and scale-out. This makes it the go-to database for anyone looking to build modern applications.

MongoDB Atlas is a hosted database service for MongoDB. It provides all of the features of MongoDB, without the operational heavy lifting required for any new application. MongoDB Atlas is available on demand through a pay-as-you-go model and billed on an hourly basis, letting you focus on what you do best.

It’s easy to get started – use a simple GUI to select the instance size, region, and features you need. MongoDB Atlas provides:

  • Security features to protect access to your data
  • Built in replication for always-on availability, tolerating complete data center failure
  • Backups and point in time recovery to protect against data corruption
  • Fine-grained monitoring to let you know when to scale. Additional instances can be provisioned with the push of a button
  • Automated patching and one-click upgrades for new major versions of the database, enabling you to take advantage of the latest and greatest MongoDB features
  • A choice of regions and billing options

Like Amazon Kinesis, MongoDB Atlas is a natural fit for users looking to simplify their development and operations work, letting them focus on what makes their application unique rather than commodity (albeit essential) plumbing. Also like Kinesis, you only pay for MongoDB Atlas when you’re using it with no upfront costs and no charges after you terminate your cluster.

Example Application

The rest of this post focuses on building a system to process log data. There are 2 sources of log data:

  1. A simple client that acts as a Kinesis Streams producer, generating sensor readings and writing them to a stream
  2. Amazon Kinesis Agent monitoring a SYSLOG file and sending each log event to a stream

In both cases, the data is consumed from the stream using the same consumer, which adds some metadata to each entry and then stores it in MongoDB Atlas.

Create Kinesis IAM Policy in AWS

From the IAM section of the AWS console use the wizard to create a new policy. The policy should grant permission to perform specific actions on a particular stream (in this case “ClusterDBStream”) and the results should look similar to this:

Next, create a new user and associate it with the new policy. Important: Take a note of the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

Create MongoDB Atlas Cluster

Register with MongoDB Atlas and use the simple GUI to select the instance size, region, and features you need (Figure 2).

create mongodb atlas cluster

Create a user with read and write privileges for just the database that will be used for your application, as shown in Figure 3.

Creating an Application user in MongoDB Atlas

Figure 3: Creating an Application user in MongoDB Atlas

You must also add the IP address of your application server to the IP Whitelist in the MongoDB Atlas security tab (Figure 4). Note that if multiple application servers will be accessing MongoDB Atlas then an IP address range can be specified in CIDR format (IP Address/number of significant bits).

Add App Server IP Address(es) to MongoDB Atlas

Figure 4: Add App Server IP Address(es) to MongoDB Atlas

If your application server(s) are running in AWS, then an alternative to IP Whitelisting is to configure a VPC (Virtual Private Cloud) Peering relationship between your MongoDB Atlas group and the VPC containing your AWS resources. This removes the requirement to add and remove IP addresses as AWS reschedules functions, and is especially useful when using highly dynamic services such as AWS Lambda.

Click the “Connect” button and make a note of the URI that should be used when connecting to the database (note that you will substitute the user name and password with ones that you’ve just created).

App Part 1 – Kinesis/Atlas Consumer

The code and configuration files in Parts 1 & 2 are based on the sample consumer and producer included with the client library for Node.js (MultiLangDaemon).

Install the Node.js client library:

git clone https://github.com/awslabs/amazon-kinesis-client-nodejs.git
cd amazon-kinesis-client-nodejs
npm install

Install the MongoDB Node.js Driver:

npm install --save mongodb

Move to the consumer sample folder:

cd samples/basic_sample/consumer/

Create a configuration file (“logging_consumer.properties”), taking care to set the correct stream and application names and AWS region:

The code for working with MongoDB can be abstracted to a helper file (“db.js”):

Create the application Node.js file (“logging_consumer_app.js”), making sure to replace the database user and host details in “mongodbConnectString” with your own:

Note that this code adds some metadata to the received object before writing it to MongoDB. At this point, it is also possible to filter objects based on any of their fields.

Note also that this Node.js code logs a lot of information to the “application log” file (including the database password!); this is for debugging and would be removed from a real application.

The simplest way to have the application use the user credentials (noted when creating the user in AWS IAM) is to export them from the shell where the application will be launched:

export AWS_ACCESS_KEY_ID=????????????????????
export AWS_SECRET_ACCESS_KEY=????????????????????????????????????????

Finally, launch the consumer application:

../../../bin/kcl-bootstrap --java /usr/bin/java -e -p ./logging_consumer.properties

Check the “application.log” file for any errors.

App Part 2 – Kinesis Producer

As for the consumer, export the credentials for the user created in AWS IAM:

cd amazon-kinesis-client-nodejs/samples/basic_sample/producer

export AWS_ACCESS_KEY_ID=????????????????????
export AWS_SECRET_ACCESS_KEY=????????????????????????????????????????

Create the configuration file (“config.js”) and ensure that the correct AWS region and stream are specified:

Create the producer code (“logging_producer.js”):

The producer is launched from “logging_producer_app.js”:

Run the producer:

node logging_producer_app.js

Check the consumer and producer “application.log” files for errors.

At this point, data should have been written to MongoDB Atlas. Using the connection string provided after clicking the “Connect” button in MongoDB Atlas, connect to the database and confirm that the documents have been added:

App Part 3 – Capturing Live Logs Using Amazon Kinesis Agent

Using the same consumer, the next step is to stream real log data. Fortunately, this doesn’t require any additional code as the Kinesis Agent can be used to monitor files and add every new entry to a Kinesis Stream (or Firehose).

Install the Kinesis Agent:

sudo yum install –y aws-kinesis-agent

and edit the configuration file to use the correct AWS region, user credentials, and stream in “/etc/aws-kinesis/agent.json”:

“/var/log/messages” is a SYSLOG file and so a “dataProcessingOptions” field is included in the configuration to automatically convert each log into a JSON document before writing it to the Kinesis Stream.

The agent will not run as root and so the permissions for “/var/log/messages” need to be made more permissive:

sudo chmod og+r /var/log/messages

The agent can now be started:

sudo service aws-kinesis-agent start

Monitor the agent’s log file to see what it’s doing:

sudo tail -f /var/log/aws-kinesis-agent/aws-kinesis-agent.log

If there aren’t enough logs being generated on the machine then extra ones can be injected manually for testing:

logger -i This is a test log

This will create a log with the “program” field set to your username (in this case, “ec2-user”). Check that the logs get added to MongoDB Atlas:

Checking the Data with MongoDB Compass

To visually navigate through the MongoDB schema and data, download and install MongoDB Compass. Use your MongoDB Atlas credentials to connect Compass to your MongoDB database (the hostname should refer to the primary node in your replica set or a “mongos” process if your MongoDB cluster is sharded).

Navigate through the structure of the data in the “clusterdb” database (Figure 5) and view the JSON documents.

Explore Schema Using MongoDB Compass

Figure 5: Explore Schema Using MongoDB Compass

Clicking on a value builds a query and then clicking “Apply” filters the results (Figure 6).

View Filtered Documents in MongoDB Compass

Figure 6: View Filtered Documents in MongoDB Compass

Add Document Validation Rules

One of MongoDB’s primary attractions for developers is that it gives them the ability to start application development without first needing to define a formal schema. Operations teams appreciate the fact that they don’t need to perform a time-consuming schema upgrade operation every time the developers need to store a different attribute.

This is well suited to the application built in this post as logs from different sources are likely to include different attributes. There are however some attributes that we always expect to be there (e.g., the metadata that the application is adding). For applications reading the documents from this collection to be able to rely on those fields being present, the documents should be validated before they are written to the database. Prior to MongoDB 3.2, those checks had to be implemented in the application but they can now be performed by the database itself.

Executing a single command from the “mongo” shell adds the document checks:

The above command adds multiple checks:

  • The “program” field exists and contains a string
  • There’s a sub-document called “metadata” containing at least 2 fields:
  • “mongoLabel” which must be a string
  • “timeAdded” which must be a date

Test that the rules are correctly applied when attempting to write to the database:

Cleaning Up (IMPORTANT!)

Remember that you will continue to be charged for the services even when you’re no longer actively using them. If you no longer need to use the services then clean up:

  • From the MongoDB Atlas GUI, select your Cluster, click on the ellipses and select “Terminate”.
  • From the AWS management console select the Kinesis service, then Kinesis Streams, and then delete your stream.
  • From the AWS management console select the DynamoDB service, then tables, and then delete your table.

Using MongoDB Atlas with Other Frameworks and Services

We have detailed walkthroughs for using MongoDB Atlas with several programming languages and frameworks, as well as generic instructions that can be used with others. They can all be found in Using MongoDB Atlas From Your Favorite Language or Framework.