Avoiding Tech-sand

Saturday, April 28, 2012

MongoDB - Jongo and Morphia

When presented with the opportunity to store data into a database without schema restrictions, some of us are hesitant to leap into the NoSQL void. I'm not. I am tired of having to work with RDBMS restrictions for simple data changes in my applications. I am tired of having to cram non-relational domain models into relational schemas. I want an easier way to update my application data structures without having to constantly change my database, while still maintaining backwards compatibility, and providing minimal data transformation. Lately I have been working with MongoDB and a couple of APIs (Jongo and Morphia) as well as MongoVUE. What follows is my odyssey into MongoDB.

First I started with running MongoDB on my laptop. On my Windows 7 machine I am using MongoDB 2.0.4 (mongodb-win32-x86_64-2.0.4). The MongoDB daemon start-up script I am using is:

call mongod.exe --dbpath "C:/Tools/mongodb-win32-x86_64-2.0.4/data/db"

Once it is running, I can immediately connect to it with MongoVUE. MongoVUE is a GUI that allows me to view the database, as well as edit data and other database level tasks.

The cool thing about MongoVUE is that most tasks that you execute in the GUI are read out in the log. This has the added benefit of helping new comers to MongoDB more quickly learn the shell commands that are used to manage the database. Without the GUI, I would use the MongoDB shell. In Windows this is launched by mongo.exe.

The Jongo API

Jongo allows Java developers to write programs that use the MongoDB Java Driver, while using MongoDB shell style JSON queries. To see how it works, I will first use the MongoDB Java Driver API to insert an Employee object into the database. First I start by connecting the database:

Next I get an Employee object. Note: MongoDriver is just the name of the Java class that I use to run this code. I then serialize the Employee object into JSON via the FlexJSON API. There is a MongoDB JSON API that includes a serializer, but it does not know how to serialize objects that are not intrinsic to MongoDB. I wanted to serialize the object graph into JSON to experiment with how I would push JSON into MongoDB from my application.

Finally I parse the JSON into an Object and cast it to the com.mongodb.DBObject, and insert it into the "employees" collection.

To verify the insert I can view the employees collection in the MongoVUE GUI. The interesting thing here is that the two date elements in the Employee document are serialized as longs and stored in the document store as numbers. This seems counter intuitive to how dates should b e stored in MongoDB (ISO-Date). In fact when working with the com.mongodb.util.JSON parser, I found a bug where longs were being improperly cast and truncated to ints. This happened when the dates serialized were before 1/1/1970. This bug is fixed in the MongoDB Java Driver version 2.8.0. I am using version 2.7.3, so I had to fix it locally.

At this point the reader should notice that the Employee object contains two member objects: Address and Department. The JSON for this Employee object is seen below:

Now I can read the employee document using the Jongo API using a JSON query syntax:

The output is seen below:

The Morphia API

According Google Code, "Morphia is a lightweight, type-safe library for mapping Java objects to/from MongoDB." Below is an example that I wrote to stuff a collection of Employee objects in the database as JSON documents.

Using Morphia requires a few Morphia annotations (@Entity, @Id, @Embedded) in the model objects (Employee, Address, Department). In the Employee class I use all three. The @Entity annotation is similar to the JPA @Entity. The "employees" argument tells the Morphia API that this MongoDB document will be added to the employees collection.

The interesting thing about MongoDB collections is that if they do not yet exist in the database, they will be created in-line when documents are inserted. This is also true of the database itself. If the database specified by the API does not yet exist in the MongoDB environment, it too will be created on the first document insert.

The @Id annotation is needed for the for the "auto-generated" ObjectId field. @Embedded is used on the Address and Department members to add them as embedded documents to the Employee document. @Embedded is also on each of the Address and Department classes.

After execution I can again view the documents in MongoVUE. In this Morphia insert, I have correctly mapped the java.util.Date to the ISODate MongoDB Date/Time data type.

Next I will investigate a scalable solution for MongoDB as well as Couchbase.

NVSS NFJS 2012 - Day 2

So today I started with a great session from Craig Walls about Spring Data (Templates and Repositories) focusing on integration with RDBMS, MongoDB, Neo4j, and Redis. He seemed to be keen on Neo4j. I am still looking into how the very schema-oriented graph database would work for my needs.

It was a very interesting session; I especially like the content on Spring Data Repositories. I liked the convention of generating implementations by naming queries with method names. If that is not good enough, or your methods don't adhere to what the Spring Data engine is expecting, you can use @Query annotations to specify the queries that Spring should implement. This approach seems to still need a JPA provider, so for now, Hibernate will be used by our teams. I am not 100% sold on the performance of Hibernate under the covers, but it does make implementation easy.

Next I sat through another MongoDB session with Ken Sipe. I am partial to MongoDB, due to its schema-less approach and it's scalability. Schema-less is a big departure from what most of us are use to. I am eager to model one our applications into MongoDB collections of documents. I think that it gives most flexibility with the most web application performance. Though I am still looking into Couchbase. I am not necessarily worried about schema-less storage. the biggest issue would be with data transformation needed to other service layers and other relational data stores. I think we can manage that with service layers in our application. I actually see schema-less storage with a schema-oriented services layers to be the best of both worlds.

I am still reviewing the limitations in MongoDB, like 24K namespaces. Namespaces would include collections and indexes. I should not bump into the collections limitations, but I am concerned about any limit on the number of documents in a single collection, or for that matter limitations on number of total documents in a database.

Finally, I ended the say with another Ken Sipe session on Web Application Security and ESAPI.

Friday, April 27, 2012

NVSS NFJS 2012 - Day 1

So I am up in Reston, VA for the Northern Virginia Software Symposium, No Fluff Just Stuff conference. I came up here early for a Gradle session given by Tim Berglund. I had looked at Gradle before, but his session was a good starter session for Gradle beginners. We are already Maven and Groovy users, having moved from Ant. Gradle lets you use Ant, Maven, Ivy, Java, Groovy, etc. In fact, Gradle seems to elevate the art of writing builds to first-class programming, if it wasn't there already. I will be looking into how we can better control our builds using Gradle, Maven, and Groovy.

In the afternoon I attended a few of Tim's sessions on NoSQL. I have been researching several NoSQL solutions to date: Cassandra, MongoDB, Couchbase. After a survey of multiple NoSQL solutions, followed by a deep dive into Cassandra and Neo4j, I feel that I am better positioned to decide on the appropriate solution.

I really like the flexibility of schema-less storage in NoSQL databases like MongoDB. However, my solutions need schema binding. I am currently prototyping a solution that combines the schema-less, document-based storage of NoSQL with application layer schema validation and data transformation. MongoDB and Couchbase seem to be good candidates due to there native handling of JSON.

One of the really cool things is that this year NFJS is not handing out paper session guides and agendas. Instead, every attendee receives an iPad that they can use for the duration of the conference. All the slides and conference documentation are on this iPad in a custom NFJS application. It is convenient.

Thursday, April 12, 2012

Speaking at TriJUG in May 2012

On May 21st I will be speaking at the May 2012 TriJUG meeting. My talk will be based on my blog entry: Dynamic Groovy Edges and Regression Isolation. More info will follow as the date draws near.

Tuesday, February 28, 2012

PMI CVC PMP 2012 Spring Workshop - Framework Talk

I will be presenting on March 17th for the PMI Central Va Chapter during their Spring 2012 PMI Certification Workshop. My topic will be the Project Management Framework. My talk goes from 10:15 AM to 11:45 AM and touches on items found in sections 1 & 2, chapters 1, 2, & 3, of the PMBOK, version 4.

In my session I will be discussing these topics and more:

Projects, Portfolios, and Program
Process Groups, Knowledge Areas, and Processes
PMO
Project Life Cycle vs. Product Life Cycle
Stakeholder Management
Organizational Structure

Ever wondered how PMI keeps the PMP exam current and relevant? This year we have also added information of the PMI Role Delineation Study and the Crosswalk.

This workshop is a great way to come up to speed for the PMP exam as well as gain valuable study tips from fellow project managers that have already passed the exam. The workshop is also a great opportunity to gain the PDUs needed to maintain existing PMP certifications. Best of all, attendees receive copies of all the slides presented at the workshop as well as other resources to help them study for the exam.

Sunday, January 22, 2012

Cronyism

If you really had the choice, would you pick cronyism over meritocracy? Even in today's enlightened society, where productivity and efficiency are very desirable traits in both the public and private sectors, one can still find cronyism. In my opinion, cronyism is the undesirable side effect of relying too much on who you know, or who knows you.

Many of us have heard that it is not as much what you know that gets you a job, but instead, who you know. So we network to increase our chances of success when seeking employment. Living in Richmond, VA, I can attest to this notion. The Richmond IT community is very small and somewhat incestuous when it comes to employment. It's hard to find someone new in our IT community that a 1st, 2nd, or 3rd degree contact has not at least heard of or knows outright.

I have had more success researching and networking opportunities through friends then applying coldly through an online job site. In fact, online job sites are mainly used for finding the jobs. Once the job is located, one is better off if they research where the job is and skip the online job site, or least follow-up through networking channels.

As long as a candidate is qualified for a position, checking his or her credentials through mutual contacts is not necessarily a bad move. LinkedIn and Facebook have become the De facto method of choice for researching contacts. In fact, my current position was the result of a LinkedIn search, followed by a check on me by a recruiter. The recruiter looked at my LinkedIn profile and asked folks in her office if they new me. Luckily, someone that she solicited feedback from not only knew me and my work, but was able to give me a good recommendation. This all happened unbeknown to me, but I would still consider it networking, albeit anonymous and unsolicited networking.

While personal contacts are valuable, they are certainly no substitute for technical interviews. However, I have witnessed numerous times where interviews are not even granted for candidates that do not pass the litmus test that anonymous networking has become.

On the flip side, this approach goes horribly wrong when it devolves into cronyism. Cronyism is when a person exhibits unwarranted partiality towards a friend of theirs and places that friend into a job or other position of authority based solely (or mainly) on that friendship and not on what should be considered: technical and cultural fitness for the position.

I recently witness unbridled cronyism. It was so blatant that it is both sickening and comical at the same time. In order to protect the innocent (and the guilty) I will not name names here. Suffice it to say that it was a commercial, C-level executive that passed on an otherwise very qualified individual, in order to satisfy a false sense of loyalty to a peer that was in fact instrumental in the cronyism. This peer was another executive that was staffing an already ailing IT organization with only folks that he knew. No one outside his circle of contacts had a chance. This even went as far as involving HR to interview folks, while the executives knew all along that only candidates from within the so-called "inner circle" were worthy. At the least this was disrespectful to the candidates that came in for a false interview. At the most, it was immoral and possibly even illegal. Plausible deny-ability would most likely prevent the latter from being true.

At the end of the day, the process is broken when cronyism is involved. We have all witnessed it in the news, and we have frowned on those political figures that got caught in cronyism-related scandals. Yet, this doesn't stop some folks from exercising their rights to compromise their own integrity and that of their organizations by participating in practices that they know are both morally bankrupt and, at the end of the day, counter intuitive to successful long term execution of human resource management and staffing.

These folks no longer deserve the mantle of authority and leadership that has so falsely been granted them. If all is right in the world, this poor decision making will, in the end, bring the house of cards that was built so haphazardly, by embracing poor hiring practices, down on their collectively empty skulls. Moreover, they will find themselves without redress when they have to explain to their management why they ignored qualified candidates to hire poor performing friends. It may take a while for this to happen, but "nature abhors a vacuum", and equilibrium will be achieved. One can only hope to be around to see the imbalance between stupidity and intelligence righted.

Thursday, January 12, 2012

Lotus Notes: One of the First NoSQL Databases - No Really!

Depending on who you talk to, NoSQL means NO SQL at all or Not-Only SQL. I like the latter meaning as I find it makes the technology more flexible and adoptable for those that have primarily worked with SQL databases. Regardless of the meaning, there are over 100 databases that can be labeled as NoSQL. These databases range from wide-column , document storage, and key-value tuple stores to object stores and graph databases, with several types in between. Included in this list of 100+ NoSQL databases is Lotus Notes/Domino.

I started working with Lotus Notes in the early 90's, right after I started with HTML and VB. Back then it a was a great tool to allow remote users to collaborate on common data, via replication. Over the years (more than a decade) Notes evolved to include the Domino server-side technology, while including a host of other technology integration: HTML, Java, C, C++, JavaScript, XML, Web Services, etc. In fact, Lotus Notes was my springboard to move to Java.

To me, the two biggest strengths of Lotus Notes/Domino were its approach to non-structured data storage, and its replication technologies. There are a whole host of other Domino features that make the platform very robust, such as agents, routing, and security, not to mention the robust HTTP stack.

Notes has been much maligned over the years as being mostly a productivity tool or email tool. However, I was very happy working with Lotus Notes/Domino. I think there are several reasons for the bad press surrounding Lotus Notes. My top four reasons are below:
1. IBM's incredible (dare I say colossal) mismanagement of the Lotus brand, especially in the wake of the WebSphere brand
2. The lack of standards and design patterns, as well as documented ALM for Lotus Notes development (at one point I heard Notes referred to as a virus).
3. The lack of forward momentum on modernizing client and web UI tools
4. The lack of management understanding what Lotus Notes was and wasn't, while developers tried to make every solution a nail to be handled by the Notes hammer.

However much I lament the downward slide of Lotus Notes, there is truth to my NoSQL claim.

Notes stores data in documents, or more precisely, notes. There are also design notes (forms, views, etc.) and administrative notes (ACL and replication). Based on that alone I would label it a document-based NoSQL database.

In Notes, the container model is quite simple. Items contain data, notes (document notes) contain items, and database (NSF files) contain notes. Items are of two types, summary and non-summary. For the most part, only summary data can be displayed in a Notes view. Non-summary data, while accessible at the note level, is generally not seen in a Notes view. Notes views don't really contain document notes as much as they collect and index them. Using views, designers can build views into Notes data based on the need of users and the data contained in document notes. In this way, they are somewhat similar to views found in relational databases.

Non-summary items are those items usually bigger than 60K and/or composite, like rich text items. These items can contain rich text, file attachments, images, multimedia, etc.

Notes forms are generally used to create, edit, and view document notes in the UI. To that end, forms are designed for those functions with fields added to the forms to contain the data when document notes are displayed or edited in a form. Forms contain fields, however, fields are not the same as items. Items contain data, fields will temporarily show and allow editing of item data when the form is used to open a document note. However, document notes can and do contain items that don't exist on any forms. Multiple forms can be used to view or edit document notes. In this fashion, forms are actually decoupled from document notes. While document notes are item, and therefore data, containers, forms are merely the template used to view, edit, or create documents.

Forms are mostly UI concerns. In fact, items contained in document notes can be, and often are, mutated, without forms. Moreover, document notes can be created without ever using a form.

So, in the grand scheme of things, Lotus Notes, at its core, is a document-based NoSQL database for unstructured data. You can store data within documents. These documents can be related (or not) by data contained in items. These items can be summary (simple) data or non-summary (composite) data. Notes forms and views can be used to enter data (including tags and metadata) and view collections or related documents. Notes replication ensures that documents contained in one database instance can be eventually synchronized with data in another Notes database with the same replica ID.

Given these features, which are comparable to many other NoSQL databases, Lotus Notes NoSQL is a reality, now.