GeoSpatial Indexing in MongoDB

Is DIRT simple! Consider the code, which searches an item collection for items located near a specific location, designated in longitude/latitude (according to the GeoJSON spec, longitude first, latitude second):
this.findItems = function(db, type, longitude, latitude, index, pageSize, callback){
db.collection("items",function (err, collection){
if (err) throw err;
collection.ensureIndex(["location","2d"], false, function(err){
var search = {}, paging = {};
if(type) search.type = type;
if(longitude != null && latitude != null)
search.location = {"$near" : [parseFloat(longitude),parseFloat(latitude)]};

if (index >= 0 && pageSize >= 0){
paging.skip = index*pageSize;
paging.limit = pageSize;
}

collection.find(search, paging,
function(err,cursor){
cursor.toArray(function(err,items){
callback(items);
}
);
});
});
});
}

It's that simple (minus hours of learning how to do that, scratching my head over stupid errors that I caused but didn't know I was causing [us computer scientists are quick to blame the other guy :) ])

If you can decipher that, good for you. Onto step 3 of a million on my little code project.

MongoDB and Node.js

I am starting a joint venture with my very good friend, where I am doing the coding, and I decided that I will be doing it in Node.js with a MongoDB backend. I was thinking about why these two technologies, and how I would explain my choices to another techy. I think I have my explanation figured out...

First and foremost, Javascript. I have come to love everything about it! For my side projects, I used to use Java, and wrote a pretty decent web middle tier and rolled my own ORM also. You are witnessing it in action by reading this post. I could make an object like "Book", add properties to it, give them properties in an XML file (a series of them, if you know how Java web apps work ;), and my ORM would create the table, foreign keys, etc. It was particularly magical in how it handled foreign keys and loading referenced objects in one SQL query, knowing which type of join to apply and everything. However, this was all a daunting task, made even more so by the fact that Java allows abstraction to the Nth degree. If I could remember that code, I would be very much more specific on how it worked, but I had an object for connecting to the database, an object in another project for building SQL because I thought I could abstract that out and rewrite it if I had to, and swap in SQL building engines. That was the main cause of pain, I would think "If I just wanted to swap in another X, it would be easy." I initially made plans to have it either go to SQL or XML or JSON, whatever you wanted, and you would just swap in an engine in the config file. It was heavily reflection based :)

So, I've been writing Javascript a lot. Only lately, but before discovering Node.js, have I come to realize its power... anonymous methods and objects, JSON, Functions as first class citizens. Of course there are the libraries, like jQuery, and Google APIs, like maps for instance. There's the shear fact that you don't have to think ahead about every possibility for an object before creating it. Like, my Book object would have an author, title, etc. If later I wanted to add the ISBN or something, in Java I would have to update any IReadable interfaces (well, considering that you could read a cereal box that would undoubtedly NOT have an ISBN, this example is falling apart, but you know what I'm talking about :P ), then update the Book class, update the list to enable searching by an ISBN, etc. Tons of stuff. Javascript:

var Book = function(opt) { for (var i in opt) { this[i] = opt[i]; } }

Imagine I start calling it with
var book = new Book({title: "Brainiac", author: "Ken Jennings", ISBN: "some string of numbers});

I can now just get the ISBN in other places by calling "book.ISBN".

Node.js

I've grown weary of writing server side code in Java and C#. ASP.NET is a CF with its control design, JSF was a CF with its configs and my ginormous web framework that is almost 6 years old now, and I never tried updating past JSF 1.1, so I don't know if it's gotten any better. But Java web development was nuts, you always needed 34 external libraries, most of which came from Apache's Jakarta project (which is awesome though), intimate knowledge of how to set up Tomcat or your favorite application server, you had to know how to write servlets and JSPs, JSTL, JSTL configs, and to read a Catalina.out file. I'm sure I'm forgetting something worth hours of learning...

Node.js is simple. Create a server, attach a listener to the request method. It's so barebones, that requesting anything will not work at all on the server you just created. I researched some libraries for web app development, and I decided, F it, I'm not falling in that trap again. I've dealt with enough code in my life, I could write one. One that takes my favorite things from the frameworks I know, and works on that. It is nearly done...

MongoDB

MongoDB is clearly the only choice for me on Node.js. JSON documents. That should about clear it up, if you were still wondering. If still... imagine we need an ISBN on a book. There's no updating 14 stored procedures that perform CRUD ops on Book to now also include ISBN. No alter table statements... There's simply, Book has ISBN now... db.books.save({title: "Brainiac", author: "Ken Jennings", ISBN: "some number"});

There'll be other books without ISBN, but that's simple... if (book.ISBN != null) (or simply if (book.ISBN) of course). So you think of the basic stuff you need to get off the ground running, and you run. You can run, and you can run fast and recklessly, because if you think of something to add, you put it in. There's very minimal pain, or slowing down, in change.

Node and MongoDB are built for scaling, but I'm not too concerned about that at the moment. Then why, you ask, am I using them? Simply because I do not know them! Although it is 10% for the learning factor, 90% for the cool factor. If it's not cool and I don't know it, then I won't go out of my way to learn it. You might ask, if you're trying to make something awesome, why don't you do it in what you know first, then convert it once you start making money? I'd rather do it in something awesome. Knowing these technologies will be more valuable to me than selling the site to Google for a billion dollars! Ok, that's not true. But if it hits that point, and I didn't do it in Node + MongoDB, then I would not have learned them, and I would not have much reason to NOW, being a billionaire, would I? :P

Oh yeah, Happy Birthday to me today :)

JSON and Two Smoking Stacks

JSONJSON is something to me that is just simple to use in its native environment, that I would never consider writing a parser. I would just try to use it in its natural environment, if I ever have to use it. Well, I found a case where that is not the case!

In my dealings with geocoding locations for clients, I've come across many instances where a limit on the amount of geocoding calls was reached, and I would have to wait until the next day to geocode some more locations. I could write a geocoding mega-program that abstractly geocodes addresses with all of the free services available until it reaches a limit, then moves on to the next service! Fun stuff.

The problem is, with the other geocoders out there, they do not let you specify which format you would like the response to be in. XML is easy to do in C#, however I hadn't researched a JSON parser, so I had tasked myself with writing one from scratch with no previous parsing experience.

JSON is a scary beast, for someone like me with limited syntax parsing experience, and no compiler courses taken in college... SCARY!


Although, it can only have a certain number of syntax elements. {}[],: as far as I know. Quotes ("") contain strings, curlies {} contain objects, squares [] contain arrays. Arrays can contain objects, objects can contain arrays. Objects have to be in field : value pairs. So here's the basic structure:


private Stack syntaxStack;
private Stack tokenStack;

StringBuilder sb = new StringBuilder(); // string builder for catching data

for (loop through json){
switch (char){
case '"':
we're either in a string or just out of a string (set a boolean so we can check in the other cases)
if it's a " preceded by a , add it to the string buffer
break;
case '[':
if we're not in a string
push a '[' onto the syntax stack
push an array token onto the token stack
break;
case '{':
if we're not in a string
push a '{' onto the syntax stack
push an object token onto the token stack
break;
case ']':
if we're not in a string
Get the last value in the array, if there is one
( it could have also been an array of arrays, and we're closing the outer array, so there won't be a value)
add to the children of the last token in the token stack
pop from each stack, we're out of the current array
break;
case '}':
if we're not in a string and we're in an object
get the last value in the object, if there is one
set the value in the last object
pop from each stack, we are out of that object
case ',':
if we're not in a string
if we're in an object, a value was just specified. if it's a string value, set the last field's value to the data in the string builder
clear the string buffer
break;
case ':':
if we're not in a string
we should be in an object, and the previous string was the field, so create a field token and add it to the current node
clear the string buffer
break;
default:
append the character to the data string buffer
}
}

bool InArray
= this.syntaxStack.Peek() == '['

bool InObject
= this.syntaxStack.Peek() == '{'


Here is the Visual Studio 2008 project with C# code for parsing your own JSON. I think mine looks decent compared to others out there