Today's Joke...

"I better do the lawn tomorrow before people start thinking a bunch of homeless people live here"

I kill me.

Can you break it?

<script type="text/javascript">
function cleanseField(str){
str = str.replace(/<(/?)(b|i|u|em)>/ig,"_$1$2_").replace(/<.*?>/g,"").replace(/_(/?)(b|i|u|em)_/ig,"<$1$2>");
return str;
} function doit(){
var str = "<h3>Header</h3>Hello, my name is <eM>jason connell</Em>. <b>You</B> are awesome, like <i>italics</i><<scr"+"ipt>script type='text/javascript'>alert('sup');</sc"+"ript </scr"+"ipt>><u>underline</u>";

var spn = document.getElementById("test");
spn.innerHTML += "<h1>Original</h1>" + str;
spn.innerHTML += "<h1>Cleansed</h1>" + cleanseField(str);
}
</script>

<a href="javascript:void(0);" onclick="doit();">Doit</a>
<span id="test"></span>


Trying to cleanse html but keep certain elements for now. Tested in Chrome and IE 8, but is there a string that will still put out a valid script tag? It's very important.

If only there was a way in HTML that you can specify, "scripts are only valid inside this segment of the page". It's a shame that you could execute a script anywhere.

Borrow the good, discard the bad

In my many years of web development, I've come across a lot of good ways that platform authors did stuff, and a lot of bad ways. So I'm writing my version of a web platform on Node.js, and I decided to keep the good stuff, and get rid of what I didn't like. It wasn't easy but I'm pretty much finished by now.

As with most things I develop, I'll decide on an architecture that allows for changes to be made in a way that makes sense, but I'll start with what I want the code to look like. Yes. When I wrote my ORM, I started with the simple line, db.save(obj); (it turns out that's how you do it in MongoDB so I didn't have to write an ORM with Mongo :) When starting a web platform, I started out the same way.

I wanted to write:

<list value="${page.someListVariable}" var="item">
Details for ${item.name}
<include value="/template/item-template.html" item="item" />
</list>


Obvious features here are code and presentation separation, SSIs, simple variable replacement with ${} syntax.

There aren't a lot of tags in my platform. There's an if, which you can use to decide whether to output something. There's an include, which you can pass variables from the main page so you can reuse it on many pages. This one takes an "item" object, which it will refer to in its own code with ${item}.

Recently I added a layout concept. So you can have your layout html in another file, and just put things into the page in the page's actual html. For instance, you might reach the file index.html, which would look like this:

<layout name="main">
<content name="left-column">
<include value="/template/navigation.html" />
</content>
<content name="main-column">
<include value="/template/home-content.html" />
</content>
</layout>


Java Server Faces used a two way data binding mechanism which was really helpful. But then you need controls, like input[type=text] or whatever. My pages will not have two way data binding, but you can use plain html. Which I like better. (However, those controls were very simple to swap due to the generous use of interfaces by Java, and their documentation pretty much mandating their use. e.g. using ValueHolder in Java instead of TextBox, and if you were to make it a "select" or input[type=hidden], your Java code would not have to change, which is one thing I absolutely hate about ASP.NET).

I borrow nothing from PHP.

ASP.NET pretty much does nothing that I like, other than it's easy to keep track of what code gets run when you go to /default.aspx. The code in /default.aspx.cs and whatever Page class that inherits, or master page that it's on. In Java Server Faces you're scrounging through xml files to see which session bean got named "mybean".

My platform is similar to ASP.NET in that for /index.html there's a /site/pages/index.js (have I mentioned that it's built on node.js), that can optionally exist, and can have 1-2 functions implemented in it, which are "load" and "handlePost", if your page is so inclined to handle posts. Another option is to have this file exist, implement neither load nor handlePost, and just have properties in it. It's up to youme.

Here's a sample sitemap page for generating a Google Sitemap xml file:

Html:

<!--?xml version="1.0" encoding="UTF-8"?-->

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://${config.hostUrl}/index</loc>
<lastmod>2011-06-16</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
<jsn:foreach value="${page.entries}" var="entry">
<url>
<loc>${entry.loc}</loc>
<lastmod>${entry.lastmod}</lastmod>
<changefreq>${entry.changefreq}</changefreq>
<priority>${entry.priority}</priority>
</url>
</jsn:foreach>
</urlset>


I use the jsn prefix, which just stands for (now, anyway) Javascript Node. I wasn't creative. I guess I can call it "Jason's Site N..." I can't think of an N.

And the javascript:

var date = require("dates"), common = require("../common");

this.entries = [];

this.load = function(site, query, finishedCallback){
var self = this;
var now = new Date(Date.now());
var yesterday = new Date(now.getFullYear(), now.getMonth(), now.getDate());
var yesterdayFormat = date.formatDate("YYYY-MM-dd", yesterday);
common.populateCities(site.db, function(states){
for (var i = 0; i < states.length; i++){
states[i].cities.forEach(function(city){
var entry = {
loc: "http://" + site.hostUrl + "/metro/" + city.state.toLowerCase() + "/" + city.key,
lastmod: yesterdayFormat,
changefreq: "daily",
priority: "1"
}
self.entries.push(entry);
});
}
finishedCallback({contentType: "text/xml"});
});
}


My finishedCallback function can take more parameters, say for handling a JSON request, I could add {contentType: "text/plain", content: JSON.stringify(obj)}.

That's about all there is to it! It's pretty easy to work with so far :) My site will launch soon!

The Non-Blocking Nature of Node.js

This can lead to some pretty sweet code. For one thing, always add a callback as a parameter to functions you create, to keep with the non-blocking nature. The next thing you need to know is that you will back yourself into a corner!

Take the following code:
collection.find(search, {sort: sort}, function(err, cursor){
cursor.toArray(function(err, messages){
for (var i = 0; i < messages.length; i++){
db.dereference(messages[i].from, function(err, result){
messages[i].from_deref = result;
});
}
callback(messages);
});
});


Backstory: I'm using MongoDB as the backend, I have a message collection, a user collection, and messages have a "from" property that is a DBRef to a user.

You would run this code and find that if you had any number of messages greater than zero, you will probably get a null "from_deref" object, which means the callback at the end was called before it was finished processing. That is if you're lucky enough to not get an error stating that the code "can't set the property from_deref of undefined", which means, usually, that "i" is null or greater than the length of the array by the time the callback for db.dereference calls. If it's not obvious, I'm dereferencing the user's DBRef and storing it in the message's from_deref property.

This is because of the non-blocking nature of Node.js. It's interesting because it makes me think in new ways. Anything that makes you think differently is good in my opinion. So how do we accomplish this and not break anything? Consider the following code as a solution:

collection.find(search, {sort: sort}, function(err, cursor){
cursor.toArray(function(err, messages){
var process = messages.length - 1;
for (var i = 0; i < messages.length; i++){
(function(messages, index){
db.dereference(messages[index].from, function(err, result){
messages[index].from_deref = result;

if (index == process)
callback(messages);
});
})(messages, i);
}

if (messages.length == 0) callback(messages);
});
});


Javascript is awesome. This is basically an anonymous function that I define and call in the same block. The definition is everything inside (function(x,y){}) and the call is in the parentheses following: (messages, i); So this calls the inner block with the value of i that I'm hoping it will (or rather than hoping, I'm confident it will!). And when all dereferences are done, I know that the process variable will be equal to the index (process variable is length - 1 which is the max value the index can have).

Of course, this doesn't take advantage of the node-mongodb-native's library of the nextObject function on the cursor object. That would totally solve this without javascript magic:

cursor.nextObject(function(err, message){
db.dereference(message.from, function(err, result){
message.from_deref = result;
});
});


However, I like the Array...

So there you have it.