Archive for the ‘Internet’ category

Designing web application systems that ease constraints and increase request throughput

March 8th, 2012

In 2012 I never really expected to be having conversations about why storing static file data in a database within a web app is generally a bad idea, but i have. So, as i provided my reasoning, it became clear that i could blog about this too. Comments most welcome.

The goals of software development within a multi-tiered system

To me, the aims of software development within a multi-tiered system are to constantly simplify the system, plus reduce the overhead on the bottleneck parts within the system. Web apps in particular have so many interdependent parts, that it’s essential that a developer doesn’t add to the complexity. On top of this, as web apps generally have 2 bottlenecks (the application server and the database server), which both govern the overall throughput of requests through the entire system (impacting performance and cost – because we have to scale as these bottlenecks become blocked), it’s our aim to constantly take load off of these parts, to free them up to focus on the stuff that only they can do. When we do this, we gain immediate performance benefits, plus we bake scalability and stability into the design.

We’ve all seen this with the use of page and fragment caching. We use these techniques because we understand that serving content from the web-server and file system is super fast, plus easy to make faster.

Why storing static files in a database is against the goal

It’s these inherent rules and our experience that scare us when we hear about moving activity from non-bottleneck parts of the system (the filesystem and web server) to the bottleneck parts. It’s almost instinctive that we don’t even need to dig any deeper, because it just makes sense – if you start overloading the bottleneck parts of the system, you are working against the overall goal of the architecture – increase throughput without increasing costs. Also, when you take this step away from what’s normally considered the aim of the design, you end up in new territory. How will you resolve performance issues when they arise (and they will)? You can no longer use the techniques designed to take load off the bottlenecks, because you’re designed the system to overload the bottlenecks. It’s anti-performance by design.

So, when considering serving files to end user’s browsers, certain technologies have become the norm because they offer a risk free, high performance solution, and are known now to be the least constrained parts within the system (on average). Serving files from a file system via Apache or some other web server will always be the fastest and simplest solution, and the attention within the industry to make this faster is ongoing. The industry really cares about making this stuff fast. Also, serving content from the file system puts 0 overhead on the application, meaning that doing so frees up the system and reduces the need to add tin as more users upload and download files from your app. The beauty here is that it no longer really matters how many files have been uploaded to your web app, nor does it even matter much how many people want to concurrently download those files. Why? Because the file system and web server are not system bottlenecks; they really can handle the traffic and both are easy to scale if you need to. Now imagine moving those same files to the database, with each download streamed via the application server (because every byte would need to be streamed through the application). What do you think will happen to the bottlenecks? What will happen to overall performance as more users upload and download files? All of a sudden, you have landed in a really challenging place, where the only really obvious solution is to add tin (be it virtual or physical). Your database would need to be scaled (increasing costs), as would your application server (further increasing costs). Plus, unless you have bottomless pockets, you’ve effectively baked in a ceiling into your app. At some point, you are going to have to start bouncing requests, because the cost of serving them is going to be too high.

So, are there any situations when storing files in database on a web app make sense? The only one i can find in google is when you’re uploading images and you need absolute integrity of the metadata. Storing everything in the database is the only way to achieve this, because there can be some discrepancy between file system metadata and the data of the linked file in the database (such as the created time stamp).

Things to consider if you choose to store static files in postgres

Other than this case, it just doesn’t make sense to put strain on the bottleneck parts of the system.

On top of this, when storing file data in a postgres database, you have a few other things to consider:

  • When using bytea columns, you need to be careful about storing “large” files.
    A “large” file means one that exceeds the amount of memory you have in the database. This is caused by the fact that you have to assemble the whole SQL INSERT command with all the file data, which may consume a lot of memory (and the need to escape the BYTEA data makes this even worse). For example in PHP (because that’s the example i have to hand), there is a memory_limit option, usually set to 8MB (which is quite reasonable size), and you’ll exceed this limit with a 2MB file (you have to escape the data before using it in INSERT). You may increase the memory_limit value a little, but on shared server the admin won’t be very happy about that (and I completely understand his reasons).
  • When using blobs, you need to be careful about blob integrity
    You may delete a BLOB that is still referenced from a table. You may create AFTER UPDATE and AFTER DELETE triggers to remove orphaned BLOBs, or use “lo” contrib package. Anyway none of these solutions enforces the referential integrity.

Summary

All in all, I feel that a system designer needs to focus solidly on request throughput within a system. I don’t believe in early system optimisations, because such optimisations can’t be considered optimisations at all until you know what your bottlenecks are. Instead, I feel that following a few simple rules are the way to go:

  • Keep the design simple – if it *feels* wrong and less-than-simple, it probably is.
  • Use the “best tool for the job”. To enable access to uploaded files, file systems and webservers are simply great at serving static content and reduce overhead on your application and database. It just “makes sense”.

Using Cucumber, Webrat and Selenium to test ajax form field validations

September 4th, 2009

I have an app that fires an ajax request on a form field to validate its contents when I take focus off the field.

I am also using Cucumber, Webrat and Selenium for my integration tests.

I needed my integration tests to test the Ajax responses and the tests weren’t receiving a response from the Ajax requests.

The Problem

I found that by simply completing the web form, the ajax request was not being fired and my test was therefore failing when i checked for the existance of the Ajax response. It soon became clear that selenium doesn’t really interact with the form in the sense of selecting fields and entering values; It simply enters values. As such, the Ajax request was not firing and my test was failing.

The Solution

The solution is Selenium’s fireEvent method, which you can pass a form field id and the blur method:

selenium_session.fireEvent(”field”, “blur”);

In Webrat, this is ever simpler:

fire_event(”field”,”blur”)

On using this in a Cucumber step, the ajax is fired as the blur command tells the browser to take focus off the field.

Lovely!

Awesome tools for rapid UX prototypes – Letting you focus on the solution!

June 18th, 2009

Comic Life is great for creating story flows in a rough and ready way, with a little style.

Balsamiq is an excellent tool for rapidly creating purposefully low-fi wireframe mockups

Napkee enables you to import Balsamiq mockups and turn them into HTML prototypes! Lovely!

Axure is excellent for rapidly creating interactive prototypes.

Liferay Portal is a pretty awesome portlet container that, with a bit of UX (HTML, CSS and JSP) hacking, enables you to rapidly produce fully functional portals. It comes with a vast array of portlets out of the box, saving you a whole load of time.

JQueryUI is a lovely toolkit for quickly developing interactive prototypes. I’m not completely convinced by it as a production tool (heavy JS? but i could be wrong), but excellent for prototyping

iPhone 3.0 Beta 4 Bluetooth tethering results: Less than 1/3 the speed of USB

April 29th, 2009

So, it’s rather slower – 96kbps.

To test it, i tried the ThinkBroadband speed tester, (that i successfully used to test USB tethering), but it failed to complete the test.

I then tried the Broadband speed tester site and got the above results, as well as a 74kbps upload speed.

When (if!) i find time, i’ll go digging and find out if there’s an obvious reason for this – it’s clearly shouldn’t be bottlenecked by the performance of BlueTooth, as it should run up to 3 Mbit/s (http://en.wikipedia.org/wiki/Bluetooth)

iPhone 3.0 Beta 4 + iTunes 8.2 Pre-release = USB Tethering Active Again :)

April 29th, 2009

Yep, USB tethering died in Betas 2 and 3, but with iTunes 8.2 Pre-release and Beta 4, all is working again and it feels wonderfully fast. Well, it’s only running at 336.58 Kbps, but it feels great! As a note, i think the fix was actually iTunes rather than iPhone Beta 4.

Lovely :)

Now to test the Bluetooth speeds