In 2012 I never really expected to be having conversations about why storing static file data in a database within a web app is generally a bad idea, but i have. So, as i provided my reasoning, it became clear that i could blog about this too. Comments most welcome.
The goals of software development within a multi-tiered system
To me, the aims of software development within a multi-tiered system are to constantly simplify the system, plus reduce the overhead on the bottleneck parts within the system. Web apps in particular have so many interdependent parts, that it’s essential that a developer doesn’t add to the complexity. On top of this, as web apps generally have 2 bottlenecks (the application server and the database server), which both govern the overall throughput of requests through the entire system (impacting performance and cost – because we have to scale as these bottlenecks become blocked), it’s our aim to constantly take load off of these parts, to free them up to focus on the stuff that only they can do. When we do this, we gain immediate performance benefits, plus we bake scalability and stability into the design.
We’ve all seen this with the use of page and fragment caching. We use these techniques because we understand that serving content from the web-server and file system is super fast, plus easy to make faster.
Why storing static files in a database is against the goal
It’s these inherent rules and our experience that scare us when we hear about moving activity from non-bottleneck parts of the system (the filesystem and web server) to the bottleneck parts. It’s almost instinctive that we don’t even need to dig any deeper, because it just makes sense – if you start overloading the bottleneck parts of the system, you are working against the overall goal of the architecture – increase throughput without increasing costs. Also, when you take this step away from what’s normally considered the aim of the design, you end up in new territory. How will you resolve performance issues when they arise (and they will)? You can no longer use the techniques designed to take load off the bottlenecks, because you’re designed the system to overload the bottlenecks. It’s anti-performance by design.
So, when considering serving files to end user’s browsers, certain technologies have become the norm because they offer a risk free, high performance solution, and are known now to be the least constrained parts within the system (on average). Serving files from a file system via Apache or some other web server will always be the fastest and simplest solution, and the attention within the industry to make this faster is ongoing. The industry really cares about making this stuff fast. Also, serving content from the file system puts 0 overhead on the application, meaning that doing so frees up the system and reduces the need to add tin as more users upload and download files from your app. The beauty here is that it no longer really matters how many files have been uploaded to your web app, nor does it even matter much how many people want to concurrently download those files. Why? Because the file system and web server are not system bottlenecks; they really can handle the traffic and both are easy to scale if you need to. Now imagine moving those same files to the database, with each download streamed via the application server (because every byte would need to be streamed through the application). What do you think will happen to the bottlenecks? What will happen to overall performance as more users upload and download files? All of a sudden, you have landed in a really challenging place, where the only really obvious solution is to add tin (be it virtual or physical). Your database would need to be scaled (increasing costs), as would your application server (further increasing costs). Plus, unless you have bottomless pockets, you’ve effectively baked in a ceiling into your app. At some point, you are going to have to start bouncing requests, because the cost of serving them is going to be too high.
So, are there any situations when storing files in database on a web app make sense? The only one i can find in google is when you’re uploading images and you need absolute integrity of the metadata. Storing everything in the database is the only way to achieve this, because there can be some discrepancy between file system metadata and the data of the linked file in the database (such as the created time stamp).
Things to consider if you choose to store static files in postgres
Other than this case, it just doesn’t make sense to put strain on the bottleneck parts of the system.
On top of this, when storing file data in a postgres database, you have a few other things to consider:
- When using bytea columns, you need to be careful about storing “large” files.
A “large” file means one that exceeds the amount of memory you have in the database. This is caused by the fact that you have to assemble the whole SQL INSERT command with all the file data, which may consume a lot of memory (and the need to escape the BYTEA data makes this even worse). For example in PHP (because that’s the example i have to hand), there is a memory_limit option, usually set to 8MB (which is quite reasonable size), and you’ll exceed this limit with a 2MB file (you have to escape the data before using it in INSERT). You may increase the memory_limit value a little, but on shared server the admin won’t be very happy about that (and I completely understand his reasons).
- When using blobs, you need to be careful about blob integrity
You may delete a BLOB that is still referenced from a table. You may create AFTER UPDATE and AFTER DELETE triggers to remove orphaned BLOBs, or use “lo” contrib package. Anyway none of these solutions enforces the referential integrity.
All in all, I feel that a system designer needs to focus solidly on request throughput within a system. I don’t believe in early system optimisations, because such optimisations can’t be considered optimisations at all until you know what your bottlenecks are. Instead, I feel that following a few simple rules are the way to go:
- Keep the design simple – if it *feels* wrong and less-than-simple, it probably is.
- Use the “best tool for the job”. To enable access to uploaded files, file systems and webservers are simply great at serving static content and reduce overhead on your application and database. It just “makes sense”.