Thursday, October 7, 2010

10+ ways to screw up your database design

Database developers are people too — you make mistakes just like everyone else. The problem is, users are the ones who pay for those mistakes when they have to work with slow, clunky applications. If you’re part of your company’s IT team, a mistake could cost you your job. If you’re a freelance developer, a bad database usually means a lost client — and even more lost business, if word gets around. Here are some common design pitfalls to watch out for.

1: The data’s unimportant; it’s the architecture that matters

It doesn’t matter if your code sings, if you don’t know the data. You want the two to work in harmony and that means spending some time with the people who use and manipulate all that data. This is probably the most important rule: Before you do anything, you absolutely must get intimate with the data. Without that firsthand knowledge, you might make errors in judgment that have far-reaching consequences — like dragging the whole application to a complete halt.

A nonchalant attitude that the data isn’t important isn’t a sign of laziness. It’s a mistaken perception that anything that doesn’t work quite right early on can be fixed later. I just don’t agree. Doing it right from the bottom up will produce a foundation that can grow and accommodate change quickly. Without that foundation, any database is just a few Band-Aids away from disaster.

2: I can do anything with a little code

Some developers are so skilled that they can make just about anything happen with a bit of code. But you can take a good thing too far. One of my biggest complaints about the developers’ psyche is that they want to solve everything with code, even when a system feature exists to handle the need. They claim it’s just easier — easier for them, maybe, but not necessarily easier for those maintaining the database. My recommendation is to use the built-in features, when available, unless you don’t know the RDBMS well enough. If that’s the case, see #3.

3: I can use whatever RDBMS you have

This brings me to the next point: Developers who think the system is unimportant because their coding ability is the only magic they need. Wrong! Unless your hands are tied, choose the best system for the job. You’ll save your client time and money and build a reputation as an honest and comprehensive developer. You might not have a choice, of course. If you find yourself stuck with a system you’re not familiar with or that’s just not right for the job, you might want to excuse yourself from the project. You’re going to take the fall for that decision eventually, even if you didn’t make it.

4: That doesn’t need an index

Very little affects performance like failing to apply an index or applying an index incorrectly. It isn’t rocket science, and there are guidelines that help. But many developers still avoid the task altogether. Without proper indexing, your database will eventually slow down and irritate users. Perhaps the only thing that causes as much trouble as no indexing is too much indexing.

5: This database doesn’t require referential integrity

Enforcing referential integrity protects the validity of your data by eliminating orphans (foreign keys that have no related primary key entity). For instance, in a sales database, you might have an ordered item that doesn’t point to a customer — not a good idea. If your RDBMS supports referential integrity, I recommend that you use it.

6: Natural keys are best

Relational database theory relies on keys, primary and foreign. Natural keys are based on data, which of course has meaning within the context of the database’s purpose. Natural keys are obsolete now that we have systems that can generate sequential values, known as surrogates. They have little purpose beyond identifying entities. (They are usually an auto-incrementing data type).

The superiority of natural versus surrogate keys is a hotly debated topic. Just bring it up in your favorite development list or forum, sit back, and watch the show. Here’s the nitty-gritty though:

* Natural keys can be unwieldy and awkward to maintain. It might take several columns to create a unique key for each record. It’s doable, but do you really want to accommodate that kind of structure if you don’t have to?
* Primary keys are supposed to be stable; nothing about data is stable. It changes all the time. In contrast, there’s no reason to ever change a surrogate key. You might delete one, but if you have to change a surrogate key, something’s wrong with your design.

The biggest argument for natural keys is one of association. Proponents insist that you need to be able to associate the key to the actual record. Why? Keys are used by the RDBMS, not users. The other most commonly heard argument is that surrogate keys allow duplicate records. My response is to apply indexes appropriately to avoid duplicate records.

I recommend surrogate keys — always, which is an invitation to hate mail, but it’s my recommendation just the same. I can think of no circumstance where a natural key would be preferable to a surrogate.

7: Normalization is a waste of time

Just writing that title hurts. Unfortunately, I do run into developers who don’t take normalization seriously enough. Normalization is the process of removing any repeating groups and redundant data to related tables. This process supports the RDBMS by theory and design. Without normalization, a RDBMS is doomed. Despite its importance, many developers make a cursory pass through the data and normalize very little, and that’s a mistake you should avoid. Take the time to break down your data, normalizing at least to 2nd or 3rd Normal form.

8: You can’t normalize enough

The previous point may seem to imply that normalization is the panacea of database design. But like code, too much of a good thing can slow things down. The more tables and joins involved in pulling data together into meaningful information, the slower the database will perform. Don’t overdo it — be thorough without being obsessed.

If your normalization scheme requires several tables to generate a common view, you’ve gone too far (probably). In short, if performance slows and there’s nothing wrong with the connection, the query, and so on, excessive normalization might be the culprit.

9: It’ll perform just as well with real data

Failing to test a database for scalability is a huge mistake. During the development stage, it’s acceptable to work with a scant amount of data. On the other hand, a few rows of test data just can’t provide a realistic view of how the database will perform in a production environment. Before going live, be sure to test your database with real data, and lots of it. Doing so will expose bottlenecks and vulnerabilities.

You can blame the database engine for choking on real data — nice work if you can get it (and the client believes you).

10: Only the most elegant code is good enough for my clients

This attitude is another example of how too much of a good thing can be bad. We all want to write the best code possible, but sometimes, good enough is, well, good enough. Time spent optimizing routines that already perform well and accurately can be money down the drain for your client. If the database runs great with a bit of ugly code, so what? Is the trade-off worth the extra time and money you’ll spend to optimize the code to its fullest? I’m betting your client would answer in the negative. I’m not saying write clunky code. Nor am I suggesting that you write code that performs poorly because doing so makes your job easier. I’m saying, don’t put your client’s money into optimizing something that works fine as is. Put that time into good design and a solid foundation — that’s what will support the best performance.

11: You can back it up later

If the data is important enough to store, it’s important enough to protect. Hardware breaks. Mistakes happen. A backup plan should be part of your development process, not an afterthought: I meant to do that. How often should you back up the database, where will you store those backups, and so on, are questions to answer up front, not after your client losses important data.

12: You promised that wouldn’t change

The client promised that a specific business rule would never change and you believed it. Never believe them! Don’t take the easy way out on this one; apply the best design and logic so that change is easy. The truth is, once users become accustom to the database, they’ll want more — and that means change. It’s just about the only part of the whole development process you can depend on.

13: Yes, I can give you the moon

Some developers are so ambitious. Wanting to give users everything they want in the first version is a nice sentiment, but it’s also impractical. Unless the project is small with a specific focus, producing a foundation version that can go into production quickly is preferable. Users won’t get everything they asked for, but they’ll have a production database much sooner. You can add features with subsequent versions. The client gets work quickly and you get job security.

Friday, October 1, 2010

You're a bad Programmer. Embrace it!!!

How many developers think they're good programmers? We're not. We're all fairly bad at what we do. We can't remember all the methods our code needs to call, so we use autocompleting IDEs to remind us. And the languages we've spent years, in some cases decades, learning? We can't even type the syntax properly, so we have compilers that check and correct it for us.

Don't even get me started on the correctness of the logic in the code... I'm looking at a chart that shows failure rates for software projects. "Small" projects (about a year for 8 people) have a 46% success rate. That's almost half. Any longer project just plummets down to a 2% rate for projects costing 10 million or more. In other words, our projects nearly always fail.

We constantly write code that breaks existing features. We add features in ways that won't scale. We create features nobody wants.

In short, as an industry, we all stink.

You can fight this. You can summon your "inner lawyer", the voice in your head that always defends you and tells you how great you are. You can take the typical developer's attitude that "it's always someone else's fault".

Or you can embrace it. Admit it... humans aren't good at programming. It's a task that requires extraordinary complexity and detail from brains that don't work that way. It's not how we tick. We might enjoy programming, we might get "in the zone" and loose track of time, we might spend our nights and weekends writing code for fun... but after years of practice, we still need these crutches to prop us up.

Once you've embraced this bitter reality, you can start to move forward. Once you've admitted to yourself that you're a bad programmer, you can stop all the silly posturing and pretending that you're great, and you can look around and find the best possible tools to help you look smarter than you are.

If we know that anytime we touch code, we'll probably break something, then we can look around to see what catches functional breaks in code as quickly as possible? A good continuous integration system combined with a good automated test suite.

Creating a good test suite is a lot like working out though... we all know we should, but we rarely take the time to create a solid, workable test automation strategy. Something like Defect Driven Testing is a great place to start.

If you recognize that we all write the same fundamentally bad code that everyone else does, then you can look around at static code analysis tools. If you're writing Java, I highly recommend FindBugs.

These tools flag common mistakes... really common mistakes. Things you can't believe you actually do. But somehow we do nearly every day. I've met many development teams who didn't think they needed static code anaylsis. But I've never run a tool like FindBugs without finding a real (read: not theoretical) problem in production code. They hadn't learned to embrace their badness yet.

If you don't know what code is being exercised by your manual or automated test cases, look at a code coverage tool. Cobertura is my favorite, but I'm a bit biased as I had a hand in getting it started. (Why isn't it listed on that Wikipedia page??)

If your team is constantly getting interrupted and pulled in different directions, try tacking smaller units of work. Time boxed iterations, especially the one week iteration, force you to tackle and complete smaller units of work. It's easier to push back on an "emergency" change if you can ask them to wait for only two or three days. If you're on a two month task, they won't wait.

Admit that you're a sorry sysadmin as well and script your deployments. Tools like rake and capistrano are extremely powerful tools that can completely automate your development, testing, and even production deployments. Once it's scripted, it becomes incredibly easy to duplicate. You'll be amazed at how much time this one will save you.

Of course, if you've automated your code deployments, it'd be silly to keep typing in SQL commands by hand. Ruby on Rail's database migrations may have been the early leaders in this area, but there are plenty of other tools available today. One prominent tool is Liquibase. Here are two good articles with plenty of information. and

How about customer interaction? Yes, we're bad at that too. (Ever used the phrase "stupid users"?) Try moving your specifications into a more precise, and even executable format. DSLs (Domain Specific Languages) provide a wide variety of ways to engage a customer with a precise, but understandable, langage.

There are a great many tools we can use to make ourselves look smarter than we are. Once we stop fighting against the need for the tools, and embrace them, we start looking like we're pretty good developers. I won't tell anyone you stink if you don't tell them about me. ;)