Friday, May 16, 2008

The Myth of the Activist Judge and the Strict Constructionist

Pretty much every time a major court makes a controversial ruling, whatever segment of the population that doesn't like the outcome of the ruling inevitably decries it as the result of "activist" judges. The activist judge is, generally, then contrasted with something like the "strict constructionist" judge that, in their opinion, interprets the law more strictly and doesn't read their own opinions into it. As epxected, the old hue and cry went up again over yesterday's ruling on same-sex marriage by the California State Supreme Court.

The "activist judge" label is supposed to imply that a judge has injected their personal opinions of the matter into the debate and made their decision based on personal opinions rather than on rigorous legal interpretations; in truth, though, I can't remember the last time I heard a legitimate criticism of a judicial ruling based on the legal facts. Instead, what I often hear are slogans like "judges shouldn't rewrite the law," as if the judge's job was simply to sign off on whatever the legislature passed through, regardless of its constitutionality. In truth, the "activist" label is applied to any judge that ever overturns any law, making it a totally meaningless accusation. A judge's job is to interpret the law and decide how it applies or if it's constitutional; saying that judges should never overturn laws is tantamount to ripping up the constitution, since at that point it's basically useless. But as far as I can tell, that's exactly what people who complain about activist judges seem to want: a quiet judiciary that allows the legislature (or executive) to do as they please with no sort of checks on their power.

Similarly, the "strict constructionist" label is generally applied admiringly to judges that supposedly interpret laws "strictly." The problem is, it's impossible to define what that means. Certainly there are differences in how judges go about interpreting laws, and it's reasonable to prefer one type of interpretation over another, but what often seems to be attacked is the very act of interpretation itself. That, in turn, seems to stem from some misguided view that language carries its own inherent meaning that requires no further interpretation. The truth is that all communication requires interpretation on the part of the receiver, and that goes for reading legal documents as well as for normal conversations. A judge can't help but apply some sort of interpretation to the words in the legal text, so arguing against doing that is simply idiotic.

Sunday, April 13, 2008

Language Comparison Series

I needed something to do at the last SuperHappyDevHouse event, so I decided to start learning Python. I need something to use for some projects outside of work, and I've wanted to check it out for a while since so many people seem so committed to it, so it seems like as good a time as any. I've read through the pickaxe book and done a few small projects in Ruby at work, but it was a while back and I've not forgotten it all, so I figured it was worth trying something different. So I've been working through Dive Into Python, installed andLinux so I have a better dev environment locally (though we'll see how that goes; I may try VMWare, create a real Linux partition, or just switch back to using the Windows toolchain, as painful as it is), and now I'm working through the Django tutorial.

Of course, part of my motivation is also that we have our own programming language, currently called GScript, and it's good to learn about other languages so you can see how yours compares and so you can borrow the best ideas that are out there. The same applies to frameworks too, which is one reason I'm going to try out Django; we have our own ORM layer and web layer, so it's good to see what cool directions other people have gone with those.

Honestly, it's not worth it to me to check out any Java frameworks in those regards: everyone uses Hibernate, so I really should learn more about it, but Java just doesn't lend itself to great frameworks due to its lack of metaprogramming. No matter how clever you are, it's really a pretty fatal flaw in my opinion; the gscript interface to our ORM layer is all dynamic even though GScript is strongly-typed, thanks to the magic of our open type system, but on the Java side we have to do massive amounts of code generation that should be totally unnecessary. Thankfully pretty much our entire interface to the web layer is through gscript, so we don't have to do any Java code generation there. There are a lot of reasons why I think our web framework is better than anything else out there for the sort of work that we do, but the open typesystem in gscript really makes it untouchable. I think that it would take people some time to understand what we've done with our latest "smoke tests," but I think that if people really understand what we were doing it would blow their minds a bit, since it really is a total game-changer as far as our ability to reliably test our entire application from the UI on down.

But I digress . . . my point here is that I've started putting up a series of articles on the Guidewire Development Blog comparing GScript with Java, Ruby, and Python. I chose those because they're the languages I know best (or at least are freshest in my head) and because those are, for a lot of people, the main choices for a modern web development platform. I didn't include PHP because . . . well, because PHP code is always eye-bleedingly ugly. I'd certainly consider it for web development due to its sheer practicality, scapability, and rapid development model, but as a language designer it should really serve as a cautionary tale of how not to do things.

So if you want to see how GScript stacks up, head on over the dev blog and check it out. The first two are up, and I'll probably be adding one or two comparisons a week.

Sunday, April 06, 2008

Why Java Needs Closures

(Note: This has been cross-posted to the Guidewire Development Blog)

There was a post by Bruce Eckel on Artima this week that asked the question of whether or not closures would make Java less verbose, so it seemed like an apt time to put this up.

People tout closures as cure-alls for all sorts of things: because they like functional programming or because it’s better suited to certain tasks, or maybe because they want to use it for control structures (so they don’t forget to close a file after reading it, say). Honestly, I’m just sick of writing this same block of code over and over again:

Map> claimsByUser = new HashMap>();

for (Claim claim : someListOfClaims) {
List claims = someListOfClaims.get(claim.getUser());
if (claims == null) {
claims = new ArrayList();
claimsByUser.put(claim.getUser(), claims);
}
claimsByUser.put(claim.getUser())
}

I’d much rather write something like:

var claimsByUser = someListOfClaims.partition(\claim-> claim.User)

Sometimes I feel like all my Java code devolves into a morass of for loops and angle brackets (I like generics, mostly, but without type inference they’re exceedingly painful). Even worse, it’s the same few for loops over and over again. You can write helper classes and methods to do things, and you can simulate closures using anonymous inner classes, but even then your code looks like:

Map> claimsByUser = ListUtils.partition(someListOfClaims, new Partitioner() {
public User partition(Claim c) {
return c.getUser();
}
});

Hardly the most elegant code on the planet. If you had to read or modify the code, hopefully it’s obvious which one is easier to understand and easier to change.

There have been multiple closure proposals floated for inclusion in Java 7, and the one that looks like it’ll make it in is, like the generics implementation, a bit too complicated in the wrong ways. The arguments over exception and return handling within closures, along with debates about the scoping rules, have not (in my opinion) ended well, mainly (from what I can tell) due to a desire to be able to use closures to allow programmers to create new language-level control constructs, like the for loop added in Java 1.5. From my viewpoint, they’re silly questions to ask in the first place: a closure is a function, so a return statement within a closure just returns from that function. Returning from the containing scope just seems like madness and an invitation to serious confusion. Yes, it lets you define new control structures, and yes, new control structures can also simplify code, but I really think they should be two completely different features. Keep closures simple and understandable, with simple, consistent rules about scoping, return values, and exception handling. If people still demand better control structures, then make that a separate effort and implement those cleanly and simply.

While nothing’s been officially decided yet, I’m guessing that the two most likely outcomes are either 1) a confusing, over-engineered closures implementation or 2) no closures at all. And that’s just unfortunate; even some simple form of closures allows for a major simplification of routine data structure manipulations that both reduces code and makes code more clear. And of course, closures are far more concise and usable if you add type inference in as well, but that’s another post. Just yet another reason why the world needs a language like GScript.

The Challenge of Configurability

(Note: This has been cross-posted to the Guidewire Development Blog).

Every kind of software development has its own unique set of challenges: embedded software, desktop software, games, operating systems, and web applications all have their own difficulties and problems. Enterprise software is no different, and one of the main challenges there is making that software configurable.

Of course, not all enterprise software vendors try to do this; some of them go the route of essentially requiring you to fit your business to how their software works, while others basically bring in a small army of consultants to essentially do a complete custom build based loosely on some pre-existing components that they have. One of the things that sets Guidewire apart from our competitors in the insurance software market, and (I’d argue) from most enterprise vendors in general, is the level of configurability that we provide with our software. And after having worked on configurable, enterprise systems for close to six years now, I can be fully confident in telling you that the reason no one else does it to the degree that we do is that it’s really, really hard.

Why is it so hard? For one, it turns everything into a meta-problem. Think about the problem of assigning claims to users. If you were just building out a system for in-house use, you’d probably just hardcode the logic in Java (or whatever language you’re using): assign glass claims to the group with this name, injury claims to that group, etc. In our system, that’s implemented via a ruleset, where we call out to business rules that are set up to do that assignment. That means we have to think about where those callouts should occur, what the API should be for them, and what sorts of methods the person writing the business rules will need available: maybe they need to do round-robin assignments, maybe they need to do location-based assignments, or maybe they want to balance assignments based on adjuster workloads. Instead of coming across those problems and solving them as we need to, we have to think about what the API should be up front and try to provide the right amount of flexibility for our customers.

Programming that way is just harder than normal code because you’re working at another level of abstraction; instead of writing an expression like 1 + 1 in the language, you’re writing the language itself (numeric literals, additive expressions, etc.). The code for implementing a language is rarely straightforward when compared to the code written in that language.

There are other challenges too. We need good tools to ease the configuration work and help discover problems. Testing an entire framework is harder than testing code in that framework, both because of the combinatorial explosion of configuration options and because the surface area of our API is much larger than what our default application configuration actually uses. Upgrade is another challenge we always have to think about, as we don’t always know what sorts of configurations customers have performed and have to be careful to keep those configurations working as much as we can and/or to provide upgrade tools when we change the frameworks. Getting feedback from customers is more difficult than it otherwise would be, since customers have to configure the application a non-trivial amount before they can realistically use the features we want feedback on, meaning they’re less likely to give us that feedback and that it takes longer to get it. And lastly, there’s constant pressure to release early versions of the software so that customer projects can get underway with their configuration and integration work, but that means customers will start configuring on partially-complete systems, which can force us to be locked into APIs (or at least provide an upgrade path) for things we haven’t fully completed and thus need the freedom to change.

If it’s so hard, you might be thinking, then why do it? The most obvious reason is that it better serves our customers’ needs, and as a software vendor that’s obviously our top priority. Relative to minimally configurable systems, highly configurable software results in a product that (once configured) more closely matches how our customers do business. Relative to custom-built systems, configurable systems can be implemented faster and the end product can be owned and maintained by the customers rather than requiring vendor patches or consultants for any future changes. Even relative to in-house built systems, configurable software like ours will allow future changes to be made much faster and more safely.

The second reason is less obvious, and that’s that in many ways forcing us to think hard about configurability makes us write better, more flexible software. We’ve developed a huge array of tools (our scripting language, our metadata layer, our web framework, etc.) that would help us even if the end result was something that would never be configured by a customer. For example our web framework, while built with customer configuration in mind, lets us build new pages, modify existing pages, and find and fix bugs far faster than we ever could with our old struts/jsp-based web framework. Just like doing test-driven development can often lead to better-decomposed code, making things configurable and flexible often results in better tools and better frameworks that benefit application development generally.

With great challenges come great rewards. What we’re doing might be incredibly difficult, but that’s why our systems are the best ones on the market.

Saturday, March 29, 2008

Avoiding Development Mercantilism

(Note: This has been cross-posted to the Guidewire Development Blog.)

Prior to the rise of the capitalist view of economics, the prevailing economic world view was known as mercantilism. Mercantlism was essentially based on the theory that trade was a zero-sum game; there was a limited amount of wealth (in the form of gold or silver or other bullion) in the world, and one nation having more of it inevitably meant that other nations had less of it. One of the revolutions of the capitalist view was in recognizing that wealth is not, in fact, a zero-sum game, and that the amount of wealth in the world can increase as a result of increases in productivity due to things like economies of scale and technological improvements. The way to become richer as a nation is not, in other words, merely to try to make sure that more wealth enters the nation than leaves it via trade imbalances; instead, you can and should look for ways to increase the total amount of wealth being produced.

That's perhaps an overly-simplistic summary of things, but the point here is not to debate economics. Rather, it's to point out a common mistake that many development organizations make: they view development output as a fixed commodity just as the mercantilists viewed the amount of wealth in the world as fixed. But as every developer knows, development output is in no way fixed, and is highly dependent on factors such as the toolset being used, the fit of the developer's skillset and experience and temperment to a particular task, the developer's enthusiasm for the task, the current state of the code base (size, cleanliness, documentation), and the amount of organizational drag (in the form of meetings, reports, e-mails, etc.).

So what does it mean to be a development capitalist? The easier part, in my experience, is in matching people to the right tasks and in trying to reduce the amount of organizational overhead. Those, at least, are easy decisions to make, as they tend not to come with too much potential risk or up-front cost. The hard decisions, then, are around technical considerations: how much time do you spend building infrastructure and tools, and how much time do you spend trying to keep the code base clean and small? Both those efforts, and especially the effort to build infrastructure or tools, come with a potentially huge up-front cost and, at best, a speculative payout down the road. It's tempting, then, to simply see them as costs to be minimized. Doing so, however, can miss the huge potential productivity gains you can get down the line that will more than make up for any up-front investment.

As an example, consider what I spent my half of my day doing on Friday: performance tuning a couple of critical tests that I (and presumably everyone else on the team) run every time before checking in. The tests verify that all of our UI configuration files and gscript classes are error-free, and as such they catch a ton of errors and it's critical to keep them clean. I probably run the tests an average of about 10 times a day, and before I started I had to run two tests that, combined, took about 200 seconds to execute. The tests had some overlap, however, so I could tell that there might be some low-hanging fruit for further optimization, and about four hours of work later I had managed to combine the tests into a single test that took 140 seconds to execute. Four hours of work to save 60 seconds of test execute time might seem like a bad return on investment, but it won't take long to pay off. At my rate of test execution, it'll take about 24 days of development time for me to make back that four hours: but there are also seven other people on my team that will probably save about 5 minutes a day each, and another 40 or so across development that will eventually benefit from the optimizations.

So was the time worth it? It's always a hard call, and it can be hard to know when to stop; we certainly can't spend all our time building infrastructure or we'll never get our products out the door. We've got release deadlines just like everyone else, so it's always easy to say "we'll do that later, we can't afford the cost right now." But the payoff for taking a few days (or weeks or months) up-front to do things with seemingly small payoffs can, over the long-haul with months (or years) of development time ahead and dozens of programmers working with that code base, lead to huge time-savings and huge productivity increases for the team. Eventually it can make the difference between development grinding to a halt under the weight of an ever-growing code base and maintaining the ability to make steady forward progress.

That will really only happen if you have an organizational commitment to taking the long view and recognizing the long-term benefits of making a constant investment in your infrastructure, tooling, and code base quality. It also requires giving developers the freedom to make those improvements when they see an opportunity as well as the freedom to take some risks; not all such bets are going to pay off, and I could very well have spent four hours on Friday without being able to improve the performance of those tests in the least. Being willing to take those risks and let people scratch their particular technological itches every now and then will almost always pay off in the long run, and in my experience such investments usually pay off much faster (within weeks or months) than you initially think they will. And the next time you find yourself dividing a project into developer-days or man-months (which we should all know are Mythical anyway), make sure to ask yourself if you're falling into a mercantilist mindset where all costs are fixed instead of looking for ways to make the whole organization more effective.

Favoring Composition Over Inheritance

(Note: I'm now posting on the Guidewire Development Blog, so I'll be cross-posting to this blog with everything I deem interesting enough.)


It's something of a widely-stated belief in Object Oriented Programming circles that composition is often a better idea than inheritance, but my personal experience has been that, like so many other software best practices, it's more often said than done, and that in practice most developers (myself included) often revert to simple inheritance when they want to share behavior between two classes. There are reasons for that, of course, especially in a language like Java that offers no built-in support for composition: inheritance is generally easy to code, maintain, and understand and generally leads to less code than inheritance requires.

The path of least resistance, of course, is not always the best long-term strategy, and there are compelling reasons to prefer a compositional model instead of an inheritance model, especially with larger and more complex code bases. As a result, much of my time over the last several months has been devoted to untangling some of the more overgrown inheritance hierarchies that we've developed over the years and replacing them with cleaner compositional models.

Before going any further, it might help to clarify some terminology. Inheritance is the standard OO practice of subclassing something, inheriting all of its fields and methods and then overriding and/or adding to them. Composition can mean one of a few different things, but generally it refers to the idea of having behaviors defined in a separate class that your class then calls through to, commonly known as delegation (and the helper class is commonly known as a delegate). In practice, I almost always combine a delegation approach with an interface that my class implements by delegating each method to the delegate class. For example, if your ORM framework maps database rows to objects and you want the generic ability to ask if an object can be viewed by a given user, you could implement it with inheritance by creating a superclass for all your objects and adding a canView(User user) method to it (or simply adding to the existing superclass) or you could create a Viewable interface with a canView method on it, a ViewableDelegate class that provides a standard implementation of the canView method, and then each object would independently implement the Viewable interface by calling through to the ViewableDelegate object (generally held as a private instance variable on the class).

Clearly the delegation approach is, at least in Java, much more work to code and maintain than the inheritance approach. If you have 20 Viewable objects in your system and you decide to add a canCurrentUserView() method, if you're using inheritance you only have to add the method in one place, whereas with composition you'll need to modify the interface, the delegate, and then all 20 classes that make use of the delegate.

In spite of those obvious drawbacks, though, I've come around to the conclusion that I should be using composition much more than I have in the past, for the following reasons:

  • Cleaner abstractions and better encapsulation. In my opinion this one is the most important reason to avoid deep inheritance hierarchies. With inheritance, there's a temptation to use it even when there isn't really an is-a relationship between two classes simply because it allows you to easily reuse code. Unfortunately, that tends to lead to fuzzy abstractions at the top of the hierarchy: your base classes begin to acquire a lot of methods that only apply to some subtypes simply because it's a convenient place to put things, and pretty soon your base class starts looking like it should be named "Thing" because the methods on it don't support any one unifying concept. Splitting the base class out into several smaller interfaces, then implementing the interfaces on an as-needed basis on the subtypes, makes the abstractions much clearer and better encapsulated, which makes the whole system easier to understand.
  • More testable code. Along with cleaning up the abstractions and better encapsulating code comes improved testability. Classes with deep inheritance hierarchies are generally very difficult to test; it generally becomes impossible to test the subtype without also testing its supertype, and it's difficult to test the supertype in isolation from its subtypes. A delegation model can clean that up, as you can often more easily test the delegate in isolation and then have the choice of testing the classes that use the delegate either via an interaction model (to just test that they do indeed call through to the delegate) or via a state model (i.e. actually testing that they implement the interface and treating the delegate as an implementation detail). Either way it becomes more obvious what exactly to test and easier to do it.
  • Decoupling. Another important factor on large systems (like, say, a Policy Administration or Claims system) is to keep coupling to a minimum. The only realistic way to deal with the complexity of such a large application is to keep it as compartmentalized as possible so that you only have to worry about one part at a time, and a high degree of coupling means that you can't make changes to one part of the application without worrying about how they might affect other seemingly-unrelated parts, which naturally leads to a higher probability of introducing bugs with any given change. By letting you define tighter abstractions and discouraging unnecessary code sharing, compositional models tend to reduce coupling to the absolute minimum (i.e. the interface boundaries). In addition, if you find that you need to further decouple things by splitting out sub-interfaces or having different delegates that implement the interface differently it's much easier to do than it is if everything inherits from the same superclass. And as painful as it can be to modify lots of code when you introduce a new method to an interface, you're at least forced to think whether it really makes sense in all those places: with an inheritance model it's far too easy to just add something to the superclass and never think about whether or not it makes sense for each of its subclasses to have that method on them, which quickly makes it difficult to figure out which of those methods is actually safe to call from a particular subclass.

While Java makes composition fairly difficult, languages like C++ that feature multiple inheritance give you a way out of this by simply inheriting from multiple parents, though that can lead to dangerous ambiguities and even more confusion. Dynamically-typed languages make composition much easier, as the desired methods can simply be added onto the class at runtime; Ruby does this via its mixin facilities while you could do it in JavaScript simply by programmatically attaching the methods you want directly to the prototype objects (or directly onto the instances, for that matter).

Making it work well in Java is, unfortunately, more difficult. If you don't like doing the delegation by hand, Java does give you a few other options. The simplest one is brute-force code generation; for example, we already generate code for our domain objects, so we've simply added the ability to specify interfaces and delegate classes for the code generator to add in when it generates the domain objects. If you're feeling more ambitious, you can also do the composition dynamically at runtime, using either the Java Proxy class or (taking it one step further) dynamically generating a java class using a library like javassist. It's not always easy, but the conceptual clarity that comes with using composition instead of inheritance is often worth the tradeoff.

Saturday, October 07, 2006

Why Intel Can't Save Me

Right now I'm working on performance-tuning the next release of one of our products. I've been through this two or three times before (I've either lost count or repressed memories, so I'm really not sure), and it's pretty much always a beating: lots of waiting for tests to run, limited visibility into what's actually going on, arbitrary targets that may or may not be meaningful, a huge amount of work to define data sets and test scripts, and worst of all no real visibility into how far away from your targets you are.

As with any beating, I'd really like to avoid it if it were possible. Every so often, I'll be talking with someone about what I'm doing, and they'll respond with some quip like, "Why are you guys doing that? Just tell our customers to buy faster boxes." If only it were that simple. Intel and the other processor manufacturers have certainly made life easier for us developers, which means that we can use higher-level, more productive but potentially slower tools like Java (and, hopefully, Ruby, at least some day). Unfortunately, that doesn't really get me off the hook: application performance, at least in a web application, is dictated by the weakest critical link. If that weak link isn't CPU-based then 2x or 4x more horsepower just isn't going to help, and even if it is it's rarely the case that all you're looking for is a 2x to 4x speedup. It's not uncommon to find that a certain bit of code, be it a computation or a database query or something else, really needs to be a hundred times faster or more.

Unfortunately for us developers and for our hapless victims/users, there are a seemingly infinite number of ways to screw up performance. Rather than really being about writing "fast" code, performance tuning an application like ours is all about finding and fixing all the "slow" code that makes up the weak links that are holding things back. How many ways are there to screw up performance in a server app likes ours, you ask? Let me count the ways . . .

Alrogithmic Problems

One thing that I've always found interesting about computer science is the sheer magnitude of the variance between an optimal algorithm and other functional but sub-optimal algorithms. Ask two engineers to build engines and you probably won't end up with one that's 1,000,000 times more powerful than the other under normal conditions. Mechanical engineering just doesn't work that way. Ask two engineers to write sorting algorithms, though, and then sort a large list (say a million names) with each of them, and there's a pretty good chance one might be a million times faster than the other.

Why is that? As every undergrad CS major knows, you measure algorithmic complexity as a relationship between running time (or some other resource) and the input size (generally represented as n). The differences in algorithms are sometimes just constant factors, like 10n versus 27n, but they're often in the order of the equation: 10n log n versus n^2, or 10n^2 versus n^4. In pathological cases, you get relationships like 2^n, which essentially means you might as well not bother unless n is a single digit.

Faster hardware can help with the constant factors, but there's really nothing a non-quantum computer is going to do about the order differences. The problem is compounded by the facts that it's non-trivial to determine the algorithmic running time, that it's easy to completely mis-estimate what sort of n to expect (and to test with far-too-low values of n), and that there are inevitably thousands of "algorithms" in a piece of software, only a handful of which actually warrant real attention. The only solution is to test a lot, find the ones that matter, and make them fast. That new quad-core CPU won't do anything to speed up a single algorithm unless it's parallelizable, and even Intel magically gives us a 4x raw speed gain instead you'll still find that 1/4th of Way Too Long is usually still Way Too Long.

Synchronization Problems

It's great that Intel is coming out with those quad-core CPUs, but how do you take advantage of that if your code only runs on one of them? To truly use all that horsepower, you need to have a multi-threaded application, which means you need to synchronize access to shared resources. If you're lucky, you can avoid the problem altogether by not having any shared resources, but that's often not a realistic option. Unfortunately for programmers (and their hapless end-users), writing correct, bug-free, performant synchronized code is really, really hard. There are a ton of interesting ways to write buggy synchronized code, but there are two pretty foolproof ways to destroy performance with synchronization. First of all, you can end up locking so much that your application essentially becomes single-threaded, as every thread ends up waiting for access to synchronized resources. Now your quad-core is magically a single processor again. Secondly, you can run into a far nastier form of that problem not by holding locks for too long but by synchronizing so much and so often that you incur a massive amount of system overhead from the acquisition and release of locks and from the attendant context switches in the OS. The first thing I've had to do on this release, in fact, is to rewrite a lot of synchronized code to be faster and to reduce the amount and duration of locking, as that's been the main bottleneck so far.

Memory Consumption, Object Allocation and Garbage Collection


Garbage collected languages are great, but the garbage collection itself isn't free. GC algorithms take time and, as anyone who's tried to use a desktop Java application knows, it can grind an application to a halt. Using too much memory can cause garbage collection to happen more often since there's less free space to play with, and depending on the hardware you're on can lead to excessive swapping. Allocating a lot of temporary objects ends up as a double performance hit, because you both incur the gc penalty more often and you generally incur some synchronization or system overhead in order to allocate the memory in the first place (at least if you're using a language that doesn't allow for stack-allocated objects). Future versions of the Java VM should really help out here, since they ought to be able to do what's called escape analysis and determine when an object can be allocated on the stack for efficiency, but for now one of my main jobs as Performance Guy is to reduce the number of object allocations that happen on each request.

Bad Database Queries

There are fewer reliable ways to grind an entire web application to a halt than a bad database query. It really only takes one to completely lock up the database and make the application unusable for everyone else. Slow queries could be caused by a lack of proper indexing, poor database statistics that cause the db to choose a bad query strategy, db optimizer bugs, or just an overly-complicated query that can't be made to perform without nasty denormalizations (if then). Faster database servers can help, but bad query plans result in performance that's so much worse than good query plans that the hardware really becomes irrelevant at that point. And unless you have your entire database in memory, a bad query that leads to a full table-scan will hammer your disks. To make things even more fun for a developer, queries will have very different performance characteristics on different size data sets and with different data distributions, which means you need to test a lot to make sure you're covering everything.

Excessive IO

If your application spends all its time talking over the network or reading from the hard drive, a faster machine isn't really going to help much. A faster hard drive or more RAM might, but hard drive improvements are incremental at best and advance nowhere near as quickly as processor speed improvements. The only real solution to excessive IO is to find it and eliminate it.

That's just a brief taxonomy of the sorts of problems you run into during performance tuning. All of them are pretty easy to do, and any one of them by themselves can bring your application to its knees and render your hardware useless. Even if hardware were free, which it most certainly is not, you'd still have to do performance tuning in order to get rid of the patholigically bad algorithms, queries, memory usage, etc. Of course the first order of business with performance tuning is to write the tests and create your sample data set (which is a pain), and the second order of business is to actually figure out what's going on (which is often the hardest part of the job), so only after a lot of back-breaking, mind-numbing labor do you actually get to the point where you're fixing things.

So what's the easiest way to avoid all that work? Simple: just don't write server applications that people will actually use. Performance tuning is, unfortunately, one of the prices of success.

Saturday, September 30, 2006

A Sad Time To Be An American

I can accept that in a Democracy, things will be done by my government that I don't necessarily agree with. Wars will be fought, people will be imprisoned, money will be spent, and laws will be made. That's part of playing the game: once you agree to the rules of the game, you have to accept that any outcome produced by those rules is fair, even if you don't like it. The same thing could be said of the legal system as well; at the end of the day all you really have is the integrity of the process, and if the process is followed you have to accept that the outcome is just. The rule of law and due process have to be protected at all costs, as they're really all that separates a just legal system from a tyranical, arbitrary one.

I've always believed that the United States, for all its other flaws, did well on those two accounts. The elections are generally fair and without problems, happen on a regular schedule, and power transfers peacefully to the winners of the election. Our justice system certainly isn't perfect, but a person is at least always entitled to legal representation, can't be held indefinitely without charges, has a right to examine evidence against them, and evidence obtained by unlawful means (be that an illegal search, hearsay, coercion, or some other failure of due process) can't be used to convict someone. Those are two of the primary virtues that set the US apart from, say, Russia or China or most of South America.

My faith in those two pillars of the American way of lifewas pretty severely shaken this past week; I'm more shocked and saddened than even outraged at this point, simply because there's no obvious outlet at which to direct any outrage.

Most damaging was the passage by congress this week of the detainee treatment bill, a vile, despicable bit of legislation that strips the most hallowed legal protections in the British/US legal system from anyone the president deems an "unlawful enemy combatant." Such people have no ability to challenge their legal standing in court, no right to a speedy trial, no right to self-representation, no irrevocable right to examine the evidence against them, and can effectively be tortured into confessions that are then used against them at trial. All of those by themselves are morally reprehensible, but the lack of any real appeals process is truly evil. The hallowed "innocent until proven guilty" principle has been completely removed, and the president essentially now has the authority to essentially do whatever he wants to non-US citizens with absolutely no recourse on their behalf: all he has to do is declare them an enemy combatant, and away they go. In addition to being in clear violation of US treaty obligations (the bill's blunt assertions that it complies with them does nothing to alter that reality), the bill is clearly ethically vacant and demonstrates that the Bush administration, along with everyone who supported the bill, doesn't understand the true importance of due process and basic human rights. There is no room for compromise here: either you're for inalienable rights and due process, or you're against it. The line has been drawn, and this country's elected representatives have largely come down on the same side of the line as all of the most corrupt and tyrannical governments this world has ever seen. History will be unkind to these people (and to us, for electing them and allowing this to happen), and some day this bill will be as reviled as the Alien and Sedition Acts or the Japanese internment during World War II. I never thought I'd hate any piece of legislation more than the original Patriot Act, but this takes things to a far more disturbing level. I never thought I'd live to see the US Congress attempt to legalize torture, but clearly I was wrong on that count as well.

At the same time, more and more disturbing facts about the Ohio election in 2004 are coming to light. Thankfully, a legal case was won to keep the man in charge of those elections from ordering all the ballots to be shredded, so that voting irregularities have more of a chance of coming to light. I'm naturally wary of any conspiracy theories, but I'm now convinced that the Ohio election was clearly compromised. It's taken a lot to convince me, but Robert F. Kennedy's articles for Rolling Stone raise too many questions to be ignored. The first article raised serious doubts, but the latest article is even more disturbing, as it details how Diebold was essentially allowed to run the election, and how there's pretty clear evidence that the machines were tampered with and an unauthorized patch was installed. Who knows what it did? As a software engineer, though, I can tell you that if I was able to upload arbitrary software onto a voting machine, it would be very easy to rig it to do just about whatever you wanted. Combine that with the other evidence of voter rolls being purged, of improbably-long consecutive runs of ballots all marked for Bush (as discovered by researches accessing the now-legally-proceted ballots), of mysterious replacement ballots that no one can exactly explain, of official accounts of how the ballots were handled that can't possibly be true (due to election returns being posted for the precint far faster than would have been possible), of inexplicable voting patterns (voters in some precints who voted for a black court justice or against a gay marraige ban were found to have voted for Bush by a wide margain), exit polls that didn't come close to matching the reported vote count, and most blatantly of white stickers mysteriously placed on some ballots to cover up clear votes for Kerry . . . all irregularities which just so happened to favor Bush. A few irregularities could be explained away as statistical anamolies, but this many, all in the same direction, in a hotly contested state where a corporation run by a pro-Bush camp is running the election and where the Secretary of State (who oversees the elections) is running Bush's campaign? Election rigging is notoriously hard to prove, as the evidence is usually gone after the election, and doubly so in the case of electronic voting machines that don't leave any paper trail. But in this case, clearly the most rational thing, as sad and disturbing as it is, is to accept that the election was compromised. There simply isn't any way to explain all of those irregularities away.

It's a sad, sad time to be an American.