mandag, september 26, 2011

A few answers to CQRS questions

I have tried to pick a few common CQRS questions and supply my own answers:

1) I'm really missing some definition of WHEN to use CQRS. Collaboration, I know, but 90% is collaboration in my book. A Customer table for example. And Orders, Invoices and shipping tables. Can those be accessed and edited by more than one user? Yes - well there's collaboration!

The point here is that, yes, they can be edited by more than one user, but are they really? How high is the probability of having more than one user updating the same Customer simultaneously? I would say close to zero. The same goes for the other entities.

Sure, all of the users are working together to keep the customer table in sync with reality, that is collaboration. But it is not collaboration in the way of working on the same shared resource!

But we still need to handle that 0.001% change of simultaneous updates! Sure - so add a simple version check (first update wins, second update gets a notification). That is not CQRS One Way Commands - that's plain old synchronous and locking database updates.

So when are we collaborating on shared resources and expecting lots of simultaneous work? Good question (and I do not have a good answer). The fact that so few examples exists indicates that CQRS is something which is not normally required.

2) What about the fact that many implementors feel that things are getting much more simple just by having read and write separation. Isn't that a reason in itself to use CQRS?

You can separate reading and writing without CQRS and eventual consistency. All you need is two different code bases working on the same database table. Very low tech. Maybe throw in multiple database views to support more or less complex queries.

3) But having different read and write databases helps scaling out.

Yes. So does plain single-write-master/multiple-read-slave database replication. If scaling out the query part is your problem then use master/slave on the database level. If scaling out writing is your problem then look into CQRS for that particular use case which requires it!

Scaling out writing is required if you have lots of issues with database deadlocks and timeouts due to intensive locking of the database. This won't happen if all your users do, is to work on different entities all the time.

4) Adding some repository and an event store might be hard the first time, but after that it's really simple. Right?

In my little experience - No. But I may certainly be wrong and have used the wrong tools. In theory it is simple, yes, in practice, no. It adds complexity - not that much, but enough add friction to your project. If your are reading this for some advice on CQRS - well, you might just have experienced that friction and started wondering why.

CQRS adds extra time spent on infrastructure for simple problems, when you should be spending time on complex problems, while keeping simple stuff, well, simple. See also my previous post:

Please, go ahead and use CQRS! I am not saying "no, do not" (who am I to do that?) - I am just sharing my experience.


Why CQRS may not be the answer you are looking for

With CQRS I mean Command Query Responsibility Segregation. If you have never heard of this before then you are probably better off reading about CQRS elsewhere on the web.

If you know CQRS already then, please, read on.

Have you ever tried to implement some kind of CQRS and had that feeling of friction - not really getting it to work and then wondering why? I mean - everybody else is using it, right? Like this quote from the NServiceBus forum states it:

> I think CQRS has been getting a lot of hype recently and one frustration for a beginner (like me) is that if all these people are using these concepts/technologies, rather than just telling conventional crud/n-tier folks that CQRS is the "future", let's have some good examples and documentation that helps us learn why.

I have been there myself - trying to do CQRS exercises for training and ultimately giving up. Lately I have been digging more into it and now I would like to share some thoughts based on my own experiences and learnings from Udi Dahan's Advanced Distributed Systems Design Course. To be honest, lot's of it is just rephrasing of stuff from the course.

First of all, CQRS is one of those annoying technologies that looks very simple in theory, but turns out to cost a bit to build in practice.

Lets try with an example from the NCQRS project (I do not hold anything against that project, they just have a good on-line example) - creating a Tweet in a Twitter-like system:

Now compare the CQRS solution with a Ruby on Rails solution: Which one is the simplest? I hope you agree with me that the Ruby on Rails solution is simpler - it is after all only a matter of writing a tweet to a database. Right?

And that is the most important point: it is after all only a matter of writing a tweet to a database! Right? None of the two examples has addressed the more difficult problems of making sure the right people gets the right tweets, re-tweeting and so on.

So why use CQRS when it can be done in a much simpler way? Here are some arguments:
  1. Its never going to scale!
  2. Of course RoR is easier, CQRS is for complex/collaboration scenarios, you won't see any gain in such a simple example!
  3. Okay, the RoR solution is simpler, but(!) when we add complex business rules then it will be a mess!
Regarding (1): the code we are seeing so far will scale - it is simple CRUD operations and it can easily be scaled out on a database level with a one-master/multiple-slaves setup.

Regarding (2): that it is the whole point - CQRS is not for simple scenarios. You can do it, but it may hurt and work against you all the time.

Regarding (3): you may be right! So lets take out the complex business rules and handle them elsewhere - asynchronously using messaging.

Our message handlers can parse the tweet, looking for #tag and @name, and distribute tweets to other users. All of it the same (eventually consistent) query database!

Duh! You probably say. Only Command handlers modify command state and send out events that the query context listens for. But is any command state involved here? No - all we have done is the simple RoR style direct update of the query model, and some background processing also working directly on the query model.

How can we do this? Where is the command handling? Well, it is not here ... as I started out saying, CQRS may not be the answer you are looking for. CQRS is not to be applied to any and all problems - it carries a big overhead (even if it is simple in theory).

What is left is then to explain where CQRS really applies - but that is beyond the scope of this blog post (and also beyond my current knowledge of CQRS).

So who am I to judge what CQRS is and is not? Well, not any better than so many other people out there. I am no expert. I build my opinion on my own experiences and some valuable input from Udi. Read this as one advice out of many and use it to make your own decision.


onsdag, september 21, 2011

SOA - The Short Story (yes, the short)

Decompositioning business into autonomous IT services is truly to make IT align with business (which is what we all want, right?). It is the understanding of the business domain that guides this process that will ultimately lead to self contained, independent, autonomous, authoritative services that represents real business areas, and can be maintained without causing any pain for other parts of the system ... if applied right, that is.

Unfortunately this is not as easy as it sounds. Suddenly designing IT systems requires real business understanding - something which does not come natural to most developers: we decompose into technical units (database, business layer, network, GUI) - that's easy, and it even holds for all scenarios (that's why devs like it so much)! Whereas business decompose into shopping, sales, booking, finance, fulfillment, reservation, shipping, and, hmmm, ... all sorts of other wonderful "stuff" that makes no sense to us and therefore gives us a hard time creating IT for it.

To make it even harder, we, the devs, need to begin with the understanding of what "self contained", "independent", "autonomous", "authoritative" services really mean and be able to write code that conforms to these predicates. You can start here: "A Webservice is NOT a Service". Write this on the chalkboard a hundred times and then take the red pill.

Should our brain survive this transition into a completely different universe, where devs understand the business and is backing it up with software to support it, we probably just crash and burn the minute we hit reality again in the form of (standback and watch out) IT Operations:

- "No, we cannot enable the Distributed Transaction Coordinator".

- "No, you cannot use the database without having referential integrity".

- "Yes, you must comply to our organizational standards and use IBM MQ series, BizTalk, and WebSphere Message Broker (remember, we paid a gazillion for those products!)".

- "Yes, you must use stored procedures for everything".

- "And, remember that any change request to the database schema must be presented one month in advance before you need it, and, yes, of course it must be approved our DBA's (why do you ask?)."

So is it any wonder that we keep seeing one IT-project after another fail again and again?