Archive for the ‘Engineering’ Category

The Enterprise Service Bus

Monday, May 31st, 2010

It seems like everyone’s architecture diagrams has a big “Enterprise Service Bus” drawn down the middle of it these days.   It’s as if you get a prize for just including the notion of an ESB and indicating that everything is loosely coupled.   The problem is that an unmanaged, misused or overused service bus can become one big Enterprise Cesspool.

Don’t get me wrong – ESBs are a great addition for many companies if used properly.   If you are publishing a message for which there are many subscribers and if you can lose some messages without causing problems then by all means implement an ESB.  Centralized logging of messages, trickle loading of data warehouses and even updates to prices or inventory in search nodes are all examples of ESB activities.  Just ensure that your application is either resilient to or can recover from intermittent message loss.

Don’t fall into the trap of thinking that message buses are the only solution for communications.  Asynchronous communication is preferable where possible, but there are many options beyond message buses.  You can implement asynchronous point to point communication methods (we often differentiate these from bus architectures and call them queue architectures).  You can also implement async-sync point to point methods that allow a busy application to “fire and forget” into a queue while another application works through the queue and synchronously communicates to another consumer process or to multiple data stores where necessary.  And sometimes, when it absolutely positively has to be there overnight, it is best just to communicate synchronously.  Be a wise craftsman/engineer and choose the right tool for the job.  After all, it’s rare that a carpenter chooses to cut a board with a hammer.

Service buses can easily also become cesspools.  Someone needs to ensure that the bus is built with an appropriate number of independent channels so that communications paths don’t become congested.  Treat buses as you would any other piece of infrastructure and scale them horizontally with multiple channels, routing services, etc.  Monitor them aggressively to include message latency, age of the oldest message, number of undelivered messages, etc.

Why We Hate Stored Procedures

Wednesday, May 19th, 2010

Okay we really don’t hate stored procedures and have actually used them  extensively on past projects but we so often tell people to remove them  that we’ve been accused of hating them. Here’s the deal, as we mentioned  in a past post, two scalability best practices are 1) put as little  business logic in the database layer as possible and 2) use as few  features of the RDBMS as possible. Both of these are violated by the use  of stored procedures, if they contain any business logic. If the stored  procedures are simply instantiations of SQL data statements then you’re  probably okay.

Expanding on the two reasons for avoiding stored procedures mentioned  above, we’ll first discuss why business logic doesn’t belong in the  database layer. The biggest reason for this is that the database server  is likely the most expensive server in your system and the most  difficult to scale. Adding more computational processing on it makes it  the most expensive layer to perform these and requires you to upgrade  the hardware or split the database sooner than necessary. It is much  cheaper from a cost per computation perspective to have this business  logic processed on a much cheaper application server.

Avoiding the use of RDBMS features is important when it comes to  remaining vendor agnostic and getting the best possible price during  negotiations. Using features specific to certain database vendors locks  you in to that vendor. While you may never want to switch database  vendors it does reduce your negotiating strength. When you are locked in  to a vendor your best alternative to a negotiated agreement, BATNA, is  not good and their sales staff knows this.

While we don’t hate stored procedures we do seem them as an impediment  to scaling. If you already have them in place you may be hesitant to  spend the money moving them into the application logic. One method of  doing so at minimal incremental cost is to start today with the policy  that if any engineer touches a stored procedure they must move it. This  way the cost is small, because you’re already touching the code, and  spread out because it’s not done all at once.

Matching Data Value and Data Storage Costs

Wednesday, May 12th, 2010

We often consult with clients who have storage costs overruns.  These aren’t just the actual cost of storage, which as we all know have been dropping fairly drastically.  Rather, as we outline in our book, it’s all the other costs of storage:  space for the storage, power for the storage, the cost to traverse huge indices and the resulting response time, etc.  Most often these clients have a single solution for their storage needs.  Regardless of the actual value of the data being stored to the business, it goes into a cookie cutter high speed SAN; easy to access – overall costly to maintain on a relative basis.

Our solution to this is simple:  Not all data is created equal.  No, we aren’t going to tell you to “enslave it” but we are going to violate certain constitutional rights and tell you that you should absolutely “profile” it.  TSA be damned.  We are going to borrow a technique from our marketing friends and apply something known as an RFM cube to the data.  RFM stands for recency, frequency and monetization.  Marketing gurus use this technique to make recommendations to people or send special offers to keep high value customers happy or to “re-activate” those who haven’t been active recently.

“Recency” accounts for how recently the data item in question has been accessed.  This might be a file in your storage system or rows within a database.  Frequency speaks to how frequently the data is accessed.  Monetization is the value that specific piece of data has to your business in general.  The three of them help us calculate overall business value and access speeds.  By matching the type of storage to the value of the data in an RFM-like approach, we might have a cube that has very high cost storage mapped to high value data in the upper right and an approach to delete and/or archive data in the bottom left:

The product of our RFM analysis might yield a score for the value of the data overall.  Maybe it’s as simple as a product or maybe you’ll add some magic of your own.  If we plot this on a cost and value curve, we can start matching storage needs to it.  Low value data goes away or goes on low cost systems with slow access times.  If you need to access it, you can always do it offline and email the report or whatever to the requester.  High value systems might go on very fast but relatively costly SSDs or some storage area network equivalent.  We’ve made some mappings below, though the escalation isn’t meant to represent today’s market prices:

Now go save your company some cash!

5 Things Agile is NOT

Monday, May 3rd, 2010

It seems that everyone is moving to an Agile approach in their product (or software) development lifecycle.  Some move with great success, some with great fanfare and for some it’s one of the last moves their company and engineering organizations make before failing miserably and shutting doors permanently.  As often as not, companies just move because they believe it will cure all of their problems.  As we’ve written before, no new PDLC will cure all of your problems.  Agile may in fact be best for you, but there are always tradeoffs.

We’ve compiled a top 5 misconceptions about Agile from our experience working with companies to solve problems.  Be careful of these pitfalls as they can cause you to fail miserably in your efforts.

1)      Agile is NOT a reason to NOT manage engineers or programmers

Engineering organizations measure themselves.  In fact, many Agile methods include a number of metrics such as burn down charts to measure progress and velocity for measuring the rate at which teams deliver business value.  As with any other engineering organization, you should seek to find meaningful metrics to understand and improve developer and organizational quality and productivity.  Don’t let your team or your managers tell you not to do your job!

2)      Agile is NOT a reason to have engineering alone make product decisions

You still have to run your business which means nesting the product vision to the vision of the company.   More than likely you are paying business or product people to define what it is you need to do to fight and win in your market.  Someone still has to set the broad strategic vision to which products will be developed.  Engineers can and should contribute to this vision, and that’s where Agile methods can help.

3)      Agile alone is NOT a cure for all of your product delivery problems

As we’ve blogged before, there simply are no silver bullets.  With the right management and oversight, you can make any development lifecycle work.  There are just cases where Agile methods work better.   But don’t look to have a PDLC fix your business, people or management issues.

4)      Agile is NOT an excuse to NOT put in the appropriate processes

There is nothing in the Agile manifesto that states that process is evil.  In fact, it recognizes processes are important by stating that there is value within them.  Agile simply believes that individuals and interactions are more important – a concept with which we completely agree.   Don’t argue that all process is evil and cite Agile as the proof as so many organizations seem to do.  Certain processes (e.g. code reviews) are necessary for hyper growth environments to be successful.

5)      Agile is NOT an excuse not to create SOME documentation

Another often misinterpreted point is that Agile eliminates documentation.  Not only is this not true, it is ridiculous.  The Agile manifesto again recognizes the need for documentation and simply prefers working software over COMPREHENSIVE documentation.  Go ahead and try to figure out how to use or troubleshoot something for the very first time without documentation and see how easy it is…  Programming before designing coupled with not creating  useful documentation makes one a hack instead of an engineer.  Document – just don’t go overboard in doing it.

Data Access Layers

Wednesday, April 21st, 2010

Ways of abstracting the storage of data have been around for a long time.  In data warehouses engineers abstract data into business or domain objects that can be manipulated in reports.  For object oriented programming, engineers can use the active record pattern to create a wrapper class representing a table and methods for inserting, updating, or deleting. Thus the manipulation of the database rows are abstracted into the object oriented parlance of classes and methods. This layer of computing is known as a Data Access Layer (DAL) and hides the complexity of the underlying data store from engineers who do not need to be bothered with those details.

There are many frameworks or object-relational mapping (ORM) tools for creating DAL’s for different programming languages such as Active Record for Ruby and Hibernate for Java.  We often hear from development teams that have adopted an ORM that the pros include: shorter development time, better designed code and reduction in amount of code.

However, a quick search will show you that not everyone is sold on the benefits of a DAL. The reduction in code is debatable when constructing complex queries or considering that Hibernate is over 800 KLOC of Java and XML.  There are also concerns about the ability to scale effectively when using DALs.  While it is possible with an ORM to scale on the X-axis such as with MySQL master-slave replication, the Y and Z axes splits can become much more complicated.

I am a fan of DAL’s for their centralization of data objects and their abstraction of relational data into objects. To me these advantages speed development and testing time and improve quality. Additionally, given the sophistication and open source of ORM’s today, I think it makes sense to consider using one as a framework. However, if you choose to do so, you need to consider ahead of time how you would shard your data along other axes. The time for those considerations is moved up when using an ORM. Think of the D-I-D approach where the cost to make a change during the Design phase is negligible compared to changes made after Deployment.