Archive for the ‘Newsletters’ Category

Newsletter – Trends

Tuesday, October 27th, 2009

Below is our most recent newsletter. If you would like to subscribe and have it delivered to your inbox, you can do so here.

It has been several months since our last newsletter and we’ve been very busy working with a lot of new clients as well as getting the opportunity to continue working with some of our existing friends. We’ve also been busy working on our upcoming book, The Art of Scalability. The book will be released January 8th and is available now for preorder at Amazon, Barnes and Noble, Borders, and InformIT.

In our profession, we have a unique opportunity to study a lot of new technology as well as visit with hundreds of technology teams each year. This perspective allows us a vantage point to spot trends that are occurring across the technology spectrum. In this newsletter we are going to cover three of these technology trends that we think you should at least be aware of. Each organization and product offering is different so the applicability of these is going to need to be determined on an individual basis.

Continuous Deployment
The concept of continuous deployment is the natural extension to continuous integration that is the practice of checking code into the source code repository early and often, compiling the code, and performing the integration tests on the new code. The goal of which is to ease the often very painful process of integrating multiple developer’s code after weeks of independent work. In order to make this process the most effective the automation of builds and smoke tests are highly recommended. For more information on continuous integration there are a lot of resources such as these books and articles.

Continuous deployment is when all code that is written for an application is immediately deployed into production. While still a very new concept, there are a growing number of companies that are beginning to adopt this process. Flicker and IMVU are two of the earliest. Eric Ries, CTO of IMVU, believes that this approach can improve software quality due to the discipline, automation, and rigorous standards that are required. In order to be successful Eric suggest a 5 step approach that includes continuous integration, source code commit checks, deployment scripts, alerting, and root cause analysis. You can read more about this process in a recent post.

Before dismissing this idea as not right for your organization consider what one of our clients does that achieves some of the benefits of the approach without so much of the risk. This particular company uses its own software, which many SaaS companies do at least from an administrative perspective, and they deploy each night’s build onto their internal system. They do not deploy each build onto their customer’s production environments but instead wait until the iteration is complete. The concern over disrupting the company’s internal operations are enough to enforce the rigor and achieve higher quality without taking on the risk of disrupting their customers.

Key-Value Stores and Task Specific Databases
In most Software as a Service or eCommerce applications the database is the central part of the entire system. While relational database management systems (RDBMS) are scalable as we have described in many of our posts, we also know that it can be intimidating for some technology orgs.

What we have seen and expect to see more of is the use of in memory key-value stores such as memcached and redis as a shared memory object cache that is the primary data source for the service or application. Writes work their way back to the relational database asynchronously and reads into the object cache work their way forward either scheduled or on demand. An even more cutting edge trend, although the term has been around since at least 1998, is memory-based architectures where the memory storage is the system of record and there is no relational database or persistent storage device.

A similar trend is the utilization of task specific databases such as Apache CouchDB that is a document-oriented database. CouchDB allows objects, that consist of named fields, to be queried and indexed in a MapReduce fashion and also offers incremental replication with bi-directional conflict resolution. This is obviously not a relational database nor is it an object oriented database but rather it is a query-able and index-able, table oriented reporting engine. The advantages are simpler implementation and administration as well as improved performance for a very specific task.
As services become more specialized and service level agreements demand faster response times, in memory data stores and task specific databases are very likely to be involved in more architectures as a layer between the application servers and the persistent storage or in place of them.

Cloud Computing
While not really a hot new trend this cloud computing is still at such a fledgling stage and evolving so rapidly that it will pay to keep a close eye on it. Some of the more recent introductions by Amazon in their cloud portfolio include auto scaling, load balancing, monitoring, and VPN. Auto Scaling allows you to automatically scale Amazon EC2 instances up or down according to predefined conditions. Instead of running your own software load balancer such as HAProxy, Elastic Load Balancing can perform the distribution of incoming traffic across multiple instances. Monitoring is now available through CloudWatch for EC2 instances but it is still pretty featureless other than load and traffic. For companies interested in extending their existing management capabilities such as security services, firewalls, and intrusion detection systems to include their AWS resources, Amazon now offers virtual private cloud. This enables companies to connect their existing infrastructure to a set of isolated AWS compute resources via a Virtual Private Network (VPN).

These are all Amazon specific functionality but the addition of these features are either already offered by other cloud providers or can be expected shortly. We still are weary to recommend a full production deployment in a single provider’s cloud but often see this as a viable solution for many customers for non-production environments, disaster recovery, and surge or flex capacity.

Newsletter – The Future of Relational Databases

Wednesday, June 17th, 2009

For those readers who haven’t subscribed to our newsletter here is a copy of what was sent out earlier this week.  If you would like to receive future emails, about once every other month, sign up here.

We were intrigued by a question asked by Tony Bain of Read Write Web, in his recent article “Is the Relational Database Dead?” Since relational databases management systems (RDBMS) are a significant part of most SaaS or Web 2.0 architectures, we spend a lot of time helping people scale their databases. We also often find ourselves part of discussions about the future of relational databases.

Do Relational Databases Scale?
We believe that RDBMS scale very well especially when not on a single server. This is the major principle behind the three axes of the AKF Scale Cube.

The AKF Database Scale Cube consists of an X, Y and Z axes – each addressing a different approach to scale transactions applied to a database. The lowest left point of the cube (coordinates X=0, Y=0 and Z=0) represents the worst case monolithic database – a case where all data is located in a single location and all accesses go to this single database.

The X Axis of the cube represents spreading read requests across multiple instances of the same data set. All writes get executed on a single database that asynchronously propagates to the read replicas. Caching can be implemented to further reduce database load. The Y Axis of the cube represents a split by function or service. The Z Axis represents ways to split transactions by performing a lookup, a modulus or other indiscriminate function (hash for instance).

Key / Value Stores
We don’t believe in the imminent demise of the RDBMS but we do expect more applications and services to use a relational database as a persistent data store and use key/value stores as part of the transactional system. We have seen and expect to see more of the use of key/value stores such as memcached and redis as a shared memory object cache that is the primary data source for the service or application.  Writes work their way back to the relational database asynchronously and reads into the object cache work their way forward either scheduled or on demand.

This trend is possible obviously because of the software being developed to allow this but also the cheaper hardware including virtualization in clouds. As services become more specialized and service level agreements demand faster response times, in memory data stores are very likely to be involved in more architectures as a layer between the application servers and the relational databases.

Specialized Databases
Another trend in RDBMS is the creation of specialized databases such as Hive that is an open source, peta-byte scale date warehousing framework based on Hadoop that was developed by Facebook. Other databases include Amazon’s distributed, web service SimpleDB, Google’s column-oriented BigTable, and the Apache JSON based CouchDB. Each of these with advantages in particular scenarios. It has been shown in research papers like “One Size Fits All? – Part 2: Benchmarking Results” by Michael Stonebraker and Ugur Cetintemel that major RDBMS vendors can be outperformed by 1 – 2 orders of magnitude by specialized database engines.

In “The End of an Architectural Era” they continue their argument  that the current legacy RDBMS code attempting to be a “one size fits all” excels at nothing. In fact they offer data that the completely written from scracth, H-Store database, built in collaboration with MIT, Brown and Yale Universities, can outperform these legacy RDBMS by two orders of magnitude on standard transactional benchmarks.

Conclusion
The demise of the RDBMS has been prophesied almost since it’s inception in E.F. Codd’s 1970 paper “A Relational Model of Data for Large Shared Data Banks“. One of the most promising replacements given the advent of object oriented programming languages was the object-oriented database, which never became mainstream but has resulted in the inclusion of object-relational features in more popular RDBMS.

Even with the improved performance from specialized databases the most popular relational databases including Oracle, MySQL, and PostgreSQL will continue to be an integral part of SaaS and Web 2.0 architectures. Learning how to scale them will continue to be important for many years to come.