AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Category » Engineering

How to Choose a Development Methodology

One of our most viewed blog entries is PDLC or SDLC, while we don’t know definitively why we suspect that technology leaders are looking for ways to improve their organization’s performance e.g. better quality, faster development. etc. Additionally, we often get asked the question “what is the best software development methodology?” or “should we change PDLC’s?” The simple answer is that there is no “best” and changing methodologies is not likely to fix organizational or process problems. What I’d like to do in this posts is 1) give you a very brief overview of the different methodologies – consider this a primer, a refresher, or feel free to skip and 2) provide a framework for considering which methodology is right for your organization.

The waterfall model is an often used software development processes that occurs in a sequential set of steps. Progress is seen as flowing steadily downwards (like a waterfall) through the phases such as Idea, Analysis, Design, Development, Testing, Implementation and Maintenance. The waterfall development model originated in the hardware manufacturing arena where after-the-fact changes are prohibitively costly. Since no formal software development methodologies existed at the time, this hardware-oriented model was adapted for software development. There are many variations on the waterfall methodology that changes the phases but all have the similar quality of a sequential set of steps. Some waterfall variations include those from Six Sigma (DMAIC, DMADV).

Agile software development is a group of software development methodologies based on iterative and incremental development. The requirements and ultimate products or services that get delivered evolve through collaboration between self-organizing, cross-functional teams. This methodology promotes adaptive planning, evolutionary development and delivery. The Agile Manifesto was introduced in 2001 and introduced a conceptual framework that promotes foreseen interactions throughout the development cycle. The time boxed iterative approach encourages rapid and flexible response to change. There are many variations of Agile including XP, Scrum, FDD, DSDM, and RUP.

With so many different methodologies available how do you decide which is right for your team? Here are a few questions that will help guide you through the decision.

1) Is the business willing to be involved in the entire product development cycle? This involvement takes the form of dedicated resources (no split roles such as running a P&L by day and being a product manager by night), everyone goes through training, and joint ownership of the product / service (no blaming technology for slow delivery or quality problems).
YES – Consider any of the agile methodologies.
NO – Bypass all forms of agile. All of these require commitment and involvement by the business in order to be successful.

2) What is the size of your development teams? Is the entire development team less than 10 people or have you divided the code into small components / services that teams of less than 10 people own and support?
YES – Consider XP or Scrum flavors of the agile methodology.
NO – Consider FDD and DSDM which are capable of scaling up to 100 developers. If you team is even larger consider RUP. Note that with agile, when the development team gets larger so does the amount of documentation and communication and this tends to make the project less agile.

3) Are your teams located in the same location?
YES – Consider any flavor of agile.
NO – While remote teams can and do follow agile methodologies it is much more difficult. If the business owners are not collocated with the developers I would highly recommend sticking with a waterfall methodology.

4) Are you hiring a lot of developers?
YES – Consider the more popular forms of agile or waterfall to minimize the ramp up time of new developers coming on board. If you really want an agile methodology, consider XP which includes paired programming as a concept, and is a good way to bring new developers up to speed quickly.
NO – Any methodology is fine.

A last important idea is that it isn’t important to follow a pure flavor of any methodology. Purist or zealots of process or technology are dangerous because a single tool in your tool box doesn’t provide the flexibility needed in the real world. Feel free to mix or alter concepts of any methodology to make it fit better in your organization or the service being provided.

There are of course counter examples to every one of these questions, in fact I can probably give the examples from our client list. These questions/answers are not definitive but they should provide a starting point or framework for how you can determine your team’s development methodology.


1 comment

The Agile Executive

In this third installment of our “Agile Organization” series we discuss the qualities and attributes necessary for someone to lead a group of cross functional Agile teams in the development of a web-centric product.  For the purposes of this discussion, the Agile Executive is the person responsible for leading a group of agile teams in the development of a product.

In a world with less focus on functional organizations such as the one we’ve described in our Agile Organization articles, it is imperative that the leadership have a good understanding of all domains from the overall business through product management and finally each of the technical domains.  Clearly such an individual can’t be an expert in every one of these areas, but they should be deep in at least one and broad through all of them.  Ideally this would have been the case in the functional world as well, but alas functional organizations exist to support deep rather than broad domain knowledge.  In the Agile world we need deep in at least area and broad in all areas.

Such a deep yet broad individual could come from any of the functional domains.  The former head of product management may be one such candidate assuming that he or she had good engineering and operations understanding.  The head of engineering and operations may be heads of other agile teams, assuming that they have a high business acumen and good product understanding.  In fact, it should not matter whence the individual comes, but rather whether he or she has the business acumen, product savvy and technical chops to lead teams.

In our view of the world, such an individual will likely have a strong education consisting of an undergraduate STEM (science, technology, engineering or math) degree.  This helps give them the fundamentals necessary to effectively interact with engineers and add value to the engineering process.  They will also have likely attended graduate school in a business focused program such as an MBA with a curriculum that covers finance, accounting, product and strategy.  This background helps them understand the language of business.  The person will hopefully have served for at least a short time in one of the engineering disciplines as an individual contributor to help bridge the communication chasm that can sometimes exist between those who “do” and those who “dream”.  As they progress in their career, they will have taken on roles within product or worked closely with product in not only identification of near term product elements, but the strategic evaluation of product needs longer term as well.

From the perspective of philosophy, the ideal candidates are those who understand that innovation is more closely correlated with collaboration through wide networks than it is to the intelligence of one individual or a small group of people.  This understanding helps drive beneficial cognitive conflict and increased contribution to the process of innovation rather than the closed minded approach of affective conflict associated with small groups of contributors.

In summary, it’s not about whence the person comes but rather “who the person is”.  Leading cross disciplinary teams requires cross disciplinary knowledge.  As we can’t possibly experience enough personally to be effective in all areas, we must broaden ourselves through education and exposure and deepen ourselves through specific past experiences.  Most importantly, for a leader to succeed in such an environment he or she must understand that “it’s not about them” – that success is most highly correlated with teams that contribute and not with just being “wickedly smart”.


Comments Off

Technical Debt

Given the debt crises in Italy, Greece, and the US, just to name a few, the idea of debt is often parts of our conversations. As little as five years ago who would have believed that the US could double it’s debt that took 200 years to accumulate. While we all probably think this is outrageous let’s consider the debt that we accumulate within our systems. We’ve all heard and probably used the term technical debt but do we really understand how bad it can affect us and how quickly?

We often tell clients that they need to spend 12 – 25% of their engineering time on maintenance issues to pay down this debt. Failing to do so can result in some horrendous work stoppages. As an example, we had a client that has been around for a little more than a decade and had a great little profitable business. They were bringing in finance to grow their operations outside of the NY-metro area and needed help scaling. Unfortunately they had ignored refactoring, scaling, and maintenance in general for most of the time they’ve been in business. The result was that when a potential client asked to add a second email address on accounts the estimate came back at 1,500 hours…yes, that’s right 190 days or about 3/4 of a year of a developer to simply add an email field. Given that engineers typically are optimistic about how long it will take to accomplish something, imagine how long this really took.

The take away here for both our countries and our organizations is to not let debt pile up. Face the pain and balance the budget because if you don’t it only gets worse.


Comments Off

Lazy Summertime

Now that summer has officially arrived, it’s time to talk about how we can justify being lazy. One of my favorite chapters in Scalability Rules is “Use Caching Aggresively.” The reason I like it so much is that it reminds me to be lazy. Yes for all you slackers here is your excuse to do as little work as possible.

Another Guincho perspective

Our first justification for being lazy comes under the category of “how to avoid work.” The best way to scale is to avoid the traffic in the first place. One way to avoid traffic is for users to never come to your site but this isn’t very desirable. The prefered solution to avoiding traffic is to utilize the many layers of caching between your persistent storage (usually a relational database) and the users’ browsers. A few of these possible caches that you can leverage are: O/S DNS cache, Browser cache, CDNs, Reverse Proxy, and Object Cache.

If one reason wasn’t enough here is another excuse to be lazy. The best way to avoid errors is that do as little work as possible. The less you do the less you can screw up. In order to do as little as possible you need to automate or script simple tasks. If you find yourself doing something repetitively such as installing packages, resetting data, or making copies consider these tasks for automation. Consider these few commands:

dd bs=65536 if=/dev/sda1 of=/dev/sdd
fsck /dev/sdd
mkdir /root/ebs-vol
mount /dev/sdd /root/ebs-vol

It’s much easier and less prone to errors to kick off a shell script than to type all these commands over and over, day after day.

So, there you go, two reasons for you to remain lazy all summer and hopefully enjoy the warm weather.


Comments Off

Multi-paradigm Data Storage Architectures

We often have clients ask about one or more of the NoSQL technologies as potential replacements for their RDBMS (typically MySQL) to simplify scaling. What I think makes much more sense with regard to these NoSQL and SQL storage systems is an AND instead of an OR discussion. Consider implementing a multi-paradigm data storage layer that provides the appropriate persistent storage system for the different types of data in your application. This approach is similar to our RFM approach to data storage. Consider questions such as how often do you need the data, how quickly do you need it, how relational is the data, etc. There are at least four benefits of this multi-paradigm approach: simpler scaling, improved application performance, easier application development, and reduced cost.

Scaling
The AKF Scale Cube provides a straightforward way to scale any relational database through the three axes but we know that splitting data relationships once they’ve been established isn’t easy. It requires work and lots of coordination between teams. By limiting what gets stored relationally to only the minimum that is required means fewer splits along any axis. Many of the NoSQL technologies provide auto sharding and asynchronous replication. Re-indexing keys across another node is much simpler than migrating tables into another database.

Performance
While relational databases can have great performance, unless the table is pinned in memory or the query results are cached in memory, an in memory data store will always outperform SQL. In many applications we could make use of in memory solutions like Memcache or MongoDB to improve performance of retrieving high value data.

Application Development
As Debasish Ghosh states in his article Multiparadigm Data Storage for Enterprise Applications, storing data the way it is used in an application, simplifies programming and makes it easier to decentralize data processing. If the application treats the data as a document why break it apart to store it relationally when we have viable document storage engines. Storing the data in a more native format allows for quicker development.

Cost
For data that’s not needed often, cache it in other places (such as a CDN) or lazy load it from a low cost storage tier such as Amazon’s S3. This might work well for applications hosted in the cloud. The benefit of this a lower cost per byte stored, especially when considering all costs including administrators for the more complex data storage systems such as relational databases.

A final step in implementing a multi-paradigm data storage layer is an asynchronous message queue for data that needs to move up and down the stack. Implementing ActiveMQ or RabbitMQ to asynchronously move data from one layer to another as needed relieves the application of this burden. As an example consider an application that routes picking baskets for inventory in a warehouse. This is typically thought of as a graph with bins of inventory as nodes and the aisles as edges. For faster retrieval you could store this in a graph database such as Neo4J for ease of development and performance reasons. You could then asynchronously persist these maps in a MySQL database for reporting and older versions into an S3 bucket for historic archiving. This combination provides faster performance, easier development, simpler scaling, and reduced cost.


Comments Off

Rules for Surviving an Amazon Outage

Because of recent issues with Amazon’s services there is a lot of interest in why some companies are able to keep their site up despite their IaaS or PaaS providers experiencing issues. Here is an InformIT article we wrote, outlining a few rules for surviving an Amazon or other cloud provider outage.


4 comments

Don’t Interrupt the Doers

We get called in occasionally because a company’s leaders don’t feel that their product development is happening rapidly enough. They recall how fast the product evolved when the company was first started and they want that pace again. There are many reasons for the pace of development to have slowed. Certainly one of the more popular catch phrases that people use is technical debt, which is a metaphor to explain the eventual consequences of fast paced development. As you incur technical debt, your pace of development slows.

I think there is another factor that is equally or possibly more responsible for slowing the pace of development, interruptions. Engineers need large blocks of uninterrupted time to think, design, plan, code, and test. Disrupting an engineer during these tasks often require a wholesale reset of their thought process. There have been lots of studies that support this one such study found that when tasks were interrupted people require upwards of 27% more time to complete, commit twice the number of errors, and experience twice the increase in anxiety as compared to uninterrrupted tasks. And, as a recent CNN article explained, this problem of disruptions affecting our productivity gets worse as we get older.

So what is interrupting engineers? I’d wager it’s predominantly meetings. While communicating, coordinating, interviewing, etc are all very important for engineers to participate in, doing so in a haphazard manner can be devasting to productivity. In this competitive hiring environment, interruptions might just be driving your engineers out the door. Try a few of these suggestions to reduce interruptions for engineers:

  • Have at least one day per week where meetings are not allowed
  • Only allow meetings at the beginning or end of the day
  • Require all meetings to have agendas and goals
  • Question standing meetings to ensure all participants are necessary

While measuring productivity is incredibly difficult most organizations can feel when the pace of development has slowed. Reduce the interruptions of your engineers and see if this doesn’t help increase the pace again.


Comments Off

Availability as a Feature

It doesn’t matter if you run a commerce site, a services product (such as a SaaS offering) or simply use your homepage to distribute information:  The table stakes for playing online is high availability.  So many companies just take for granted that they will be highly available because they have multiple instances of systems and multiple copies of their data.  This assumption of availability will likely, at the very least, cost you significant pain and in the extreme cost you either significant market share or close your doors as a business.  Customers expect the unachievable – 100% availability.  At the very least you need to give them something close to that.  What will happen to you if you have a data center failure?  How about if a DBA accidentally drops a critical table in your production database?  What will you do when that marketing campaign triggers a near overnight doubling of traffic?  What happens when that new feature has a significant performance bug and gets adopted so quickly that it brings your entire site to its knees?

We often tell our clients that they should treat high availability as a feature.  Unfortunately, it is a somewhat expensive feature that requires constant investment overtime to achieve and maintain. It is a must have feature that will only differentiate your firm if you have competitors who do not make similar investments; when competition exists, customers are more likely to leave a site for a competitor due to availability and performance issues than nearly any other reason.  If you don’t believe us on this topic, just go ask the folks at Friendster.

Treating availability as a feature means measuring availability from a customer perspective rather than a systems or device perspective.  How many times did customer requests not complete?  In this regard, availability now becomes a percentage of failed transactions against an expected number of transactions.   We define an approach to accomplish this in our first book “The Art of Scalability”.  Every executive in the company should “own” the availability metric and understand its implication to the business.    You should track how much you invest in availability over time and significant decreases in engineering or capital should be questioned as it may be an early indicator that you are under investing and a harbinger of hard times to come.

One of the most common failures we see is to assume that disaster recovery is something that only big companies need.  Make no mistake about it, disasters do happen and given enough time they will happen to you.  Data centers catch on fire, have water (sprinkler) discharges that ruin equipment, have complete power equipment failures that take hours to fix and are prone to damages from vehicles, earthquakes, employees and tornados.  In our past lives as executives and current roles as advisors we’ve seen no less than 4 data center fires, 2 data centers incapacitated from earthquakes and tornados and one data center leveled by a truck running into it.  You are never too young to invest in DR.

And DR need not break your bank.  The dials of RTO (recovery time objective) and RPO (recovery point objective) allow you to determine how much you will invest.  Perhaps you simply replicate your databases to a smaller set of databases at a remote datacenter and have a copy of each of your systems there with an additional copy ready “in the cloud”.  While you won’t be able to run production from that data center, you may be able to leverage the cloud to add capacity for relatively low cost by cloning the cloud based systems.  Such a solution has a fast recovery point objective (you lose very little data) and a moderate recovery time objective (several hours) for very low comparative cost.  Of course, you would need to test the solution from time to time to show that it is viable, but it’s a cheap and effective insurance policy for the business.

So remember – availability is your most important feature.  Customers expect it always and will run away from you to competitors if you do not have it.  Create an availability metric and ensure that everyone understands it as a critical KPI.  Evaluate the company spend against availability quarterly or annually as an additional indicator of potential problems.   Assume that disasters happen and have a DR plan regardless of your company size.

 


Comments Off

What Is That Delay Costing?

As a side practice in our scalability and availability engagements we often work with companies on the performance of their SaaS offerings by attempting to speed up their web page load times. Citing a Google white paper, “Speed Matters for Google Web Search” by Jake Brutlag, we point to the fact that even tenths or hundredths of a second matter. Brutlag states that through experiments they have shown that increasing web search latency from 100 to 400 ms reduces the daily number of searches per user by upwards of 0.6%. Given that we are attempting to become practitioner-scholars, in order to bridge the gap between academia and practice, we decided to dive into this subject area a little deeper. Our goal was to understand what other research had been done and if there was anything more practitioners could learn besides “speed up your pages!”

Research in computer system delay has been taking place for decades and has shown that excessive computer system delay results in negative responses such as anxiety (Guynes, 1988) and satisfaction with the system itself (Rushinek & Rushinek, 1986). However, the research does not support the relationships between increases in delay and the attitude toward the company (Rose & Straub, 2001). It was shown that increases in delay treatments from near 0 to 15 seconds did not correlate with a reduction in satisfaction measures such as ease of use or content appeal (Otto et al., 2000). It was also found that increases in delay treatments did not consistently predict likelihood of future patronage (Rajala and Hantula, 2000).

So from all this research we have the notion that delays cause frustration, even anxiety, but yet they don’t appear to cause a decrease in satisfaction or even predict continued usage. Why is this?

Attribution theory, which deals with how individuals infer causality between events (Kelley and Michela, 1980), would explain this phenomenon as the customers assigning blame for the delay to something or someone other than the SaaS provider. This theory has also been used to show that the presence of a self-serving attribution bias and an actor–observer attribution bias in entrepreneurs’ representations of events (Rogoff et al. 2004) but we’ll save that for another post. It turns out that perceived wait time is much more critical than the actual wait time (Baker & Cameron, 1996).

Rose, et al. (2005) content that “it may be less important to reduce objective delay than it is to create a system where users will be less likely to attribute the delay to the retailer.” An example would be to give the user the option of selecting a low or high graphic site in order to provide the users with the control. Users will likely perceive this as an active effort on the part of the SaaS provider to minimize download time and thus attribute delays to themselves, their computer, their ISP, etc but not the site.


References:

Baker, J., & Cameron, M. (1996). “The effects of the service environment on affect and consumer perception of waiting time: An integrative review and research propositions.” Journal of the Academy of Marketing Science, 24, 338–349.

Guynes, J. L. (1988). “Impact of system response time on state anxiety.” Communications of the ACM, 31, 342–347.

Kelley, H. H. and J. L. Michela (1980). “Attribution theory and research.” Annual review of psychology 31(1): 457-501.

Otto, J. R., Najdawi, M. K., & Caron, K. M. (2000). “Web-user satisfaction: An exploratory study.” Journal of End User Computing, 12, 3–10.

Rajala, A. K., & Hantula, D. A. (2000). “Toward a behavioral ecology of consumption: Delay-reduction effects on foraging in a simulated internet mall.” Managerial and Decision Economics, 21, 145–158.

Rogoff, E., Lee, M., and Suh, D. 2004. ““Who Done It?’ Attributions by Entrepreneurs and Experts of the Factors That Cause and Impede Small Business Success,” Journal of Small Business Management (42:4), pp 364-376.

Rose, G.M., & Straub, D. (2001). “The effect of download time on consumer attitude toward the e-service retailer.” e-Service Journal, 1, 55–76.

Rose, G. M., M. L. Meuter, et al. (2005). “On line waiting: The role of download time and other important predictors on attitude toward e retailers.” Psychology and Marketing 22(2): 127-151.

Rushinek, A., & Rushinek, S. (1986). “What makes users happy?” Communications of the ACM, 29, 594–598.


Comments Off

Scalability at the Cost of Availability

Do you associate scalability with availability? Sometimes these go hand-in-hand but sometimes these are at odds with each other. We’re obviously big proponents of architecting your systems so that you have the necessary scalability when you need it but we’re also realistic. We often help young companies make tradeoffs between capital expenditure and scalability. It’s not uncommon for us to spend a good deal of time explaining the concepts of Design-Implement-Deploy and Recency-Frequency-Monetization to help with this discussion.

One subtle concept that is sometimes misunderstood is that if not careful an increase in scalability can actually decrease your availability. In order to understand how this can happen we need to talk about the multiplicative affect of failure with items in series. Let’s take for example a system with a single web server with 99.9% availability, forget about network gear for now but it has the same affect. The availability of the system is 99.9% If we now add a database, also with 99.9% availability, to the system. Assume that the DB is required for the web server to respond i.e. pages are built by querying the DB. This causes the availability of the system to go down to 99.9% x 99.9% = 99.8%. The reason is that with 99.9% availability the system is going to be down for ~43 min per month. The chance that the database experiences its 43 min of downtime at the same time as the web server is down is very small. Much more likely is that you experience 86 min of downtime each month, half caused by the DB and half by the web server.

Back to scaling causing problems with availability. Let’s take the same example, a single web server and a single DB server, both with 99.9% availability. If our database is starting to get busy and we decide to split it, most likely we’d start by adding a read slave (X-Axis split), where the write queries (insert, update, delete) go to the master and the reads (select) go to the slave. To accomplish this we need to introduce another piece of hardware and replicate the database. If the web pages in our system require both read and write queries to the DB, then we’ve just decreased the overall system availability by increasing its scalability. This is a very simplistic example and makes a lot of assumptions but hopefully it gets the point across that you can actually decrease your availability by increasing your scalability.

So why make this tradeoff? In most cases the availability of our hardware is much higher than three-nines so the addition of a small amount of downtime is worth the gain in scalability. Also, by using swim lanes we can mitigate this by splitting our downtime across parts of our users, effectively cutting downtime in half with our first swim lane split.

All of this reminds us that scalability is much more of an art than a science, hence the name of our first book The Art of Scalability. But don’t despair, there are definite rules that govern how to scale effectively, such as the X, Y, or Z Axis splits, and why we’re calling this book Scalability Rules. You just need to use art in applying them. As an analogy, think about an artist painting. Mixing red with blue will always result in purple, a rule, but how the artist applies that color to the canvas is pure art.


1 comment