AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

The Future of IaaS and PaaS

Even though I’m a fan of technology futurist, I’m not much of a prognosticator myself. However, some recent announcements from Amazon and recent work with some clients got me thinking about the future of Infrastructure as a Service (IaaS) such as Amazon’s AWS and Platform as a Service (PaaS) such as Google’s App Engine or Microsoft’s Azure.

Amazon’s most recent announcement was about Beanstalk. In case you missed it this new service is a combination of existing services, according to their announcement “AWS Elastic Beanstalk is an even easier way for you to quickly deploy and manage applications in the AWS cloud. You simply upload your application, and Elastic Beanstalk automatically handles the deployment details of capacity provisioning, load balancing, auto-scaling, and application health monitoring.” This sounds like a move towards the PaaS to me but the announcement made a point that users retained the ability for total control if desired. It states “…you retain full control over the AWS resources powering your application and can access the underlying resources at any time.”

Werner Vogel, Amazon’s CTO, stated on his blog the need for Beanstalk was in dealing with the complexity involved in managing the entire software stack, which to me id the reason the concept of PaaS was developed. He cites examples already in use of Heroku and Engine Yard for Ruby on Rails, CloudFoundry for Springsource, Acquia for Drupal, and phpfrog for PHP. He states “These platforms take away much of the ‘muck’ of software development to the extent that most RoR developers these days will choose to run on a platform instead of managing the whole stack themselves.” This to me sounds like a blurring of the lines between IaaS and PaaS.

Another item, that we’ve actually written about at the end of last year, is the concept of DevOps. This idea which has gained popularity recently acknowledges the interdependence of development and operations in order to producing timely software products and services. Software developers in many organizations need simpler consolidated platform services in order to procure, deploy, and support virtual instances themselves. This is another push for PaaS platforms but with the flexibility for control when necessary.

Market predictions for cloud services in 2014 span from $55B according to IDC up to $148B according to Gartner. Regardless of the exact number, the trend is double digit growth for many years to come. While the market will pressure for commoditization of these services, providers will resist this through service differentiation. This attempt at differentiation will come in the form of add-on features and simplification across the entire PDLC.

The future of Iaas and PaaS is a blurring of the lines between the two. IaaS providers will offer simpler alternatives while still offering full control and PaaS providers will likely start allowing greater control to attract larger markets. Let us know if you have any thoughts on the future of IaaS or PaaS or both.


1 comment

How to Setup a Failover Server on EC2

We started working on Amazon’s EC2 instances several years ago. Eventually we moved several of our hosted environments to the cloud and used scripts to backup the MySQL DB’s and file systems to S3. While the EC2 instances are pretty stable, like everything else they do occasionally fail. Since Amazon offers an elastic load balancer solution I started there. The setup is incredibly simple through the AWS UI and the cost is pretty reasonable at $0.025 per hour and $0.008 per GB. The problem with Amazon’s elastic load balancer solution is that you can’t associate an IP with it and can only address it by the domain name that Amazon has assigned. This prevents Amazon’s elastic LB from being able to be used for a primary domain. You can only use Amazon’s LB solution for sub-domains. This wasn’t acceptable so I started looking at at alternatives.

HAProxy was top of my list for an open source LB because of it’s ease of configuration, performance, and wide adoption. What I didn’t like this solution is that because it is in the path of traffic it requires two servers setup in HA mode, lest I cause more issues than I solve. This unfortunately doubles the cost of server instances. Additionally several environments that I was considering load balancing were running CMS systems not designed for active-active so without some hacking they would be running in active-passive mode. I started thinking about an alternative solution.

What I came up with was setting up a failover server with a script to monitor and control the failover execution. I believe this solution balances cost, complexity, and availability for small sites that are not critical, i.e. a company’s blog. If your site IS your business then you need to move forward with a properly load balanced, active-active solution.

The first thing you’ll need to do is to setup two additional servers. One is your replica or failover server that you’ll host your site/DB from when the primary fails. The second server is for monitoring and controlling the failover. For my failover server I used MySQL master-slave replication, which is pretty straight forward to setup and not going to be covered here. On the monitoring server my plan was to rely on Amazon’s AWS API tools to disassociate my IP and re-associate it with my failover server. In order to use these tools you need a JRE on your monitoring server. For setting this up I followed the instructions on this site.

Once you’ve setup the replica and monitoring servers, you need a script to monitor and control the failover. I used a bash shell script that curl’s the desired test page and greps for something that I know loads at the bottom of the page such as a Google analytics ID. If the load fails the script inserts the current timestamp into a file. If the page loads successfully it empties the file. The reason for this is that I didn’t want to alert or failover just because of one missed page load or because of missed page loads that were not sequential.

#!/bin/sh
FILE=akf_blog_err_cnt.txt
if curl -s http://mysite.com/ | grep -c UA-12345 > /dev/null 2>&1
then echo > $FILE
else echo $(date) >> $FILE
fi

The next step is to add the logic for counting the number of timestamps in the file.

ERR_CNT=0
while [ $ERR_CNT -lt $(cat $FILE | wc -l) ]
do
let ERR_CNT=ERR_CNT+1
done

Now compare that count to a maximum allowable number of failures. In my case if I don’t get a successful page response in 5 attempts I want to initiate the alert and failover. Since this script is designed to run via cron periodically and not as a persistent service, I’ve added a semaphore file to identify if the site has failed over. This will prevent the script from continuously trying to failover.

The actual failover control has a few steps. The first is to send out an email alert so that I know something has gone awry. The next is to stop the MySQL slave on the failover server. Since this is going to start taking traffic I don’t want it applying any more logs from the master. I’m using SSH with a key to execute a remote command. The last two steps are to disassociate the IP from the failed server and re-associate it to the failover server. These commands are part of the AWS API tool.

MAX_ERR=5
FAILED_FLAG=akf_blog_fail.txt
if [ $ERR_CNT -gt $MAX_ERR ]&&[ ! -f FAILED_FLAG ]
then
# Send email about failure
echo “The page did not loading more than $MAX_ERR times. Shifting to backup server.” | /bin/mail -s “Site NOT Loading” michael@akfpartners.com
# Stop slave
echo $( ssh -i /key.pem user@ec2-IP-address.amazonaws.com ‘mysql -Bse “stop slave” ‘)
# Shift IP to secondary server
echo $(ec2-disassociate-address 50.72.23.173)
echo $(ec2-associate-address 50.72.23.173 -i i-3950994)
# Mark as failed over
echo $(touch $FAILED_FLAG)
else echo “The test page has less than $MAX_ERR errors”
fi

Now, place this script in your cron jobs to run every minute. That’s it for setting up the failover monitor and control script. Because this monitoring server is not in the direct route of traffic I don’t need it setup as HA. A total failure of the system would require both the monitoring server and the primary site server to fail simultaneously. But because I’m pretty paranoid I do have an external monitoring service watching over the site and the monitoring server.


1 comment

Why A Technology Leader Should Code

After I left the military, I started in corporate America as a software developer. I spent several years programming on various projects in a variety of languages. Perhaps more quickly than I wanted, I entered the management ranks. Starting as an engineering manager, I progressed into a number of executive roles including VP of Engineering, CIO, and CTO. It has now been well over a decade for me as a manager and executive but through these years I have continued to program. From the technology executives that I’ve met this is fairly unusual. Most tech execs gladly give up programming upon entering management and never look back.

I’ve never considered myself a great programmer and what I do today compared to a professional developer is like comparing a weekend gardener with an industrial farmer. Recently I’ve been considering whether continuing to program is clutching to my technical youth or actually beneficial as a technology leader. We’ve written about How Technical a CTO Should Be but here are a few more specific thoughts on programming.

Technical and Tactical Proficiency
As a junior officer I was taught that in order to lead one had to be “technically and tactically” proficient. I owed it to the soldiers in my unit to understand the equipment our unit employed and the basic combat tactics that we would be following. This concept has stuck with me and I believe that technology leaders need to understand the tools that their team is working with and the processes that they are following. The exact level of understanding is a personal choice and highly debatable. For me, I like if at all possible to have hands on experience. Periodically having to code a feature and deploy it will provide the engineering manager a better understanding and appreciation for what her engineers go through on a daily basis.

Tangible Results
Leading people can be one of the most challenging and yet rewarding jobs. Getting a team to buy into a single vision and motivating them to deliver that vision is a day-to-day challenge that can wear the best of us down. When that team finally delivers or when the junior employee that you’ve been coaching starts performing like the star that you knew they could be, it all seems worth it. Unfortunately, those reward days are months or years in between. During the interim days and weeks it can be difficult to not achieve tangible results.

This is where programming fits. Coding provides immediate feedback and accomplishment of short-term goals. When your function works perfectly the first time you test it or when the solution to that very difficult problem becomes clear, you receive instant gratification and tangible results.

Some leaders use other hobbies like woodworking or gardening to provide this short-term gratification. Start working on a garden and within a couple of hours or days you can see the impact of your work. The ground is turned over, weeds are removed, seeds are planted. After a couple of weeks or months the project is completed with the results on your dinner table, proof of your achievement.

While these physical activities are enjoyable and rewarding they don’t expand your knowledge of developing systems. Consider deliberate practice by picking up a programming project to receive tangible rewards and improve your technical and tactical proficiency.


2 comments

Setting Up CloudFront with an Origin Server

We have a couple of sites hosted on Amazon’s EC2 and I wanted to implement the CDN product from Amazon called CloudFront to see what performance improvements we could achieve. Having setup other CDN’s for sites I figured this would be a pretty straightforward setup, not worthy of a post. Unfortunately, this turned out to not be the case and thus I thought I should write something up for anyone else interested in a similar setup.

As background, a CDN (Content Delivery Network) is used to host mostly static content (files that don’t change often) on what are called “edge servers” instead of just your servers, called origin servers. Typically there are many hundreds or thousands of edge servers that are geographically distributed across multiple backbone providers. This makes them much closer to your customers resulting in faster download of your files to their browsers and thus better page performance while on your site.

CloudFront is designed to use Amazon’s S3 storage as its source for objects (static files like images or videos). I didn’t want to pay for the additional storage, although it is very cheap, but most importantly I did not want another failure point in the architecture. This setup might also be useful for sites not hosted on EC2 but wanting to use CloudFront. Wanting CloudFront to pull objects directly from my server I went looking for how others had solved this problem. It turns out this is possible to setup a CloudFront “distribution” (a term Amazon uses to refer to an implementation) using an origin server instead of S3 but only through Amazon’s CloudFront API, documentation here. Once the distribution is setup you can adminster it from the AWS web interface.

I started playing with the API using CURL but realized after a few attempts that the process was a little more complicated and in order to have something repeatable I’d need to write a little code. Since I had already borrowed the HMAC-SHA1 function, required for API authorization, from here which was in PHP, I continued with PHP. Here is the complete program if you’re interested but below are the major steps.

Major Steps
Here are the major steps in the program.

1) Define XML Payload: Using the “DistributionConfig” method, you set the “CustomOrigin” instead of “S3″ and define the following variables:

  • DNSName – this is the domain you are setting up the CDN for.
  • HTTPPort – what ports are your secure and unsecure traffic on?
  • CNAME – what subdomain will you use in DNS to refer to the CDN? I used “cdn1.akfpartners.com” because I planned on changing all my references to static items (images, js, css, etc) to call this subdomain.
  • Enabled – do you want this enabled right away?
  • CallerReference – this is an ID to keep your requests unique.
  • DefaultRootObject – this is the default file that will be requested if no file is explicitly called.

2) Encode Authorization String: The CloudFront API requires that you encode the date formatted as such “Thu, 30 Dec 2010 16:05:21 EST” using HMAC-SHA1 with your secret access key.

3) Set Headers: The most important header is the “Authorization” header that requires the following format “Authorization: AWS public_access_key:encoded_date”.

4) Set CURL Options: There are a few CURL options that are required

  • URL – the URL to be called is “https://cloudfront.amazonaws.com/2010-11-01/distribution/”
  • POST – the API is a REST so you need to set the CURL to POST
  • TIMEOUT – how long before the request times out

5) Execute API Request: I wrapped the request in microtime calls to see how long the transaction took and captured the results of the request.

6) Parse Results: If successful the result will be a 201 reference meaning “created”. Otherwise there are a bunch of errors that can be sent back.

Once your program is ready just execute it and hopefully you get back a 201. Once you’re successful jump into the AWS console and you should see your distribution being created. It usually takes about 5 minutes until the distribution is completely ready.

DNS & Application Changes
The next step is to setup your DNS to use this CloudFront distribution. In the AWS console you will see the URL that Amazon has assigned to your CDN distribution, something like “d75x0jxgmx7op.cloudfront.net”. Simply take that URL and create a CNAME through your DNS provider to point your subdomain to the Amazon URL. My entry looked like this:

cdn1.akfpartners.com Alias (CNAME) d75x0jxgmx7op.cloudfront.net

Once you have DNS setup and propogated, remember that depending on your DNS provider’s TTL this might take 24 hrs or more, then you can change your application’s reference to static images. For the sites that I was implementing this for we used MediaWiki, Expression Engine (EE), and WordPress. The wiki just required a change to one PHP file, LocalSettings.php. For EE it took a change to a CSS file and in several templates replacing the {site_url} with a reference to the CNAME. For WordPress there is a plugin that helps with this reference replacement if you don’t want to hack the file by hand.

That’s it! Your site should now be up and running with Amazon’s CloudFront CDN.

Was There a Peformance Improvement?

This is really the big question, was this exercise, slightly more than point & click that I thought it would be, worth it? Well the wiki that I set this up for was ridiculously fast already and it had almost no images so the results weren’t that impressive. Our site, akfpartners.com, was already pretty fast as well but it does contain numerous images, JS, and CSS files. Using webpagetest.org, I ran the test several times averaging the results. The table below shows the results.

Here is a screenshot of the output of WebPageTest.org for a run with CloudFront enabled. Notice that it assigned us an “A” for use of a CDN whereas before we received an “F”.

(Click to Enlarge)

A 6.1% improvement doesn’t seem like that much until you consider Google’s statement that decreasing web search latency from 400 ms to 100 ms increases the daily number of searches per user by up to 0.6%. Increasing your site’s speed by just a small amount can have significant increases in repeat visitors and time on site.

Good luck with your CloudFront implementation.


Comments Off

Scalability Rules TOC

We’ve completed the first draft of our new book “Scalability Rules – 50 Principles For Scaling Web Sites” and wanted to share the table of contents with everyone. We have a terrific team of technical editors who are reviewing every rule in detail but would also like to offer this opportunity to anyone else so inclined. Our publisher, Addison-Wesley Professional, has posted the draft versions of Chapters 1-5 (Rules 1-19) on line at Safari Rough Cuts and should have a couple more chapters available soon. If you’re interested in a sneak preview or would like to provide feedback, sign up and take a look. Below is the book’s table of contents.

Chapter 1 – Reduce the Equation

  • Rule 1 Don’t Over Engineer The Solution
  • Rule 2 Design Scale Into the Solution (D-I-D Process)
  • Rule 3 Simplify the Solution 3 Times Over
  • Rule 4 Reduce DNS Lookups
  • Rule 5 Reduce Objects Where Possible
  • Rule 6 Use Homogenous Networks

Chapter 2 – Distribute Your Work

  • Rule 7 Design to Split Reads and Writes (X axis)
  • Rule 8 Design to Split Different Things (Y axis)
  • Rule 9 Design to Split Similar Things   (Z axis)

Chapter 3 – Design to Scale Out Horizontally

  • Rule 10 Design Your Solution to Scale Out – Not Just Up
  • Rule 11 Use Commodity Systems (Goldfish not Thoroughbreds)
  • Rule 12 Scale Out Your Data Centers
  • Rule 13 Design to Leverage the Cloud

Chapter 4 – Use The Right Tools

  • Rule 14 Use Databases Appropriately
  • Rule 15 Firewalls, Firewalls, Everywhere!
  • Rule 16 Actively Use Log Files

Chapter 5 – Don’t Duplicate Your Work (Nov 30th)

  • Rule 17 Don’t Check Your Work
  • Rule 18 Stop Redirecting Traffic
  • Rule 19 Relax Temporal Constraints

Chapter 6 – Use Caching Aggressively

  • Rule 20 Leverage CDNs
  • Rule 21 Use Expires Headers
  • Rule 22 Cache Ajax Calls
  • Rule 23 Leverage Page Caches
  • Rule 24 Utilize Application Caches
  • Rule 25 Make Use of Object Caches
  • Rule 26 Put Object Caches on Their Own “Tier”

Chapter 7 – Learn From Your Mistakes

  • Rule 27 Learn Aggressively
  • Rule 28 Don’t Rely on QA to Find Mistakes
  • Rule 29 Failing to Design for Rollback Is Designing to Fail
  • Rule 30 Discuss and Learn from Failures

Chapter 8 – Database Rules

  • Rule 31 Be Aware of Costly Relationships
  • Rule 32 Use the Right Type of Database Locks
  • Rule 33 Pass on Using Multi-phase Commits
  • Rule 34 Try Not to Use “Select For Update”
  • Rule 35 Don’t Select Everything

Chapter 9 – Design for Fault Tolerance and Graceful Failure

  • Rule 36 Design Using Fault Isolative “Swim Lanes”
  • Rule 37 Never Trust Single Points of Failure
  • Rule 38 Avoid Putting Systems in Series
  • Rule 39 Ensure You Can Wire On and Off Functions

Chapter 10 – Avoid or Distribute State

  • Rule 40 Strive For Statelessness
  • Rule 41 Maintain Sessions in the Browser When Possible
  • Rule 42 Make Use of a Distributed Cache For States

Chapter 11 – Asynchronous Communication and Message Buses

  • Rule 43 Communicate Asynchronously As Much As Possible
  • Rule 44 Ensure Your Message Bus Can Scale
  • Rule 45 Avoid Overcrowding Your Message Bus

Chapter 12 – Miscellaneous Rules

  • Rule 46 Be Wary of Scaling Through 3rd Parties
  • Rule 47 Purge, Archive, and Cost-justify Storage
  • Rule 48 Remove Business Intelligence from Transaction Processing
  • Rule 49 Design Your Application to Be Monitored

Chapter 13 – Rule Review and Prioritization


2 comments

Outbrain’s CTO on Scalability

I had the opportunity to speak with Ori Lahav, Outbrain’s co-founder and CTO, about his experience scaling Outbrain to handle 1.5 billion page views in just three years.  Outbrain is a content recommendation service that provides for blogs and articles what Netflix does for videos or Amazon does for products.

Ori is oversees the R&D center for the company located in the heart of Israel’s technology center. Prior to founding Outbrain, Ori led the R&D groups at Shopping.com (acquired by eBay) in search and classification. Before joining the internet revolution Ori led the video streaming Server Group at Vsoft.

Below are my notes from the interview.  Here is the link to the audio version but the quality is fairly poor.

What is your background and how did you come to start Outbrain?
Outbrain was founded by Yaron Galai and myself (Navy friendship).  Before that I spent 3.5 years at shopping.com, which was acquired by eBay, leading software groups.  I liked the challenges of a start-up and joining forces with Yaron was a great opportunity.

How has Outbrain grown over the past couple of years?
We did some false starts with several directions but when we started with recommendations on content sites it started catching fire.  We started with self serve blogs, then professional blogs, small publishers, national publishers, and now we are international.  In 3 years we grew from 0 to over 1.5B page views, 3 production servers to over 150, 0 system administrators to 4, and from 3 developers to 15 today.

How has your architecture changed over this period?
Surprisingly… not much, it was first 2 application servers and a MySQL database.  Then we started adding more application servers and replicating the MySQL.  As the product grew, we added more and more components including memcached, Solr, TokyoTyrant, and Cassandra.  We recently added layer that warms the caches and is being notified by ActiveMQ.

On the backend, we have machines that are fetching content from the sites we are on.  We gather article text and images and save them to MogileFS where we have indexers that index them in Solr.
We started investing in reporting infrastructure and started with MySql but with the amount of data we have we shortly understood that Hadoop is the way to go – we now have a cluster of more than 10 nodes in our Hadoop cluster.

What is Outbrains view of scalability?
First – scalability is culture – if you think big – you will be big.  The regular rules of scalability apply here too:

  • No singles
  • Scale out instead of up
  • Replicate data to ease the load
  • Shard data to scale
  • Utilize commodity hardware

Specifically for Outbrain I would add:

  • open-source is crucial for scalability
  • be cost conscious
  • build many simple/small environments and not a single big one

How have you managed data centers to be so cost effective?
We simplify it!  We create many small datacenters and not a big one.  We have no upfront costs of network gear (stacking) and real-estate (Cages).  We use open-source for our network (load balancer and firewalls) and infrastructure (OS).

In order to ease the step function of cost, grow as you go. Our data center provider, Atlantic Metro, has helped with metered power instead of paying by the circuit.  This way we’re motivated to power down servers not needed.

What has been the most difficult scalability challenge?
Scaling the team…we are more of a patrol boat DNA not an aircraft Carrier DNA.  Technology challenges are fun – never too hard but the team and process are much more challenging.

We have also started using the Continuous Deployment process and this has been a great help. By empowering the team members to act and change the system as they need – you can grow to be an Aircraft carrier and still keep the maneuverability of a patrol boat.

What do you think of the NoSQL movement?
We are big fans.  We use Hadoop, Hive, Cassandra, TokyoTyrant, MySql, and others.  This helps us maintain our low serving cost and attract the type of talent we need on the team.
Outbrain is proud to be 100% open-source.  We use open-source from the office telephony system to all platforms and network infrastructure.

And… we are hiring top technology talent in Israel, so contact Outrbain if you are interested.


1 comment

Book Review: The Lords of Strategy

OK, this isn’t a book review as much as it is a comparison of how the iterative and rapid “productization” of strategy closely parallels Agile methods of software development.  But first, here’s an overview of a particularly good book – Walter Kiechel’s The Lords of Strategy.   I flew through the book – not because I wanted it to end but because I couldn’t put it down.  It’s an incredible history of the people, organizations and ideas that developed the concept of corporate strategy and it’s full of incredible facts and observations.  Take for instance that the notion of strategy consulting as purveyed by the likes of BCG, Bain and McKinsey is only roughly 30 to 35 years old and that perhaps even more interestingly the notion that a company exists to create shareholder wealth is only about 30 years old.  The book does a great job of explaining not only the history of the ideas behind strategy consulting, sometimes told alongside the biography of their inventors, but how those ideas affected the industry for better and for worse.  Ultimately it describes how these ideas “quickened the pace of capitalism” though the reader is left with figuring out whether we are better or worse off for the change.

What struck me as particularly interesting in this book is the parallels one can draw between how corporate strategy (including the product and services surrounding it) developed and how Agile methods of solution development should work.  The germinating idea behind strategy was the identification of the “experience curve” by the Boston Consulting Group.  This curve identified through trend analysis that the longer and more experienced a company became, the lower its cost of producing a certain good or service.  This notion, though flawed (experience alone isn’t what drives cost), came quickly and was brought to market quickly by BCG.  In rapid fashion, the company built upon this to develop the growth-share matrix as its second “product” offering.  Both of these ideas together led to a grouping of offerings that suggested companies take on debt, reduce costs, differentiate themselves on price and expand shareholder value.  The success of BCG led to McKinsey joining the ranks of strategy consultants, Harvard Business School changing its curriculum (via Michael Porter who had now built his own strategy framework – the famous 5 Forces analysis) and created Bain and Company.

Key here is the evolutionary nature of strategy as a product.  In the very early phases, the offerings were quite frankly wrong.  We know now that the notion that companies differentiate themselves on price alone in every industry is flawed.  But the firms and institutions that supported strategy as a product and intellectual endeavor did not try to offer the absolute best solution – they attempted to bring an appropriate solution to the market and then modify it from there.  In effect, for the time, their solution was the minimum viable product.  Did their approach work?  Billions of dollars of consulting revenue and profits and billions in market value would argue it was an effective approach.

While these companies didn’t realize it at the time, they were in fact practicing agile development.  They didn’t know end user requirements – how could they?  The market wasn’t created yet.  They created a quick offering and iterated upon it, simultaneously changing the market demand and adapting to both the shifting demand and their growing understanding of what strategy needed to become.

Where else might agile methods apply?


1 comment

Newsletter – Firesheep

Below is part of our Fall 2010 Newsletter.  If you haven’t subscribed yet, click here to do so.

In this newsletter:

Scalability Rules

In between working with some terrific new clients this year we’ve been busy writing the second book, Scalability Rules.  With the help of some terrific technical reviewers we feel the book is taking shape very nicely and the first five chapters are now available on Safari Rough Cuts for those interested in helping review.  Scalability Rulesbrings together 50 rules that we have gathered from our experiences working with over a hundred hyper-growth companies. This format of practical rules of scalability should make it ideal for use as reference manual in formal meetings and informal discussions.

See More…

If you haven’t picked up your copy of The Art of Scalability or have technologist on your holiday gift list here are a couple links for you:

Putting Out Firesheep (Protecting Your Users’ Cookies)

One of our recent blog posts that we found most interesting was submitted by a guest blogger.  Randy Wigginton is a seasoned technologist who after a discussion with us about the security risks, brought to light recently by the Firefox plugin called Firesheep, came up with a solution that we thought should be shared.  Randy’s solution is ideal for companies who want to protect their user session data (login, browsing history, etc) but doesn’t want to be encumbered by the overhead of running their entire site behind SSL.

We also have a simple demo setup for those interesting in testing this solution.  The way it works is when a user logs in, TWO cookies are dropped.  In the demo, one is called “session”, the other is called “authenticate”.  These two cookies are identical except for a single attribute: “authenticate” is a secure cookie.  We authenticate users on non-secure pages by including a reference to a secure javascript at the top of each page.  At the top of pages requiring authentication is this simple line of code:

<script type=”text/javascript” src=”https://verify.akfdemo.com/authenticate.php“>
</script>

See More…

Lot18′s Series A

Lot18, a membership-by-invitation marketplace for wine from renowned producers at fantastic values, announced that it has completed a $3 million Series A round of funding led by FirstMark Capital, a New York City-based venture capital firm. Lot18 was founded by Kevin Fortuna and Philip James. Philip was the founder of Snooth.com, the world’s largest wine website, and Kevin was most recently a partner at AKF partners.

Lot18 Screen Shot

See More…


Comments Off

DevOps

What do you call a set of processes or systems for coordination between development and operations teams? Give up? Try “DevOps”. While not a new concept, we’ve been living and suggesting ARB and JAD as cornerstones of this coordination for years, but it has recently grown into a discipline of its own. Wikipedia states that DevOps “relates to the emerging understanding of the interdependence of development and operations in meeting a business’ goal to producing timely software products and services.” Tracking down the history of the DevOps Wikipedia page, shows that this topic is a recent entry.

There are a lot of other resources on the web that many not have been using this exact term but have certainly been dealing with the development and operations coordination challenge for years.  Dev2Ops.org is one such group and posted earlier this year their definition of DevOps “an umbrella concept that refers to anything that smoothes out the interaction between development and operations.”  They continue in their post highlighting that concept of DevOps is in response to the growing awareness of a disconnect between development and operations. While I think that is correct I think it’s only partially the reason for the recent interest in defining DevOps.

With ideas such as continuous deployment and Amazon’s two-pizza rule for highly autonomous dev/ops teams there is a blurring of roles between development and operations. Another driver of this movement is cloud computing. Developers can procure, deploy, and support virtual instances much easier than ever before with the advent of GUI or API based cloud control interfaces. What used to be clearly defined career paths and sets of responsibilities are now being blended to create a new, more efficient and highly sought after technologist. A developer who understands operations support or a system administrator who understands programming are utility players that are very valuable.

While perhaps DevOps is a new term to an old problem, it is promising to realize that organizations are taking interest in the challenges of coordination between development and operations. It is even more important that organizations pay attention to this topic given the blurring of roles.


Comments Off

How To Restore Service in Less Than 5 Minutes

What’s the first thing you do when your site is down? For most people they pull up Nagios, or the like, and check all the servers, databases, and storage systems. Someone else might start tail’ing or grep’ing the log files. Tech executives by now are answering phone calls or sending email updates about the outage and expected downtime. Software developers are called in go over the log files in more detail and network engineers are asked to jump on devices to make sure they are responding properly.

What’s missing from the above scenario? Nobody looked up the last change that went into production. In our experience, 90+% of the problems in production are caused by the latest change, be it a code release, firewall change, or applying DDL or DML to the database. And it’s a sure bet that latest change is the problem if the person who made it says “That couldn’t have caused the outage.” In fact there is probably a high degree of correlation between how emphatically they make their statement and the probability that it is the cause of the incident.

Just the other day one of our friends had an outage call where the network security team was arguing that their latest change could not have possibly caused the outage. Guess what caused the outage…that’s right the firewall change.

So, how do you solve 90+% of your problems in less than 5 minutes? You immediately rollback the last change you made to your production environment. You might be saying to yourself “But how can I do that when I don’t know all the changes that are happening in my production environment?” And that (as Paul Harvey used to say) is the rest of the story.

You have to keep track of every single change that takes place in your production environment. This is called “change tracking” and is different from “change management”. Change tracking is simply keeping track, in any format, of all the changes that happen in production. These changes can be kept in a word document, spreadsheet, database, IRC channel, or even an unmonitored email account. Anything that 1) allows fast entry, so people have no excuse to not use it, and 2) can be retrieved immediately when needed during an outage.


1 comment