One of the first topics we teach in our CTO Accelerator workshops and webinars is speaking the language of the business. This article provides an example how to run your engineering team like a business.

This analogy was first introduced in an ITSM context by a leading ITSM vendor. We've modified this for SaaS or eCommerce engineering team. Running your organization like a business makes it easier to communicate with the CEO, Board, and other key stakeholders.

We'll use the analogy of owning a set of bakeries with running a medium to large Engineering team. We'll walk through common processes from incident and problem management to asset management. These examples can be used to guide your engineering teams and to communicate with the CEO and CFO.

Incident and Problem Management

In the bakery example, we have an example of the different between an incident and a problem.

  • Incident - a cookie is burned
  • Problem - a lot of cookies are burned from Bakery #2
  • Root cause analysis (why) - a faulty thermostat on the oven

In the technology example, we can have multiple incidents relate to a one root cause problem. We can also have multiple problems due to the interconnectedness of components

A mature engineering organization has integrated monitoring into incident and problem management processes. A mature team separates (but relates) incidents from problems.  

Change and Release Management

In the burnt cookies bakery example, the analogy is:

  • Change (request for) - When can we change the faulty thermostat? Who will make the change?
  • Release (actual change to baker) - We will change it tonight when we normally clean the ovens. Our contractor will replace the thermostat and test it.

In the technology example, mature SaaS teams have scheduled or planned releases. Requests for change can occur in a weekly Change Advisory Board meeting. With automated CI/CD, the Request for Change occurs in the backlog grooming and sprint planning the Release occurs automatically whenever the feature or fix passes all automated tests. Larger, more complex changes identify dependencies in architecture review meetings.  Use key time-to-market and quality metrics  metrics to optimize: 

  1. time to market - how fast and frequently you can release
  2. quality - how frequently mistakes occur and how fast you can recover from them

    In the mature example, automated or scripted releases must have rollback steps. In particular, database schema changes must have rollback steps.   

    Asset Management

    In the bakery example, asset management could occur at several levels:

    • Facility - the bakeries, offsite warehouse, and 3rd party inventory holders
    • Physical assets - this includes ovens, coffee machines, displays. These may be tracked at a serial number level. Inventory is important because it is likely depreciated and/or has a warranty.
    • Subcomponents - For bigger items, you have parts like thermostats, light bulbs, and knives. Unless they cost a lot, they usually just have SKUs, not serial numbers. They are probably tracked at the inventory level. They likely don't have warranties or depreciation.

    In the technology example, the bare minimum for a CTO should be a table (confluence) or file (Excel) of key systems. This list of systems should have key attributes such as primary uses, key risks, overall health, and subject matter experts (SMEs). A large platform may have an internal or external feature matrix. A more robust solution is automated tracking of assets in your cloud provider such as AWS. You may also have asset tracking for laptops. More comprehensive solutions help track overall spend, risks of older systems, references to warranties, etc.

    SLA Management

    In the bakery example, SLAs may be formal or informal to help guide decision making. To determine when to replace the faulty thermostat, key SLAs are:

    • Time to fix - do we have any commitments to get the oven fixed in 2 vs 24 vs 48 hours? How much lead time do we need to get our contractor scheduled to replace the thermostat?
    • If we don't fix it - do we have any contractual recourse if the contractor no-shows to fix the thermostat? Do we need to provide a perk for the customers who received the burned cookies?

    A mature engineering organization has uptime commitments—at least internally. Not all systems require the same uptime, but core systems should be designed for 4 9s. A mature organization identifies all third-party commitments for uptimes. Any tightly coupled systems have worse uptime than the worst component in a call chain. That component is often a third-party service.

    Monitoring

    In the bakery example, monitoring can come from multiple components and humans. Extending the burnt cookie analogy:

    • Smoke alarm – did an alarm trigger? Why not?
    • Oven alerting – did the control panel show a warning that the thermostat wasn’t working?
    • Visual inspection – did the baker not see or notice the cookies were burnt?
    • Olfactory inspection – did the baker not smell a burning cookie?

    In the engineering world, we look for several levels of alerting to detect and mitigate problems from having wider impacts:

    • Business metric monitoring – are user patterns changing from normal behavior?
    • Synthetic monitoring – to confirm connected components are working before users attempt them.
    • Unified event management – to centralize and aggregate monitoring and alerting

    Continuity Management

    In the bakery example, business continuity management is the preparation by leadership and employees for various scenarios that are documented and periodically practiced:

    • Single bakery can’t open
    • Key supplier can’t deliver
    • Unplanned local disaster such as a flood
    • Macro level changes such as new federal or state legislative or executive bureaucracy quickly changes policies

    In the engineering world, a mature organization should plan for failure scenarios such as:

    • A key 3rd party service provider is down – how to gracefully notify the user and/or failover to an alternate service provider
    • The primary cloud Availability Zone is unavailable – most cloud service providers support automatic failover within a region
    • The primary cloud Region is unavailable – cloud providers are providing improved capabilities, but the buyer is responsible for these more complex failover scenarios
    • Losing a key team member – who in turn may take more co-workers with them

    Knowledge & Content Management

    In the bakery example, this includes the standard operating procedures to onboard and train employees quickly:

    • A single cookie recipe
    • All of the recipes for all items sold
    • Variations of recipes due to different components – e.g. a different mixer or ingredient provider yields different outcomes

    In the engineering example, knowledge and management is often spread across various systems such as:

    • A wiki (e.g. Confluence)
    • A ticketing system (e.g. Salesforce, ServiceNow)
    • A BCP/DR platform (e.g. SharePoint)
    • Noggins

    An engineering leader will need to strike the right balance of capturing information (the author) and sharing information (the reader attempting to search for it).

    Financial and HR Management

    In the bakery example, financial and human resource management may be allocated to different groups:

    • A local manager may be responsible for all aspects of procuring goods and 3rd party services and hiring the workforce.
    • A larger company may have a Procurement function and/or an HR function that manages the financial aspects of goods, services, and the workforce.

    An engineering leader should have the mindset and skillset of the local manager of the bakery and the corporate manager of all bakeries. Each report to the CTO should lead:

    • Procurement standards for engineering services.  Many large procurement teams use the same negotiating tactics across all functions.  This often puts engineering leaders in a bind when procurement focuses on price over quality or outcomes.
    • Hiring standards and processes for engineers.  HR leaders need input from engineering leaders for job families and responsibilities.  HR can help guide salary bands, but engineering leaders need to identify when salary bands are out of alignment for your organization.

    Time Tracking

    In the bakery example, time tracking is important for several scenarios to validate SLAs:

    • How long to repair an oven
    • How long to order a cookie from the counter versus a mobile app
    • How long to fulfill a catering order
    • How long to expand a bakery

    In the engineering world, larger teams eventually track time for quality and efficiency reasons. This often includes a mix of:

    • Timesheets – particularly for project based work or hourly billing
    • Timers – to capture smaller increments of work
    • Automation – to avoid human error in capturing time for each tasks whether performed by a human or a machine

    A good engineering organization has a Lean focus to eliminate waste and improve throughput. 

    Asset Discovery

    Asset discovery augments asset management.  Asset discovery has traditionally not been available for a bakery. With more smart components and asset scanning, it is easier to have near real time inventory for:

    • Components in each oven
    • All components and machines in the bakery
    • All components and machines in offsite storage
    • Most key ingredients for the baked goods via smart sensors

    For an engineering organization, asset discovery is often tied to monitoring and security solutions to detect:

    • All versions of software on all servers and desktops
    • All cloud providers used by core services and key employees (e.g. how many of my employees are using their own licensed LLM for work purposes)

    Automated discovery helps manage costs and potential security risks. It is also helps identify opportunities for expanding usage of some platforms and deprecating others.

    Project, Program and Portfolio Mgmt.

    In the bakery analogy, a growing organization will encounter these different types of project work:

    • Expanding a bakery to support more customers
    • Testing catering from a single bakery to increase revenue and leverage underutilized resources
    • Upgrading all ovens to add smart features and more sensors
    • Acquiring or selling a single bakery

    Engineering organizations have similar types of work:

    • Simple to medium size projects to pay down technical debt such as eliminating stored procedures
    • Launching a new feature
    • Regularly practice failover scenarios
    • Quickly perform a due diligence on an acquisition target

    These are different types of work. A good engineering leader recognizes which of these should follow waterfall, Agile, and/or lean process design

    Service Ordering and Provisioning

    In the bakery analogy, this is:

    • The menu for off the shelf and custom orders
    • The price list
    • Any variations by bakery

    In the engineering world, this is the list of business capabilities your platforms and teams provide.   

    You may have a service catalog for internal facing services.  You may have a developer portal for exposing and managing your APIs.  The actual service or product may be embedded in your core platform and/or marketing and sales activities bring prospects along the customer journey.  You likely also have a customer facing ticketing system to submit and track issues and bugs.

    Configuration Management

    Configuration management (CM) is the linkage of all the disciplines discussed in this article.

    In the bakery example, CM is the ability to link/relate these documents:

    • Building blueprints with layouts of machines, storage locations, and customer seating
    • The recipes
    • The procure manuals
    • The time schedules and tracking for all employees

    In the engineering world, a Configuration Management Database (CDMB) describes the relationships between:

    • The services provided to customers
    • The services consumed by each bakery
    • The humans responsible for each bakery
    • The machines utilized in each bakery

    A CMDB improves the cycle times for all disciplines listed above. Linking all capabilities to your services reduces cycle times and errors whether you are launching new products, maintaining legacy platforms, or acquiring competitors.

    Learn More

    Our CTO Accelerator webinar recently covered these concepts. A mature engineering organization understands the relationships of:

    • People
    • Process
    • Technology

    as the company grows.  


    Contact us to help assess and improve your product and engineering organization.