
In 2018, AKF Partners shared a list of questions we used to conduct technical due diligence engagements for our clients. It proved to be popular and continues to be one of the most viewed blog posts on our site.
We continually review and improve our models and tools and have made some changes to our technical due diligence question list. Some of the terminology refers to AKF models and a quick visit to the AKF Scale Cube will help orient you for the scalability questions. Without further ado, here it is.
SCALABILITY X AXIS
- How do you manage the routing of requests to various endpoints?
- Is session state stored in the browser or separate tier?
- Have you employed any strategies to increase database / storage performance and scalability?
- Have you implemented any caching strategies?
- Are authentication and authorization services synchronized and available to all environments / regions?
- Does the client/target leverage edge caching (CDN/browser caching)?
SCALABILITY Y AXIS
- Are services (i.e. login, signup, checkout) separated across different servers?
- Does each service have it's own separate datastore?
- Are services sized appropriately with consideration of needs (e.g. availability, frequency of change, dependencies, skill sets)?
SCALABILITY Z AXIS
- Are end points (web, app servers) dedicated to a subset of similar data (i.e. users, SKU, content)?
- Are databases dedicated to only a subset of similar data?
- How do you handle multi or all-tenant in your implementation (services & data)?
- How do you handle data residency requirements?
FAULT ISOLATION (SWIM LANES)
- Are only asynchronous calls being made across services?
- Do you isolate data with respect to domains / services?
- Does logical architecture enforce separation of function / areas of concern?
DISASTER RECOVERY
- Is there any specific area or areas where a failure will cause an outage?
- Is your data tier resilient to logical corruption (durable snapshots, intra database rollback) or being unavailable?
- How do you use multiple data centers (regions) to implement disaster recovery strategies?
- Are data centers (if colo or owned DCs) located in geographically low-risk areas?
COST EFFECTIVENESS
- Does the application use Stored Procedures?
- Are the web, application, and database servers on physically separate tiers or separate virtual instances?
- Are auto-scaling techniques used?
- What Third party purchased technologies are part of the system?
- If self-hosted - describe the hardware sizes across the major components/tiers of the application.
If cloud hosted - describe the instance sizes used for workloads including the DB(s).
- Do you use virtualization, and if so, what are your objectives in using it?
- How expensive would it be for you to use a different public cloud provider? How do you determine which PaaS services you incorporate into your product?
- Are data centers located in low-cost areas (if colo or owned DCs)?
- What traffic is routed through a Firewall/WAF?
- Are there any customer-specific code/data structures in your codebase?
PRODUCT MANAGEMENT PROCESS
- Is there a product management team or product owner that can make decisions to add, delay, or deprecate features?
- Does the product management team have ownership of business goals?
- Does the team develop success criteria, OKRs, KPIs or customer feedback loops that help to inform feature decisions?
- Does the team use an iterative discovery process to validate market value and achieve goals?
- Are leading and lagging indicators defined for each major project, and what is the process to track, validate, and rebase or revisit the metrics?
- Is everyone in the Product and Engineering organization skilled in using the product in the role of a customer?
PDLC PROCESS
- Are leading and lagging indicators defined for each major project, and what is the process to track, validate, and rebase or revisit the metrics?
- Does the team use any relative sizing method to estimate effort for features/story cards?
- Does the team utilize a velocity measure to improve team time to market?
- Does the team use burn down, metrics, retrospectives to measure the progress and performance of iterations?
- How does the team measure engineering efficiency and who owns and drives these improvements?
- Does the team perform estimates and measure actual against predicted?
- Is there a Definition of Done in place?
DEVELOPMENT PROCESS
- Does the team use an approach that allows for easy identification and use of a prod bug fix branch for rapid deployment?
- Does the team use a feature-branch approach? Can a single feature/engineer block a release?
- In your public cloud architecture, how do you build and provision a new server / environment / region?
- Does the team have documented coding standards that are applied?
- Are engineers conducting code reviews with defined standards or have automation in the dev pipeline that validates against the standards?
- Is open-source licensing actively managed and tracked?
- Are engineers writing unit tests (code coverage)?
- Is the automated testing coverage greater than 75%?
- Does the team utilize continuous integration?
- Is load and performance testing conducted before releasing to a significant portion of users or is the testing built into the development pipeline?
- Does the team deploy small payloads frequently versus larger payloads seldomly?
- Does the team utilize continuous deployment?
- Is any containerization (e.g. Docker) and orchestration (Kubernetes) in place?
- Are feature flags, where a feature is enabled or disabled outside of a code release, in use?
- Does the team have a mechanism that can be used to rollback (wire on/off. DDL/DML scripted and tested, additive persistence tier, no "select *"?
- How does the team define technical debt?
- Is Tech Debt tracked and prioritized on an ongoing basis?
- How are issues found in production, backlogged bugs, as well as technical debt addressed in the product lifecycle?
- Does the team utilize a Joint Application Design process for large features that brings together engineering and ops for a solution or do they have experientially diverse teams?
- Does the team have documented architectural principles that are followed?
- Does the team utilize an Architecture Review Board where large features are reviewed to uphold architectural principles?
OPERATIONS PROCESS
- Describe application logging in place. Where are the logs stored? Are they centralized and searchable? Who has access to the logs?
- Are customer experience (business) metrics used for monitoring to determine if a problem exists?
- Are system level monitors and metrics used to determine where and what the problem may be?
- Are synthetic monitors in use against your key transaction flows?
- Are incidents centrally logged with appropriate details?
- Are problems separated from incidents and centrally logged?
- Is there any differentiation on the severity of issues? How do you escalate appropriate severity incidents to teams/leaders/the business?
- Are alerts sent in real time to the appropriate owners and subject matter experts for diagnosis and resolution?
- Is there a single location where all production changes (code and infrastructure) are logged and available when diagnosing a problem?
- Are postmortems conducted on significant problems and are actions identified and assigned and driven to completion?
- Do you measure time to fully close problems?
- Is availability measured in true customer-impact?
- Are Quality of Service meetings held where customer complaints, incidents, SLA reports, postmortem scheduling and other necessary information reviewed/updated regularly?
- Do operational look back meetings occur either monthly or quarterly where themes are identified for architectural improvements?
- Does the team know how much headroom is left in the infrastructure or how much time until capacity is exceeded?
- How do you simulate faults in the system?
ORGANIZATION KNOWLEDGE AND ALIGNMENT
- Do architects have both engineering/development and infrastructure experience?
- Are teams perpetually seeded, fed, and weeded?
- Are teams aligned with services or features that are in pods or swim lanes?
- Are Agile teams able to act autonomously with a satisfactory TTM?
- Are measurable business goals, OKRs and KPIs visible and commercialized with teams?
- Are teams comprised of members with all of the skills necessary to achieve their goals?
- Have architects designed for graceful failures by thinking about scale cube concepts?
- Does leadership think about availability as a feature by setting aside investment for debt and scaling?
- Does the client have a satisfactory engineer to QA tester ratio?
SECURITY
- Is there a set of approved and published information security policies used by the organization?
- Has an individual who has final responsibility for information security been designated?
- Are security responsibilities clearly defined across teams (i.e., distributed vs completely centralized)?
- Are the organization's security objectives and goals shared and aligned across the organization?
- Has an ongoing security awareness and training program for all employees been implemented?
- Is a complete inventory of all data assets maintained with owners designated?
- Has a data categorization system been established and classified in terms of legal/regulatory requirements (PCI, HIPAA, SOX, etc.), value, sensitivity, etc.?
- Has an access control policy been established which allows users access only to network and network services required to perform their job duties?
- Are the access rights of all employees and external party users to information and information processing facilities removed upon termination of their employment, contract or agreement?
- Is multi-factor authentication used for access to systems where the confidentiality, integrity or availability of data stored has been deemed critical or essential?
- Is access to source code restricted to only those who require access to perform their job duties?
- Are the development and testing environments separate from the production/operational environment (i.e., they don't share servers, are on separate network segments, etc.)?
- Are network vulnerability scans run frequently (at least quarterly) and vulnerabilities assessed and addressed based on risk to the business?
- Are application vulnerability scans (penetration tests) run frequently (at least annually or after significant code changes) and vulnerabilities assessed and addressed based on risk to the business?
- Are all data classified as sensitive, confidential or required by law/regulation (i.e., PCI, PHI, PII, etc.) encrypted in transit?
- Is testing of security functionality carried out during development?
- Are rules regarding information security included and documented in code development standards?
- Has an incident response plan been documented and tested at least annually?
- Are encryption controls being used in compliance with all relevant agreements, legislation and regulations? (i.e., data in use, in transit and at rest)
- Do you have a process for ranking and prioritizing security risks?
- Do you have an IDS (Intrusion Detection System) solution implemented? How about an IPS (Intrusion Protection System)
Why the Changes?
Technology evolves, engineers innovate, entrepreneurs create. A static checklist will not improve with age like wine. Keep your eyes out for future blog posts discussing more details on the changes we made.
Want to learn more?
Contact us, we would be happy to discuss how we have helped hundreds of clients over the years.