A lot has happened since 2018 when AKF Partners first shared a list of questions we have used hundreds of times to conduct technical due diligence engagements for our clients. It proved to be popular and continues to be one of the most viewed blog posts on our site.

Some of the terminology below refers to AKF models and a quick visit to the AKF Scale Cube will help orient you for the scalability questions. This most recent update reflects what we are learning as we iterate. Additionally, we've included more links to articles for further exploration on topics.

AI/ML: Artificial Intelligence/Machine Learning Capabilities

One of the most emerging requests that has accelerated since around October of 2020 is for us to assess artificial intelligence and machine learning (AI/ML) as part of our overall architecture, process, and people review. Investors want to know how credible the AI/ML claims are, if they are scalable, how differentiating they are from their competitors, and what is easy to reproduce vs. true intellectual property that will be difficult for others to copy among other things. When AI/ML review is part of a statement of work, we dedicate additional time to speak with the data scientists in open questioning rather than a checklist as provided below for other sections. We have another blog that dives much deeper, AI/ML for Product Managers.

SCALABILITY: AKF Scale Cube X-AXIS

  1. Are load balancers or ELBs used to distribute requests across multiple end points?
  2. Is session state stored in the browser or a separate tier?
  3. Read/Write Databases: Is there an appropriate separation of reads and writes using master/slave databases (or other AKF Scale Cube X-Axis capabilities - e.g. NoSQL Quoroms)?
  4. Are object caches utilized - if so, how and why?
  5. Are authentication and authorization services synchronized and available to all environments/regions?
  6. How is edge caching (CDN/browser caching) being leveraged?

SCALABILITY: AKF Scale Cube Y-AXIS

  1. Are services (i.e. login, signup, checkout) separated across different servers?
  2. Is data sharded across different datbases - Does each service have its own separate data store?
  3. Are services sized appropriately with consideration of needs (e.g. availability, frequency of change, dependencies, skillsets)?

SCALABILITY: AKF Scale Cube Z-AXIS

  1. Are endpoints (web, app servers) dedicated to a subset of similar data (i.e. users, SKU, content)?
  2. Are databases dedicated to only a subset of similar data?
  3. How do you handle multi or all-tenant in your implementation (services & data)?
  4. How do you handle data residency requirements - and are you required to do so for GDPR or other reasons?

FAULT ISOLATION (SWIM LANES)

  1. Are only asynchronous calls being made across services?
  2. Do you isolate data with respect to domains/services?
  3. Does logical architecture enforce the separation of function/areas of concern?

DISASTER RECOVERY

  1. Is there any specific area or areas where a failure will cause an outage (single-points of failure –SPOFs)?
  2. Is your data tier resilient to logical corruption (durable snapshots, intra-database rollback) or being unavailable?
  3. How do you use multiple data centers (cloud regions and availabilty zones) to implement disaster recovery strategies?
  4. Are data centers (if colo or owned DCs) located in geographically low-risk areas?

COST-EFFECTIVENESS

  1. Does the application use stored procedures (SPROCs, PROCs)? If so, is there business logic contained within the SPROCs?
  2. Are the web, application, and database servers on physically separate tiers or separate virtual instances?
  3. How do you scale for demand? Are auto-scaling techniques used?
  4. What third-party technologies are part of the system? How do they impact time-to-market, cost, and latency?
  5. If self-hosted - describe the hardware sizes across the major components/tiers of the application (goldfish vs. thouroughbreds/cattle, not pets). If cloud-hosted - describe the instance sizes used for workloads including the DB(s).
  6. Do you use virtualization, and if so, what are your objectives in using it?
  7. Are data centers located in low-cost areas (if colo or owned DCs)?
  8. What traffic is routed through a Firewall/WAF?
  9. How time-intensive/expensive would it be for you to use a different public cloud provider? 
  10. Do you have code specific to a customer? If so, what customer-specific code/data structures exist? How many versions of your software do you deploy and support?
  11. Current platform as a service (PaaS) - how did you make the decisions on what PaaS service to leverage?
  12. Future PaaS usage - How are you thinking about evolving your use of PaaS based on the future? 

PRODUCT MANAGEMENT PROCESS

  1. Is there a product management team or owner that can decide to add, delay, or deprecate features?
  2. Does the product management team have ownership of business goals?
  3. Does the team develop success criteria, OKRs, KPIs, or customer feedback loops that help to inform feature decisions?
  4. Does the team use an iterative discovery process to validate market value and achieve goals?
  5. Are leading and lagging indicators defined for each major project, and what is the process to track, validate, and rebase or revisit the metrics?

PDLC PROCESS

  1. Are leading and lagging indicators defined for each major project, and what is the process to track, validate, and rebase or revisit the metrics?
  2. Does the team use any relative sizing method to estimate effort for features/story cards?
  3. Does the team utilize a velocity measure to improve team time-to-market?
  4. Does the team use Agile burndown, metrics, and retrospectives to measure the progress and performance of iterations?
  5. How does the team measure engineering efficiency and who owns and drives these improvements?
  6. Does the team perform estimates and measure actual against predicted?
  7. Is there an Agile Definition of Done in place? Does it measure customer adoption of features, or just releasing of code into production? 

DEVELOPMENT PROCESS

  1. Does the team use an approach that allows for easy identification and use of a prod bug fix branch for rapid deployment?
  2. Does the team use a feature-branch approach? Can a single feature/engineer block a release?
  3. In your public cloud architecture, how do you build and provision a new server/environment/region?
  4. Does the team have documented coding standards and if so - how are they taught and applied?
  5. Are engineers conducting code reviews with defined standards or have automation in the development pipeline that validates against the standards?
  6. Is open-source licensing actively managed and tracked? If so, how? Is it automated? 
  7. Are engineers writing unit tests (code coverage)?
  8. Is the automated testing coverage greater than 75%?
  9. Does the team utilize continuous integration?
  10. Is load and performance testing conducted before releasing to a significant portion of users or is the testing built into the development pipeline?
  11. Does the team deploy small payloads frequently versus larger payloads less-frequently?
  12. Does the team utilize continuous deployment?
  13. Is any containerization (e.g. Docker) and orchestration (Kubernetes) in place?
  14. Are feature flags, where a feature is enabled or disabled outside of a code release, in use?
  15. Does the team have a mechanism that can be used to roll back (wire on/off .DDL/DML scripted and tested, additive persistence tier, no "select *")?
  16. How does the team define technical debt?
  17. Is tech debt tracked and prioritized on an ongoing basis? (Is there a defined percentage of each Sprint dedicated to reducing tech debt?)
  18. How are issues found in production, backlogged bugs, as well as technical debt addressed in the product lifecycle?
  19. Does the team utilize a Joint Application Design process for large features that brings together engineering and ops for a solution or do they have experientially diverse teams?
  20. Does the team have documented architectural principles that are followed?
  21. Does the team utilize an Architecture Review Board where large features are reviewed to uphold architectural principles?

OPERATIONS PROCESS

  1. Describe application logging in place. Where are the logs stored? Are they centralized and searchable? Who has access to the logs?
  2. Are customer experience (business) metrics used for monitoring to determine if a problem exists?
  3. Are system-level monitors and metrics used to determine where and what the problem may be?
  4. Are synthetic monitors in use against your key transaction flows? (How do you find out what is wrong before your customers do?)
  5. Are incidents centrally logged with appropriate details?
  6. Are problems separated from incidents and centrally logged?
  7. Is there any differentiation in the severity of issues? How do you escalate appropriate severity incidents to teams/leaders/the business?
  8. Are alerts sent in real-time to the appropriate owners and subject matter experts for diagnosis and resolution?
  9. Is there a single location where all production changes (code and infrastructure) are logged and available when diagnosing a problem?
  10. Are postmortems conducted on significant problems and are actions identified assigned and driven to completion?
  11. Do you measure time to close problems fully?
  12. Is availability measured in true customer impact?
  13. Are quality of service (QoS) meetings held where customer complaints, incidents, SLA reports, postmortem scheduling and other necessary information reviewed/updated regularly?
  14. Do operational look-back meetings occur either monthly or quarterly where themes are identified for architectural improvements?
  15. Does the team know how much headroom is left in the infrastructure or how much time until capacity is exceeded?
  16. How do customer-reported problems flow from support to product engineering teams?
  17. How do you simulate faults in the system? (i.e. Choas Monkey)

ORGANIZATION KNOWLEDGE AND ALIGNMENT

  1. Do architects have both engineering/development and infrastructure experience?
  2. How do you onboard and develop team members?
  3. Are teams perpetually being evaluated and upgraded (i.e. seed, feed, weed)? How do you handle poor performers?
  4. Are teams aligned with services or features that are in pods or swim lanes?
  5. Are Agile teams able to act autonomously with a satisfactory TTM?
  6. Are teams comprised of members with all of the skills necessary to achieve their goals?
  7. Does leadership think about availability as a feature by setting aside investment for debt and scaling?
  8. Have architects designed for graceful failures by thinking about AKF scale cube concepts?
  9. Does the client have a satisfactory engineer to QA tester ratio?
  10. Does the team think strategically about talent sourcing - both level and location?
  11. What percentage of your QA team performs automation?

SECURITY

  1. Is there a set of approved and published information security policies used by the organization?
  2. Has an individual who has final responsibility for information security been designated?
  3. Are security responsibilities clearly defined across teams (i.e., distributed vs completely centralized)?
  4. Are the organization's security objectives and goals shared and aligned across the organization? (Or does everyone put to the CISO or InfoSec team?)
  5. Has an ongoing security awareness and training program for all employees been implemented?
  6. Is a complete inventory of all data assets maintained with owners designated?
  7. Has a data categorization system been established and classified in terms of legal/regulatory requirements (PCI, HIPAA, SOX, etc.), value, sensitivity, etc.?
  8. Has an access control policy been established which allows users access only to network and network services required to perform their job duties?
  9. Are the access rights of all employees and external party users to information and information processing facilities removed upon termination of their employment, contract or agreement?
  10. Is multi-factor authentication used for access to systems where the confidentiality, integrity or availability of data stored has been deemed critical or essential?
  11. Is access to source code restricted to only those who require access to perform their job duties?
  12. Are the development and testing environments separate from the production/operational environment (i.e., they don't share servers, are on separate network segments, etc.)?
  13. Are network vulnerability scans run frequently (at least quarterly) and vulnerabilities assessed and addressed based on risk to the business?
  14. Are application vulnerability scans (penetration tests) run frequently (at least annually or after significant code changes) and vulnerabilities assessed and addressed based on risk to the business?
  15. Are all data classified as sensitive, confidential or required by law/regulation (i.e., PCI, PHI, PII, etc.) encrypted in transit?
  16. Is testing of security functionality carried out during development?
  17. Are rules regarding information security included and documented in code development standards?
  18. Has an incident response plan been documented and tested at least annually?
  19. Are encryption controls being used in compliance with all relevant agreements, legislation and regulations? (i.e., data in use, in transit and at rest)
  20. Do you have a process for ranking and prioritizing security risks?
  21. Do you have an IDS (Intrusion Detection System) solution implemented? How about an IPS (Intrusion Protection System)

Why the Changes?

Technology evolves, engineers innovate, and entrepreneurs create. A static checklist will not improve with age like wine. Keep your eyes out for future blog posts discussing more details about our changes.

Want to learn more?

Contact us, we would be happy to discuss how we have helped hundreds of clients over the years.