Big Data Anti-Patterns
Recent research by Gartner indicates that while Big Data investment continues to grow, the intent to invest in Big Data projects is finally showing signs of tapering. While this is a natural part of the hype-cycle, poor ROI on Big Data projects continues to impact the industry. Gartner sites the effective “lack of business leadership” around Big Data initiatives as a primary cause. We’ve kept a watchful eye on these Big Data trends to better serve our clients.
In a recent presentation, Marty Abbott addresses the struggles of getting Big Data Analytics right. There he identifed seven anti-patterns that hamper the proper implementation of Big Data initiatives and provided suggested remedies for each.
Anti-Pattern #1 — Assuming you know the answers.
By setting out to confirm a particular hypothesis, you end up building an analytics system hard-wired to answer a narrow set of questions. This lack of extensibility and blinds you from achieving deeper insights.
Gain greater insights by focusing on correlations without presumption and adopt an analytics process that follows continuous cycle induction, hypothesis, deduction, and data creation. (see hermeneutic cycle.)
Anti-Pattern #2 — No QA in analytics.
Big Data analytics systems are just as prone to bugs and defects as production systems, if not more so. All too often Analytics systems are designed for perfect execution.
Improve QA by building a pre-processing, cleaning stage in your data flow while retaining raw immutable data elsewhere. Maintain prior results and implement the ability to run analytics over your entire dataset to reverify results.
Anti-Pattern #3 — Using production Engineers to run your analytics.
Big Data involves different skills sets, different technologies, and, perhaps most importantly, a different mindset from production operations. Data scientists and analytics experts are explorers who theorize correlations, while production developers are operationalists, making pragmatic decisions to keep systems running.
Improve effectiveness by creating a separate team focused on implementing analytics architecture.
Anti-Pattern #4 — Using OLTP Datastore for Analytics
Using your transactional datastore for analytics will impact the performance of your operational systems. Furthermore, transactional and analytics data have different requirements for normalization and response time latency.
Separate analytics from transactional systems by using a separate Operational Datastore and ingesting data from your application in an asynchronous fashion.
Anti-Pattern #5 — Ignoring Performance Requirements
Performance matters. The longer you take to produce results the more out of date they become. Furthermore, no one can leverage results if they can’t quickly and easily access them.
Improve analytics performance using cluster computing technologies such as Apache’s Spark and Hadoop. Store frequently queried results in a Datamart for quick access.
Anti-Pattern #6 — OLTP is the only input to your Big Data system
If transactional data is the only input to your analytics you’re missing out on the much larger picture.
Customer interactions, social media scrapes, stock performance, Google analytics data, and transaction logs should be brought together to form a comprehensive picture.
Anti-Pattern #7 — Anyone can run their own Analytics
A well-built analytics system can become the victim of it’s own success. As it becomes more popular among stakeholders, overall performance may tank.
To preserve performance, move aggregated data out of analytics processing into OLAP/Datamarts and create a chargeback system allocating resources to users/departments.