Do you trust your data?
Is it “Yes” or “Mostly Yes”? One way to tell is if you hesitate to re-validate all of your information before taking a leap or if you fail to act based on data presented. At best, this slows down decision making and at worst, it completely negates the value of collecting, storing, and organizing data in the first place.
The importance of trusting your data and why “mostly yes” is dangerous
In the Navy, I was trained to trust key indications to ensure safe operation of a nuclear reactor. Specific indicators triggered alarms and those alarms required immediate actions. Other decisions afforded more opportunity to respond, but still relied upon trusting and validating the indications that were reported. The same principals apply to data stored in your core datasets that is used to drive business decisions.
For core datasets, “mostly yes” falls far short of what is necessary to drive a data driven business. “Trust me” is a phrase you hear when you’re being asked to go against your intuition or gut. It often involves taking a risk and exposing yourself to potential harm. Sometimes emerging trends (and threats) will present themselves in data before they’re generally well known or accepted. An industry maxim may start to bend and your attuned competitors will have a competitive advantage if you fail to detect a leading signal.
Why should you trust your data?
The reason why I trusted my reactor plant indicators was because we performed regular checks on our sensors to ensure they were working properly. Without this we would be operating blind. The same applies to your data systems. Consistently ensuring that data is created, stored, and maintained falls under the realm of data stewardship. Looking to ensure that information matches what is expected should be part of your common business practices.
Who is responsible for data stewardship in your organization?
The right answer is that everyone in the organization should play some part of ownership. Specific roles and responsibilities will play a more active role depending upon your organization’s needs, structure, and tools. However, one common answer is often always wrong: “that’s simply an IT problem.”
How do you build the ecosystem?
It is a full team sport that requires: 1) Designing Systems; 2) Providing Data Stewardship; 3) Making Decisions.
Designing systems starts with understanding the needs of key stakeholders. Often these stakeholders will be key leaders of front-line businesses or functional roles such as finance. Determining their information requirements helps define the core datasets you’ll need to govern. From there, those key stakeholders will be critical in driving alignment and compliance across the business.
Data Stewardship is responsible for governance and influencing those who deal with data to ensure that it is handled in a consistent manner. Good data stewards will need deep domain knowledge, a background in data, and the ability to influence both leadership and those across the organization to ensure program compliance.
Leaders need to demonstrate they are making decisions based on available data. Yes, the data informing these decisions will likely result in strategic or operational choices thus furthering the business. From an ecosystem perspective, these decisions also have the advantage of highlighting the purpose and importance of all of the data stewardship efforts. This should encourage and motivate those in the organization to remain dedicated to further enhance and protect a valuable corporate resource: a firm's data.
Talent strategy for data scientists
It’s not uncommon for firms to face several core challenges recruiting, developing, and retaining data scientists and technical leaders. Below are several reasons why:
How do you mitigate these challenges and compete effectively against the digitally native tech firms? (note: each topic below is worthy of a deeper exploration and could be a subject of future blog posts :) )
Don’t they know how hard it is? Maybe, but no one said big data was clean and easy. In fact, most things described as ‘big’ are hard and daunting…except perhaps for Clifford, my daughter’s favorite fictional Big Red Dog.
Some describe big data as simply a data set too large to be analyzed (or even opened), in a standard spreadsheet. It requires using programs such as R, SQL, or Python—that name alone seems scary, as well as the somewhat rare (though increasingly common) skill set of a ‘data scientist’. Compounding this problem is that once data is aggregated and analyzed, it will almost always be unstructured which translates to messy and incomplete.
Your data will always be imperfect and be found somewhere along the spectrum between insightful, perfect data and unstructured, random white noise. Where your data falls depends on: 1) a clear articulation of a specific measurement you wish to capture, 2) how carefully you design your data collection systems, 3) how rigorously you enforce data validation rules, and 4) the availability of data to begin with. Each of these topics is worthy of a separate discussion that I’ll address in future postings.
Often, simple is best. For example, joining a new data set might add a marginally useful insight, but could risk generating a significant number of duplicates. Evaluate whether this makes sense and proceed with caution…and save a backup for reversion just in case.
Lastly, big data analysis is only as good as the trust and decisions that flow from it. Imperfections need to be known and identified so that the data set can be completely trusted within it’s stated limitations. Failing to disclose these limitations will make any big data project a potentially interesting, but useless endeavor.
Management consultant and advocate for leveraging data and analytics to improve healthcare.