The data you don’t know, you don’t have is more important than you might know ..
A bit of history
Before the steel Brodie helmet was introduced, soldiers went into battle wearing cloth, felt, or leather headgear that offered little to none protection from modern weapons. That meant shots to the head had a high likelihood of being fatal and those victims were not brought into the hospital. Comparing the data of only the hospitals before and after the introduction of the new helmet is clearly not giving the full picture.
Let’s travel just over 100 years further to today. The year is 2021. You are analyzing your sales opportunities in your state-of-the-art CRM system. When comparing the closing ratio of two sales managers, that is, the number of deals won divided by the total number of deals in the system, there can be big differences in the data while there is none in real life. It all depends on what the sales managers are putting into the system. If one of the sales managers puts every meeting she has with a potential client in the system while the other only when an offer is sent out, it becomes very difficult to compare the two. Especially when the sales managers know that the closing ratio is one of their KPI’s, they will become reluctant to enter potential leads in the system until they are fully convinced that they will be able to close it.
Aside from information ambiguity, not capturing all the useful data in your CRM system can also lead to missed revenues. Recently, a client of ours was interested in investigating his company’s potential for cross-selling opportunities, as they felt they were currently underachieving. He however only had his past sales data available and not the deals he had lost. This makes it almost impossible for e.g. a machine learning model to recommend potential cross-selling opportunities, as the existing data only contains training data of successful cross-sells.
Survivorship bias is most of the time a story of the data you don’t know you don’t have. That is why this pitfall is so hard to avoid. Understanding your data and knowing how it was collected can help you to avoid it. That is why involvement from the business is crucial when undertaking data projects, because these people have a good understanding of the underlying business process that is generating the data.
If you want more information on how we at DataMotive can help you to spot and avoid this type of bias, feel free to reach out to firstname.lastname@example.org!