A small but important difference that you should know
In many data science-related tasks, we want to know how certain we are about the result. Knowing how much we can trust a result helps us to make better decisions.
Once we have quantified the level of uncertainty that comes with a result we can use it for:
- scenario planning to evaluate a best-case and worst-case scenario
- risk assessment to evaluate the impact on decisions
- model evaluation to compare different models and model performance
- communication with decision-makers about how much they should trust the results
Where does the uncertainty come from?
Let’s look at a simple example. We want to estimate the mean price of a 300-square-meter house in Germany. Collecting the data for all 300-square-meter houses is not viable. Instead, we will calculate the mean price based on a representative subset.