Band B is about the validity of the data. Checking the faithfulness and representation. It can include components of exploratory data analysis.

What does it include?

  • visualisations for exploratory analysis e.g. PCA and Hierarchical Clustering
  • noise characterisation.
  • missing values.
  • entity disambiguation, record linkage, duplicate detection
  • anomaly detection
  • sanity checks on the use of physical units (if used)
  • data representation (vectorizing, word embeddings etc)? Or does this come in Band A.

Examples

At the end of Band B, we are ready to define a candidate question, the context.