A data product company is a firm that provides a service to clients by easing the process of going from data to knowledge which informs actions. Data is the first element in a five step sequence to value known as the DIKAR model:
Data \(\rightarrow\) Information \(\rightarrow\) Knowledge \(\rightarrow\) Action \(\rightarrow\) Results
A data product company takes data and transforms the data into information. The information should be presented in such a way that it easily produces knowledge for the user.1 An AI-product company, by comparison, takes data and produces actions. An AI system creates a closed-loop system where results are measured and fed back into the system as data. Where your product stops along the DIKAR chain determines the type of firm.
A data product company is one that derives its value proprosition as a function of data. A statistic, formally defined, is a function of a sample. Thus, an analytics company is a data products company and vice versa. The goal of a data product company is to efficiently move from data to the conditions for knowledge acquisition in the user base. For a data product with a subscription fee, users are synonymous with clients.2 N.b. For a firm like Facebook, the user base and paying customer base are not the same. Facebook users are advertisers, and the people that use the Facebook application are human chattel.
A data product is a combination of three elements:
Data curation3 Absent curation, a company is a data storage company and not a data product company. carries its own unique status in this case, because data has an abstract value above and beyond the technical details of the systems that store it and the measurement schemes used to create records.4 The Extract-Transform-Load (ETL) nomenclature is to be avoided if the data product is using text documents, as you have two senses of the word “extract” in that case. In an ETL, you are capturing the data by “extracting” it from the data store where you initially find it. With NLP work, you also need to “extract” the text from a document for processing. In the latter case, this occurs during pre-processing.
The most critical terms of data product companies are:
In a pure data management scenario, data curation is the lifeblood of the work to be done. Fundamentally, it is about setting the choice of records to be curated with the data owner, and then appropriately documenting the source(s). This is, at its core, not a technical enterprise. However, one will often find themselves engaging in deeply technical work to support a data set. This is due to either: a lack of earlier investment of energy; or unforseen challenges with a particular dataset. As much as possible the technical elements of processing and publishing data should be established in advance of a data curation project.