Reducing Friction in Selling Data Products: Protecting IP and Data

15 Sep

Traditionally software has been distributed as a binary. The customer “grants” the binary a broad set of rights on the machine and expects the application to behave, e.g., not snoop on personal data, not add the computer to a botnet, etc. Most SaaS can be delivered with minor alterations to the above—finer access control and usage logging. Such systems work on trust—the customer trusts that the vendor will do the right thing. It is a fine model but does not work for the long tail. For the long tail, you need a system that grants limited rights to the application and restricts what data can be sent back. This kind of model is increasingly common on mobile OS but absent on many other “platforms.”

The other big change over time in software has been how much data is sent back to the application maker. In a typical case, the SaaS application is delivered via a REST API, and nearly all the data is posted to the application’s servers. This brings up issues about privacy and security, especially for businesses. Let me give an example. Say there is an app that can summarize documents. And say that a business has a few million documents in a Dropbox folder on which it would like to run this application. Let’s assume that the app is delivered via a REST API, as many SaaS apps are. And let’s assume that the business doesn’t want the application maker to ‘keep’ the data. What’s the recourse? Here are a few options:

  • Trust me. Large vendors like Google can credibly commit to models where they don’t store customer data. To the extent that storing customer data is valuable to the application developer, the application developer can also use price discrimination, providing separate pricing tiers for cases where the data is logged and where it isn’t. For example, see the Google speech-to-text API.
  • Trust but verify. The application developer claims to follow certain policies, but the customer is able to verify, for e.g., audit access policies and logs. (A weaker version of this model is relying on industry associations that ‘certify’ certain data handling standards, e.g., SOC2.)
  • Trusted third-party. The customer and application developer give some rights to a third party that implements a solution that ensures privacy and protects the application developer’s IP. For instance, AWS provides a model where the customer data and algorithm are copied over to an air-gapped server and the outputs written back to the customer’s disk. 

Of the three options, the last option likely reduces friction the most for long tail applications. But there are two issues. First, such models are unavailable on a wide variety of “platforms,” e.g., Dropbox, etc. (or easy integrations with the AWS offering are uncommon). The second is that air-gapped copying is but one model. A neutral third party can provide interesting architectures, including strong port observability and customer-in-the-loop “data emission” auditing, etc.