Making useful products is hard. Making useful ML products is harder still, in part because there are a larger number of moving parts in an ML system. To understand the issues at stake, let’s go over the **basics** of developing an ML product.
Often, product development starts with a business problem. And your first job is to understand the business problem as well as you can, familiarizing yourself with as much detail as possible.
Let’s say the problem is as follows: A company gets a lot of customer emails. All the emails go to a common inbox from which specialist customer agents fish out emails that are relevant to them. For instance, finance specialists fish out billing emails. And technical specialists fish out emails about technical errors. Fishing is time-consuming and chaotic.
Once you understand the precise problem—time taken to discover and assign emails—work on developing solutions for the problem. When developing solutions, the bias should be toward solving the problem the best way possible than injecting custom ML into whatever solution you propose. For instance, you could propose a solution that makes it easier to search (using no ML or off-the-shelf ML) and bulk assign to a new queue. But let’s say that after careful consideration of costs and benefits, a particularly appealing solution is a system that uses machine learning to automatically direct relevant emails to specialist inboxes, obviating the need to fish. That’s a start to a solution, not the end. You need to spend enough time thinking about the solution so that you have thought about how to handle edge cases, e.g., when there is a technical issue about billing, a misclassified email, etc., and any spillover issues, like the latency of such a system, how implementing such a system may break existing data pipelines that measure the total number of emails, etc.
Next, you need to define the KPIs. How much time will be saved? What is the total cost of the saved time? How many mistakes is the system making? What is the cost of handling mistakes?
Next, you need to turn the business problem into a precise machine learning problem. What labels would you predict? How would you collect the initial labels?
Once the outline of the solution has been agreed upon, you need to don your architect’s hat and outline a system diagram. Wearing the data engineer’s hat, figure out where the data needed for training and for live classification is stored, and how you would build a pipeline for training and serving the model. This is also the time to understand what guarantees, if any, exist on the data, and how you can test those guarantees.
Right next to the data engineer’s hat is the modeler’s hat. Wearing that, you must decide what algorithms you want to run, etc. how to version control your models, etc. The ML modeler’s hat also directs your attention to your plan for how to improve the model. Machine learning is an elaborate system to learn from your losses and you must design a system to continuously learn from your errors. More precisely, you must answer what is your system in place to improve your model? There is a pipeline for that: 1. Learn about your losses: from feedback, errors, etc. 2. Understand your losses: error analysis, etc., 3. Reducing your losses: new data collection, fixing old data, diff. models, objective functions, and 4. Testing: A/B testing, offline testing, etc.
Last, you must wear an operator’s hat. Wearing that you answer the operational nitty-gritty of how to introduce a new product. This is the time when you work with stakeholders to stand up dashboards to monitor the system, develop a rollout strategy, and a rollback strategy, a dashboard for monitoring A/B tests, etc.
The key to wearing an architect’s hat is to not only designing a system but also to make sure that enough logging is in place for different parts of the system for you to triage failures. So part of the dashboard would display logs from different parts of the system.