OCR Stack to Produce Searchable Index (07/2020)
Dropbox has a simple stack to OCR documents and then make them searchable. Opportunity to expose this and sell this as a service.
Exploiting Sophisticated Browsers to Detect Fake News (07/2019)
Companies like FB use features of the news story, e.g., WHOIS data, metadata about the IP address hosting the domain, language used in the article, etc., to predict whether or not a news article is fake. Another good signal, however, potentially remains unexploited.
Generally, companies like FB also know which people when offered fake news don’t click on it. And companies can use the propensity to click on labeled low-trust websites to build browsing sophistication scores. And then use the early behavior of sophisticated browsers to predict which new article is fake or not.
Password Predictor (2017)
Create character level low dimensional embeddings of passwords using any of the many publicly leaked password lists and combine it with user name data to predict the password. When users are setting their passwords, tell them how many tries would be needed to guess their password given their username.
Recommend Reviewers (5/2019)
Many journal publishing systems give authors the opportunity to suggest reviewers. Some journals even require it. Create a reviewer recommendation system that initially simply samples from the citation list and eventually builds more sophistication, exploiting data on reviewers who have reviewed similar manuscripts, with some constraints for reviewer load, etc.
Detect Duplicate Images (People) in Customer Testimonials (5/2019)
Given the efficacy of testimonials, many companies contend that they have glowing reviews from actual customers. But now we have some tools to detect fake customer testimonials—look for how often the same model is used across products, etc. and dock from ratings or shame appropriately.
Auto-generate Sports Commentary (12/2017)
Let’s use cricket as an example. All official international cricket matches and many heavily viewed domestic matches like IPL use a lot of technology to capture the action on the field. The action is covered by a video camera from multiple vantage points, there is ball-tracking and there are speed guns. As a result of the heavy investment in technology, we can observe the action very well. The other great thing about cricket is that a vast majority of the action falls into a few well-known categories. For instance, batsman generally pulls, cuts, sweeps, or drives a ball. With ball tracking and speed gun, we know how quickly the ball is bowled, where it pitches, and how much it deviates off the pitch, etc. So on and so forth. And if that was not enough, we have a ton of ball-by-ball commentary of lots of cricket matches available online from espncricinfo.com. In all, we have all the ingredients to build an effective ML system to auto-generate commentary. As goes for cricket, so goes for many other sports. Sell the auto-generator to people who write commentaries so that they can more easily write better commentaries for the match discussing yet more strategic, harder-to-capture, aspects of the sport.
AI to Detect Emergency Vehicles (10/16/2017)
App. that detects distant emergency vehicles through flashing lights and sound and provides alerts on the screen. Since good microphones are much more sensitive, longer-term win = lower noise pollution.
Terms and Conditions (10/16/2017)
Should list date last updated up-front and provide a diff. Ok with reading terms and conditions once before buying X on Y but would like to know if the terms and conditions have changed and if so, what sort of changes.
Bad Weather Discounts (5/25/2017)
When it rains, the restaurants are generally a lot emptier. There is no reason they should be as empty as they often are. An app that automatically offers discounts on rainy days (or based on weather forecasts), and that calibrates the discount based on returns is easy to build, and likely to be useful. Integrate it with UberEats or Google Local Ads and you have an ecosystem. The general idea is about a platform for offering dynamic pricing for physical retailers. The platform will do two things: a) make it a lot easier to calibrate pricing, and b) make it a lot easier to publish discounts (where it publishes and at what price can also be optimized).
Behavioral Insurance/Mortgage Company
Lower interest rate/premiums in return for CBT + Numeracy courses.
Rate the Peer Review
One problem with peer reviews is the lack of external incentives, except perhaps pleasing the editor. One way to provide incentives is to make reviews public. But that means losing anonymity and ability to speak ‘truth to power.’ A second, smarter way, is to make only ratings of the review public. That allows reviewers to get credit and recognition, likely increasing the quality of the review.
(Stop) Waiting for the Doctor (1/3/2017)
Doctors’ offices all across the US are well-known for one thing: waiting times. Nobody expects to go to a doctor’s office and not wait. And with simple technology, if not eliminate it, we can certainly reduce waiting times. Start by collecting data on schedules, arrival times of patients, wait times, and non-arrivals. Based on current scheduling, predict wait times for people scheduled for X pm on y-day at a particular office. Adjust the schedule to reduce the estimated wait time. Add frills like texting updates if some patient interaction goes for too long, mobile phone buzzers that ring with a 10-minute warning, etc., and payout every time wait for more than X minutes.
Alternately, one could create a market for trading waiting times? Most will go back richer, even though even the rich generally undervalue their time.
Obamacare leverages a ‘smoker’s surcharge.’ In the same spirit, an anti-vaxxer surcharge can be easily instituted. The latter still doesn’t account for the externalities, but still better than nothing.
Chatbot for Data Science
A well implemented AI chatbot that provides an interface to Azure + data munging and has strong visualization support would reduce the cost of data science dramatically. (12/5/2016)
Measures of Impatience
Tenable indicators of impatience can be built using the frequency and extent of @GitHub commits. These indicators are likely to prove useful for companies exploiting data on GitHub to hire.
Ethical water and coffee are passé. Want to choose professionals or businesses based on which party (or policies) they support more generally? Use campaign finance and voter registration data (some places allow you to register as a partisan), to build a database about the partisanship of various businesses and professionals. (Eitan Hersh reveals conservative doctors, for instance, are less inclined to prescribe certain treatments.)
Surveys in DMV
State DMVs are places where a broad cross-section of population wastes time. Thus, they are excellent places for surveying representative samples.
Airports, DMVs, restaurants, queues are everywhere. But why do we still have them? A system that logs requests and texts sometime before the turn arrives is pretty simple and cheap to deploy. #RescueTime.
Female Drivers for Uber
Concern about sex crime in India once led the Delhi government to shut down Uber. To address concerns, Uber in India may want to default to disclosing the gender of the driver, allowing female riders to pick female drivers. This change may also have the virtue of increasing the number of female drivers (and employment).
Dropbox: git with it (7/2015)
Cloud storage platforms were invented to take headache and anxiety out of storage. And default storage of all saves with minimal information to distinguish versions is a reasonable implementation of the service for lay users. But Dropbox can do more for expert users. And over the longer term it may want to help lay users come up with better—still easy—workflows. One of the innovations that expert users want is better version control. Currently, it appears that time stamp and author information are the only pieces of metadata stored with each version. Allowing users to store more information with each save, and giving them the power to delete intervening saves is liable to prove useful.
FB Consumer Surveys
Now that Facebook is in the paid content business (inline presentation of news articles), it should implement a version of google consumer surveys. Surveys for paid content. Providing alternate ways of access content is liable to be a win-win-win: users, news organizations, and FB all win.
Survey Companies: Exploiting Collected Data II
YouGov, like many other survey companies, conducts lots of surveys. But it fails to exploit the databank adequately. Assuming that the survey data are the intellectual property of the person (organization) who sponsors the survey, it could allow survey sponsors to enroll in a loyalty program where they get paid every time another user wants to use the data they have collected. To that end, they would need to create a searchable database of questions. Such a database is liable to prove lucrative for both YG and researchers. It is also liable to have positive externalities for research.
Survey Companies: Exploiting Collected Data I
An average panelist fills out a lot of surveys. This means that survey companies have extraordinarily rich data person. They can use that data to do better matching and to estimate measurement error. For instance, if a survey respondent fills out a question, across say 1000 other surveys, even 1 second, an average survey response time of .5 seconds on a survey may be flagged as a potential outlier or used to build an estimate of potential measurement error.
Proposal for a Hedge Fund
“An algorithm based on successful start-up founders finds 20% would be women — double the % that actually get funded” Link
Cut from successful ventures
NIH doesn’t take a cut from successful investments. It is not clear, why not. So NIH grants ~ scam. (NIH can still fund losing ventures as important areas may not always be lucrative. Lots of times basic scientific research is not, not at least in the short term.)
More Idealogs…keep the fire burning.