The Quild Future Unicorn is a weekly product-focused note highlighting early-stage startups with statistically significant signals of becoming unicorns.
Claypot unifies streaming and batch systems to make it easier and cheaper for companies to do online prediction and continuous evaluation. They provide the infrastructure for other companies to build, train, and serve their models.
Founders: Chip Huyen, Zhenzhong Xu
Signals:
Venture-backed startup experience
Chip was a machine learning engineer at Snorkel (1+ yrs)
Top company alumni
Chip was a deep learning engineer at NVIDIA (1+ yrs)
Zhenzhong was an engineer at Microsoft (7+ yrs) and Netfllix (6+ yrs)
Top university alumni
Chip graduated from Stanford University (BS, MS)
Top investors
Lightspeed invested
Quiet Capital invested
They’re looking for founding machine learning and infrastructure engineers to join their team!
The Future Unicorn series is powered by Specter, a data intelligence provider for the world's leading investors like Accel and Bessemer. I have been working with data-driven tools for venture for a long time, and Specter's is the best one.
Product Notes
Claypot is still in stealth so there’s no product to talk about. But we know the problem they’re trying to solve from their white paper.
Problem and persona
Claypot is tackling the data scientists & machine learning engineers’ problems of batch predictions and manual, stateless model retraining.
Batch prediction refers to precomputed machine learning predictions calculated at regular intervals. In a scenario of you browsing an e-commerce store, batch predictions implies that the item recommendations you see are somewhat relevant in general, but irrelevant to what you are looking at the moment. If you were browsing for books yesterday and today you are looking for same-day delivery toilet paper, the store will be recommending the wrong type of paper for your restroom relief. This is not the optimal user experience because you will have to do a few more clicks (aren’t we spoiled). In some contexts though, batch prediction will not work at all. Collision detection for cars needs real-time (sometimes called online) predictions.
Manual, stateless model retraining refers to the process of manually retraining a machine learning model and the trigger to retrain the model is not because of data distribution shifts or model degradation. This was how I practiced data science years ago. I built models for banks that were manually retrained every year only because the regulator requires the banks to. Stateless retraining means training the model from scratch across the whole data set. The problem with this set up is that retraining is time-consuming (manual) and costly (runs through entire data set). Another problem is that the models might already be useless before retraining. A simple example of this is a real estate price estimation model with sqft as a key variable. In San Francisco, the average $/sqft was over $1,200 in 2021. In 2022, it rapidly dropped to $1,000. If the model only retrains once a year, the price estimates would have been off by 20% in 2022.