Challenges in enterprise generative AI
3 themes from conversations & conferences + Enterprise GenAI Forum event
This post is different from past essays, which were more future-oriented. This one is grounded in today’s challenges in scaling AI. Read on to learn about the recurring challenges I’ve heard. I am also organizing the Enterprise Generative AI Forum which is designed for both technical and non-technical folks and will cover topics like product, pricing, positioning, and more.
Experimentation to Production
It has been almost one year since ChatGPT was released. In the initial six months, the community grappled with understanding the capabilities of foundation or generative AI models. The number of in-person AI events happening in Bay Area grew from just 10 in January to 100 in September. ~Three in-person events a day, filled with folks talking about AI.
Over the past few months, I’ve noted that the narrative has shifted. It is less about demonstrating what models can do. While hackathons still happen, the conversations transitioned from “look at how cool this is” to “how can I make this damn thing work at scale?”.
This isn’t a surprise considering that today all major public software companies have released a generative AI product. There’s also a long list of private companies not included in the list below.
Collaborative Applications: Microsoft, Google, Slack, Atlassian, Smartsheet, Asana, Zoom
Content Management: Box, Dropbox, Docusign
Sales & Marketing: Salesforce, Hubspot, Sprinklr
IT & Customer Support: ServiceNow, Freshworks, Five9, Shopify
Finance & ERP: Workday, Blackline, SAP
Creative: Adobe, Canva, Autodesk
App Development: Github, Gitlab, AWS, Pagerduty
BI & Data: Oracle, Tableau, Thoughtspot, Snowflake
Three challenges in making AI enterprise-ready
Here are three recurring challenges distilled from attending and hosting several events.
RAG is the rage (both loved and loathed): Productionizing RAG seems to be the rallying call of AI vendors today. I’ve seen it become the tagline of company booths in a few events. Off-the-shelf models are sufficient for general assistants but incorporating proprietary data, which can be as simple as your last email to provide context, makes these models much more valuable. RAG is the best way to do so. Finetuning models is great for specific tasks but it is not great for incorporating new knowledge. Finetuning also makes it difficult for engineers to iterate quickly compared to engineering a RAG pipeline. As much as there is excitement to productize this pipeline though, there remains thorny problems of how to best chunk, index, rerank and synthesize the retrieved results.
Contextualized retrieval: While RAG is great, deploying at the enterprise level often means thousands of users accessing documents from different sources. For a consumer facing chatbot, it might be pulling from a general knowledge base that can serve most customers. For a knowledge management assistant, it requires more context to be helpful. It has to know what documents the user is working on, has access to, which users it is collaborating with, and so on. It is exponentially more complex than indexing an entire Google Drive and making it available for retrieval to anyone. At an enterprise level, context & permission-aware indexing and retrieval is key. Not all employees should have access to internal financial forecasts nor should private Slack messages be searchable.
Monitoring challenges: Transitioning experiments into production requires monitoring. While monitoring machine learning models is not new, monitoring generative AI models is. It is more complicated. First, its because of the sheer amount of additional metadata generated by LLM applications. Each prompt chain, retrieval, vector index, and so on has to be tracked. The power of LLMs is that it can handle any open ended conversation but that also expands the scope of monitoring. Even when its all logged, evaluating the conversational responses of models is subjective. Today, it is manually rated by humans who each have different subjective preferences. There is a long tail of additional challenges about mitigating bias (difficult to manage at scale), enhancing explainability (LLMs can explain themselves but can be eloquently incorrect), and enforcing copyright (detectors are proven to be ineffective).
Enterprise GenAI Forum on Nov 14
While most of the events have centered around product and engineering topics, building enterprise AI is more than that. Competitive positioning, pricing & packaging, practical legal issues, and talent management are also relevant to building AI companies. So I am organizing Enterprise GenAI Forum on November 14 in San Francisco to explore these topics. This event is for AI insiders that are building, selling, and investing in enterprise AI products. Details are still being finalized but if you’d like to be the first to get access, sign up here. Priority will be given to current subscribers — leave a note saying you found out about the event through Generational.
Sign up here: Enterprise GenAI Forum
PS - I am looking for a speaker / panelist that has worked on pricing & packaging of GenAI products. If you know someone who is a rockstar in this area would so appreciate leads.