Llamas are the answer to AI taking over our jobs

So the government should buy us all one

Apr 08, 2023

Last Friday, I attended Hugging Face's conference at San Francisco’s Exploratorium. Initially it was supposed to be a meetup of a few dozen people or maybe a hundred at most. But registrations quickly ballooned to 5,000 and people started referring to it as the Woodstock of AI, referencing the Woodstock music festival in 1969. The festival symbolized the counterculture movement of peace and idealism. While the Hugging Face logo 🤗 could have been the icon of the original Woodstock, the company’s mission is to democratize AI by publicly hosting thousands of AI models and datasets for anyone to access.

Alongside the buzz and excitement of AI is a growing chorus of concerns about the consequences of foundation models. The Future of Life Institute, an organization whose mission is to steer humanity away from extreme risks, penned a petition to pause any AI research smarter than GPT-4. GPT-4 already shows sparks of AGI, scoring at the 90th percentile among test takers of several academic and professional exams. Over 50,000 signatories are worried that we are not prepared to deal with the risks of AI systems becoming competitive with humans. This essay explores one of the questions posed in the letter: should we automate away all the jobs, including the fulfilling ones?

Will AI actually automate all the jobs?

Shortly after GPT-4's release, OpenAI and University of Pennsylvania researchers studied the job market impact of large language models (LLMs) by breaking down jobs into tasks, such as system engineers monitoring computers or gambling cage workers processing payments. They noted that some jobs share similar tasks, like grading exams for both elementary and high school teachers. For each task, human experts and GPT-4 assessed whether LLMs could reduce the time it takes to complete a task. They found that LLMs allowed 15% of tasks to be completed significantly faster without quality loss, which increased to 50% with LLM-powered software. Intriguingly, GPT-4's estimates were consistent with human experts', suggesting that the researchers could have relied on GPT-4 to reach the same conclusions.

Sample of occupations and tasks from the Bureau of Labor Statistics

The study is insightful because it examines specific tasks rather than making broad estimates based on industries or job titles alone. However, it's not data-driven. Automatability is determined by subjective judgments. So, let’s look at an actual study measuring the ChatGPT’s productivity boost. But first, let me share a personal anecdote.

I've been using ChatGPT as a writing partner for several weeks now, and although I haven't measured my daily word output, I feel about 20-30% more productive. An essay that would typically take me ten days to complete can now be done in just seven. ChatGPT assists with common LLM tasks like generating essay feedback, refining sentences, and helping me find the perfect word – essentially, a conversational thesaurus. But what's astonishing is how it also boosts creativity. While writing my previous essay "GPTs are GPTs," I wanted to include some witty remarks about big tech companies' dominance.

On Google’s ability to dominate the digital ad market:

By indexing and organizing web content, Google created new opportunities for different forms of online advertising from replacing top search results with paid ads to infuriating commercials in between videos.

On Apple’s ability to create complementary innovations:

These devices created an ecosystem of mobile applications and services that have shaped the way we communicate and live - ride haling with Uber, food delivery with DoorDash, and digital nicotine with Tiktok.

But I was struggling with Microsoft. There weren’t any holes I could poke into their products. Its ad-free, unaddictive software that helps you get your work done. After gnawing at it for 15 minutes, I finally relented and asked ChatGPT for help:

Me: Help me add a snarky, witty sentence in the Microsoft example similar to what I did for Google and Apple.

What came out became my favorite line in that essay. How could I have forgotten the frustrations of sleuthing through random forums to figure out why there was an extra line break in my Word doc that I could not get rid of.

ChatGPT: The standardization of Windows OS enabled generations of developers to build higher-level applications and turned countless users into masters of Word's arcane formatting mysteries, Excel's labyrinthine formulas, and PowerPoint's hypnotic slide transitions, ultimately revolutionizing how we work, procrastinate, and communicate in colorful pie charts.

Many people have shared similar anecdotes about their experiences working with ChatGPT, but it's hard to generalize from these since everyone's experience is unique. Thankfully, pioneering MIT PhD students Shakked Noy and Whitney Zhang conducted a controlled experiment to evaluate ChatGPT's impact. They divided about 450 experienced white-collar professionals, including marketers, managers, and grant writers, into two groups: those with access to ChatGPT and those without. Participants from both groups were asked to complete writing tasks resembling real job tasks, such as crafting press releases, short reports, analysis plans, and sensitive emails. Their performance was assessed based on the speed and quality of their work. We now have statistically significant data on ChatGPT’s effects on work.

ChatGPT treatment effect on (a) time to complete a task and (b) quality of work

Not surprisingly, ChatGPT users completed tasks ~40% faster while also producing higher quality work than those who did not. The more intriguing result of their study came from further probing how participants were using ChatGPT: were they simply copy-pasting ChatGPT’s output? Did they just use the first output or did they iteratively refine their work? They found that ~70% copy-pasted the first output without editing it at all. Those who spent more time on ChatGPT to refine their work did not result in better work quality. This means that ChatGPT can immediately substitute for some workers. It’s a story of replacement rather than augmentation.

Mark Zuckerberg declared 2023 will be Meta’s year of efficiency. This eventually became the rallying cry of CEOs. If employees could be 40% more productive without loss of work throughput and quality, its unfortunately logical for companies to let employees go or freeze hiring for a while. Capitalists will accrue the benefits of ChatGPT at the expense of workers.

Sprinting towards further disruption

If current systems, which are mostly pure LLMs, can already impact us this much, imagine what more advanced LLM-powered systems? Industry researchers already have prototypes of foundation models that directly instruct other systems. If GPT-4 was a game-changer, interacting with external systems would be a revolution. We've already seen the the beginnings of this with OpenAI's ChatGPT plugins. With the plugins, users could order groceries via Instacart and plan trip via Expedia just by chatting. Taking this idea a step further, Microsoft researchers recently published a week-old paper detailing TaskMatrix.AI, a system using a conversational foundation model to interact with “millions of APIs” to complete tasks. Below are some excerpts from the paper showing an overview of the system and my favorite example of what TaskMatrix can do: create a formatted PowerPoint presentation about big tech companies just by chatting.

Overview of TaskMatrix.AI. Given user instruction and the conversational context, the multimodal conversational foundation model (MCFM) first generates a solution outline (step 1), which is a textual description of the steps needed to solve the task. Then, the API selector chooses the most relevant APIs from the API platform according to the solution outline (step 2). Next, MCFM generates action codes using the recommended APIs, which will be further executed by calling APIs. Last, the user feedback on task completion is returned to MCFM and API developers.

Multiple rounds of dialogue between user and TaskMatrix.AI. TaskMatrix.AI can understand user instructions and operate PowerPoint on behalf of users. TaskMatrix.AI is capable of breaking down the user’s complex instructions into multiple PowerPoint operations, assisting users in finding and using infrequent features, and generalizing the same patterns across multiple pages. While we display the API calls in a gray text box, this information is not necessary for the user.

More rounds of dialogue between user and TaskMatrix.AI. TaskMatrix.AI can accomplish the insert logo instruction by the insert internet feature of PowerPoint with API insert_internet_image(”Microsoft logo”). This feature will provide multiple images for users. TaskMatrix.AI can take the user’s instructions to select one of them. In the example. we omitted the selection steps for brevity.

Creating the presentation above might've taken me 30-45 minutes to complete without any assistance. If I had TaskMatrix to help me, it would have taken me 5-10 minutes maximum, including the idle time of watching the computer fetch and resize images from the internet. Sorry to belabor the same point, the impact on the job market will be massive. And it’s coming faster than we’re prepared for.

OpenAI wrapped up GPT-4 training in August 2022 and refined it until its March 2023 launch. They've likely finished training GPT-5, with rumors pointing to a December 2023 release. OpenAI isn't racing alone towards AGI. The whole industry is. Past general-purpose technologies took years, even decades, to achieve widespread adoption. Power plants and electrical grids are massive, multi-year infrastructure projects, while the internet needed computing devices and undersea cables. But AGI-like systems are software accessible to anyone with an internet connection, so adoption will be swift. That's why ChatGPT hit 100 million monthly active users in just two months.

No clear answers but we’ll adapt

Writing this essay has left me feeling torn. On one hand, I'm thrilled about the technology and still amazed at how helpful of a collaborator it is to me. On the other hand, I'm filled with anxiety and uncertainty about what the world will look like a year from now. Nevertheless, I choose to stay hopeful. As Stephen Wolfram beautifully puts it - as a society we’ll adapt.

Technology in some way or another enables some new occupation. And eventually that occupation becomes widespread, and lots of people do it. But then there’s a technological advance, and the occupation gets automated—and people aren’t needed to do it anymore. But now there’s a new level of technology, that enables new occupations. And the cycle continues.
A century ago the increasingly widespread use of telephones meant that more and more people worked as switchboard operators. But then telephone switching was automated—and those switchboard operators weren’t needed anymore. But with automated switching there could be huge development of telecommunications infrastructure, opening up all sorts of new types of jobs, that in aggregate employ vastly more people than were ever switchboard operators.
Something somewhat similar happened with accounting clerks. Before there were computers, one needed to have people laboriously tallying up numbers. But with computers, that was all automated away. But with that automation came the ability to do more complex financial computations—which allowed for more complex financial transactions, more complex regulations, etc., which in turn led to all sorts of new types of jobs.
And across a whole range of industries, it’s been the same kind of story. Automation obsoletes some jobs, but enables others. There’s quite often a gap in time, and a change in the skills that are needed. But at least so far there always seems to have been a broad frontier of jobs that have been made possible—but haven’t yet been automated.

But what can we do as individuals to prepare now? We can't just wait for the day we get laid off, watch how society adjusts, and then adapt. We have to be proactive to stay relevant. I don't have answers, but here's what I'm doing:

Leveraging conversational foundation models like ChatGPT to automate routine parts of my work, so I can focus on the most creative aspects. AI is excellent at repeating existing knowledge, but individuals who can generate new knowledge will be more valuable.

Building and deepening personal relationships. Strong connections help us grow and its also good for our soul.

Pursuing analog interests like improving my cooking and pottery skills. Maybe I'll open the cafe I've been planning for my retirement decades sooner.

Everyone should get a Llama

One of the crowded demo tables at the Hugging Face event was helmed by the Stanford team that built Alpaca, an open-source alternative to ChatGPT. Stanford released the model along with a live demo, which was taken down in a few days due to costs. It's expensive to run these models, especially polished, responsive products like ChatGPT. OpenAI can offer ChatGPT for free, thanks to Microsoft covering the expenses. Hugging Face already provides a public service by hosting free ChatGPT-like models, although they might be slower and less user-friendly. If usage soars, Hugging Face will need to throttle access to manage costs

Foundation models are changing the way we work. They can boost productivity significantly, letting people focus on creative and strategic tasks. Given their importance, everyone should have access to these tools. That's why I'm proposing Project Llama: a public or non-profit sponsored program designed to give everyone access to a powerful conversational foundation model like ChatGPT. To handle the risk of misuse, strict automated moderation and periodic human reviews of selected conversations would be implemented. By offering universal access to AI systems that mimic human interactions, Project Llama aims to level the playing field, helping people adapt to new ways of working and preparing for a changing job market.

Curated reads:

Commercial: ChatGPT plugins

Societal: Will AIs Take All Our Jobs and End Human History—or Not? Well, It’s Complicated

Technical: Sparks of Artificial General Intelligence: Early experiments with GPT-4

3 Comments

Moritz Wallawitsch

Scaling Knowledge

May 31, 2023Liked by Kenn So

The conclusion here seems wrong:

> They found that ~70% copy-pasted the first output without editing it at all. .... This means that ChatGPT can immediately substitute for some workers. It’s a story of replacement rather than augmentation.

1. If they only accepted 70% and not 100% isn't that an argument for augmentation not automation?

2. Who wrote the prompt? We always need someone to write the prompt as these models are no generally intelligent agents.

I wrote a more nuanced take on this topic here: https://scalingknowledge.substack.com/p/why-job-displacement-predictions

Expand full comment

1 reply by Kenn So