There are many accessible introductions that explain what Large Language Models1 (LLMs) are, how they are built, and what they can and cannot accomplish. Check out this example2 for a reasonable overview aimed at a non-technical but interested audience. As with any significant innovation, much has been said about the implications of LLM based technologies. Popular perspectives fall somewhere between an impending AI doomsday and AI-eats-all-software. The only thing anyone seems to agree on is that LLMs are an important and transformative technology. The moment in time we currently find ourselves in is being referred to as the next “platform shift” analogous to the developments of the microprocessor, GUI, PC, browser, smartphone, and cloud. But amid all of the media and tech industry fervour, our daily workflows have only seen incremental transformations, the software products we use remain relatively unchanged, our inboxes continue to overflow, and our desktop work has yet to be automated away. The novelty of the chat interface is fading, and it’s leaving people confused as to what the practical applications of LLMs are. In an attempt to give a clear-headed view of what is happening now, we are sharing a brief overview of our current perspectives as active investors in this new wave of technology. The predictions will be out of date from the moment we write them, but that is what makes predictions fun.
We also want to recognize that this article represents the synthesis of many sources, ideas and team member opinions while also being heavily biassed by our personal views and experiences. Credit the team for the work and the authors for misunderstandings, misrepresentations or missed information.
Does Generative AI Represent an Existential Risk
Despite frequent, and high-pitched, media swirl that is breathless and often promotes self-serving “concerns” about the emergence of Artificial General Intelligence (AGI), it is important to clarify that these models are devoid of human cognitive abilities, of world models, of understanding, and of self-awareness. More specifically, these models are predictive and are fundamentally limited by the huge volumes of content they have been trained with – see for example3. LLMs do not “reason” or “infer” or offer reliable “insights” outside of their training ranges. As always, pay attention to the financial incentives and pre-existing philosophical biases of any unprovable claims, including ours.
What these systems do well is generate human-readable summarizations, make useful suggestions based on enormous amounts of ingested information, and form the basis for synthesising expertise and observations to greatly simplify repetitive work. Our favourite observation is that these systems are potentially “a revolution in usefulness”, particularly when combined with expert human users.4
Like other statistical machine learning approaches, LLMs are devoid of innate ethical boundaries. They are tools to be used by humans in specific contexts, for specific purposes and with specific limitations. As an ethical investor, we at Inovia are committed to investing intentionally and with a lens to supporting founders who build products and services that benefit society while keeping privacy, safety and security in mind5. As with any new technology, we must constantly review potential investments through these and other lenses. The more novel the technical approaches to problem solving, the more intentional we intend to be in our investing commitments.
How Business and Technology Leaders are Reacting
Leading technology providers are scrambling to keep up with the latest advancements and what that means for their respective products, companies, employees, and customers.
Board members, CEOs, and investors are looking to their technology leaders to “AI-ify” their companies for different reasons. The most prevalent, short-term, obvious, and accessible application in existing businesses is the opportunity for human cost-reduction in workflows. As Nicholas Frosst at Cohere puts it “the most exciting problems being solved right now are very frequently the most boring”. This includes a range of solutions from document categorization, summarization and re-formatting, to automation of human information processing tasks.
However boring these tasks may sound, these classes of problems can be very expensive at scale, and reducing the cost while maintaining a high quality of output can be very impactful to businesses in a relatively short time frame. Realising these types of outcomes in a large enterprise requires a process of thoughtful experimentation and a detailed understanding of your customers’ problems, workflows and data. LLMs are a new category of tool to be added to the existing information processing toolset, not a panacea that instantly replaces human intensive labour.
Perhaps most importantly for companies in a rapidly evolving competitive landscape, the opportunity to deliver new and expanded value to customers must not be ignored. It is critical that everyone focused on cost-reduction applications of the new AI remember that big tech incumbents and smart agile competitors are pursuing a meaningful product expansion and innovation pace in parallel with well-capitalised entrepreneurs. All are leveraging speed, agility, and experimentation to deliver new value in niche but important workflows.
Insights & Observations
Over the last 24 months we have assembled a team of experts both internally and externally at Inovia to help advise the firm and guide our thinking regarding the implications of LLMs in building, operating, and scaling companies. Combining this information with what we observe and learn from our efforts investing in and guiding portfolio companies, we have come up with a few observations that we believe are useful when thinking about what is actually happening right now.
- Large transformer models generally outperform smaller domain specific models with only a few simple specialisations and strategies (i.e. prompt engineering6). This outperformance is the result of very large investments in talent, data, and compute from a small set of companies. The cost of creating a state of the art base layer/foundational LLM from scratch needs to be measured in billions of dollars and years of experience and training at this moment in time.
- Orchestrating small models on top of LLMs has value. Approaches to building tiered or segmented solutions with multiple models where each is optimised to narrower categories of content or use cases are competing with solutions that just use a broad model. This hinges on the thoughtful decomposition of big problems into smaller ones and the management of workflows across multiple models in various iterations. We are seeing this lead to very powerful, fast, and cost-effective solutions.
- Early wins have gone to incumbents. Although we have seen a large influx of VC dollars (and corresponding large valuations) into the startup ecosystem, there are fewer examples of success for net-new startups. Only a handful have found true product-market-fit, most of which are consumer facing (such as ChatGPT) while many are early in their monetization journey. Early examples of success with generative AI tools are more prevalent in sophisticated organisations with strong engineering, data hygiene, distribution, and customer relationships (i.e. Adobe Firefly, GitHub Co-Pilot, Zendesk, Intercom, etc.).
- Adoption of LLM-based technologies is happening and accelerating. Some 92% of US-based developers are already using AI-powered coding tools at work with increasing dexterity7. Conversations with tier 1 software professionals lead us to believe that productivity improvements through tools such as CoPilot and GPT4 directly are currently biassed to the highest performing developers. We believe the same will be true for analysts, creators, researchers and many other knowledge workers, especially for complex problem solving tasks. What’s not so obvious is that while these tools empower experts to be even better, lower performers can sometimes produce complex things that are wrong or incorrect quickly without a good basis for understanding what is being suggested to them, creating a lot of potential risk.
- While the jury is out on the measurable ROI of development tools, early signs are promising. Measuring software productivity is always fraught with debate. More code is often bad. Quick code is often missing key attributes. Empowering the novice with quickly operable code that they do not understand is potentially dangerous for systems integrity. Counter-arguments are emerging8 regarding how much value is generated in CoPilot-like development tools, and to whom that value benefits. The reality remains that highly repetitive tasks such as writing unit tests, documentation and autocomplete type suggestions are useful but as yet there is no replacement for human expertise in complex analytic and problem-solving tasks today. This matters because the largest proportion of time that is invested in software development is not in coding but rather in design, reading, thought, intuition and interpersonal collaboration. One area where we have seen interesting results has been in the integration of LLM capabilities into a process of analyzing systems and proactively suggesting code for review and potential debugging. Systems are emerging that combine errors with program state and logs to create a very useful set of pointers for developers to investigate and even are starting to suggest possible renovations to consider.
- AI tourism is real. Startups and larger companies with quick user growth but with limited monetization are common. Build vs. buy procurement is happening in parallel in every enterprise, often without resources for proper learning, experimentation and appropriate prioritisation. It is unclear in many cases when activity will turn into meaningful and sustainable revenue.
- There is an asymmetric risk/reward for existing companies with a proactive approach to innovation. Entrepreneurs who understand their domain, customers, workflows, and data can leverage these technological advancements to fuel the next phase of their growth. One example we have seen is a company that took almost a decade to get $10M of revenue, integrated LLMs into their workflows appropriate to the domain and in less than 3 months more than doubled their revenue. This pattern of understanding a domain and workflow needs intimately, understanding the potential of new technology, and experimenting aggressively is a massive requirement for competing in many areas today. As new capabilities emerge the prepared experimental team is going to be well prepared to take full advantage in gaining momentum through improved value to customers.
- Data and information hygiene is a competitive advantage and a necessity. Companies with great documentation, well organised information about products, customers, use cases, analysis and general corporate knowledge are very well positioned to get a 10x advantage in information synthesis, summarization and generation. LLMs are being used not just for generation but structuring large amounts of data to drive valuable business insights. In addition to content generation, companies at all scales are finding LLMs helpful for data processing, structuring and insights.
- LLMs and Chatbots are hardly a panacea to all information heavy work. Product design and user interface design is as important as ever. Empowering data workers with these tools and guiding them through the process of working in new ways is going to be a very important component in delivering value while hiding complexity.
- Model building is more of an art than a science, composed of expensive technical trial and error. This is a lot like a chef creating a recipe, not a traditional engineering exercise. Put a dataset through an algorithm with a number of key settings, pay for a series of expensive training runs, and evaluate the outputs. Sometimes it “works”, sometimes it does not. Tweak a few settings, train it again and so on.
Near Term Predictions
We’ve outlined what we have seen up to today. Now on to a few key predictions about what could happen in the near, mid and far flung future. It’s worth repeating the sentiment from earlier that these will be out of date the moment they are written and very likely wrong the further out we make them, but let’s make them anyways.
- Foundational models will continue to get better. Bigger, more expensive, more capable foundational models will continue to show step changes in capabilities. Billions of dollars will continue to be invested in model experimentation in foundational models until we see an obvious declining return in larger models. The coming iterations will likely not just be bigger LLMs but will be augmented with additional tech like Retrieval Augmented Generation9 (RAG), etc to improve explainability and patch some of the LLM weaknesses. As these iterations advance, tools that enable LLMs to be grounded in fact and internal data (i.e. knowledge graphs and vector databases) will be embedded as a core part of the Generative AI tech stack, with knowledge graphs growing in importance as users look to LLM-enabled solutions to answer more complex queries. These improvements will also include techniques to build cheaper and faster serving strategies both in hardware focused on serving models and in strategies to optimise or compile models.
- Trust in outcomes will get better over time and will start in vertical specific use cases. The overall explanation of very large models is a deep research issue with no clear breakthroughs on the horizon. That said, much can be done in specific application areas to move from directionally right to trusted outputs. Today we have seen this accomplished by using methods like RAG and other grounding techniques sometimes coupled with a click through attribution that allows the reader to see exactly where the information they are reading came from. Even then, the bar for “trust” is often a hard one to define. On many objective mathematical measures the models already outperform humans when you compare error rates on specific tasks like document categorization. The subtlety here is that we often ascribe some other attribute to decision making that is special to human consciousness that can’t be captured with statistical comparisons of outputs. One leading researcher recently suggested to us that you might imagine a future where you are given two alternate recommendations for a medical plan of action. One of these is completely understood and explainable through common medical practice, is offered to you by a physician, and works 50% of the time. The other is less explainable and generated by a model but works 90% of the time. You choose which advice and plan of action you take. As in the case of self driving cars vs human drivers, we have a tendency to be less forgiving of machine generated errors even when they are made less often.
- Comparative metrics will continue to be fraught. Broad often-cited comparative metrics among LLMs are mostly inadequate in pre-determining specific model applicability to a particular problem. Better category-specific metrics proven on potential customer data will emerge as companies realise that broad generic metrics are useless guideposts to creating specific LLM-strategies for their own unique problems.
- New techniques will emerge to train models. Today, the second bottleneck in improving model performance (behind compute) is access to high-quality annotated training data to “steer” the models. AI models need increasingly unique data sets to improve their performance. Web data is “no longer good enough” and getting “extremely expensive”10. Companies will continue to use more synthetic data (data generated by another model) to generate high-quality labelled training data. New providers of synthetic data will become significant businesses in these ecosystems. Model companies today are hiring their own in-house data annotators. This cottage industry surrounding LLMs will continue to grow making labelling training data a lucrative full time career.
- Industry shifts are imminent. When step changes happen in technology the solutions stack gets reshuffled in many different ways as companies joust for the chance to add more value in their own solutions. You can see this already as Cloud providers are hosting their own models and services right beside third-party models and services in co-opetition. Model providers are releasing developer environments and making them accessible to non-developers, and database companies are releasing LLM-powered features alongside their existing services making it possible to get more value from their customers’ own data at little incremental cost. All players are building down, up and sideways to try to capture more of this emerging market and we will continue to see big bets in the space before the dust settles.
- Everything is going to get hyper personal. We are already seeing relatively cheap hyper personalization of messaging, marketing, descriptions, and reports. This will continue as the content you consume gets more personal and recommendation engines improve dramatically as the models we enable are fed more information about our activities.The synthesis of large bodies of content into tight actionable advice is already here and will continue to get better. It is worth highlighting that11, as in all technologies, applications are net-positive only in the subjective sense to users. While it is possible today to tailor articles, ads, pages, audiobooks, books to individuals or categories of individuals, this is now going to be much much cheaper with all kinds of cascading impacts over time. Not all of those implications will be universally positive. It will continue to be important for those incorporating capabilities into workflows to have expert-level awareness in testing and evaluating the differences between convincing text and task-appropriate text.
- Every tool that we use to process information will improve. The tools (Google Workspace, Microsoft Office, Slack) that help you do your primary tasks everyday will get better quickly. Elastic search has been a mainstay in products we all know and love for over a decade and is being replaced quickly by Vector search. This approach takes the context encoded in the language and provides more powerful questions, answers and iteration capabilities. We will see more of this.
- There is a race to the bottom on pricing as “good enough” competes with “great”. Just about any use case that is a single feature or point solution where vast structured data is widely available allowing for high performing models to be built to deliver that service will be part of the race to the bottom. Services like content creation, translation, photo/video editing, and customer service will likely be included in this list. This will be accelerated as companies give services away for free or at a massive discount as part of bundled services to win market share.
- There is no near term relief on compute constraints. Compute capacity will remain a barrier to entry, and a competitive advantage. Companies without access to large state-of-the-art clusters will execute backward exploitation of non state-of-the-art hardware to increase global computational capacity for training and serving large scale models. This likely happens in the open source fine tuned model world more often while private LLMs continue to sign large compute deals to make step changes for a while. Data hygiene and model optimization services will move to capitalize on this opportunity.
- We will see information produced in a way that is compatible with how LLMs are trained/optimised. LLMs are amazing at human language as they have ingested so much of it. This is true also for computer languages. We are already seeing new approaches to encoding information such as APIs in new ways so as to allow LLMs to reflect that content in their generation and summarizations. Imagine giving an LLM access to old programming languages (Cobal) to then be used to make better new ones. This trend will continue and increase in pace.
- Larger context windows will enable the next killer use case. Cheaper serving paired with a longer context window in foundational models will enable new applications that aren’t yet possible or practical.
- Creating reliable large scale systems will get a lot more complex. Imagine a world with millions of models optimised from dozens of foundational models. Each of these models optimised on every changing subset of data and with recipes and approaches that evolve over time. Reproduction of identical results is likely impossible, reversion similarly challenging. On top of this is a world of generative outputs from code to advice based on constantly changing prompts and, in many cases, overlaid sets of prompts with different grounding advice. Compared to software (which is already a major challenge to operate at scale with constant change) the new world is chaotic. Very different and new approaches to managing large scale systems will need to emerge covering pieces like testing, tools, organisation, controls.
Medium-Long Term Predictions
If we weren’t wrong before, we almost surely will be now…
- Transformer-based algorithms are likely to be supplanted with newer approaches that produce more capable models. These are likely to be proprietary for some time based on current trends away from publication. Liquid Neural Nets12 are a recent example of what is possibly coming.
- Coherence will increase significantly. Mathematics, reasoning and semantics will be successfully used alongside LLMs to create more complex systems in which a wider range of questions can be answered coherently in an increasingly wider range of domains. This will fix many of the obvious blind spots prevalent in LLMs alone today.
- Proving you’re not a robot online will get a lot harder. Voice models that are nearly indistinguishable from a human voice are already here passing the Turing Test when combined with LLMs. New tests and ways to verify if someone is a real person will be needed to separate human from non-human utterances, documents, images and videos.
- Fully generated video will first take hold in animation. Nobody has solved generative video. It’s not clear when or how that happens. Watch for animation first.
- Gaming LLMs will greatly expand deterministic gaming worlds. Developers and users will generate gaming experiences with just a few prompts in some of your favourite games. LLM-powered non-player characters are appearing providing richer, more personalised experiences as a first step. The concept of “beating” a deterministic game as the end goal will be put to the test. Game studios will have to balance how much repeatability is important in a game.
- AI created language will become a thing. LLMs are trained on huge amounts of human text examples of language and are optimised to produce human-like text. Similarly when trained on other human languages or on translations between languages they are able to generate text that fits those use cases. Training on computer language examples creates models that can do predictive code creation. An interesting evolution is that languages can now be created and examples found or generated to help train the models in that new language. For example, try asking a model to create a mini basic. It will define the programming language, generate and example, run it and create an interpreter for it. This will continue to get weirder.
- AI will interact with your surroundings through AR. The Intersection of AR and AI is very rudimentary right now (take a photo, upload to GPT4). With sufficient computation and local processing the ability to query images could be made possible in an AR context or say via cameras situated in the real world.
- A new interface will emerge. Interface evolution is going to be interesting and the focus of parallel innovations as experts are offered more capability and flexibility beyond textual prompting. It’s likely the most popular new interface of the next 5 years is not going to be web pages and search. That said, pure textual or voice interfacing is not going to be universally adopted overnight. Hybrid interfaces are historically really hard. Voice interfaces are often creepy or annoying13.
What does this mean for venture investing in Generative AI startups?
So what, you might ask, does an investor do with all of these thoughts about today and the future? We find companies that are making our predictions happen or proving them wrong, and we support their journey. The advent of widespread access to LLMs has spawned entirely new categories of startups, many of which have already attracted unicorn+ valuations. This is not surprising as profound technological advancement broadens the opportunity set in front of entrepreneurs, not only to revolutionise existing industries, but to define new product categories that were previously thought to be impossible. Here are a few different things we look for in investments. No single company is all of these things but it gives you a sense of how we think about it.
- A deep understanding of text and data-intensive workflows where LLMs can be leveraged to dramatically change the cost-value perspective on the services and products the company builds. Here we have seen companies that replace 90% of the cost of human document classification, as an example.
- A capacity to build, iterate and ship products quickly with a specific edge on experimentation, measurement and adaptability. A simple example might be using LLMs to drastically increase your test coverage and documentation to radically improve your speed and quality of deployments.
- Product market fit but with an opportunity to be a fast mover to expand their product into adjacent areas and deliver more value to existing customers with existing data and workflows with which they are very familiar. We are seeing this in database companies that are embracing Vector forms alongside all other data types supported in a single database.
- Early stage teams at the start of their company building journey with tier 1 talent, ML understanding, product-engineering capability for rapid iteration, and a commitment to disruption. In some cases we have seen teams generating their product prototype in under an hour.
- Emerging new infrastructure for fast iteration and value delivery and reasonable costs at scale. Opportunities abound in making things cheaper, faster, and optimised.
- Data is the driving force for innovation. Teams with access to unique datasets of huge potential value and application. New ways of exploiting existing public datasets are interesting too but potentially less sustainable.
Obvious Risks and Evolving Circumstances
Companies and groups of all kinds are struggling to understand what can be done, and also what should be done or not done with new technologies in their own particular contexts. Many areas of concern exist in terms of the legal use of content in producing models, in the way in which models encode and reflect biases of many kinds, in the synthesis of information that is human-like and potentially incorrect or perhaps worse – designed to be misleading. The gaps between well-created marketing content and intentionally misleading political content has long been a point of contention in democracies and this potential is clearly higher with advanced content creation tools that LLMs make possible.
- Existing Law. Legal battles are being fought on many fronts with regards to the use of private data (books), publicly available data (websites, news articles, medium etc), and social media content (Reddit) in the creation of LLMs. As countries decide what fair use means to them we can expect change in how models get built and trained. Models are already being built with different assumptions in mind.
- New Regulation. Governments are in various stages of over-reaction to the use of LLMs. In many cases existing law already covers data use in fairly well explored contexts. With national variations inevitable this will affect where and how companies build and deploy models – and in many cases regulations suggest that derivative works or use of APIs creates legal liabilities as well.
- International Affairs. The vast majority of hardware required to build and serve LLMs is built in Taiwan and other resources are currently being limited in national contexts especially between China and the USA. These risks are not insignificant if current demand trends continue. Efforts in Canada to create a national computing infrastructure to advantage Canadian companies could be a huge medium-term trend globally – particularly if the overall cost is delivered to advantage nationally-based companies. Note there are likely new challenges ahead in this type of situation on the inter-country trade agreement front.
Conclusion
So there you have it. A wide ranging set of thoughts, observations and predictions on what we at Inovia are seeing as active investors in the world of Generative AI. We spent a lot of time to get to where we are today and we have a very long way to go with much to learn. We are looking to connect with Founders, Funders and other experts who want to engage, discuss and debate on the topics you read above, so please don’t hesitate to reach out to the authors at [email protected] or [email protected].
- https://www.nvidia.com/en-us/glossary/data-science/large-language-models/ ↩︎
- https://mark-riedl.medium.com/a-very-gentle-introduction-to-large-language-models-without-the-hype-5f67941fa59e ↩︎
- https://arxiv.org/abs/2311.00871 ↩︎
- https://www.zdnet.com/article/chatgpts-intelligence-is-zero/ ↩︎
- https://www.inovia.vc/esg/ ↩︎
- https://www.datacamp.com/blog/what-is-prompt-engineering-the-future-of-ai-communication ↩︎
- https://github.blog/2023-06-27-the-economic-impact-of-the-ai-powered-developer-lifecycle-and-lessons-from-github-copilot/ ↩︎
- https://nationalpost.com/news/canada/ai-is-coming-after-the-tech-bros-and-their-easy-money ↩︎
- https://research.ibm.com/blog/retrieval-augmented-generation-RAG ↩︎
- https://www.ft.com/content/053ee253-820e-453a-a1d5-0f24985258de ↩︎
- ht/ Alex G. ↩︎
- https://techcrunch.com/2023/12/06/liquid-ai-a-new-mit-spinoff-wants-to-build-an-entirely-new-type-of-ai/ ↩︎
- ht/ Jules. ↩︎