Generative AI tools like ChatGPT have captured the popular imagination, igniting widespread interest in potential business applications. At this early stage, the biggest pitfall surrounding generative AI is uncertainty in the source material being used. When a large language model (LLM) like ChatGPT writes an article, is it plagiarizing from unidentified source(s)? Is it drawing on inaccurate data or “hallucinating” false conclusions? Companies are rightly hesitant about the risks of using AI-generated content of questionable provenance or integrity.
One way to solve this dilemma is by taking control of the input data used to “train” an LLM. Imagine if your company could have its own customized ChatGPT-like system powered by internal data and documented sources, rather than whatever has been pulled from the internet? What if you could generate AI content and insights with confidence in its legitimacy, leveraging your own knowledge and keeping external references duly documented and footnoted?
This is not a hypothetical scenario for the future. Right now there are ways to train an LLM with your own set of proprietary data. It’s off to a slow start, but it’s happening. Having your own customized in-house LLM may soon be as commonplace as having your own website.
Pathways to Proprietary AI
An overview in the Harvard Business Review lists three primary ways to go about training an LLM using your company’s data:
- Training an LLM from scratch
- Fine-tuning an existing LLM
- Prompt-engineering an existing LLM
The first option would require a massive amount of data and development expense, leaving it unfeasible for most companies. Besides, it generally makes sense to build on the platform foundation of an established LLM and focus on making it “smarter” with your inputs, instead of wholly reinventing the wheel.
The third option enters the realm of the “prompt engineer,” the emerging skillset of crafting detailed prompts in order to elicit desired outputs from an LLM. In the short term, this will be the most commonly used of the three methods. There is virtually unlimited potential to train an LLM on your company’s unique data on a point-by-point basis, with the option to create templates for frequently used prompt sets. But the process can be tedious, requiring extensive reviews and trial and error. There is also the risk of entering proprietary data into an LLM system where it may become fodder for at-large generative AI use.
Which brings us to the middle option, fine-tuning an existing LLM. This may prove to be the Goldilocks choice that’s just right: easier and more affordable than starting from scratch, but more powerful, user-friendly and secure as compared to prompt-engineering.
There are a number of open-source LLM solutions available that allow fine-tuning for commercial use, including Databricks Dolly, Mosaic MPT and TII Falcon. These platforms offer methods to simplify and integrate proprietary data ingestion from hundreds of source types, although some level of data science expertise is required. An open-source LLM is the best option for fine-tuning, as opposed to proprietary models like ChatGPT from OpenAI. These may allow for some degree or fine-tuning, but they carry restrictive licenses for modification.
Applications for Proprietary AI
Once you’ve developed a unique LLM trained on your company’s data, the potential uses are limitless. You will gain access to generative AI output that’s more complex and reliable than a general purpose LLM, aligned with your specific knowledge and research, product details, industry conditions and brand positioning. This would be a tremendous advantage for internal reports and analysis, as well as client presentations, hiring and training, sales tools and marketing communications.
You can grant creative and PR partners access to your LLM system so they can collaborate to craft better marketing materials more efficiently than ever before. While it’s true you could let the system write your white papers and press releases and social media content, we believe there will always be a role for the professional agency touch to elevate your communications. And of course, a technical partner like Signal also has the expertise to help you build and maintain your proprietary LLM in the first place.
In the long run, it’s clear that the generative AI tools making a big splash today will be seen as rudimentary first steps. Proprietary LLM development marks a promising next step toward a more powerful and responsible application of AI that both furthers and protects your company’s best interests. Contact us today to discuss the possibilities.