Large language models in the form of chatbots are essentially a new computer interface. PrivateGPT shows how this can be applied to your private data.
Large language models from companies like Microsoft or OpenAI can capture content in documents and make it "chattable". That is, they can have a conversation about the content, explain details or interpret statements, generate summaries, infer new content, and so on. This can help with research and understanding, and when it works reliably, it is a revolutionary new way to interact with computers and content.
But there's a catch: the chatbots from the big tech companies need to read your documents. For privacy reasons, you may want to avoid that. One possible alternative comes from the open-source movement: PrivateGPT, a local document chatbot.
PrivateGPT makes local files chattable
The open-source project enables chatbot conversations about your local files. You can add files to the system and have conversations about their contents without an internet connection. For example, you can analyze the content in a chatbot dialog while all the data is being processed locally. The software currently supports twelve file formats via LangChain, including PowerPoint, Word, PDF and HTML.
PrivateGPT uses GPT4ALL, a local chatbot trained on the Alpaca formula, which in turn is based on an LLaMA variant fine-tuned with 430,000 GPT 3.5 turbo outputs. Alternatively, other locally executable open-source language models such as Camel can be integrated.
Companies could use an application like PrivateGPT for internal knowledge management, customer service, or even to create communication templates from their data without giving third parties access to it.
All necessary files and installation instructions for PrivateGPT are available on Github. A video tutorial on the installation is available from Matthew Berman. In addition, PrivateGPT-App is a web application that serves as a visual interface to PrivateGPT.