Features
Connect Knowledge Base
Connect your knowledge base to TypingMind
ChatGPT and Large Language Models (LLMs) like Anthropic Claude and Gemini are powerful tools for brainstorming ideas, creating content, generating images, and enhancing daily workflows.
However, they have a limitation: LLMs perform best with their knowledge base / training data only.
They can't provide specific insights into your unique business needs - like detailed sales reports or tailored marketing strategies - without the access to your domain-specific knowledge base.
TypingMind can help you fill in that gap by allowing you to connect your own knowledge base to ChatGPT and LLMs easily!
Why Connect Your Knowledge Base to TypingMind?
TypingMind provides you with:
- Connect knowledge from various sources: PDF, TXT, XLSX, Notion, Intercom, Web Scrape etc.
- Keep your data fresh and updated with a single click.
- Train multiple AI models, such as GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro with your custom data.
- Ensure data security and privacy.
- Start effortlessly with no coding required.
How to Setup Your Knowledge Base
- Go to the Admin Panel → Knowledge Base → Enable knowledge base
- Click on Add Data Source to connect the chat instance with your knowledge base
- Select the sources you want to connect with:
Available Sources for Knowledge Base
1. Directly upload your files
Easily upload files up to 20MB each, supporting a wide range of file types, including PDF, DOCX, TXT, CSV, XLSX, and more. You can upload multiple files at once to centralize your data and make it accessible for training AI models.
2. Connect with your existing internal systems
Seamlessly pull data from various services you already use, such as Notion, Intercom, Google Drive, Confluence, Sharepoint, One Drive and more.
This allows you to integrate the data from these systems to train your AI assistant, thus ensuring your knowledge base is rich, comprehensive, and up-to-date.
How knowledge base works on TypingMind
The AI assistant get the data from uploaded files via a vector database (RAG). Here is how the files are processed:
- Files are uploaded / data are connected
- We extract the raw text from the files / connected data and try our best to preserve the meaningful context of the file.
3.We split the text into chunks of roughly 3,000 words words each, with some overlap. The chunks are separated and split in a way that preserves the meaningful context of the provided data. (Note that the chunk size may change in the future; for now, you can’t change this number.) - These chunks are stored in a database.
- When your users send a chat message, the system will try to retrieve up to 5 relevant chunks from the database (based on the content of the chat so far) and provide that as a context to the AI assistant via the system message. This means the AI assistant will have access to the 5 most relevant chunks of knowledge base at all time during a chat.
- The “relevantness” of the chunks are decided by our system and we are improving this with every update of the system.
- The AI assistant will rely on the provided text chunks from the system message to provide the best answer to the user.
All of your connected files are stored securely on our system. We never share your data to anyone else without informing you before hand.
With data connected from your system, you can quickly resync whenever there are updates, keeping your data always up-to-date and dynamic.
Other methods to connect your knowledge base to TypingMind
Beside directly upload files or connect your knowledge base source via TypingMind, there are other options to connect your them to your chat instance:
- Set up System Prompt: a predefined input that guides and sets the context for how the AI, such as GPT-4, should respond.
- Implement RAG via a plugin: connect your database via a plugin (function calling) that allows the AI model to query and retrieve data in real time.
- Use Dynamic context via API for AI Agent: retrieve content from an API and inject it into the system prompt
- Use Custom model with RLHF method
More details on different levels of data integration on TypingMind: 4 Levels of Data Integration on TypingMind.
Best practices to provide knowledge base
- Use raw text in Markdown format if you can. LLM model understands markdown very well and can make sense of the content much more efficient compare to PDFs, DOCX, etc.
- Use both Upload Files and System Instruction. A combination of a well-prompted system instruction and a clean knowledge base is proven to have the best result for the end users.
- Stay tuned for quality updates from us. We improve the knowledge base processing and handling all the time. We’re working on some approaches that will guarantee to provide much better quality overall for the AI assistant to lookup your knowledge base. Be sure to check our updates at our Blog and our Discord.
Be aware of Prompt Injection attacks
By default, all knowledge base are not visible to the end users .However, all LLM models are subject to Prompt Injection attacks. This means the user may be able to read some of your knowledge base