运行检索测试
对您的知识库进行检索测试,检查是否能检索到预期的片段。
在您的文件上传和解析后,建议您在进行聊天助手配置之前运行检索测试。运行检索测试绝不是不必要或多余的步骤!就像精密调节仪器一样,RAGFlow需要仔细调优才能提供最佳的问答性能。您的知识库设置、聊天助手配置以及指定的大小模型都会显著影响最终结果。运行检索测试验证是否能检索到预期的片段,让您快速识别改进领域或指出需要解决的任何问题。例如,在调试问答系统时,如果您知道可以检索到正确的片段,就可以将精力集中在其他地方。例如,在问题#5627中,发现问题是由于LLM的限制。
During a retrieval test, chunks created from your specified chunking method are retrieved using a hybrid search. This search combines weighted keyword similarity with either weighted vector cosine similarity or a weighted reranking score, depending on your settings:
- If no rerank model is selected, weighted keyword similarity will be combined with weighted vector cosine similarity.
- If a rerank model is selected, weighted keyword similarity will be combined with weighted vector reranking score.
In contrast, chunks created from knowledge graph construction are retrieved solely using vector cosine similarity.
Prerequisites
- Your files are uploaded and successfully parsed before running a retrieval test.
- A knowledge graph must be successfully built before enabling Use knowledge graph.
Configurations
Similarity threshold
This sets the bar for retrieving chunks: chunks with similarities below the threshold will be filtered out. By default, the threshold is set to 0.2. This means that only chunks with hybrid similarity score of 20 or higher will be retrieved.
Keyword similarity weight
This sets the weight of keyword similarity in the combined similarity score, whether used with vector cosine similarity or a reranking score. By default, it is set to 0.7, making the weight of the other component 0.3 (1 - 0.7).
Rerank model
- If left empty, RAGFlow will use a combination of weighted keyword similarity and weighted vector cosine similarity.
- If a rerank model is selected, weighted keyword similarity will be combined with weighted vector reranking score.
Using a rerank model will significantly increase the time to receive a response.
Use knowledge graph
In a knowledge graph, an entity description, a relationship description, or a community report each exists as an independent chunk. This switch indicates whether to add these chunks to the retrieval.
The switch is disabled by default. When enabled, RAGFlow performs the following during a retrieval test:
- Extract entities and entity types from your query using the LLM.
- Retrieve top N entities from the graph based on their PageRank values, using the extracted entity types.
- Find similar entities and their N-hop relationships from the graph using the embeddings of the extracted query entities.
- Retrieve similar relationships from the graph using the query embedding.
- Rank these retrieved entities and relationships by multiplying each one's PageRank value with its similarity score to the query, returning the top n as the final retrieval.
- Retrieve the report for the community involving the most entities in the final retrieval.
The retrieved entity descriptions, relationship descriptions, and the top 1 community report are sent to the LLM for content generation.
Using a knowledge graph in a retrieval test will significantly increase the time to receive a response.
Cross-language search
To perform a cross-language search, select one or more target languages from the dropdown menu. The system’s default chat model will then translate your query entered in the Test text field into the selected target language(s). This translation ensures accurate semantic matching across languages, allowing you to retrieve relevant results regardless of language differences.
- When selecting target languages, please ensure that these languages are present in the knowledge base to guarantee an effective search.
- If no target language is selected, the system will search only in the language of your query, which may cause relevant information in other languages to be missed.
Test text
This field is where you put in your testing query.