Construction of LLM-RAG Pipeline Based on Spatial Narrative Characteristics of Yanxinglu

基于《燕行录》空间叙事特征的LLM-RAG 平台搭建

In the integrated ancient East Asian sphere, literature is an explicit expression of unity. However, due to differences in perspectives, there are huge contrasts and disharmony in contemporary East Asia surrounding historical issues. Using artificial intelligence, specifically the retrieval-enhanced generative model, to build an intelligent research platform for Yanxinglu, and completing research auxiliary work that includes named entity recognition, relationship extraction, and knowledge graph construction, the study of East Asian history can be enhanced. This paper focuses on the construction of the LLM-RAG model and rules in the construction of the Yanxinglu knowledge base, and discusses the time process and precautions for the refined processing of the Yanxinglu text data.

在這裡輸入要轉換的內容在一體化的古代東亞, 文獻是一體性的顯性表達。 然而, 由於史觀與視角的差異, 當代東亞區域圍繞曆史問題存在巨大反差與不和諧。運用 檢索增強生成式人工智能技術構建《燕行錄》智能化研究平颱, 通過人工智能問答的方 式完成命名實體識別, 實體關繫抽取, 知識圖譜構建等研究輔助工作, 是人工智能賦能東 亞交流史研究的有益嚐試。本文重點解決LLM-RAG模型的搭建, 構建《燕行錄》知識庫, 在此基礎上探討《燕行錄》文本數據精細化加工的時間過程與注意事項。

1. The Historical and Cultural Value of Yanxinglu
2. The Linguistic Features of Time and Space Narration in Yanxinglu
    2.1. The Issue of Variant Character Search
    2.2. The Issue of Identifying Place Names with the Same Name
3. Model Fine-tuning Based on Text Features and Research Needs
4. Retrieval-enhanced Generation Model and Knowledge BaseConstruction
5. Conclusion
  • Jianhong CHEN(Digital Humanities Laboratory, Shandong University, Weihai, China) | 陳建紅 (山東大學)
  • Hua SHI(School of Political Science and Public Administration, Shandong University, Qingdao, China) | 史話 (山東大學) Corresponding author