The Improvements of the Search-ability for Shōsho Kunten Database (2023)

Tajima Kōji
National Institute of Technology, Gifu College. Associate Professor

尚書古活字版を対象とした訓点データベースにおける検索性の改良

本稿では、国立国語研究所蔵　尚書（古活字版第三種本）を対象とした、訓点データベースの改良について述べる。訓点とは古典中国語資料を日本語で解釈して読み下すために付けられた、仮名や記号による注釈である。訓点は、語の意味や読み方、助詞・助動詞などによる語の活用形などを示し、文の構造や意味を明確にする役割がある。この訓点情報をデータベース化し、検索できるようにすることで、語彙の歴史的変遷や、時代による訓点の付け方の違いを明らかにできる。データベースの初版は2019年に公開し、その後も改修を続けている。当初は訓点の形や色を指定して検索をする、訓点を専門とする研究者でなければ扱いにくい検索形式であったが、今回、検索性と表示面での改良を行いシステムの利便性を改良した。検索においては、例えば「明ニシ」（あきらかにし）など、漢字や仮名の情報を使った検索を実現し、助詞や送り仮名と漢字の共起関係を直感的に検索可能にした。また、主要語と訓点の索引も作成した。さらに、検索結果は、ページ番号や行番号だけでなくIIIFの機能を利用し、訓点が付与された文字画像を切り出して確認できるようにした。文字の切り出しに関しては、翻刻文を活用した画像から文字位置の自動検出を行い、機械的な抽出を実現した。本データベースの作成により、訓点資料に付けられた訓点について細かな検索、検証ができる仕組みが整ったため、今後は他の訓点資料への適用を検討している。

The Improvements of the Search-ability for Shōsho Kunten Database

This paper describes improvements to the Kunten database for Shōsho (Early movable type printing, version 3). Kunten are the annotations such as Kana or marks for reading old Chinese textbooks in Japanese. The textbooks that have Kunten are called Kunten material. There are some small dots or marks written around the Kanji characters in Kunten material. These annotation marks show the verb conjugation (grammar rules), meanings, and readings. These annotation helps to understand the textbooks. The Kunten database supports to the analysis of the changes in the language or the historical differences in how to use the Kunten. The first version of the database was released in 2019. That was designed for the Kunten researchers, for that reason, the search and display methods need specialized knowledge and skills. The improved version has a new search method that uses Kanji + Kana, for example, “明ニシ” (Akira Ka + Ni + Shi). This method can search the relationship between Kanji and Kunten in natural written language. That shows the co-occurrence relation of the nouns and particle, and the inflectional form of verbs. In addition, we created the indexes of the major words. For the display methods, the past version only shows the link for the image of one page of the book, but the improved version shows the cutout characters images on the search results. The search results are the table that shows the location in the textbook, meanings, and list of the Kunten. In this way, we can compare the character images and the Kunten directory. To cut out the character image, we create an automatic detection method for the character position from the image by using the reprinted text. In the future, we will apply this database method to the different versions of books or other old Chinese textbooks.

尚書古活字版を対象とした訓点データベースにおける検索性の改良

The Improvements of the Search-ability for Shōsho Kunten Database

News

Conferences