Applicability of OCR full text data of digitized materials held by the National Diet Library.: From the search results for Okinawa-related keywords in the NDL Lab's ecprimental services.
In FY2021, the National Diet Library, Japan (NDL) outsourced two OCR-related projects. One is the OCR text conversion of approximately 2.47 million digitized materials (223 million images) provided by the NDL, and the other is the research and development of an OCR processing program (NDLOCR) that can be released as an open source. Of these deliverables, as of November 2022, 280,000 books whose copyright protection period has expired have been submitted to and provided through the NDL Lab's two experimental services, "Next Digital Library" and "NDL Ngram Viewer".
In addition, in 2022, the NDL Rarebooks OCR was developed in-house for the purpose of converting classical materials to OCR text, utilizing the development knowledge of the NDLOCR. 60,000 NDL classical materials are searchable in the Next Digital Library as of November 2022.
In this presentation, we will examine the possibility of using the text data for local historical research based on the search results of keywords related to Okinawa using two experimental services.