news
最新情報
2025.04.04
[Research News] Launch of “Humanitext OCR”: An OCR System Utilizing Multimodal Large Language Models

The “Humanitext” research project—led by Associate Professor Naoya Iwata of our Center, in collaboration with Associate Professor Ikko Tanaka (J. F. Oberlin University) and Assistant Professor Jun Ogawa (The University of Tokyo)—has launched “Humanitext OCR,” a user-friendly system leveraging the image and text recognition capabilities of state-of-the-art large language models (LLMs). This system is specifically designed to enable accurate and flexible optical character recognition (OCR) of complex documents, without requiring advanced programming knowledge.
Traditional OCR software has often struggled to reliably extract desired information from documents with complicated layouts or extensive footnotes and annotations. Humanitext OCR addresses these challenges by utilizing the advanced image comprehension abilities of LLMs. Users simply input natural-language instructions (“prompts”), such as requests to remove unnecessary sections or to merge multiple pages of text, allowing diverse document-processing tasks to be performed effortlessly. Furthermore, the system incorporates an automated “calibration” step, utilizing the same LLM to detect and correct typographical errors and unwanted elements from OCR results.
Humanitext OCR has been developed with particular emphasis on the flexibility required for scholarly research applications. It is expected to deliver high accuracy when dealing with multilingual texts, scholarly articles, and classical literature containing specialized vocabulary. Future improvements to the system include agent-based enhancements and advanced prompting techniques aimed at further increasing OCR accuracy. Additionally, we plan to strengthen integration with research databases, thereby promoting its wider use as part of the research infrastructure for digital humanities.
▼ Access the Humanitext OCR system here:
For inquiries regarding this initiative, please contact:
iwata.naoya.y7[at]f.mail.nagoya-u.ac.jp
(Replace “[at]” with “@”)