“Where AI meets historical documents”: Automatically transcribing historical prints with OCR and HTR

By Janna Katharina Müller

In my last blog post, I wrote about the source corpus for my master’s thesis – the journal “Monatliche Correspondenz zur Beförderung der Erd- und Himmelskunde” (MC) – and my plan to subject it to digital analysis. The main thing I needed for my analysis was a digital text. Thanks to the Thuringian University and State Library in Jena, the scanned originals of the MC are available online, but only as non-machine-readable PDF files. The first step towards usable data was thus to generate a text from image files.

However, it was essential to consider the type of writing used in the MC: As can be seen from the example page below, the MC was printed in a font that uses, among other things, the long s (“ſ”), an archaic form of the lower-case letter s. Unlike most German publications of the early 19th century, however, this is not a fractional font such as Fraktur, but rather an Antiqua font with serifs, which contains rounded arcs and was used primarily for Latin, Italian, and French texts, but was rather uncommon in German prints.

Figure 1: Example page from the MC (Monatliche Correspondenz zur Beförderung der Erd- und Himmelskunde, Juni-Heft (1801): 556)

OCR with Tesseract

One of the best-known ways to recognize text is Optical Character Recognition (OCR), the electronic or mechanical conversion of images into machine-coded text based on the recognition of individual characters.

[...]

Quelle: https://href.hypotheses.org/2105

Weiterlesen

The very first monthly astronomical journal in Germany: The Celestial Police and their structures of communication

By Janna Katharina Müller

Editorial note: Janna Katharina Müller studies the history and theory of science and technology [“Theorie und Geschichte der Wissenschaft und Technik”] at the Technische Universität Berlin. She’s currently working on her Master’s thesis focusing on the emergence and formation of a concept of the newly discovered asteroids between Mars and Jupiter in the first years after their discovery, 1801–1813. The title of her thesis is: “Von Planeto-Cometen und planetarischen Fragmenten. Die Himmels-Polizey und Asteroidenforschung im frühen 19. Jahrhundert.“ In the spring/summer 2021, she completed a remote internship at the GHI Washington, DC.

A while ago, I was preparing for an oral exam in one of my classes about the history of science during the Enlightenment and the early 19th century. We had to focus on one specific discipline or time period and give a short presentation about it.

[...]

Quelle: https://href.hypotheses.org/1999

Weiterlesen