Every research project on early Europe may benefit from exploring a well-informed research question computationally in an appropriate and representative collection of digital books. To make representative claims about findings from a computational analysis, the collection of books used in the analysis needs to be of a representative scope as well. Whether a researcher puts together a collection of books on their own, or whether they use a pre-existing collection, they have to be able to describe the scope of the collection used for analysis in relation to the total output of printed books from the relevant time period, region, language, or subject.
The following concepts in bibliography are important for this work:
The concepts are defined below.
From the Latin word Books, pamphlets, calendars, and indulgences printed from movable type in Europe prior to 1501, during the earliest years (infancy) of printing. The earliest example is the believed to have been printed before 1456 in Mainz, Germany, by Johann Gutenberg, who is credited with the invention of modern printing. For other examples, see the (1458) of Johann Fust and Peter Schöffer (Columbia University Libraries). See also the online exhibition (Glasgow University Library) and by Francisco Colonna printed by Aldus Manutius of Venice in 1499 (Royal Library of Denmark). Like medieval manuscripts, incunabula may contain hand-decorated initial letters and borders (see this copy of the first edition printed in France, courtesy of the Bibliothèque Nationale de France). For more information about incunabula, see (Univ. of Wisconsin-Milwaukie Libraries) and (National Diet Library, Japan). The developed the British Library is now available as a searchable online database. See also UC Berkeley's . Singular: . Synonymous with and . : xylograph., meaning "cradle."
Incunabula, printing from movable type prior to 1501, often do not follow conventions about title pages, author and publication date information, etc. For this reason they are cataloged in separate bibliographies, where they are described based on watermarks, quality of paper, likely printer, likely city, etc. and incunabula, printed before 1501 are often not included in retrospective bibliographies.
The very succinct definition from From: Niladri S, Dash, and S Arulmozi, "Definition of Corpus," in: History, Features, and Typology of Language Corpora. Singapore: Springer Singapore, 2018, 1-15 demonstrates that a text corpus is not simply the plain text behind a scan. Building a corpus implies a series of decisions that can change research outcomes:
Compatible to Computer
Operational in research and application
Representative of the source language
Processed by both man and machine
Unlimited in amount of language data
Systematic in the formation and text representation
The definition from the The Oxford Dictionary of English Grammar (2 ed.), 2014 also indicates the methodological approaches to working with a corpus
corpus (Plural corpuses, corpora.)
A collection of authentic spoken and/or written texts.
The study of the English language has been transformed in recent decades by the collection of large quantities of authentic texts in corpora on which grammatical, pragmatic, lexicographic, historical, etc. analyses can be based.
•• corpus-based: Research that is corpus-based is deductive in outlook in that it uses (annotated) corpora to test hypotheses about language.
•• corpus-driven: Research that is corpus-driven is inductive in outlook and takes unannotated corpus data as the starting point for investigation.
•• corpus linguistics: a methodological approach to the study of language by means of corpora, now usually in computerized form.