Thought Clusters in Early Greek Oral Poetry

by Dr. Cora Angier Sowa and Dr. John F. Sowa

Minerva Systems home page
Chapter 1 of The Loom of Minerva: An Introduction to Computer Projects for the Literary Scholar, "A Guide to the Labyrinth"
"The Eureka Machine for Composing Hexameter Latin Verses" (1845)
"Verbal Patterns in Hesiod's Theogony"
Selected Excerpts from Chapters of Traditional Themes and the Homeric Hymns
"Thought Clusters in Early Greek Oral Poetry"
"Holy Places", a study of myths of landmarks
"Epilogue to 'Holy Places': the World Trade Center as a Mythic Place"
Writings on Building and Architecture
"Ancient Myths in Modern Movies"
Archived "Quotations of the Month"
Write e-mail to Cora Angier Sowa

Demeter, Persephone, and

"Quickly, Demeter let the corn grow up from the fertile fields,
and the broad earth was weighed down with leaves and flowers.
But she, going to the law-giving kings,
showed to them--to Triptolemus and to Diocles, driver of horses,
to strong Eumolpus and to Keleus, leader of his people--
the rituals of her worship, and instituted secret rites for all of them."
(-Homeric Hymn to Demeter, tr. C.A.Sowa)

Computer-aided work on themes in the Homeric Hymn to Demeter uncovered a "supertheme" related to the founding of the Eleusinian Mysteries. (Illustration: relief depicting Demeter, Persephone, and Triptolemus, found at Eleusis, illustrated in Seyffert's A Dictionary of Classical Antiquities, revised by Nettleship and Sandys, 1899.)


A program based on the CLUMP program described in this essay is being added to the MINERVA suite of programs included with the Loom of Minerva self-study CD described on the home page of this Web site. It will thus be made available to scholars and students who want to try it out on their own data.

Summary of the essay:

Ancient Greek epic poetry like Homer's Iliad and Odyssey, Hesiod's Theogony and Works and Days, and the so-called Homeric Hymns, were composed not in writing but "orally," by a bard who composed before an audience, more in the manner of a jazz musician, using traditional vocabulary and plot elements as an aid to composition. As described by Milman Parry (who studied oral poets in modern Yugoslavia) and James Notopoulos (who recorded surviving Greek epic singers), the singer recomposed the song at each performance; the song could be longer or shorter, ornamented or plain, according to the skill of the singer and the interest of the audience. The vocabulary consisted largely of "formulas," like "fleet-footed Achilles" or "then, when rosy-fingered Dawn appeared." Plots were built from traditional themes, like The Journey (or Wandering), The Marriage of the Fertility Goddess, The Young God Consolidates His Power, Invention by the Trickster, and the Epiphany of a God. These themes consist of modular plot elements (for example, when the Hero goes on a Journey, he meets the Goddess Across the Water, he meets the Two Helpers, he discovers the Secret of Life but cannot bring it back with him, and a Substitute dies). The traditional themes occur across cultures, and are even found in our own movies (see my companion piece "Ancient Myths in Modern Movies".) Yet in the hands of a skillful singer, the story always seems fresh and original, and one is often unaware of the reuse of traditional forms. Part of the reason is the use of vocabulary (verbal or visual) pertaining only to that poem (or movie) and to no other.

This article describes a project to use a computer to study the associations of words within the Homeric Hymns, using mathematical formulas derived from cluster analysis, a technique used in information retrieval and propaganda analysis; the aim was to examine the process of composition in oral traditional poetry, exemplified by the Hymns. Results were especially striking for the Hymn to Demeter, which tells of the kidnap of Persephone, daughter of Demeter, by the King of the Underworld. In a thematically complex poem, whose major themes include the Journey, the Marriage of the Fertility Goddess, the Resurrection of the Dying God, and the Epiphany of a God to a Mortal, we found, with the help of our Clump-Finder program, a "supertheme" of words referring to the sacred rites of the Mysteries of Eleusis, whose founding by Demeter forms the climax of the poem.

This paper was originally published in Computers and the Humanities, Vol. 8, Pergamon Press, 1974, pp. 131-146. Material from this article was later used in Traditional Themes and the Homeric Hymns by Cora Angier Sowa, (Chicago: Bolchazy-Carducci, 1984; the book can be ordered from BolchazyCarducci Publishers, Wauconda, IL.) Other parts of this research are being used in The Loom of Minerva: An Introduction to Computer Projects for the Literary Scholar by Cora Angier Sowa.

The article is in three sections: the first (comprising "Themes and the Homeric Hymns," "Content Analysis and Cluster Analysis," and "The Machine-readable Dictionary of Greek") and the last ("The Results") were written by Cora Sowa; the middle section ("The Clump Program"), describing the mathematical formula used to establish connections between clumps, is by John Sowa, who also developed the program for applying the mathematical formula. Greek words and text quotations appeared in the Greek alphabet in the original publication; they are represented by Roman letters in the Web version. The pictorialization of the clumps in Figure 1 was in the original article; illustrations in Examples 1 and 2, taken from Chapter 6 of The Loom of Minerva, have been added to the reissue to show actual data and results. Some typographical and other errors have also been corrected.

Page 131

Thought Clusters in Early Greek Oral Poetry


This paper is a small part of a large project. 1 The major undertaking is the ongoing research by one of the authors, C. A. Sowa, into the nature of the traditional themes of oral poetry as exemplified in the ancient Greek poems known as the Homeric Hymns. [This research was published as Traditional Themes and the Homeric Hymns, Bolchazy-Carducci, 1984.] This research has had many facets: identification of the themes themselves and their constituent elements; the social and psychological interpretation of the themes as myth; the study of parallels in other ancient Greek oral poetry, especially that of Homer and Hesiod, and in other literatures of the ancient Eastern Mediterranean; and finally an examination of how the themes were manifested in the Hymns, a particular group of poems from a particular era of Greek literature. In some of this work, a computer was used. Any given manifestation of a given theme is always realized in the vocabulary of its place and time and of the particular poet, a relationship that is particularly amenable to computer-aided study. The present paper is a report of the way a series of programs developed since 1969 has been used. Since the method and the computer program have many potential applications in literature beyond the specific use to which we have put them, they are described in detail. The results given in this paper are only those from the computerized part of the study, but they tended at all times to corroborate the insights gained by other methods.

The programs represent an application to literary material of computer methods first used for information retrieval and propaganda analysis. The purpose of information retrieval is to find documents on a particular topic by matching keywords assumed to belong to that topic against keywords in various documents. The methods of propaganda analysis were designed to study the changing bias exhibited by the writers of the propaganda by finding what words are commonly conjoined with what others. Our interest was to see how the traditional themes used by ancient Greek oral poets changed in appearance according to the way they were used by each poet. The method in all of these procedures is to determine a particular author's emphasis by a quantitative study of his vocabulary. It differs, however, from a mere frequency count in that one is not only looking at the frequency of individual words, but at the groups or clumps of words that frequently occur together. In the case of an oral poet, the clumps of words have an additional psychological interest. Since he composes the poetry "live," before an audience, the clumps provide insight into the processes of word association going on in his mind at the moment of composition.

Homer's Iliad and Odyssey represent one branch of the oral epic tradition of ancient Greece. The poems of Hesiod belong to another. The Hymns seem to represent a third stream of this tradition; they share the same formulaic diction and the same mythological themes as the works of the two great poets, and it is most likely that they, too, were composed orally.2 Although this corpus of poems in dactylic hexameter has been associated with the name of Homer, it is doubtful that any of them were actually composed by Homer himself. In their present form, they probably date from some time between 700 and 500 B.C. There are 33 poems in all, varying in length from 3 to 580 lines (or "verses," in Classical terminology), each dedicated to one god or to a group of gods, such as the Muses. The longest and most important are the second, to Demeter; the third, to Apollo; the fourth, to Hermes, and the fifth, to Aphrodite.

Page 132

Attempts have been made to prove that Hymn III to Apollo is really two poems, a "Delian" part and a "Pythian" part, combined by some later poet. The shortest Hymns are in the nature of brief ejaculations or addresses to the gods, but the longer ones tell of various incidents in the lives of the gods.

The Hymns recommend themselves to study for several reasons. They have been rather neglected in comparison to the poems of Homer and Hesiod, and they thus contain much unexplored territory. A small corpus in themselves, sharing common features of style and diction with each other, yet they can be separated into individual poems of various lengths, all of which are short but which are whole poems in themselves. This advantage shows up in the work done with computers, where it is advisable to test the programs on small amounts of material at first and work up to larger units. One avoids some of the problems inherent in trying out a theory on, say, Book One of the Iliad, which is of convenient length, but is not a whole poem in itself. Yet since the Hymns show the same basic features as the larger works, a program that gives significant results for the smaller poems may eventually be applied to the longer ones.

The chief structural components of oral poetry are the formula, the type-scene, and the theme. Of these; the formula is the smallest unit, a ready-made combination of words of fixed metrical shape and fixed meaning (or, less rigidly defined, of fixed syntax or sound pattem). Such is the formula dios Achilleus "godlike Achilles," or kat' ommata kala balousa "casting down her beautiful eyes." The type-scenes are short, standardized descriptions of various actions which are themselves ritualized and formulaic, such as Sacrifice and Banquet, Chariot Journey, or Assembly.3 The largest element is the theme, which supplies the entire plot for a story or for a portion of it: examples are the Succession of the Gods, the Withdrawal and Return of the Hero, and the Marriage of the Fertility Goddess. These themes are not identifiable by exact verbal identity but by a recurring sequence of elements that is always found in them. They differ from both the formulae and the type-scenes in that they are not culture-bound, being found in the literature of many places and times, although the exact form they take will be determined by the particular culture. The distinctions between the various types of element are not, however, as fixed as they might seem; for example, the type-scene of Maidens Dancing and Picking Flowers may become the theme of Maiden Abducted While Dancing and Picking Flowers when the entrance of a threatening male figure moves the action forward.

It is in the themes that our chief interest lies. The identical theme, containing the same set of basic elements, can appear completely different from one poem to another, and can seem so completely appropriate to its context that it is hard to imagine it anywhere else. The Iliad and the Hymn to Demeter are both examples of the Hero's Withdrawal and Return; the Hymn to Demeter and the Hymn to Aphrodite are both examples of the Marriage of the Fertility Goddess; Hesiod's Theogony, the Hymn to Apollo, and the Hymn to Hermes all exemplify the Young Hero's Consolidation of His Power; the Odyssey, the Hymn to Demeter, and the Hymn to Apollo incorporate the theme of Wandering; and all the major epics contain examples of the theme of Epiphany. Yet they seem so different that the reader or listener does not readily apprehend that they are the same. This phenomenon is, of course, partly due to the fact that the dramatis personae are different, and that these themes are combined with or modulated into different themes in the course of their development.4 But another important factor that makes the same theme so integral a part of many different poems seems to be the presence, within a particular example of the theme, of certain keywords which belong to the individual poem or poet rather than to the theme. It is this factor which we proposed to study.


The methods we used fall under the general head "content analysis," a procedure not entirely new to Classics. Thomas F. Carney, a pioneer in this area, studied the keywords used by Cicero, Valerius Maximus, Plutarch, and John the Lydian in their portrayal of the statesman Marius, and showed how writers of differing periods and backgrounds emphasized different aspects of the character and career of the same historical figure.5 Our use of content analysis is marked by two features. First, we used a computer to aid us in gathering our information; second, we used a subtype of content analysis called "clump analysis" or "cluster analysis," a method which we think avoids some of the pitfalls of other computer programs designed to do content analysis, such as the General Inquirer.

In the usual content analysis program, the researcher himself makes semantic categories and assigns words to them. He might, for example, have

Page 133

one category for "home," one for "power," one for "affective states," and so on. He then pre-edits the text by flagging each significant word with the mark of the relevant category. This system requires the researcher to spend much time editing the text; what is worse, it introduces unnecessary arbitrary elements, since it builds the researcher's preconceptions of what is important into the presentation of the data. Our method was to have the machine itself make the categories. Our program does not even create semantic categories in the sense of grouping together synonyms or near-synonyms. Rather, it puts in groups words that occur together in a contextual or conceptual relationship. In fact, it often makes distinctions between words which seem to be synonymous, but which are actually used in different ways.

Greek is a highly inflected language: the forms of nouns, adjectives, and verbs change according to their functions in the sentence. Not only are there endings to indicate case, tense, mood, voice, person, and number; Greek also possesses prefixes and infixes and forms that change through ablaut. In poetry, we also find vowels lengthened and consonants doubled for the sake of the meter. If a computer is to be used to gather instances of a word, it must be able to connect forms of the same word; for example, it must be able to tell that lambanô "I take" belongs to the same verb as ellabon "I took." Word formation might be called "transparent": vocabulary developed by the combination of simple stems into compounds whose derivations were obvious to a native speaker. English, by contrast, is an "opaque" language where borrowing from foreign languages is more common than compounding from native roots; connections like "shirt" and "skirt," "chamber" and "camera," or "triple" and "three-ply" are apparent only to linguists. The Greeks themselves set great store by etymologies (the premier example is Plato's Cratylus) and felt that if one knew the meaning of a thing's name, one could understand the meaning of the thing itself. This was also true of the names of persons. Where the historical derivation of a name had been lost, especially in the case of non-Indo-European names, the Greeks tended to supply a derivation from some Greek word that sounded like it.

In view of these matters, it seemed better to have our program determine not what words occurred most frequently together but what stems occurred together. Thus, for instance, the machine would group together not only forms of the verb gignomai "to be born," and its compounds, such as ekgignomai and progignomai but also genos "race," goneus "begetter," neognos "newborn," since even a visual inspection reveals that patterns of association are based on these stems. It should also mark connections between words that the poet thought were connected, such as the derivation of the place-name Puthô in the Hymn to Apollo from the rotting (puthô, puse) of the dragon which Apollo slew on that spot. Our goal was to have the machine find groups of stems that were habitually used together by the poet. But between us and that goal lay several major steps. After producing the machine-readable text itself, we had to make a concordance, prepare a machine-readable dictionary of Homeric Greek, count frequencies of stems, find the clumps themselves, and, finally, analyze the results. The concordance, dictionary, and frequency list were themselves useful byproducts of the project, which can be used in later projects.6


For the machine to sort on stems rather than on words, it must have some way of determining the stem for every word-form in the text. One way to make the machine recognize stems would be to use a morphological analysis program that cuts off endings, prefixes, and infixes. No such program has, however, as of the time of our research, been perfected to the point where it could be used to provide analysis in conjunction with our program. Furthermore, such programs as have been devised lack sufficient power to connect the many linguistically related words that play a part in the poet's verbal associations. The alternative, then, was to build a dictionary giving the stem for each word-form occurring in the text, in which the machine would look up every word as it came to it, to ascertain its stem form. As we constructed it, the dictionary contains not only the stem or stems for each word, but the "canonical" form, which is the form under which one would look up the word in an ordinary dictionary (sometimes called the "lemma"). It also includes room for the addition of grammatical or other information if we should ever want to fill it in. This additional information was not needed for the clump program, but it extends the range of potential uses for the dictionary independent of the clump program. In this form it is available for future research by ourselves or by other scholars.

Page 134

The first step in the preparation of the dictionary was to produce a machine-readable list of all the words in the text, such that other information could be added to it in machine-readable form. For this we used a normal concordance program, but with the added instruction that the machine was to punch one IBM card for each separate word form. [Note: for this project, both text and dictionary were on punched cards. Only capital letters were used, with no accents or breathing marks. Equivalent Roman letters were used to represent the Greek, where they are identical; for special Greek characters, we used H for eta, J for iota subscript, Q for theta, C for chi, Y for psi, and W for omega. Diaeresis was represented by *. The dictionary has since been transferred to disk.] Up to sixteen columns on the right side of each card were allowed for each form (the rare form that exceeded sixteen characters was truncated). [Note: on the IBM punched card, each letter or other character occupied one column.] Other information was then manually keypunched onto the card as follows: The first sixteen columns on the left were allocated to the canonical form or lemma. Assignment of the canonical form was independent of whether word-form and canonical form were from the same linguistic stem. Oisô "I shall carry" is, for example, referred to pherô "I carry." Three fields of eight columns each were allowed for up to three stems per word form. The remaining twenty-four columns were left blank and can at some future time be filled in with grammatical or other information.7 The most important work of building the dictionary was the filling in of the stems. In making the primary decision as to what a stem was to be, we considered two essential questions: 1) What are the common elements that cause associations in the poet's mind? 2) Which of these common elements shall we choose to sort on, i.e., which are most important for our purposes? Then in creating stem forms for the dictionary, we tried to meet three criteria: linguistic accuracy, consistency, and comprehensibility.8 The "stems" given in our dictionary are not, however, either stems in the strict sense of basic word-parts to which endings can be added nor Indo-European roots. Such consistency, while possibly to be desired, did not turn out to be feasible. Where possible, we used the Indo-European root; where the root is impossible to ascertain or does not occur in its unchanged form in Greek, we used the most easily recognizable form of the Greek stem or simply the most basic Greek form from among a group of related words.

Since we were interested only in content words, we did not assign stems to the function words, although they were included in the dictionary and given canonical forms. Thus there were no stems for the verb "to be" nor for particles, pronouns, or conjunctions. Prepositions were generally not given stems, but some, having originally been adverbs, occasionally keep their adverbial force, especially in composition, and these were sometimes assigned stems. For example, peri as a preposition meaning "around" was not given a stem, but it was given a stem when used as an intensifier meaning "very." The inclusion or exclusion of other adverbial prefixes was judged on an individual basis. A-privative, for instance, meaning "not," was not counted as a separate stem, but aga- or êga-, meaning "much" or "very," was listed separately as a stem.

Since we could already see that the poet uses the separate stems of compounds in his pattern of association, multiple stems were given to true compounds, as opposed to simple words with adverbial prefixes. In Hymn XIX to Pan, for example, the poet calls the nymphs ligumolpoi ("piercingly singing") in verse 19, then says in verse 21 that they "sing" (melpontai) and in verse 24 that Pan delights in their piercing melodies (ligurêisin... molpais). Provision for three stems was sufficient, since there were never more than three stems in one compound.

For many categories of word we evolved a fairly straightforward set of rules for the assignment of stems:

If both a verb and a noun come from a stem, we preferred the vocalization of the verb, which is usually the e-grade. Thus we used MELP as in melpomai "I sing" instead of MOLP as in molpê "song." If a root or stem proper was not available, the first principal part of the verb, rather than the noun, was used. If, however, the verb is a secondary formation from a noun, the canonical form of the noun was used as the "stem."

Disyllabic Indo-European roots were given a form indicating their disyllabic nature, but only if they actually occur in that form in Greek. Thus we used TLA for "endure," but GEN for "birth," since GNA does not occur in Greek.

The entire first principal part was used if a short stem would be ambiguous or confusing. Thus we used EDW and PATEOMAI, both meaning "to eat," A&ISSW "to leap up," and ORAW "to see." We also used the canonical form in the case of certain verbs, such as BAINW, where part of the root has combined with the suffix, and it is impossible to give the root as it appears in Greek without giving the suffix too.

When a noun was used as the stem, consonant declension nouns were generally given true stem form, as GUNAIK for "woman," except where there was ambiguity. For all others, the nominative singular form was used. Thematic adjectives in o/a were generally given the o-grade stem

Page 135

form, as AGAQO "good," and u-declension adjectives were given the stem in -u, as TRHXU "rugged." For adjectives of other declensions, we used the nominative singular masculine or animate.

Despite the simplicity of the rules for most stems, problems arose in the assignment of stems to certain others, the solution of which uncovered a variety of linguistic curiosities that can only be hinted at here. For example, in words compounded of an adverbial prefix and certain common verb-forming stems such as the stems of echô "to have or hold" and of ballô "to throw," the simple stem seems to act merely as a function word or filler, to indicate that some sort of action is going on or that a state of being exists, and the content of the word depends on the whole compound. It is doubtful how much semantic connection such compounds retain with each other or with the related verb. Sometimes we gave the compound the same stem as the simple verb, sometimes not. Thus the adjective exochos "surpassing" has the stem EX, as does the verb echô, but sumbolon "tally," which is compounded of ballô and sun- "with," was given BAL as a stem but also its own "stem" SUMBOL. Even some words formed of identical root and prefix seem to have grown apart so much as to no longer be connected, and were therefore given separate stems. Thus epistamai "to know" was given a separate stem EPISTA, different from STA "to stand," since already in Homer it is differentiated in meaning from the later compound ephistamai "to stand upon, stand near." Other words developed historically from different roots, but they grew together and came to resemble each other more and more. Thus tinô "to pay a price," cognate with poinê "retribution," was confused by the Greeks with the unrelated word tiô "to honor," from which come the noun timê "honor" and its derivatives. Accordingly, we assigned forms of tiô to the historically unrelated TINW and gave POINH a stem of its own.

Cases in which words seem to be related linguistically, but on some principle other than a simple shared root, had to be decided individually. The historical connections between dnophos (gnophos) "darkness," knephas "darkness, twilight," zophos "nether darkness," and Zephuros "the West Wind," for example, are murky, but rhyme seems to have played some part in the formation of these words and brought them together in a group, whatever the relationship between their roots. In the otherwise unrelated group of words nephelê "cloud," thuella "hurricane," and hêlios "sun," the /l/ phoneme seems to connote the idea of natural phenomena. Popular etymologies were included in assigning the stems, as when the name of Delphi was given DELF as a stem, connecting it with delphis "dolphin"; and Pytho, the other name of Delphi, was given the stem PUQW, the same stem as that given to the verb meaning "to rot." Where more than one derivation was believed by the Greeks, or where the poet seems to be making use of more than one derivation, both were given.

A final problem concerned homographs, forms that are identical in spelling but which come from different and sometimes linguistically unrelated words. We used a text that had neither accents nor breathing marks, nor did it indicate capital letters. Some ambiguous forms, such as posín, dative plural of pous "foot," and pósin, accusative singular of posis "husband," could have been distinguished if we had included such additional information in our text. Posis and posin, the nominative and accusative singular of "husband," could not, however, have been distinguished from posis and posin, the nominative and accusative singular of the noun meaning "drink." If we wanted to make a distinction, we would either have to give the machine enough information to allow it to judge from the context, as we do, or else edit the text by the use of subscripts on ambiguous words, as POSISl and POSIS2. Actually, these confusions, already very few, were made fewer by the small chance that within any one poem both ambiguous forms will occur. For that matter, we sometimes want to retain these ambiguities, as the poet may, consciously or unconsciously, have played on them.

The completed dictionary for the Hymns included 6053 different word forms and 1333 different stems. The dictionary, since enlarged to include the major works of Hesiod as well, now contains 10,502 separate forms and 1708 stems, with 40 potentially ambiguous forms. It will be obvious to the reader that a great many human judgments have had to be made in assigning stems to the words in this dictionary, decisions that have an important effect on the final results of the program. Results could be quite different if only those words containing the same permutation of a stem were considered related from what they would be if words exhibiting radical transformations of the stem were classified in the same grouping. Continued experiments with widening and narrowing the focus in this way will show which way of assigning stems will produce the most interesting

Page 136

and useful clumps. The importance of stem assignment in influencing the final results is matched by that of only two other factors: the choice of the contextual unit--how close two words have to be in the text in order to be considered "in the same context"--and the mathematical formula for finding the clumps. [Example 1 is an excerpt from the machine-readable dictionary.]

Canonical     Stem   Stem   Stem   Au-    Fre-   Form as
(dictionary)   # 1    # 2    # 3   thor*  quen-  it occurs
form                                      cy     in text

EURONOMH      EURU   NEM             H     2    EURUNOMH
EURUODHS      EURU   ODOS           UH     8    EURUODEIHS
EURUOPA       EURU   OP     EIPON   UH    14    EURUOPA
EURUS         EURU                  UH     8    EURUS
EURUSQEUS     EURU                   H     1    EURUSQHA
EURUSQEUS     EURU                  U      1    EURUSQHOS
EURUTIWN      EURU                   H     1    EURUTIWNA
EURWEIS       EURWS                  H     3    EURWENTA
EURWEIS       EURWS                 UH     2    EURWENTI
EURISKW       EUR                   U      1    EURWN
EURWPH        EURU   OP              H     1    EURWPH
EURWPH        EURU   OP             U      2    EURWPHN
EUSKOPOS      EU     SKEP           U      1    EUSKOPOS
EUSTRWTOS     EU     STOR           U      1    EUSTRWTON
EUSTRWTOS     EU     STOR           U      1    EUSTRWTWN

             * U = Hymns, H = Hesiod, UH = Hymns and Hesiod

Example 1:  An excerpt from the machine-readable dictionary
of roots in Hesiod and the Homeric Hymns used by the "Clump
Finder" program.


A literary critic may have an intuitive feeling for associations between various words and ideas in a poem; a computer, having no intuition, blindly follows whatever instructions we give it. A computer program to find associations requires a precise definition of an association and a detailed algorithm for searching for associations that meet the definition. Whenever we translate an intuitive notion like association into a mathematical formula, some of its old meaning is inevitably lost. If the mathematical formula incorporates real insight into the structure of the problem, however, it may turn out to be more useful than the original intuitive notion. The clump program is an experiment with several possible formulas in an attempt to find one that illuminates the network of associations in a poem. To evaluate the validity of the technique, we compared the results with the insights obtained through close reading of the text.

The technique we used is based on an approach developed by R. M. Needham for information retrieval, where the subject of a document is described by index terms consisting of keywords selected from the document. Needham introduced the idea of a clump of keywords: two words that occurred in the index list for the same document were said to be connected. The total connection between two words was defined as the total number of documents whose index lists contained both words; a clump was then a set of keywords whose connections to each other tended to be greater than their connections to words not in the clump. Each clump thus represented a set of words that were associated by virtue of their frequent cooccurrence on the same documents.9

In the application of the clump technique to Greek oral poetry, a block of text with the list of stems occurring in it was considered the analog of a document with the list of index terms describing it. The connection between two stems was defined as the total number of contexts in which those two stems occurred together. The context for a stem was considered as the block of three lines consisting of the line on which the stem occurred, the line immediately preceding, and the line immediately following. To determine the optimal context size, we tried running the program with one-line and five-line contexts. Although many of the same clumps appeared under all the variations of context size, the most consistently interesting clumps were generated with three-line contexts. We also tried defining a context as a certain number of words on either side of a stem; but in oral poetry, the line appears to be a more fundamental unit of composition than a fixed-length string of words.10

The formal definition of a clump takes into account not only connections between pairs of stems, but also total connections between all the stems in one set and all the stems in another set. If x and y are two stems, let c(x,y) represent the connection between x and y; the connection of a stem to itself is not relevant to the search for clumps and is arbitrarily defined to be zero: c(x,x) = 0 for all x. If A and B are two sets of stems, we define the cohesion between A and B, written A *B, as the sum of all connections between a stem in A and a stem in B:

Mathematically, a clump is a set that minimizes a certain function F: If A is any set of stems, then all the stems not in A form another set -A, called the complement of A. The clump function is defined by the following formula:

This function determines a number that measures how tightly the stems in A are connected to each other compared to their connections to stems not in A. In the definition of F(A), the numerator represents the cohesion between stems in A and stems not in A; the smaller this number, the more isolated is the set A. The denominator represents the cohesion of A to itself and -A to itself; making A and -A more strongly connected to themselves, but not to each other, increases the denominator and thereby decreases F(A). A clump is then defined as a set of stems A that makes the value of F(A) a local minimum: adding one more stem to A or removing any stem from A would increase the value of F(A).

The following analogy provides a more concrete representation than the mathematical definition.

Page 137

Suppose we imagined the set of all stems in the text as a complex web of interconnections. Each stem would represent one node in the network. If two stems never occurred in the same context, there would be no direct connection between them. Two stems that occurred together frequently would be connected by a strong cord, and two stems that occurred in the same context only once or twice would be connected by a weak thread. The cohesion between two sets of stems would measure the strength of the cords connecting the sets. If we tried to tear the network apart by pulling vigorously on its various parts, it would tend to break into smaller networks, each of which represented a single clump. The clump program is the mathematical equivalent of building a web of string and then tearing it apart.

Suppose we took a small text with sixteen different stems and computed connections between the stems. We would end up with a list of connections such as c(l,2) = 1, c(3,6) = 1 ... To visualize the connections more readily, we could draw a network as in Figure 1, which represents each nonzero connection by a line between the two stems. Suppose each line represents a connection of value 1; then we could apply the above formulas to determine possible clumps. We would find that stems 3, 6, and 7 form clump A with only one connection to a stem not in A. Stems 1, 2, 4, 5, and 9 form clump B with five connections to stems outside of B. Stems 8, 9, 10, 11, and 12 form clump C with six connections to stems outside of C. The four isolated stems 13, 14, 15, and 16 form clump D by themselves; they have no connections to anything else. Note that stem 9 appears in two clumps, B and C.

Figure 1: Clumps in a network of stems.

Figure 2 lists the values obtained from the clump formulas using the connections in Fig. 1 as the input data. Note that the cohesion of a set to itself is always twice the number of lines joining stems in the set: there are three lines joining stems in A, but A*A = 6 because it is the sum of c(3,6), c(6,3), c(3,7), c(7,3), c(6,7), and c(7,6).

A*A  = 6         B*B  = 18       C*C  = 16       D*D  = 12
A*-A = 1         B*-B = 5        C*-C = 6        D*-D = O
-A*-A= 48        -B*-B= 28       -C*-C= 28       -D*-D= 44

F(A) = 0.00347 F(B) = 0.0496 F(C) = 0.0804 F(D) = 0

Figure 2: Values Computed with the Clump Formulas.

Page 138

The small, compact clumps are usually the most interesting ones, but strict application of the formulas allows some large ones that contain smaller clumps within them. For example, stems 3, 6, 7, 8, 9, 10, 11, and 12 form a clump that is the union of clumps A and C. The union of clumps B and C also forms a clump. The union of A and D forms a clump where the two subclumps are not even connected with each other. From the definition of the clump function, it follows that the complement of any clump is itself a clump: the union of B, C, and D forms a clump since it is the complement of A. The search methods incorporated in the clump program tend, however, to favor the small, compact clumps instead of the larger, diffuse or disconnected clumps.

Although the computations for this short example can be done with paper and pencil in a few minutes, a practical application cannot be performed without a large computer. The Hymn to Hermes, with 436 stems and 22,238 nonzero connections, used over 330,000 bytes of storage and took over ten minutes on an IBM 360/75. PL/I was used for the clump programs because it is equally good for the character manipulations in processing the text and for the numeric computations in evaluating the clump formulas. The optimizing compiler that is now available for PL/I produces code that is as efficient as the code from the best FORTRAN compilers. [Obviously, more modern machines and compilers could be more efficient.]

The clump program has four major sections: It first reduces the entire text to a list of stems. Then it computes a table with all the nonzero connections between stems. Next it searches for clumps by starting with suitable trial clumps and progressively refining them. Finally, it prints a miniconcordance of the original text with the lines arranged not by the words they contain, but by the clumps whose stems they include.

The first step in the clump program consists of reading the text and making a list of every word in the text with the number of the line on which it occurred. Then the words are sorted in alphabetical order so that the dictionary look-up requires only one pass. Each word is replaced by the stem given in the dictionary entry; if the word has no stem, it is blanked out, and if it has more than one stem, all the stems are added to the list. After the translation of words to stems, the list is no longer in alphabetical order. Therefore, it is sorted once more. Then the program sweeps through the list of stems, making a fresh list without duplicates and eliminating those stems that occur only once, since they are not relevant to determining cross connections between contexts. From this point on, the character form of the stems is used only for output; all further computations inside the machine use the number representing each stem's position in the list.

The next part of the program generates connections between pairs of stems by counting the number of times they occur together. The connections could be represented as a large matrix, where the i,j-th entry would represent c(i,j), the connection between stem i and stem j. However, the matrix form would waste storage because most of the entries would be zero: for the Hymn to Hermes, the number of entries would be 436 squared or 190,096; but only 22,238 or less than 12 percent were nonzero. Furthermore, the matrix form would not be the most convenient representation, since the clump-finding process requires rapid access to all the stems connected to a given stem without the overhead of searching for nonzero entries in a matrix. The connections are stored, therefore, in a long list with each stem having an index to its block of nonzero connections. To compute the list of connections, the program generates a list representing every line of the original text, but with the words of the text replaced by the numbers of the stems that occur on that line. Then for each stem, the program takes the lines on which that stem occurred and counts the number of occurrences of other stems on those lines and on the lines immediately preceding and immediately following.

Although the computations described so far are complex, they consume less than 20 percent of the total running time for the program; the time-consuming part is the iterative process of searching for clumps by minimizing the function F. The usual way of minimizing a function by trial and error is to start with a good guess and then to refine the guess by making small changes until no change minimizes the function further. For the clump program, the initial guess would be a set of stems representing a trial clump; the small changes would consist of adding or removing one stem at a time from the trial clump and keeping the modified set if it reduced the value of F. When no further change would reduce the value of F, then the resulting set would be a clump.

Good guesses for trial clumps were obtained by taking one stem together with all other stems that had nonzero connections to the original one. These sets were reasonably small, often required little refinement to reach clumps, and rarely led to disconnected clumps since the original sets were

Page 139

connected and the refinements tended to preserve connectivity. Each trial clump was represented internally by a vector of bits of length equal to the total number of stems. If a stem was in the trial clump, its corresponding bit was 1; if not, its bit was 0. The process of refinement consisted of systematically going through the list of stems, adding a stem if it was not already in or removing it if it was and recomputing the value of F each time. Although computing F from scratch each time would be a lengthy process, the addition or removal of only one stem for each change allowed shortcuts in evaluating the changes to F. The process usually converged to a clump after three or four passes through the entire list of stems; if it didn't converge after ten passes (which rarely happened), that trial clump was abandoned. Most of the good, compact clumps tended to turn up several times from iterations starting with different trial clumps. A relevance factor was computed for each clump that represented the ratio of the expected value of F if the stems in the clump had been distributed randomly through the text to the actual value computed by the minimization process. The larger this factor, the more compact was the set of stems and the greater the likelihood that it represented a set of ideas that were closely associated in the poet's mind. The final stage of the program consisted of sorting all the clumps by descending relevance factor, eliminating duplicate clumps that were generated from different trial clumps, and then printing a miniconcordance of the text according to clumps. The miniconcordance consisted of all the stems in each clump followed by the original lines of text on which those stems occurred. This final step shows an interesting parallel to the original information retrieval problem: clumps can be used to extract related lines from a poem in the same way that they are used to extract related documents for information retrieval.

The computations described so far represent the standard version of the clump program, which produced the results reported at the end of this article. During the development of the program, we tried a number of variations of the basic formulas: varying the context size (already discussed), varying the formulas for defining the connection between two stems, and varying the formulas for defining clumps. These variations of the clump program should be considered because a version that works best on early Greek oral poetry may not be suited to poetry of other languages and genres.

In its simplest definition, the connection c(x,y) between stems x and y is the number of times x and y occur in the same contexts. But some stems occur so many times in so many different contexts that they are connected to almost all the stems in the poem, and the poem does not break down into clearly separated clumps. One approach to this difficulty is to reduce the strength of connections to frequent stems by dividing c(x,y) by the sum of the number of times x occurs plus the number of times y occurs in the poem. This new definition of c(x,y) will enhance the importance of rare stems and lead to clumps that contain only the least common stems in the poem. The opposite approach is to assume that the most frequent stems are the most important and that their connections should be enhanced by multiplying c(x,y) by the sum of the frequencies of x and y. A better approach, which we have just begun to implement, is to consider the relative frequency of each stem--its frequency in the current poem divided by its frequency in the entire corpus. The stem APOLLWN, for example, occurs much more frequently in the Hymn to Apollo than in the rest of the corpus; therefore, its connections should be weighted more heavily in that poem than in the others. Since we already had a dictionary of all word forms in Hesiod and the Hymns, we added another field to each entry that would give the total frequency of the form in the entire corpus, as counted by the concordance prgram. We then sorted the dictionary by stems instead of by word forms and summed the frequencies for each word form containing each stem. The result of this program was a new dictionary containing each stem and the number of times it occurred in the entire corpus. We then went back to the clump program to have it look up each stem in the stem dictionary and compute relative frequencies for each stem in the poem.

Another variation for computing c(x,y) is to use the square of the number of occurrences of x and y in the same contexts. This variation enhances the importance of connections between stems that occur together frequently: two stems might occur together once purely by chance; but if they occur together twice, the significance of their connection is probably much more than twice the significance of stems that cooccur only once. This variation can be modified to account for frequencies: let c(x,y) be the square of the number of times x and y occur together in the current poem divided by the product of their frequencies in the entire corpus. So far, most of the computations have been performed with the simple definition of connection. Since

Page 140

there are so many variations, we have not been able to test all of them thoroughly to determine the best one for all the poems. We are encouraged to find the most interesting clumps, however, turning up under all the variations because we seem to be discovering associations inherent in the poems themselves and not introduced by the computational methods.

The formulas defining clumps may also be modified: in terms of the function b(x,A), called the bias of a stem x to a set of stems A, which is the total of all connections of x to stems in A minus the total of all connections of x to stems not in A; a set of stems A is defined as a clump if every stem in A has a positive or zero bias to A, and all other stems have a negative bias to A. The procedure for finding clumps with this definition is similar to that described above: start with a trial clump and then try adding or removing stems one at a time until a set is obtained that satisfies the definition of a clump. This definition, the first one we programmed, was abandoned because it very often degenerated to either the empty set containing no stems or the universal set containing all stems. The definition in terms of minimizing the function F has the advantage that it can never degenerate to the two extremes of the empty set or the universal set: the denominator of the function becomes zero at the extremes, and the function therefore becomes a maximum, not a minimum.


Among the several kinds of results produced, the first were the lists of stem frequencies, which indicated the poet's chief concerns and approach to his subject. The clump-finding program itself turned up at least two different kinds of stem clusters, one which could be said to point up connections on the formulaic level, the other on the thematic. On the lowest level were clusters of stems that were associated by virtue of plays on words or on groups of words occurring within passages of relatively small compass. These were most striking in the "Pythian" half of the Hymn to Apollo, especially when they involved the folk etymology of a name. On the highest level were clusters of stems that resulted from configurations pervading all parts of the poem, even those coming from different themes, and intimately connected with its total meaning. These were the most exciting results and were most like what we had anticipated when we designed the program. They were especially prominent in the Hymn to Demeter. In all cases, a visual inspection and interpretation of the computer output was necessary. The computer furnished us with a great many groups of words, some of which were interesting or relevant, and some of which were not. The computer output was not a finished interpretation, but supplementary data that could contribute to a more thorough analysis of the original texts. In many cases, a clump that the program produced drew our attention to other repetitions of words within the same passage that could be cautiously allied with the clump.

There were several hundred stems for each Hymn. In the Hymn to Demeter, a poem of 495 lines, there were 381 stems; the Hymn to Apollo, with 546 lines, had 352; the Hymn to Hermes, with 580 lines, had 436; the Hymn to Aphrodite, with 293 lines, had 239. QEOS "god" and QNA "death" (a root which forms words for "mortal" and "immortal" as well as the word for "death") were high in frequency in all four major Hymns, as is expectable in poems one of whose major concerns is the intersection of the life of the gods with mortality.

In the Hymn to Demeter and the Hymn to Aphrodite, QEOS and QNA were the most frequent stems, with QEOS leading in the Hymn to Demeter and QNA leading in the Hymn to Aphrodite. This, too, confirms previous observation, since both poems are examples of the Marriage of the Fertility Goddess, a theme that tells how the goddess appeared to a mortal and wanted to make him immortal, but was balked in the attempt.

In the Hymn to Apollo, the most frequent sterns were ENQA "there" and PANT "all." PANT led in the "Delian" part, ENQA in the "Pythian." The frequency of PANT is due to the presence of many generalizing expressions such as pantas...anthrôpous "all men," pantôs euumnon) "all singable," i.e., a fit topic for song. The frequency of ENQA (which is also assigned as a "stem" to the forms enthade "in that very place" and enthen "thence") is accounted for by the fact that, although the poem as a whole represents the theme of the Hero's Consolidation of His Power, both parts of the poem are cast in the form of the theme of Wandering: in the "Delian" part, the goddess Leto searches for a place where she can give birth to her son, the god Apollo; in the "Pythian," Apollo as a young hero god searches for a place to build his temple. It is common for stories of Wandering to tell what the wanderer found at each point in his journey ("and there he found.. .") and

Page 141

at the end of the episode to relate his departure from the place ("and thence he sailed.. ."). Not unexpectedly, BAINW "to walk" and IKW "to come" were also frequent in the Hymn to Apollo. QEOS and QNA were third and ninth in frequency.

In the Hymn to Hermes, the most frequent stems are BOUS "cow" and DIOS, which is the root both for the adjective dios "divine" and for the name of Zeus. This reflects the fact that two of the central episodes in the Hymn are Hermes' birth as the illegitimate son of Zeus and his comic theft of the cattle of Apollo, who is also a son of Zeus. QEOS and QNA were third and sixth.

The simplest kind of clusters resulted from associations enduring for only a small span of verses, say, five to twenty lines. Often, a poet gets one word-stem or perhaps a group of word-stems in his mind and repeats them in varying orders and relationships in the course of the passage, playing with them, using them together as a formula, turning the formula inside out, using a stem now as verb, now as noun, now replacing it by a synonym. These repetitions float, isolated, in the rest of the poem; they occur sporadically and disappear, with no connection to the overall structure of the poem. Such runs of verses are particularly characteristic of Hesiod's Theogony and of the "Pythian" part of the Hymn to Apollo, in both of which such passages frequently illustrate and "explain" folk etymologies.11 These passages often combine several types of repetition, such as anaphora, rhyme, and repeated syntactic patterns along with lexical and semantic repetition. The computer found many of these in varying degrees of complexity. The following, all from the "Pythian" Apollo, are three of the most interesting.

The first example is a set of variations on "Thebes, covered with woods" in lines 225-228 of the Hymn to Apollo. It shows purely formulaic variation with the same words juggled in an ever-changing pattern. [Relevant phrases are underlined.]

Thêbês d'eisaphikanes hedos kataeimenon hulêi:
ou gar pô tis enaie brotôn hierêi eni Thêbêi,
oud' ara pô tote g' êisan atarpitoi oude keleutha
Thêbês am pedion purêphoron, all' echen hulê

Of Thebes you [Apollo] arrived at the seat, which was clothed in woods
For not yet did any mortal live in holy Thebes;
Not yet indeed, at that time, were there paths or ways;
along Thebes' wheat-bearing plain, but it was held by woods.

Anaphora and parallel syntax are exhibited by ou gar pô "for not yet" and oud' ara pô "nor indeed yet." The actual clump which pointed to these variations included more than the stems QHBH and ULH. It also included ANQOS "flower," ENNUMI "to clothe," KRADIH "heart," and RION "headland," but the presence of these other stems is caused by their associations in another part of the poem. ENNUMI gets in because it is a fairly rare stem given a particularly strong connection to the clump because of its presence within three lines of three instances of QHBH and ULH. RION and ANQOS are associated with ULH verse 139, and KRADIH is associated with these three because of its occurrence in verse 138.

A cluster including ALFI "barley," AMA "at the same time," DMA "conquer," IPPOS "horse," KAIW "burn," LEIP "leave," LEUK "shining, white," PUR "fire," and TRIOPS, the name of a man, pointed to an interesting group of lines, verses 210-213:

Ischu' ham' antitheôi Elationidêi euippôi;
ê hama Phorbanti Triopeôi genos, ê ham' Ereuthei;
ê hama Leukippôi kai Leukippoio damarti
pezos, ho d' hippoisin; ou mên Triopos g' eneleipen

[Shall I sing of how you wooed the maiden Azantis]
in competition with godlike Iskhys, son of Elation, the well-horsed?

Page 142

or in competition with Phorbas, son of Triops by race, or in competition with Ereutheus?
Or in competition with Leukippos [lit. WhiteHorse] and the wife of Leukippos,
you being on foot, the other with horses [i.e., in a chariot]? Indeed, he was not inferior to Triops.

This is another example of variation on the formulaic level. Here the play is on the word hippos "horse" and the names of two antagonists of Apollo, [Phorbas, son of] Triops and Leukippos, whose name includes the word hippos. Another kind of repetition is provided by the repeated ê hama "or in competition with." The stems ALFI, KAIW, and PUR find their way into the clump because of their association with LEUK in the verse

pur epikaiontes epi t' alphita leuka thuontes

lighting a fire on it and on it sacrificing white barley

which is repeated in verses 491 and 509. DMA is found in the word damarti "wife" in verse 212 and in two other places, again a rare stem.

Another interesting clump is the one that points to the etymology of Pytho. The clump included the stems KALEW "I call," MAW "to be furiously eager" (an archaic verb which we used as a "stem" for several related verbs and for the related noun menos "might, force"), ONOMA "name," and PUQW, representing both the place and the verb "to rot." The interesting lines are verses 362-374, beginning with the words of Apollo over the murdered dragon:

entauthoi nun putheu epi chthoni bôtianeirêi...
oude ti toi thanaton ge dusêlege' oute Tuphôeus
arkesei oute Chimaira dusônumos, alla se g' autou
pusei gaia melaina kai êlektôr Huperiôn.
Hôs phat' epeuchomenos, tên de skotos osse kalupse.
tên d' autou katepus' ieron menos Êelioio
ex hou nun Puthô kiklêsketai, hoi de anakta
Putheion kaleousin epônumon houneka keithi
autou puse pelôr menos oxeos Êelioio.

"In this place now rot upon the man-feeding earth...
nor will Typhoeus now avail you any against grievous death
nor will the Chimera, of evil name, but on this spot
the black earth and shining Hyperion will rot you."
Thus he spoke, boasting; but the darkness hid her eyes.
On this spot the holy might of the Sun rotted her.
From that time the place has been called Pytho, and
they call the god Pythian as a surname, because there
on that spot the might of the keen Sun rotted the monster.

Although the computer, considering stem-connections for the entire poem, did not include them in the clump, visual inspection also reveals other repetitions within this passage: dus- the prefix meaning "ill, evil" in dusêlege' "grievous" and dusônumos "of evil name," autou "on this spot," and Êelios "the Sun."

Many other such repetitions were discovered in both parts of the Hymn to Apollo and in the other Hymns, but they tended to be less tightly knit and less interesting. In general, these other clumps seemed to reflect either catalogs using parallel expressions in various items or whole lines or groups of lines repeated from one part of the poem to another, an often-found feature of oral poetry. The frequency of this particular type of verbal clustering in the second half of the Hymn to Apollo supports the view that the second half is by a different poet

Page 143

from the first, perhaps by a poet close in tradition to Hesiod. These patterns are embedded in a poem whose thematic unity testifies, on the other hand, to the vitality of the themes themselves to shape the composition of the poet, no matter who he was.

The most spectacular results to come from the computer program were connections where the clumps themselves were on the thematic level. These were most impressive in the Hymn to Demeter.

The plot of the Hymn to Demeter is as follows: Persephone, the daughter of Demeter, goddess of the grain, is abducted by Hades, the King of the Underworld. Distraught, Demeter searches for her daughter. When she is told the truth by Helios, she withdraws from the company of the gods and goes among men. Pretending to be an old woman who has just escaped from pirates, she hires herself out to the family of King Keleus of Eleusis, as a nursemaid to the infant prince Demophoon. She puts the child in the fire every night to make him immortal, and he grows miraculously until one night his mother Metaneira, who was spying on them, sees Demeter putting him in the fire, and rails at the old woman for harming her child. Demeter, angry, reveals herself in her full beauty and height to the frightened mortals and demands that they propitiate her by building a temple to her. After the temple is built, she withdraws again, this time from both gods and mortals. She "hides the seed in the ground" and refuses to let the farmers' crops grow until she regains her daughter. Eventually Hades is persuaded by the other gods to return Persephone to her mother, but before he sends her back he plays a trick on her. Because he secretly makes her eat a pomegranate seed, she can remain above ground only two-thirds of each year and must live underground with him for one third. Demeter, rejoicing at her daughter's return, makes the crops grow again and imparts the ritual of her Mysteries to the people of Eleusis.

The Hymn to Demeter is thematically complex. The major themes involved are the Death and Resurrection of the Dying God, the Marriage of the Fertility Goddess, the Withdrawal and Return of the Hero, Wandering, the Epiphany of a God to a Mortal, and the Wrath of God, not to mention many other thematic elements that contribute in a minor way.12 The entire poem also functions as an aetiological myth of the foundation of the Eleusinian Mysteries, the most famous mystery-cult of all antiquity, the precise nature of whose rites is still a secret today.

The clump-finder discovered two important clusters of stems which many rereadings of the text had failed to find, and which together make up a powerful configuration of associations having to do with the Mysteries. The first has to do with eating and secrecy, the second with seeing and transgression. These stems describe the heart of the Eleusinian Mysteries, which were concerned with grain, were secret, and were forbidden to profane eyes. The first clump included AD "to be pleasant," BAL "to cast," BROT "mortal," XRWT "skin, body," EDW "to eat," KOKKOS "seed," LAQ "secret," LOUW "to wash," MELI "honey," OINOS "wine," PATEOMAI "to eat," PINW "to drink," and ROIH "pomegranate." These associations are not confined to one particular passage but are found in many parts of the poem, appearing associated with elements from a number of different themes. Demeter secretly escapes from the pirates while they are eating (in the element of the Goddess' Lie in the theme of the Marriage of the Fertility Goddess and the theme of Epiphany); she secretly feeds Demophoon on ambrosia (in the Goddess' Attempt to Give Immortality). Hades secretly feeds Persephone the pomegranate seed (the Return of the Dying God and the Death of the Substitute--who is in this case Persephone herself, who must "die" again each year--from the Death and Resurrection cycle).

Another stem which one might naturally want to group with LAQ is KRUP "to hide." As a matter of fact, they are used in different contexts, and the computer distinguished between them. Whereas LAQ occurs with words for eating, KRUP does not. KRUP is used in two contexts: Demeter hiding Demophoon in the fire (The Goddess's Attempt to Give Immortality) and Demeter hiding the seed in the ground to keep it from growing (Demeter's Withdrawal, the Wrath of God). It can be said, however, that the association is similar to the associations with LAQ. Additional words for food are also found in many places in the poem, including the names of three of Persephone's nymph companions--Melitê "Honey," Mêlobosis "Sheep-feeder," and even Galaxaurê "Milky Breeze." Demeter also refuses to eat or wash until she finds her daughter.

The other group of stems had to do with seeing and transgression. The relevant clump included the stems AAW "to mislead, cause to act foolishly," AMFW "both," LOUW "to wash," MEILIC "soothe," PLAG "strike," QALAM "chamber," SKEP "look," THREW "to watch" (note that LOUW appears in both clumps). The interesting stems are AAW, SKEP, and THREW. The connection

Page 144

between them is exhibited particularly where the mother, Metaneira, sees Demeter putting Demophoon in the fire. She saw something connected with Demeter which she shouldn't have seen. This is exactly what the uninitiate in the Mysteries was enjoined from doing. On rereading the Hymn, we realized that there is a great deal of emphasis throughout the poem on the idea of "seeing," usually connected somehow with misfortune. The words for "seeing" are from various stems, and more study is needed to determine exactly how the poet distinguishes between them. The miraculous flower with which Persephone was lured to her capture was a wonder to see (idesthai). When she was captured, she still hoped, so long as she could see (leusse) the earth, sky, sea, and the sun's rays, that she would be able to see (opsesthai) her mother. No one saw her kidnapped, not even Helios, the Sun, the "watcher (skopon) of gods and men," although both Helios and the goddess Hekate heard her cries. Hekate twice says ouk idon ophthalmoisin "I did not see with my eyes." Later, Demeter is to threaten not to make the crops grow "until she sees her daughter with her own eyes"--prin idoi ophthalmoisin, where the poet uses the same words as those he made Hekate use. At the end of the poem, naturally, there is a change of emphasis, to showing what it is lawful to see, when she demonstrates the correct ritual for the Mysteries to the kings of Eleusis.

One could also widen this interpretation by referring to the emphasis in the poem on communication and perception generally, including hearing (no one heard the cries of Persephone) and speaking (Demeter's requests for information during her search for Persephone, her eager questioning of Persephone herself, after her return, to find out what happened to her). There are many words for speech and speaking in the poem, many of them stock expressions for introducing or ending speeches in Homeric poetry. The number of these is, of course, dependent on the amount of conversation in the poem, but the very presence of so much conversation in the poem mirrors the preoccupation with the giving and withholding of information. The link between this and the esoteric information of the Mysteries is pointed up by the words spoken to Demeter in verse 323 by the goddess Iris, who is trying to persuade Demeter to return:

all'ithi, mêd' ateleston emon epos ek Dios estô

But come, lest my message (epos) from Zeus remain unfulfilled.

Atelestos "incomplete, unfulfilled," also means "uninitiated," and is related to telestêrion, the building in which one was initiated into the Mysteries and teletê, the initiation ceremony. Together, these groups of words suggest the legomena "words spoken," deiknumena "things shown" and drômena "actions performed" which our scanty sources tell us were at the heart of the Eleusinian ritual. Thus we see that on top of the regular traditional themes a sort of "super-theme" concerning the Mysteries has been superimposed, giving the familiar motifs a new look that fits this context and this context alone. [Example 2 is an excerpt from the output of the clump (numbered 128) containing words for eating and secrecy found in the Hymn to Demeter.]

128.  NUMBER OF STEMS = 13        RELEVANCE =      29.82


AD        SWEET        0.40        BAL      THROW      0.30
BRWT      MORTAL       0.35        CRWT     SKIN       0.43
EDW       EAT          0.40        KOKKOS   SEED       0.92
LAQ       HIDE         0.31        LOUW     WASH       0.43
MELI      HONEY        0.35        OINOS    WINE       0.31
PATEOMAI  EAT          0.44        PINW     DRINK      0.32

                                                   STEMS IN

    DHMHTHR                                               1


Example 2:  Excerpt from results of the "Clump Finder:"
Thematic Cluster #128 from Hymn to Demeter.

Finally, two other interesting clumps of stems which the computer revealed in the Hymn to Demeter illustrate further how a chain of associations may influence the poet's composition. The first pertains to "the third share" and contains the stems DUO "two," ETOS "year," MEIR "apportion," and TREIS "three." These occur connected in three very similar passages (verses 399-400, 445-447, 463-465) which explain how Persephone must spend one-third of the year with Hades and two-thirds with her mother. Other connections of MEIR and TREIS emphasize this group's association with the notion of the Underworld. If we look back at the only other occurrence of the stem TREIS in this poem, in verse 86, we find it connected with two stems that also mean "apportion," LAC and DA*IZW. This time, the concepts cooccur in the justification of Hades' rape of Persephone, that he will be a "not unworthy son-in-law" for Demeter since he rules one-third of the universe, namely the Underworld; the other two shares, the Sky and the Sea, belong to his brothers Zeus and Poseidon. The stem MEIR reappears in verse 481 in the description of the unhappy fate awaiting one who is "without share" (ammoros) in the Mysteries when he goes to the Underworld.

Finally, there is a strange connection existing between ULH, the stem for "wood," and TEM, the stem meaning "to cut." The clump that links them together also included the stems ERUW "to protect," ESQLO "good, noble," and FERTER "better." In verses 228--230, Demeter, applying for the job of nursemaid, is telling Metaneira why she would be good for the job:

Page 145

out' ar' epêlusiê dêlêsetai oud' upotamnon:
oida gar antitomon mega pherteron hulotomoio,
oida d' epêlusiês polupêmonos esthlon erusmon

Nor will witchcraft destroy him, nor will an herb cut off at the bottom [for use as a charm];
for I know an antidote [lit, an anti-cutting] far stronger than the herb cut in the wood,
and I know a fine safeguard against baneful witchcraft.13

The formation of the clump of course took into consideration the entire poem, but a look at the specific passage reveals it to be one of those we mentioned as more typical of the Hymn to Apollo, where the poet plays with a group of stems within a passage of a short number of lines. Other words repeated in it are epêlusiê "witchcraft" and oida "I know." The repetition of oida gar "for I know" and oida d' "and I know" introduces anaphora, and sense repetition is exhibited by epêlusiê dêlêsetai "witchcraft will destroy" and epêlusiês polupêmonos "of witchcraft baneful."

What is interesting is that the connection reemerges in a surprising way in the later part of the poem. When Persephone is being returned to the upper world the horses of Hades, bringing her back, "cut the air" (êera temnon verse 383). Three lines later Demeter is said to be like a maenad, leaping for joy through the woods (hulês). This connection, pointed out by the computer results, may be a coincidence; but it is also possible that the poet, having worked out an elaborate play on these stems earlier in the poem, retained in his mind some sense of the connection when he came to compose the later verses.

This is a sampling of the insights we have gained from our computer study of the Homeric Hymns, and of the way they can be used to support conclusions drawn concerning the use of oral themes. More importantly, for our use of computers, we are developing a tool that can be used to study literature of any language, oral or literate. Some might ask why, since so much human attention is necessary even after the computer printout has been obtained, it seemed worthwhile using a computer at all. Could we not do the same thing by careful observation and rereading of the poem? The first argument in favor of the computer is its speed. Our program, which identified scores of clumps in a poem of several hundred lines, took only ten minutes of computer time on a high-speed machine. But the most important reason is this: These poems are tied together by webs of associations comprising thousands of word-forms and thousands of stems. It would be impossible for the human mind to remember all the words in the poem and where they occur and with what other words, even if we had every poem by heart. There are too many words and too many relationships to grasp and remember all at once. The computer serves as a counting device to help us keep these relationships in mind. We do not claim that our program will find all of the interesting connections or that all the connections it finds will be of equal interest. What it does is point out some interesting patterns that we might otherwise have missed.


1. This article is a revised version of a paper presented by C. A. Sowa at the meetings of the American Philological Association in San Francisco, December 1969. Work on this project was first begun by the authors at the American Philological Association's Summer Institute in Computer Applications to Classical Studies at the University of Illinois in Champaign-Urbana in 1969. The project was also facilitated by a grant to C. A. Sowa from the American Council of Learned Societies in 1970-1971. The authors would also like to thank Professor Winifred Asprey and the computer center of Vassar College for the use of their computing facilities.

2. The theories of oral composition of Parry and Lord are now generally accepted for the Iliad and Odyssey; see Albert B. Lord, The Singer of Tales (Cambridge: Harvard University Press, 1960). For Hesiod and the Hymns, see James A. Notopoulos, "The Homeric Hymns as Oral Poetry," American Journal of Philology 83, 4 (October 1962), 337-368; Patricia G. Preziosi, "The Homeric Hymn to Aphrodite, an Oral Analysis," Harvard Studies in Classical Philology 71 (1966), 171-204; H. Berkley Peabody, Hesiod's Works and Days: An Example of the Ancient Greek Oral Style, diss. Harvard, 1961 (the contents of this dissertation are soon to appear in much expanded form as The Winged Word, to be published by the State University of New York Press); and the forthcoming book by C.A. Sowa, Traditional Themes and the Homeric Hymns. [Note: both books have since been published.]

3. W. Arend, "Die Typischen Szenen Bei Homer," Problemata 7 (Berlin, 1933), and C. A. Sowa, forthcoming. [See previous note.]

Page 146

4. "Modulation" is the term used by Lord in The Singer of Tales to describe the process by which one theme passes over into another, related theme. In chapters 8 and 9, he shows how this principle applies to the Iliad and Odyssey.

5. "Cicero's Picture of Marius," Wiener Studien 73 (1960), 83-122; "The Picture of Marius in Valerius Maximus," Rheinisches Museum 105 (1962), 289-337; "Content Analysis: Construing Literature as History," Mosaic 1, 1 (University of Manitoba Press, October 1967), 22-38; and "The Changing Picture of Marius in Ancient Literature," Proceedings of the African Classical Association 10 (Salisbury, 1967), 5-22. Since then, he has also published a book Content Analysis (Winnipeg: University of Manitoba Press, 1972).

6. The text used was Thomas W. Allen's edition of 1912 (corrected 1946) in the Oxford Classical Text series. The machine-readable text and dictionary are available to interested persons from the American Philological Association's Data Bank, care of Dr. Stephen V. F. Waite, Kiewit Computational Center at Dartmouth College, Hanover, New Hampshire. [For present availability in disk form, contact C.A. Sowa.] For purposes of keypunching, the following conversion rules were used for representing Greek letters by means of Roman letters: Wherever an exact equivalent exists, it is used; for the others, eta is represented by H, theta by Q, iota subscript by J, phi by F, khi by C, psi by Y, and omega by W; a diaeresis is represented by *. The text as we used it did not indicate accents, breathings, or capital letters. when mentioning stems in this paper, we shall give them in this code.

7. Since the dictionary has been transferred to tape, we can, of course, space out the information to allow as many columns as we want.

8. Etymological information used in creating the dictionary was taken from P. Chantraine, Dictionnaire Etymologique de la Langue Grecque (Paris: Klincksieck, 1968); H. Frisk, Griechisches Etymologisches Wörterbuch (Heidelberg: Carl Winter Universitätsverlag, 1960-1970); A. Meillet and J. Vendryes, Traité de Grammaire Comparée des Langues Classiques (Paris: Librairie Ancienne Honoré Champion, 3rd ed., 1960); H. G. Liddell and Robert Scott, A Greek-English Lexicon (Oxford: Clarendon, 9th ed., 1940).

9. R. M. Needham and K. Sparck-Jones, "Keywords and Clumps," Journal of Documentation 20, 4 (March 1964). The clump formula we used is discussed in K. Sparck-Jones and D. Jackson, "Current Approaches to Classification and Clump-finding at the Cambridge Language Research Unit," The Computer Journal 10 (1967), 29-37. Variations on the formulas for computing connections are discussed in P. E. Jones and R. M. Curtice, "A Framework for Comparing Term Association Measures," American Documentation 18 (1967), 153-161. The formula for computing clumps using the bias function is discussed in A. G. Dale and N. Dale, "Some Clumping Experiments for Associative Information Retrieval," American Documentation 16 (1965), 5-9.

10. Slightly different results might be obtained by using other units as contexts: the oral formula (the exact nature and scope of which is still in debate), the syllable (or the mora, equivalent in length to one short syllable), the colon (a group of syllables bounded by caesura), the half-line (apparently a more ancient unit in the hexameter than the full line), the sentence or clause (where enjambement makes it different from the line or half-line). On the relationship between these units see H. B. Peabody, op. cit. (note 2), and the forthcoming book by Michael N. Nagler, Tradition and Spontaneity: A Study in the Oral Art of Homer. [Note: this book has now been published.]

11. This phenomenon is discussed at greater length in C. A. Sowa, Traditional Themes, and in C. A. Sowa, "Verbal Patterns in Hesiod's Theogony," Harvard Studies in Classical Philology 68 (1964), 331-332.

12. For outlines of elements of these themes see Appendix I of Sowa, Traditional Themes.

13. The meaning of this passage is actually not completely certain. The words hupotamnon, formed from the prefix hupo "under," plus TEM, and hulotomon, formed of ULH and TEM, may mean "an herb cut off at the bottom" and "an herb cut in the woods," as we have translated them, or they may refer to a type of worm, "a borer," supposed to cause toothache or teething pains.

Copyright © 2002, Cora Angier Sowa. All rights reserved.

Send e-mail   Send e-mail to Cora Angier Sowa.


  Return to Minerva Systems home page.