Indo-European Lexicon

User Guide

The following provides a brief explanation of the format of the LRC’s online Indo-European Lexicon and how this might be used for self-study.

Motivation

The Indo-European Lexicon (IELEX) presents a freely available etymological database in which users can explore the relations within and among languages, ancient and modern, in the Indo-European family of languages.

Idea

What does it mean to have an Indo-European Lexicon?

To understand the idea, it helps to take a step back. As many readers might already be aware, even a passing familiarity with the structures and vocabulary of languages like French, Spanish, Italian, and Portuguese is enough to suggest that these languages bear a striking resemblance to one another. Consider some basic vocabulary.

Language similarities
Meaning French Spanish Italian Portuguese
one un uno uno um
book livre libro libro livro
school école escuela scuola escola
(we) translate traduisons traducimos traduciamo traduzimos

It is one thing for a few words to be similar between two languages: speakers of one language borrow words from other languages all the time, frequently to represent new concepts or technology (imagine how many languages must now have the word iPhone). But when a large number of words are similar, the likelihood of simple borrowing via social interaction is reduced (except in special cases, such as when one nationality invades another and becomes an occupying force). But when not only the vocabulary, but the structures themselves are shared among languages, then this is special: note in the table above how not only is the word for “translate” the same, but the form “we translate” is almost identical. The likelihood that speakers of different languages have simply borrowed from one another becomes vanishingly small, and we suspect that the languages themselves are related.

What does it mean for languages to be related? Roughly the same thing it means for two people to be related: they share a common ancestor. In the chart above, we actually still have documents recording the existence of these languages' common parent: Latin. Not only the vocabulary, but the very grammatical structures themselves, descend from corresponding vocabulary and structures in Latin. The same chart, but with the respective Latin predecessors, looks as follows.

Romance languages
Meaning French Spanish Italian Portuguese Latin
one un uno uno um unus
book livre libro libro livro liber
school école escuela scuola escola schola
(we) translate traduisons traducimos traduciamo traduzimos traducimus

We term the languages that share Latin as a common ancestor the Romance languages, and this language (sub)family includes other languages such as Romanian as well.

Of course the Romance languages are not the only languages to show such a thoroughgoing resemblance to one another. The same could be said for the Germanic languages, for example, as we can see from the following chart.

Germanic languages
Meaning German Dutch Icelandic ?
one eins een einn  
book Buch boek bók  
school Schule school skóla  
(we) know wissen weten vitum  

The question in this case becomes: what do we put in the last column? What language is the common ancestor of the Germanic languages? As it turns out, contrary to the robust documentation of Latin, historical artifacts leave only the sparsest traces of texts for the parent of the Germanic languages: some very old runic inscriptions make it clear that there was some such language, just not written down with any frequency in documents that might have survived the ages. So linguists give this ephemeral language a name: Proto-Germanic. The prefix proto- denotes the fact that this represents the oldest identifiable form of language that could function as a common parent for the Germanic languages that subsequently descended from it.

In the same fashion we can identify numerous language groupings, such as Slavic (containing, e.g., Russian, Bulgarian, Czech, Polish), Celtic (Irish, Welsh, Gaelic), Indic (Hindi, Marathi, Gujarati), Hellenic (Greek, Mycenaean), and numerous others. What’s more, if we look at the oldest members of many of these groups, we find that they are related to one another in the same way.

Indo-European languages
Meaning Old Norse Old Church Slavonic Latin Greek Sanskrit ??
three þrír tri trēs tría trayas  
sister systir svestĭ soror éor svásar-  
mother’s móður matere mātris mētéros mātas  
(we) bear berum beremŭ ferimus phéromen bhárāmas  

We call this collection of languages the Indo-European (IE) language family, based on the geographic distribution of the languages it contains: from Iceland in Europe all the way to north and central India. The various sub-families form part of this larger family, all descending from some common parent. This parent language would provide the original words which should fill the column labelled ?? above, playing for the Indo-European family as a whole the role analogous to that of Latin for the Romance languages. Since that language left no documents, we do not know what the speakers called it. So we term it Proto-Indo-European in the same way that we term Proto-Germanic the (little documented) parent of the Germanic languages.

The discipline of historical linguistics has created a methodology of extrapolation from the data provided by the oldest Indo-European languages in order to reconstruct features of Proto-Indo-European (PIE), including many of the basic meaning-carrying elements of its vocabulary, or lexicon. The table above, with the last column filled in according to historical linguistic reconstruction, would look as follows. (The preceding asterisk denotes that a form has been reconstructed.)

IE languages & PIE antecedent forms
Meaning Old Norse Old Church Slavonic Latin Greek Sanskrit PIE
three þrír tri trēs treîs trayas *tréyes
sister systir svestĭ soror éor svásar- *swésōr
mother’s móður matere mātris mētéros mātas *meh₂trós
(we) bear berum beremŭ ferimus phéromen bhárāmas *bhéromes

The Indo-European Lexicon, conceptually, is a collection of the vocabulary items reconstructed for Proto-Indo-European. We call each element of this vocabulary an etymological root, or simply root (though sometimes the term etymon, pl. etyma, is also used). All the words in the languages descended from PIE that share a comment root are etymologically related. We call such descendent languages daughter languages, and the words descended from a common root the reflexes of that root. For example in the above chart we can see that systir, svestĭ, soror, éor, and svásar- are the reflexes of PIE *swésōr in Old Norse, Old Church Slavonic, Latin, Greek, and Sanskrit respectively.

The IELEX thus collects all the reconstructed roots of PIE and lists under each all the reflexes of that particular root, grouped according to the languages in which they appear.

Suggested Use

The IELEX consists of a database of PIE roots, together with a list of their reflexes across the IE languages. The LRC has created a collection of associated web pages where users can browse the contents of this database starting from different points of entry depending on the particular user’s interests. The following sections outline the main interfaces to the data.

Master Index

The simplest interface to the IELEX is the Main Index (often listed as Pokorny in navigation bars to highlight its relation to Julius Pokorny’s original dictionary). Users may think of this as a simple interface paralleling that for any dictionary. It consists of a table of PIE roots in alphabetical order (in the Pokorny entry column), each together with a brief gloss. In the See also column each root has a link labeled IE: this takes the user to a page dedicated specifically to this root, where the user can find a list of words descended from this particular root across the IE languages. (Note that the LRC continues to add words to these lists.) In this same column some roots also show a link to a related root which the user might wish to consult for comparison.

Frequently users, including scholars, do not start with a particular PIE root in mind, but rather with a specific word in one of the IE languages, and they want to find out what other words it’s related to. For that reason we provide another interface to the IELEX, discussed below.

Language Index

Often etymological investigations begin in one of the following ways:

  • one has in hand a particular word in a particular language and wants to know what words in other languages it might be related to;
  • one has an interest in a particular language and wants to find words in that language whose etymologies are known.

For these investigations the LRC has created a separate point of entry to the IELEX: the Language Index. This page provides a list of Indo-European languages which users can peruse to find a particular language of interest. If the given language is highlighted as a link, the user may follow the link to a separate page that lists all words in the IELEX database that pertain to that language: a Reflex Index. See, for example, the Old Norse Reflex Index, to which the user can navigate by scrolling down the list in the Language Index and clicking on the link labeled Old Norse.

The Reflex Index page for a given language presents the user with a table displaying all words in the database for that language with known etymologies. The table has two columns: Reflex and Etyma. The Reflex column displays the words in the specified language. The Etyma column displays the underlying etymological root or roots from which the corresponding word derives. Clicking on the specific root will take the user to the root’s associated webpage (where all the root’s reflexes are listed), automatically scrolling down the page to the point where the user can find the original reflex that led to this root.

Back on the Reflex Index page, the user may select the number of entries shown in the table display by adjusting the number in the upper left-hand corner. Perhaps more importantly, the user may type text in the Search bar at the table’s top right: this will filter the results and show only elements that begin with the same sequence of characters. In this way the user can search for a specific word, or for all words beginning with a specific sequence of letters, in a given language.

Semantic Index

Some questions about the history of words may be even more open-ended, such as the following:

  • what are the words for ‘mother’ and ‘father’ across all Indo-European languages? or
  • where do the words for ‘earth’ and ‘sky’ come from?

Of course users may simply search in the Language Index for English and then look for specific words within the word list on the associated Reflex Index page. But the LRC has created another point of entry that it hopes will complement such investigations that “narrow down” by allowing an interface that “opens up”.

Specifically, users may enter the IELEX via the Semantic Index. This page lists a number of broad categories of reference that group together words related by sense, rather than by etymology. For example, to discover the origins of the words for ‘mother’ and ‘father’, the user might choose the Mankind category and discover, in addition to those roots, the origins of related terms like ‘brother’, ‘sister’, ‘son’, ‘daughter’, ‘husband’, ‘wife’, and numerous others denoting similar concepts. Or to find the origins of ‘earth’ and ‘sky’, the user might search under the Physical World category and at the same time learn about the Indo-European words for ‘sea’, ‘water’, ‘forest’, and ‘oak’.

Following the link for any specific category in the Semantic Index will lead to a new page where that category is broken down into a number of subcategories distinguishing related concepts. Many of these categories have links to a further page which lists all roots that fall within that subcategory. Since these pages continue to evolve, however, some subcategories have yet to be filled in and therefore currently have no associated links.

The basic categories of the Semantic Index are the following:

  1. Physical World
  2. Mankind
  3. Animals
  4. Body Parts & Functions
  5. Food & Drink
  6. Clothing & Adornment
  7. Dwellings & Furniture
  8. Agriculture & Vegetation
  9. Physical Acts & Materials
  10. Motion & Transportation
  11. Possession & Trade
  12. Spatial Relations
  13. Quantity & Number
  14. Time
  15. Sense Perception
  16. Emotion
  17. Mind & Thought
  18. Language & Music
  19. Social Relations
  20. Warfare & Hunting
  21. Law & Judgment
  22. Religion & Beliefs

Stay in Touch

Hopefully this outline of lexicon use will prove helpful. Please do not hesitate to contact the LRC with questions and comments based on your experience. Good luck with your studies!


  • Linguistics Research Center

    University of Texas at Austin
    PCL 5.556
    Mailcode S5490
    Austin, Texas 78712
    512-471-4566

  • For comments and inquiries, or to report issues, please contact the Web Master at UTLRC@utexas.edu