Series Introduction

Brian Joseph, Angelo Costanzo, and Jonathan Slocum

Albanian is an Indo-European language spoken mainly in the Balkan Peninsula by approximately five million people. It is the principal and official language of Albania, the principal and a co-official language of Kosovo (with Serbian), and the principal and co-official language of many western municipalities of the Republic of Macedonia (with Macedonian). Albanian is also spoken widely in some areas in Greece, southern Montenegro, southern Serbia, and in some towns in southern Italy and Sicily.

The terms Albania and Albanian are exonyms. The Albanians call themselves Shqiptar, their language shqip, and their country Shqipëria. These words are likely derived from the adverb shqip 'clearly' based on Latin excipere (whence shqipoj 'speak clearly'), though there are alternative explanations. In all other languages, a form from earlier *alban- or *arban- is used (the difference being most likely from a rhoticism process in Greek). In most other languages, a form with the same origin as Eng. Albanian is used (e.g., It. Albanese, Serb. Albanac, Germ. Albaner, etc). In Turkish, the Albanians are called Arnavut, derived in some way from arvan-. The terms Albania and Albanian are not to be confused with the area in the Caucasus referred to in ancient texts as Albania or the language spoken there referred to as Albanian (an ancestor of the modern Udi language spoken in Azerbaijan and a member of a language family with no confirmed connections to the Indo-European language family).

When compared with most of the other Indo-European languages, Albanian's first attestations are rather recent, with the first surviving fragment from the mid-15th century and the first major text from the mid-16th century. For this reason, these lessons cover Albanian from the modern standard language back to earlier attestations, starting with the modern variety to get a grounding in the language and working back to older material.

Albanian and Indo-European

Albanian forms a separate branch of Indo-European and cannot conclusively be closely connected with any other Indo-European language. There have been attempts to connect Albanian with some of the sparsely attested ancient languages of the Balkans, particularly Illyrian but also Dacian and Thracian. While this is plausible geographically, given that we know the Illyrians lived in an area that includes the modern Albanian-speaking area, there is no concrete linguistic evidence for any of these proposals. Some have proposed a connection between the ancestor of Albanian (without assigning a specific identity to this ancestor) and a Latinized variety of that ancestor that may have ultimately yielded Romanian, as there are several shared words not of Latin origin in both languages.

Albanians and Albanian in the Historical Record

Mention of the Albanian people and the Albanian language appears rather late in the historical record. The earliest uncontroversial mention of the Albanian people is in Michael Attaleiates's late 11th century history of the Byzantine Empire, where he refers to the Albanoi taking part in a revolt against Constantinople and the Arvanitai as subjects of the duke of Dyrrachium (modern Durrës, Albania's main port on the Adriatic).

The first mentions of the Albanian language predate its first attestation by several centuries. Elsie (1991) describes a 1285 text in which the investigation of a robbery in Ragusa (modern Dubrovnik, Croatia) refers to a witness who said Audivi unam vocem clamantem in monte in lingua albanesca 'I heard a voice crying in the mountains in the Albanian language'. In the 1308 Anonymi Descriptio Europae Orientalis 'Anonymous description of Eastern Europe', the author writes Habent enim Albani prefati linguam distinctam a Latinis, Grecis et Sclavis ita quod in nullo se inteligunt cum aliis nationibus 'The aformentioned Albanians have a language which is entirely distinct from that of the Latins, Greeks and Slavs such that in no way can they communicate with other peoples'.

Earliest Attestations of the Albanian Language

While the earliest attested Albanian texts are from over a century later, the existence of Albanian texts is mentioned in 1332 in Directorium ad passagium faciendum (by a French monk whose identity is uncertain): licet Albanenses aliam omnino linguam a latina habeant et diversam, tamen litteram latinam habent in uso et in omnibus suis libris 'The Albanians have a language different from Latin, although they use Latin letters in their books' (note that this could potentially be saying that Albanians just wrote in Latin).

The oldest unambiguous attested Albanian is a single line embedded in a Latin document from 1462. It is in a letter from Pal Engëlli, a bishop and associate of Skënderbeu, and is a translation of a baptismal formula (formula e pagëzimit) into Geg Albanian:

Vnte' paghesont premenit Atit et birit et spertit senit
'I baptize you in the name of the father, the son, and the holy spirit'
cf. Std. Alb. Unë të pagëzoj në emër të Atit, të Birit, e të Shpirtit të Shenjtë

Over the following century the attested Albanian "texts" are of similar size, including a single line in a Latin play from 1483 and a short list of Albanian words from 1496.

The first larger text is Meshari i Gjon Buzukut 'The Missal of Gjon Buzuku', written in 1555 (see Lesson 5). Again, like the earlier attestations of Albanian, Buzuku's 'Missal' is written in Geg. Most of the early documentation of Albanian is in Geg, as that area was more difficult for the Ottomans to subdue (and consequently discourage the use of Albanian). The earliest attestation of Tosk Albanian is the E mbsuame e krështerë 'Christian doctrine' of Lekë Matrënga from 1592, written in Hora e Arbëreshëvet, an Arbëresh settlement in northeastern Sicily.

Structure of Albanian
Some general characteristics of the Albanian language:
  • Albanian shows a fairly complex nominal inflection system. Albanian has a three-gender system (masculine, feminine, neuter), though the exact status of the neuter gender is disputed. Five cases remain from Proto-Indo-European: nominative, accusative, dative, genitive, and ablative, though the dative and genitive are morphologically identical. In addition to inflecting for case and number, Albanian nouns also inflect for definiteness. As is also seen in several other languages of the Balkans (e.g., Romanian, Macedonian, Bulgarian), Albanian has a postposed definite article, e.g., zog 'bird', zog-u 'the bird'
  • The verb system is highly populated with analytic forms. This includes several compound past tenses (e.g., kam lexuar 'I have read'), the future tense (e.g., do të lexoj 'I will read'), the present progressive (e.g., po lexoj 'I am reading'), the past passive (e.g., u lexua 'it was read'), among others. In addition, Albanian also has a substantial inventory of synthetic verb forms, some familiar (e.g., present, imperfect, past definite, optative, etc.), and some that are less familiar to learners of Indo-European (e.g., the admirative mood, see Lesson 5).
  • One of the most noticeable features of Albanian is the vast number of "small words" that exist. It is not that there is a huge inventory of different "small words" in Albanian; rather there are many instances in which words having the same form are found in different functions. Some of these small words include: the attributive article (or as we call it, nyje), that can take four different forms depending on a variety of factors and is required with most adjectives, some nouns, and all instances of nouns in the genitive case (it is what distinguishes the genitive from the dative); subordinators; weak pronouns; etc. This often gets a bit tricky as, e.g., të can be an attributive article, a pronominal clitic, and a subordinator.
Variation in Albanian

Albanian dialects are traditionally divided into two groups: Geg dialects in the north, and Tosk dialects in the south. The dividing line is traditionally considered to be the Shkumbin river, which runs east-west though central Albania (at approximately the 41st parallel north). Dialects spoken in Kosovo and Macedonia are Geg dialects, while those spoken in northwestern Greece are Tosk dialects. While they are technically Tosk dialects, Arvanitika (spoken in Greece, historically in Attica and Boeotia) and Arbëresh (spoken in southern Italy and Sicily) are also often considered major Albanian dialects; these dialects were brought to these areas after the Ottoman conquest of the western Balkans in the late 15th century, and they are maintained to this day.

Major differences between Geg and Tosk

Phonological variation:

  • Geg has nasal vowels while Tosk does not, e.g., Geg âsht vs. Tosk është 'is'
  • Geg has phonemic vowel length, e.g., dhē 'earth' vs. dhe 'and'. Nearly all Tosk dialects lack vowel length distinctions, e.g., dhe 'earth', 'and'.
  • Tosk dialects have undergone a change by which intervocalic n became r. No such change has occurred in Geg, e.g., Geg Shqipnia vs. Tosk Shqipëria 'Albania', Geg gjarpën vs. Tosk gjarpër 'snake'.

Morphosyntactic variation:

  • The Tosk future tense is formed with the marker do followed by a conjugated present subjunctive form of the verb (e.g., do të shkoj 'I will go'), while the Geg future tense is formed by a conjugated form of the verb 'have' followed by an infinitive (e.g., kam me shkue 'I will go').
  • Tosk lacks infinitives altogether (similar to several other languages of the Balkans), while Geg maintains the infinitive (composed of me plus the past participle).
  • In Tosk, most verbs have a past participle in -r (e.g., fjetur 'slept', qeshur 'laughed', kërkuar 'requested'). In Geg, no verbs have this ending (e.g., fjetë 'slept', qeshë 'laughed, kërkuë 'requested').
Standard Albanian

Nearly all of the historical centers of Albanian culture (Durrës, Tiranë, Shkodër, Prishtinë, Tetovë, etc.) are located squarely in Geg-speaking territory. However, Standard Albanian is predominantly based on Tosk. The promotion of a Tosk-based variety as a standard is actually quite recent, and likely has much to do with the fact that Enver Hoxha, Albania's dictator from the 1940s until the 1980s, was from Gjirokastër (in southern Albania), and thus was a native speaker of a Tosk variety. Even though they are predominantly located in Geg-speaking areas, the standard variety used in Kosovo and Macedonia is the same one used in Albania (i.e., it is based on Tosk).

Standard Albanian, while predominantly based on Tosk, does also have some Geg features. For example, the Standard Albanian 1st person singular present verb ending -j is a Geg feature; most Tosk dialects, on the other hand, have the ending -nj.

Language Contact

As with the other languages of the Balkans, the development of Albanian has been drastically affected by contact with speakers of other languages.

Lexical Borrowing

While reports of over 90 percent of Albanian's lexicon being composed of foreign words are definitely overstated, lexical borrowing has had an enormous effect on Albanian. There are several strata of lexical borrowings.

  • Early Greek influence: Limited to a small group of borrowings, e.g., Ancient Greek makhana > mokërë 'millstone', lakhana > lakër 'cabbage'.
  • Latin influence: The influence of Latin on the Albanian lexicon is vast, e.g., Latin lex > ligj 'law', amicus > mik 'friend', aurum > ar 'gold'. Albanian also shows a number of calques from Latin, e.g., decem-brius > dhjet-or 'December', manu-scriptus > dorë-shkrim 'manuscript'.
  • South Slavic influence: There are also a substantial number of words borrowed from South Slavic, e.g., Slavic nevolja > Alb. nevojë 'need'; gotov > Alb. gati 'ready'.
  • Modern Greek influence: While the Ancient Greek influence on Albanian is minimal, the influence from Modern Greek has been much larger, e.g., Greek kyverno > qeveris 'to govern', krevati > krevet 'bed', staphida > stafidhe 'raisin', as well as the pan-Balkan 'unceremonious mode of address' bre, more (along with several alternate forms, originally from Greek more).
  • Turkish influence: As Albania was under Ottoman rule for over 400 years, there is a strong Turkish element in the Albanian lexicon, e.g, Turkish haydi > hajde 'c'mon!; let's go!', pencere > penxhere 'window'; along with a wide range of culinary vocabulary (e.g., patëllxhan 'eggplant'; çorbë 'soup'; byrek 'delicious pastry with a variety of fillings').
  • Italian and English influence: Over the past century, the two major influences on the Albanian lexicon have been Italian and English, e.g., Italian bagno > banjë 'bathroom', tavolino > Alb. tavolinë 'table'; English jogging > Alb xhoging, to charge > Alb. çarxhoj.
The Balkan Sprachbund

As part of Balkan Sprachbund, Albanian shares a number of features with the other languages of the Balkans (e.g., Greek, Bulgarian, Macedonian, Romanian, Turkish, Romani, etc). The following are some of Albanian's more notable Balkan features:

  • Albanian has a postposed definite article, e.g., qen 'dog', qen-i 'the dog'. This is also seen in Balkan Romance and Balkan Slavic as well (e.g., Mac. kuche 'dog', kuche-to 'the dog'). While many of the features of the Balkan Sprachbund are considered to have ultimately originated in Greek, it has been proposed that Albanian is the source of this particular feature (though it is difficult to tell, as the earliest attestations of Albanian only date back 500 years).
  • While it does have a more recent formation that fulfills some of the roles of the infinitive in other languages, Tosk (like Greek, Macedonian, etc.) has lost the infinitive from earlier stages of the language. It is maintained in Geg (see Lesson 4 for a discussion of the Geg infinitive).
  • The Tosk future tense is an analytic formation composed of an invariant particle from the verb for 'want' followed by a present subjunctive form of the verb (e.g., do të pi 'I will drink', where do is from the verb dua 'want'). Most of the other Balkan languages have the same pattern (e.g., Grk. tha pino, Mac. k'e pijam, where tha and k'e are invariant particles from the Greek and Macedonian verbs meaning 'want', respectively).
  • Albanian has the admirative mood, which is used, among other things, to express shock or surprise (see Lesson 5). This is also seen in Turkish, Bulgarian and Macedonian.
The Albanian Alphabet & Pronunciation
The Albanian Alphabet

The earliest texts were written in various forms of the Latin alphabet, with additional characters borrowed from the Greek alphabet (as well as some additional characters of other origins). Up until the late 19th century, the script used to write Albanian appears to have been dependent on the religion of the scribe: Latin for Catholics, Greek for Orthodox Christians, and Perso-Arabic script for Muslims. In the late 19th century there were various attempts to create a standardized alphabet for Albanian; in 1908, the modern Albanian alphabet was codified at the Congress of Manastir.

The modern Albanian alphabet consists of 36 letters, several of which are digraphs.

    A,a   B,b   C,c   Ç,ç   D,d   Dh,dh   E,e   Ë,ë   F,f   G,g   Gj,gj   H,h
    I,i   J,j   K,k   L,l   Ll,ll   M,m   N,n   Nj,nj   O,o   P,p   Q,q   R,r
    Rr,rr   S,s   Sh,sh   T,t   Th,th   U,u   V,v   X,x   Xh,xh   Y,y   Z,z   Zh,zh

As briefly discussed above, Geg has nasalized vowels. The normal convention is to write these vowels with a circumflex accent. All other issues with the alphabet are discussed in the relevant lessons.

Vowel Pronunciation

Standard Albanian, as well as most Tosk dialects, has a seven-vowel system:

    i   similar to the vowel in Eng. meat
    e   similar to the vowel in Eng. met
    a   similar to the vowel in Eng. hot
    o   similar to the vowel in Eng. boat, but not diphthongal. More akin to the vowel in Spanish no.
    u   similar to the vowel in Eng. boot
    y   a high, front, rounded vowel; absent in English; similar to the vowel in French tu
      similar to the final vowel in Eng. sofa

In Standard Albanian (as well as in most Geg dialects), the vowel is typically not pronounced in final position (e.g., nëntë 'nine' is pronounced nënt), except for in monosyllabic words (e.g., një 'one', që 'that', etc). This sound is also commonly elided in other unstressed syllables. In some (mainly Tosk) dialects, this vowel is fully pronounced.

While Standard Albanian has a relatively simple seven-vowel system, most Geg varieties have a much more complex set of vowels. Any of the vowels above, with the exception of , can be nasalized. In addition, Geg has distinctive vowel length, so any of the vowels (except, again ) can be long or short. Camaj (1984) also claims that some Geg varieties have a distinction between short nasal vowels and long nasal vowels.

Consonant Pronunciation

As for consonants, though most of the letter-sound correspondences will be familiar, there are some exceptions:

        description   sounds like...
    c   voiceless dental affricate   ts in English cats, z in Italian zio, c in Russian cvet
      voiceless postalveolar affricate   ch in English choose, c in Italian cento
    dh   voiced dental fricative   th in English the
    gj   voiced palatal stop   similar to g in English gear
    ll   voiced velarized lateral   similar to ll in English ball; in Albanian, unlike in English, this sound can occur in any position in the word.
    nj   palatal nasal   gn in French agneau, similar to ni in Eng. onion
    q   voiceless palatal stop   similar to k in Eng. key
    rr   alveolar trill   rr in Spanish sierra
    th   voiceless dental fricative   th in English thing
    x   voiced dental affricate   ds in English needs, z in Italian zero
    xh   voiced postalveolar affricate   j in English judge, g in Italian giro
    zh   voiced postalveolar fricative   s in English pleasure, j in French jour