Albanian Linguistics

Computational Methods in humanities are attracting a growing interest in the computer processing of natural languages and different types of texts. The use of computers in humanities involves both its academic nature as well as its numerous applications in industry. Theoretical research in the field of computational linguistics, as well as, the practical results achieved by today's computers should also facilitate communication in different languages. Less studied languages should be prepared for the new era of information systems and electronic communication.

Training and scientific research in the field of Computational Linguistics (CL) and Natural Language Processing (NLP) has already been introduced into the educational programs of universities around the world. In this context there is an obvious need for establishing programs in natural language processing, spoken language and electronic documents management focused on Albanian.

There has been early efforts in this direction at the University of Tirana, the Academy of Sciences, the University of Calabria, Scuola Normale Superiore of Pisa, etc. LISSUS llc is working to capitalize in this research and advance its state via software development, corpora collection and theoretical research.

For language documentation and field research, an always useful tool remains the Swadesh list adapted here for Albanian (download) with the corresponding English and Italian translations.

An initial step for the computational study of a language is the creation of linguistic resources such as corpora, dictionaries, rules bases, etc. In 1990, was published at the Scuola Normale Superiore of Pisa the first Inverse Dictionary of Albania (download). An improved and augmented version of this dictionary is now available for downloading.

A large corpus of Albanian texts is used for text mining experiments such as collocation analysis and other linguistic investigations.