\documentstyle[nemlap3]{article} \title{Cross-Entropy and Linguistic Typology} \author{Patrick Juola \\ Department of Experimental Psychology \\ University of Oxford \\ Oxford, UK OX1 3UD \\ {\tt patrick.juola@psy.ox.ac.uk}} \begin{document} \maketitle \begin{abstract} The idea of ``familial relationships'' among languages is well-established and accepted, although some controversies persist in a few specific instances. By painstakingly recording and identifying regularities and similarities and comparing these to the historical record, linguists have been able to produce a general ``family tree'' incorporating most natural languages. Recent work by Wyner (1997) and Juola (1997) suggests that Kullback-Leibler divergence (or cross-entropy) can be meaningfully measured from extremely small samples, in some cases as small as only 20 or so words. Using these techniques, we define and measure a distance function between translations of the Bodleian Declaration, a sample set covering most of the (accepted) Indo-European family, and reconstruct a relationship tree by hierarchical cluster analysis. The resulting tree shows remarkable similarity to the accepted Indo-European family; this we read as evidence both for the immense power of this measurement technique and for the validity of this kind of mechanical similarity judgement in the identification of typological relationships. \end{abstract} \end{document}