Ad-hoc Authorship Attribution Competition



    Thank you very much for your interest in the Ad-hoc Authorship Attribution Competition. This page presents the materials to be used for this competition as a set of problems. If you have any difficulty either in obtaining, using, or understanding the materials, please contact me and we will try to resolve things.  

    To recap, your task as competitors is to determine, for each file of testing data, to which of the training author(s) it should be attributed. Each test document has (to the best of my knowledge) a single author, but this author is not necessarily among the set of training authors (i.e. "none of the above" is a legitimate answer, but "all of the above" is not). Each problem set will be scored individually as a percentage of test documents correctly attributed. Failure to produce a unique or intelligible answer will (of course) be scored as an incorrect attribution for that document. Overall contest score will be determined as an average percentage of correct identification, with each problem set weighted equally. Of course, from a scientific perspective, overall success is less interesting than a detailed analysis of the sort of documents that each method succeeds or fails upon.  

    There are twelve problems in total

The entire problem set is also available as a single file (tar, gzip). Upon request, I will also send copies out on CD to individual researchers. (The new problem, problem M, is also available as a gzipped tarball for those who already downloaded the original.)  

Formal registration is now available on-line. If you are interested in participating, please fill out this simple form. If you have any difficulties, please contact me directly.  

    Please pass this invitation on to any other people, groups, or mailing lists who might find it of interest.

Patrick (Juola)