Ad-hoc Authorship Attribution Competition

Ad-hoc Authorship Attribution Competition

SAMPLE MATERIALS

    Thank you very much for your interest in the Ad-hoc Authorship Attribution Competition. This page presents some sample files (training and testing) to help establish acceptable and compatible formats for the materials to be used in this competition. Although these sample files will NOT form part of the competition itself, I hope to mimic the format as closely as possible. Thus, if your program(s) can handle the files presented here, there should be no unexpected surprises at the competition itself.

    Attached here are two sample problem sets. For both sets, you are presented with a set of training data (with authorship information obscured), and then a set of testing data. Each individual writing sample is presented in its own file. For the two problem sets presented here, all files are flat (7-bit) ASCII, without extraneous material or markup. To the extent possible (i.e. things may change for any samples written in Japanese), this will continue to be the case for other samples.

    Your task as competitors is to determine, for each file of testing data, to which of the training author(s) it should be attributed. Each test document has (to the best of my knowledge) a single author, but this author is not necessarily among the set of training authors (i.e. "none of the above" is a legitimate answer, but "all of the above" is not).

    Each problem set will be scored individually as a percentage of test documents correctly attributed. Failure to produce a unique or intelligible answer will (of course) be scored as an incorrect attribution for that document. Overall contest score will be determined as an average percentage of correct identification, with each problem set weighted equally. Of course, from a scientific perspective, overall success is less interesting than a detailed analysis of the sort of documents that each method succeeds or fails upon. Since there's no money (and little honor) at stake in this friendly competition, I hope that this simple method of scoring meets with everyone's approval.

    Example problem set 1 :

Problem 1, Training Sample 1 (Author A)
Problem 1, Training Sample 2 (Author B)
Problem 1, Training Sample 3 (Author C)
Problem 1, Training Sample 4 (Author D)
Problem 1, Testing Sample 1
Problem 1, Testing Sample 2
Problem 1, Testing Sample 3
Problem 1, Testing Sample 4
Problem 1, Testing Sample 5
Problem 1, Testing Sample 6
Problem 1, Testing Sample 7

    The manifest, including correct answers for testing samples 1-7, is available. A plain text version of the manifest is also available.

    Example problem set 2 :

Problem 2, Training Sample 1 (Author A)
Problem 2, Training Sample 2 (Author B)
Problem 2, Testing Sample 1
Problem 2, Testing Sample 2
Problem 2, Testing Sample 3
Problem 2, Testing Sample 4
Problem 2, Testing Sample 5
Problem 2, Testing Sample 6
Problem 2, Testing Sample 7

    The manifest, including correct answers for testing samples 1-7, is available. A plain text version of the manifest is also available.

Preliminary registration is now available on-line. If you are interested in participating, please fill out this simple form. If you have any difficulties, please contact me directly.     For further details, please contact Patrick Juola. More information can also be found at the competition home page. including status reports and progressive developments.

    Please pass this invitation on to any other people, groups, or mailing lists who might find it of interest.

Patrick (Juola)
Eml: juola@mathcs.duq.edu
http://www.mathcs.duq.edu/~juola