Building Coreference chains since 2007
BART, the Beautiful Anaphora Resolution Toolkit, is a product of the project Exploiting Lexical and Encyclopedic Resources For Entity Disambiguation at the Johns Hopkins Summer Workshop 2007.
BART performs automatic coreference resolution, including all necessary preprocessing steps.
BART incorporates a variety of machine learning approaches and can use several machine learning toolkits, including WEKA and an included MaxEnt implementation.
BART internally works with a standoff-based representation based on the format of MMAX2 (an annotation tool for coreference and other discourse annotation). The easiest way of getting text into BART and coreference chains out is the REST-based web service that is part of BART and allows you to easily import raw text, process it, and export the result as inline XML.
- grab the
BART-snapshot.tgz tarball from
and untar it somewhere.
This is pretty big, but it contains all the software you need (including Berkeley parser and Stanford NER), as well as a model trained on MUC6 (which works ok-ish without requiring you to set up external databases and stuff).
- change to that directory and do:
source setup.sh java -Xmx1024m elkfed.webdemo.BARTServer(the first command sets up the classpath, whereas the second starts BART's web service).
- point your browser at http://localhost:8125/index.jsp and then enter some text into the form and verify that it does something (clicking on the "coref" tab should run the coreference resolver and display markables that are part of a coreference chain in a greenish tinge).
- To use BART on larger quantities of text, you would want to use the REST
webservice, e.g. with libwww-perl's POST program:
cat text1.txt | POST http://localhost:8125/BARTDemo/ShowText/process/this should give you the text with POS tags and coreference chains in inline XML format. (If you also want the parses, you could add "parse" to the array wanted_levels in ShowText.exportCoref in the elkfed.webdemo package.
Coreference Systems based on Kernel Methods. Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008).
BART: A Modular Toolkit for Coreference Resolution. Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).
BART: A Modular Toolkit for Coreference Resolution. Companion Volume of the Proceedings of the 46th Annual Meeting of the Association for Compuatational Linguistics (ACL 2008).
ELERFED: Final Report from the JHU 2007 Summer Workshop. Technical Report.
The following people have been involved in the creation and evolution of BART:
- Massimo Poesio (fearless leader)
- Simone Ponzetto (main contributor)
- Yannick Versley (main contributor)
- Vladimir Eidelman (Undergraduate student at the JHU workshop)
- Alan Jern (Undergraduate student at the JHU workshop)
- Alessandro Moschitti (Researcher at the JHU workshop)
- Xiaofeng Yang (Researcher at the JHU workshop)
- Kepa Rodriguez (PhD student, CiMeC)
- Olga Uryupina (Postdoc, Wordpresshosts.org)
BART is open source. What license is it under?
The core of BART - i.e., everything that's in the src/ folder when you unpack it - is licensed under the Apache license (v2.0), except for a small number of classes (elkfed.coref.discourse_entities.DiscourseEntity, which is inherited from GuiTaR code, and the LBFGS code in riso.numerical, which comes from a project called RISO; both are GPL-licensed). The libraries that BART uses also fall under either Apache license (e.g., XMLBeans, different helper libraries), but some important components (all the parsers that can be used, the Stanford NER, WEKA) are also GPL-licensed.
As a result, you can (in my understanding):
* use BART internally with no restrictions - both the Apache license and the GPL allow this. ("Internally" excludes letting others use the code by means of a web service or a closed appliance).
* build derived works with BART and give these to others under a license that is compatible with both the GPL and the Apache license (which includes both the GPL and the Apache license).
* replace the parts that have a license that is too restrictive for your taste and/or get a commercial license from the respective owners (which may be a nontrivial undertaking, as all parsers and all machine learning libraries that BART currently interfaces to are GPL-licensed).