GATE
From TechWiki
Contents |
Installation
For installation instructions and basic tagging concepts, see the Installing GATE.
GATE Intro
- GATE: http://gate.ac.uk
Guides / Tutorials
- User Guide: http://gate.ac.uk/sale/tao/split.html
- Anton Andreev's blog: http://debian.fmi.uni-sofia.bg/~toncho/myblog/plugin/tag/gate
- The Semantic Annotation Workflow - KIM part 10
- KIM Multi-threaded Clustered Client Application - KIM part 9
- Gazetteers - KIM/GATE part 7
- Strict Rules vs Machine Learning - KIM part 6
- Tips and Tricks - KIM part 5
- Using a Gate application - KIM part 4
- Gate tutorial - KIM part 3
- Using KIM from .NET - KIM part 2
- Getting Started - KIM part 1
- Installation - KIM part 0
- Graham Wilcox book: http://books.google.com/books?id=TDQJb1UgVywC&pg=PA95&lpg=PA95&dq=gate+wrapper+java&source=bl&ots=bAFabZXX_E&sig=4tqEGrsdsvmbOM2UuiK9VWtmZpM&hl=en&ei=dp6iS57pLZOGNpGn8NEI&sa=X&oi=book_result&ct=result&resnum=89&ved=0CKcDEOgBMFg#v=onepage&q=gate%20wrapper%20java&f=false
Plug-ins
- Onto Root Gazeteer from GATE: http://wyner.info/LanguageLogicLawSoftware/index.php/2009/11/24/notes-on-onto-root-gazetteer/
- Text2Onto: http://www.neon-toolkit.org/wiki/1.x/Text2Onto
- http://code.google.com/p/neon-plugins/
- A Word Sense Disambiguation (WSD) toolkit using GATE and WEKA: http://sourceforge.net/projects/wsdgate/
- An Ngram Statistics Package (NSP) wrapper developed as a GATE Processing Resource: http://sourceforge.net/projects/nspgate/
- For TF/IDF, see Key Term Extractor PR
- TF/IDF and flat extractor plugins: http://lincs.etsmtl.ca/res_logiciels.html
Web Services
- GATE Web services: http://www.neon-toolkit.org/wiki/Gate_Webservice
- ANNIE demo: http://services.gate.ac.uk/annie/
- http://gate.ac.uk/projects/neon/webservices-plugin.html
- GATEService: http://gate.ac.uk/projects/neon/gas.html
- GATE Teamware: http://gate.ac.uk/teamware/teamware-detail.html
Projects Using GATE
- SEMPRE uses GATE as framework for processing the corpora. Instead of using the GUI of GATE or pure java code, we employ jython scripts for automating the whole process of pre-processing corpora (e.g. splitting of corpora see above), composing and running gate pipelines including the GATE learning plugin for producing feature files, converting to ARFF files, and running WEKA for learning and evaluating classifiers. Jython is a variant of the programming language python allowing for direct use of Java classes. It allows to directly script the use of GATE resources and operate directly on gate documents. See http://www.ofai.at/~bernhard.jung/projects/sempre/tr-sentiment-classification.pdf
- KP-Lab - Text Mining Services : Classification http://cit.fei.tuke.sk:8080/TMSClassify/index.html
- Semantic Assistant (based on GATE): http://www.semanticsoftware.info/semantic-assistants-architecture
- http://sourceforge.net/projects/scan-ca-manager
- http://www.ir-facility.org/research/projects/data-representation/gate-teamware (MatrixWare creation)
- MUSING: http://143.167.12.20/projects/musing/
Digital Pebble
- Digital Pebble: http://www.digitalpebble.com/solutions.html
- DigitalPebble has a TextClassification API for GATE
- http://code.google.com/p/behemoth-pebble/
- http://code.google.com/p/behemoth-pebble/source/browse/#svn/trunk/src/main/java/com/digitalpebble/behemoth/io/warc%3Fstate%3Dclosed
- http://www.digitalpebble.com/resources.html
- http://www.slideshare.net/steve_l/digital-pebble-behemoth
- Mahout's goal is to build scalable machine learning libraries: http://lucene.apache.org/mahout/
- Avro: http://www.slideshare.net/cloudera/apachecon09-doug-cutting-on-avro
Open Sahara
- http://opensahara.com
- Open Sahara is a framework and infrastructure to grab information from the web, to classify it's content, to search it semantically and to distribute the results. It's components are driven by the best technology available and completely open source. The open data structure enable users to modify Open Sahara to their own needs and standards.
- Open Sahara delivers several interfaces for application builders to use the annotated content or to add new content streams or annotation sets to the backbone. As a result of the unique approach of Open Sahara, all relevant information is linked on the fly and presented as a fully standardized information stream. Ready to use as a feed for totally new information products.
- Open Sahara started to harvest all relevant content about Amsterdam, the capital of the Netherlands. All content available from the city council, citizen service desk, public transportation, the police, news and user-generated content has been gathered, indexed, annotated and related with Open Street View, DBPedia and other relevant source in the linked open data cloud.
- In March we will present the first results of our work. The first release of our 'fun app' is scheduled for the first week of April.
- http://opensahara.com/node/31
-
American National Corpus
- http://www.americannationalcorpus.org/
- Xoro: http://www.americannationalcorpus.org/xoro.html
- GATE tools: http://americannationalcorpus.org/tools/index.html#gate-tools
Java Access
- jython
- palava: http://palava.cosmocode.de/wiki/overview
- PHP/Java Bridge: http://php-java-bridge.sourceforge.net/pjb/how_it_works.php
- http://javaboutique.internet.com/tutorials/thebridge/
- http://www.flyninja.net/?p=69
- http://webdesignersgoa.blogspot.com/2007/08/phpjava-bridge.html
- http://www.idimmu.net/2008/01/13/PHP-Java-Bridge-in-Ubuntu-Gutsy-with-Lucene
- http://blogs.vinuthomas.com/2007/11/22/installing-the-php-java-bridge-in-ubuntu-gutsy-gibbon/
- http://www.devshed.com/c/a/PHP/Using-PHP-with-Java/ (2002; likely severely out of date)
- http://www.projectzero.org/sMash/1.1.x/docs/zero.devguide.doc/zero.php/ZeroAdvancedPHPJavaBridge.html
- http://www.theserverside.com/news/1363642/Intergrating-BIRT-with-PHP (Mar 2009; seems pretty useful)
- http://www.javafaq.nu/java-article990.html
Other Related
- Tika: Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
* HyperText Markup Language * XML and derived formats * Microsoft Office document formats * OpenDocument Format * Portable Document Format * Electronic Publication Format * Rich Text Format * Compression and packaging formats * Text formats * Audio formats * Image formats * Video formats * Java class files and archives * The mbox format