I am a research fellow at the University of Sheffield and a member of the Natural Language Processing Group of the Computer Science Department. I am working on large-scale multilingual entity recognition and disambiguation and I am part of the team that develops the GATE NLP framework and have also developed and contributed to a number of GATE plugins and other GATE related software. I am generally interested in machine learning, natural language processing and knowledge representation / ontologies.
Prior to joining the University of Sheffield, I was a researcher at the Austrian Research Institute for Artificial Intelligence (OFAI).For a list of publication see Google Scholar or my Sheffield University Page.
- gatelib-interaction A library that simplifies the interaction between GATE processing resources and external software. The interaction can be done either by starting a separate process and communicating through pipes with the process or by communicating with a separate server. So far this is mainly used for enabling the GATE machine learning plugin, gateplugin-LearningFramework to use Weka, Scikit-Learn, Keras and other external tools.
- gateplugin-AnnotationGraphs A GATE plugin that makes it easier to handle graphs of annotations i.e. annotations representing trees, coreference chains, candidate lists or anything where one annotation needs to refer to one or more other annotations in some way.
- gateplugin-CorpusStats A GATE plugin to create term frequency, document frequency and tf*idf stats for a corpus. This plugin can be run multi-threaded using GCP.
- gateplugin-Evaluation A GATE plugin which provides the ability to carry out evaluations from within a pipeline.
- gateplugin-Java A GATE plugin which makes it easy to write Java code that gets executed in a pipeline. The Java code gets compiled on the fly and there is no need to restart GATE or reload the pipeline when the Java program is modified.
- gateplugin-JdbcLookup A GATE plugin which makes it easy to add or update annotations based on looking up information in a JDBC table.
- gateplugin-Format_Misc A GATE plugin for loading and saving documents in a number of additional formats: GZIP compressed GATE XML, GATE XML Snappy compressed, Java Object serialized, Java Object serialized with Snappy compression, Java Object serialized with GZIP compression.
- gateplugin-LearningFramework A GATE plugin for using various machine learning algorithms from withing GATE. It supports classification, regression and tagging tasks and allows the use of algorithms from LibSVM, Mallet, Weka (as external program), Scikit-Learn (external, Python), CostCLA (external, Python), and Keras (external, Python)
- gateplugin-ModularPipelines A GATE plugin that brings two important properties to GATE pipelines: modularity and parametrizability. The plugin provides a new processing resource which makes it easy to include pipelines within pipelines while keeping each of the contained pipeline files separate. It also provides a new kind of controller which allows to override or set any runtime parameter or init parameter for any of the processing resources in the pipeline, or to set document features or enable or disable a PR within the pipline.
- gateplugin-Scala A GATE plugin which allows to write Scala code that gets executed in a pipeline from within GATE.
- gateplugin-StringAnnotation A GATE plugin which provides processing resources for very flexible matching of text using nestable Java regular expressions, and for very fast and compact use of gazetteer lists for matching either document text or text extracted from annotation features (similar to what the FlexibleGazetteer does).
- gateplugin-Tagger_CoreNLP A GATE plugin which can connect to the Stanford CoreNLP server to annotate documents.
- gateplugin-Tagger_GoogleNLP A GATE plugin which can connect to the Google NLP Service to annotate documents.
- gateplugin-TaggerSyntaxNet A GATE plugin which can connect to a local instance of SyntaxNet running under Tensorflow Serving to annotate documents.
- gateplugin-Tagger_TagMe A GATE plugin which can connect to the TagMe web API to annotate documents.
- gateplugin-VirtualCorpus A GATE plugin which provides new kinds of corpora which directly reflect the files within a directory on disk.
- keras-wrapper A lightweight Python wrapper for making Keras usable from the gateplugin-LearningFramework.
- simple-issues-tracker A very simple script for tracking issues. This is meant to be used from within a git repository and will simply manage issues by creating a new file for each issue in a subdirectory of the repository.
- sklearn-wrapper A lightweight Python wrapper for making Scikit-Learn usable from the gateplugin-LearningFramework.
- weka-wrapper A lightweight wrapper for making Weka usable from the gateplugin-LearningFramwork.
- miscfastcompact [under construction] A library for various fast and compact datastructures used in plugins and other software.
- license-headers A simple python script to add or replace license headers to all files in a directory tree of source files.
- gatetools-runpipeline A useful and flexible command line script to run a GATE pipeline on documents in a directory.
- gateplugin-dict-lemmatizer A GATE plugin which adds lemmata to tokens based on their universial dependencies POS-tags. This currently works for English, German, French, Italian, Dutch and Spanish, though for some languages only wiktionary-based lookups are used while others use a morphological transducer. This is based on the code by Ahmet Aker: http://staffwww.dcs.shef.ac.uk/people/A.Aker/activityNLPProjects.html
Projects where I have collaborated with others: