Professor Rob Gaizauskas
BA, MA, DPhil
School of Computer Science
Professor of Natural Language Processing
Co-Director of CDT in Speech and Language Technologies
Member of the Natural Language Processing (NLP) research group

Full contact details
School of Computer Science
Regent Court (DCS)
211 Portobello
S1 4DP
- Profile
Rob Gaizauskas studied Mathematics and Physics at the University of Toronto from 1972-74, then moved to Carleton University in Ottawa where he received an Honours BA in Philosophy in 1975 and an MA in Philosophy (with distinction) in 1978. Following two years teaching Logic as a temporary lecturer at Carleton he obtained a Diploma in Information Processing from Algonquin College, Ottawa, in 1981.
He then worked for several software companies in Ottawa, including Domus Software, Nabu Technologies, and Fulcrum Technologies (now part of Hummingbird), before moving to the U.K. in 1985, thanks to a Canadian SSHRC Doctoral Fellowship and British Council ORS award, to study for a DPhil in the School of Cognitive and Computing Sciences (now the Department of Informatics) at the University of Sussex.
He received his MA in Cognitive Studies in 1986 and was awarded his DPhil in 1992. During 1989 he lectured in Artificial Intelligence at Sussex. From 1990 to 1993 he worked as a Research Associate at the University of Sussex.
In 1993 he became a Lecturer in the Natural Language Processing Group of the Department of Computer Science, Sheffield University, became a Reader in Computer Science in the same group in 1999, and a Professor in 2002.
- Research interests
Rob's research interests are in natural language processing, specifically in information extraction from natural language texts, software architectures for natural language processing and evaluation of language processing systems.
- Publications
- The Language Of Time. Oxford University PressOxford.
Journal articles
- Obituary: Yorick Wilks. Computational Linguistics, 49(3), 767-772.
- Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12), 3045-3058. View this article in WRRO
- Extracting bilingual terms from the Web. Terminology, 21(2), 205-236. View this article in WRRO
- Exploring relation types for literature-based discovery. Journal of the American Medical Informatics Association, 22(5), 987-992. View this article in WRRO
- Generating descriptive multi‐document summaries of geo‐located entities using entity type models. Journal of the Association for Information Science and Technology, 66(4), 721-738.
- Named entity disambiguation using HMMs. Proceedings - 2013 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IATW 2013, 3, 159-162.
- Information retrieval for temporal bounding. ACM International Conference Proceeding Series, 129-130.
- Investigating summarization techniques for geo-tagged image indexing. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7224 LNCS, 472-475.
- STARLET: Multi-document summarization of service and product reviews with balanced rating distributions. Proceedings - IEEE International Conference on Data Mining, ICDM, 67-74.
- Understanding the types of information humans associate with geographic objects. International Conference on Information and Knowledge Management, Proceedings, 1929-1932.
- A collection of comparable corpora for under-resourced languages. Frontiers in Artificial Intelligence and Applications, 219, 161-168.
- Disambiguation of biomedical text using diverse sources of information.. BMC Bioinformatics, 9 Suppl 11, S7. View this article in WRRO
- Mining clinical relationships from patient narratives.. BMC Bioinformatics, 9 Suppl 11, S3. View this article in WRRO
- A web service for biomedical term look-up.. Comp Funct Genomics, 6(1-2), 86-93. View this article in WRRO
- Integrating text mining into distributed bioinformatics workflows: A Web services implementation. Proceedings - 2004 IEEE International Conference on Services Computing, SCC 2004, 145-152.
- View this article in WRRO
- Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures.. Pac Symp Biocomput, 505-516.
- Bioinformatics applications of information extraction from scientific journal articles. Journal of Information Science, 26(2), 75-85.
- View this article in WRRO
- Using a semantic network for information extraction. Natural Language Engineering, 3(2), 147-169.
- POETIC: A system for gathering and disseminating traffic information. Natural Language Engineering, 1(4), 363-388.
- Collecting Comparable Corpora In Skadiņa I, Gaizauskas R, Babych B, Ljubešić N, Tufiş D & Vasiļjevs A (Ed.), Using Comparable Corpora for Under-Resourced Areas of Machine Translation (pp. 55-87). Springer
- Introduction In Skadiņa I, Gaizauskas R, Babych B, Ljubešić N, Tufiş D & Vasiļjevs A (Ed.), Using Comparable Corpora for Under-Resourced Areas of Machine Translation (pp. 1-11). Springer
- Cross-Language Comparability and Its Applications for MT In Skadiņa I, Gaizauskas R, Babych B, Ljubešić N, Tufiş D & Vasiļjevs A (Ed.), Using Comparable Corpora for Under-Resourced Areas of Machine Translation (pp. 13-53). Springer
- Appendices, Using Comparable Corpora for Under-Resourced Areas of Machine Translation (pp. 291-323). Springer International Publishing
- Mapping and Aligning Units from Comparable Corpora, Using Comparable Corpora for Under-Resourced Areas of Machine Translation (pp. 141-188). Springer International Publishing
- Building and Using Comparable Corpora Springer Berlin Heidelberg
- Summarizing Opinion-Related Information for Mobile Devices, Mobile Speech and Advanced Natural Language Solutions (pp. 289-317). Springer New York
- Mobile Speech and Advanced Natural Language Solutions Springer New York
- Methods for Collection and Evaluation of Comparable Documents, Building and Using Comparable Corpora (pp. 93-112). Springer Berlin Heidelberg
- Multi-Document Summarization Techniques for Generating Image Descriptions: A Comparative Analysis In Poibeau T, Saggion H, Piskorski J & Yangarber R (Ed.), Theory and Applications of Natural Language Processing (pp. 299-320). Springer
- 59. Corpora and text re-use, Corpus Linguistics Mouton de Gruyter
- The Specification Language TimeML, The Language Of Time (pp. 545-558). Oxford University PressOxford
- Quantitative evaluation of coreference algorithms in an information extraction system, Corpus-based and Computational Approaches to Discourse Anaphora (pp. 145-145). John Benjamins Publishing Company
- LaSIE Jumps the GATE, Text, Speech and Language Technology (pp. 197-214). Springer Netherlands
- Information Access and Natural Language Processing: A Stimulating Dialogue, Text, Speech and Language Technology (pp. 85-105). Springer Netherlands
- Mice from a Mountain: Reflections on Current Issues in Evaluation of Written Language Technology, Charting a New Course: Natural Language Processing and Information Retrieval (pp. 195-238). Springer-Verlag
Conference proceedings papers
- Parsing Graphical Summaries from Argumentative Dialogues. Frontiers in Artificial Intelligence and Applications, Vol. 388 (pp 37-48)
- View this article in WRRO
- View this article in WRRO
- A Pilot Study on the Collection and Computational Analysis of Linguistic Differences Amongst Men and Women in a Kuwaiti Arabic WhatsApp Dataset. Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP) (pp 372-380), December 2022 - December 2022.
- The SENSEI Overview of Newspaper Readers’ Comments (pp 758-761) View this article in WRRO
- Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles (pp 633-639) View this article in WRRO
- Large Scale Semi-supervised Object Detection using Visual and Semantic Knowledge Transfer. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp 2119-2128). Las Vegas, Nevada, 26 June 2016 - 1 July 2016. View this article in WRRO
- Automatic Label Generation for News Comment Clusters. Proceedings of the 9th International Natural Language Generation Conference (pp 61-69), 5 September 2016 - 8 September 2016. View this article in WRRO
- View this article in WRRO
- View this article in WRRO
- The SENSEI Annotated Corpus: Human Summaries of Reader Comment Conversations in On-line News. Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp 42-52), 13 September 2016 - 15 September 2016. View this article in WRRO
- General Overview of ImageCLEF at the CLEF 2016 Labs. Experimental IR Meets Multilinguality, Multimodality, and Interaction, Vol. 9822. Évora, Portugal View this article in WRRO
- View this article in WRRO
- View this article in WRRO
- Automatic label generation for news comment clusters. Proceedings of the 9th International Natural Language Generation conference, 2016 - 2016.
- The SENSEI Project: Making Sense of Human Conversations (pp 10-33)
- A Graph-Based Approach to Topic Clustering for Online Comments to News. Advances in Information Retrieval (pp 15-29), 20 March 2016 - 23 March 2016. View this article in WRRO
- Summarizing Multi-Party Argumentative Conversations in Reader Comment on News. Proceedings of the Third Workshop on Argument Mining (ArgMining2016) (pp 12-20), August 2016 - August 2016.
- Combining Geometric, Textual and Visual Features for Predicting Prepositions in Image Descriptions. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp 214-220). Lisbon, Portugal, 17 September 2015 - 21 September 2015. View this article in WRRO
- Defining Visually Descriptive Language. Proceedings of the Fourth Workshop on Vision and Language, September 2015 - September 2015.
- Generating Image Descriptions with Gold Standard Visual Inputs: Motivation, Evaluation and Baselines. Proceedings of the 15th European Workshop on Natural Language Generation (ENLG), September 2015 - September 2015. View this article in WRRO
- Comment-to-Article Linking in the Online News Domain. Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, September 2015 - September 2015. View this article in WRRO
- View this article in WRRO
- View this article in WRRO
- A Hybrid Approach to Multi-document Summarization of Opinions in Reviews. Proceedings of the 8th International Natural Language Generation Conference (INLG), June 2014 - June 2014.
- Assigning Terms to Domains by Document Classification. Proceedings of the 4th International Workshop on Computational Terminology (Computerm), August 2014 - August 2014.
- Graph Ranking for Collective Named Entity Disambiguation. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), June 2014 - June 2014.
- View this article in WRRO
- Disambiguation of biomedical abbreviations. Proceedings of the Workshop on BioNLP - BioNLP '09, 4 June 2009 - 5 June 2009.
- View this article in WRRO
- Knowledge sources for word sense disambiguation of biomedical text. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing - BioNLP '08, 19 June 2008 - 19 June 2008.
- Evaluation of automatically reformulated questions in question series. Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering - IRQA '08, 24 August 2008 - 24 August 2008.
- Generating image captions using topic focused multi-document summarization. Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization - MMIES '08, 23 August 2008 - 23 August 2008.
- Evaluating automatically generated user-focused multi-document summaries for geo-referenced images. Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization - MMIES '08, 23 August 2008 - 23 August 2008.
- Extracting clinical relationships from patient narratives. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing - BioNLP '08, 19 June 2008 - 19 June 2008.
- SemEval-2007 task 15. Proceedings of the 4th International Workshop on Semantic Evaluations - SemEval '07, 23 June 2007 - 24 June 2007.
- USFD. Proceedings of the 4th International Workshop on Semantic Evaluations - SemEval '07, 23 June 2007 - 24 June 2007.
- Task-Oriented Extraction of Temporal Information: The Case of Clinical Narratives.. TIME (pp 188-195)
- SUPPLE. Proceedings of the Ninth International Workshop on Parsing Technology - Parsing '05, 9 October 2005 - 10 October 2005.
- Aligning words in English-Hindi parallel corpora. Proceedings of the ACL Workshop on Building and Using Parallel Texts - ParaText '05, 29 June 2005 - 30 June 2005.
- A hybrid approach to align sentences and words in English-Hindi parallel corpora. Proceedings of the ACL Workshop on Building and Using Parallel Texts - ParaText '05, 29 June 2005 - 30 June 2005.
- Text Mining into Distributed Bioinformatics Workflows: A Web Services Implementation.. IEEE SCC (pp 145-152)
- On the Use of Agents in BioInformatics Grid.. CCGRID (pp 653-660)
- View this article in WRRO
- View this article in WRRO
- A pilot study on annotating temporal relations in text. Proceedings of the workshop on Temporal and spatial information processing -, 7 July 2001 - 7 July 2001.
- Intelligent access to text. Proceedings of the first international conference on Human language technology research - HLT '01, 18 March 2001 - 21 March 2001.
- Using HLT for acquiring, retrieving and publishing knowledge in AKT. Proceedings of the workshop on Human Language Technology and Knowledge Management -, 6 July 2001 - 7 July 2001.
- CM-Builder: An Automated NL-Based CASE Tool.. ASE (pp 45-54)
- Using coreference chains for text summarization. Proceedings of the Workshop on Coreference and its Applications - CorefApp '99, 22 June 1999 - 22 June 1999.
- View this article in WRRO
- Event coreference for information extraction. Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts - ANARESOLUTION '97, 11 July 1997 - 11 July 1997.
- Visual Execution and Data Visualization in Natural Language Processing.. VL (pp 342-347)
- NEC corporation and University of Sheffield. Proceedings of a workshop on held at Vienna, Virginia May 6-8, 1996 -, 6 May 1996 - 8 May 1996.
- TIPSTER-compatible projects at Sheffield. Proceedings of a workshop on held at Vienna, Virginia May 6-8, 1996 -, 6 May 1996 - 8 May 1996.
- View this article in WRRO
Working papers
- The Language Of Time. Oxford University PressOxford.
- Grants
Current grants
- UKRI Centre for Doctoral Training in Speech and Language Technologies and their Applications, EPSRC, 04/2019 - 09/2027, £5,508,850, as Co-PI
Previous grants
- A Multimodal Speech and Graphical Interface for Hands-free Data Capture and Querying in MRO: Connecting Workers to Enterprise Information Systems, EPSRC & Research England, 07/2019 - 03/2021, £85,009, as PI
- Investigating Spoken Dialogue to Support Manufacturing Processes, ESPRC, 03/2017 - 06/2018, £63,502, as PI
- SENSEI: Building the business case, The University of Sheffield,12/2016 - 03/2017, £8,541, as PI
Healtex: UK Healthcare Text Analytics Research Network, EPSRC, 05/2016 -02/2020, £340,240, as Co-PI
- SENSEI: Making Sense of Human - Human Conversation, EC FP7, 11/2013 - 10/2016, £459,034, as PI
- VisualSense: Tagging visual data with semantic descriptions, EPSRC, 01/2013 - 06/2016, £310,677, as PI
- Language Processing for Literature Based Discovery in Medicine, EPSRC, 06/2012 - 05/2015, £293,127, as Co-PI
- TAAS: Terminology As A Service, EC FP7, 06/2012 - 05/2014, £268,032, as PI
- ACCURAT: Analysis and evaluation of Comparable Corpora for Under Resourced Areas of machine Translation, EC FP7, 01/2010 - 06/2012, £353,265, as PI
- Lexical Disambiguation for the Biomedical Domain, EPSRC, 02/2007 - 01/2010, £239,920, as Co-PI
- Cronopath: Timeline and named entity extraction for hyperlink corpora, EPSRC, 07/2005 - 12/2007, £294,632, as Co-PI
- Real-time Text Mining for the Biomedical Literature: a collaboration between DiscoveryNet & myGrid, EPSRC, 03/2005 - 02/2006, £56,588, as PI
- CLEF-Services, MRC, 01/2005 - 06/2008, £430,221, as PI
- VIKEF: Virtual Information and Knowledge Environment Framework, EC FP6, 04/2004 - 03/2007, £200,020, as PI
- Electronic cub-reporter: automatically gathering and collating background information from digital text, EPSRC, 01/2003 - 06/2006, £307,973, as PI
- MYGRID: Directly Supporting the E-Scientist, MRC, 10/2001 - 06/2005, £320,206, as PI
- CLEF: Clinical E-Science Framework, MRC, 10/2002 - 01/2006, £280,725, as PI
- CLARITY: Cross language information retrieval and organisation of text and audio documents, EC FP6, 02/2001 - 01/2004, £469,576, as PI
- Emille: Enabling minority language engineering, EPSRC, 06/2000 - 09/2003, £35,859, as PI
- Professional activities and memberships
Head of Natural Language Processing (NLP) research group