Dr Mark Stevenson
School of Computer Science
Senior Lecturer
Member of the Natural Language Processing research group
+44 114 222 1921
Full contact details
School of Computer Science
Regent Court (DCS)
211 Portobello
Sheffield
S1 4DP
- Profile
-
Mark Stevenson is a Senior Lecturer in Computer Science. He is a member of the Natural Language Processing group which he joined in 1995. His PhD, on Word Sense Disambiguation, was published as a monograph.
He has been Principal Investigator of projects funded by a range of sources including the EU, EPSRC and Google. He was an EPSRC Advanced Research Fellow (2006-2011) and co-ordinator of the EU-funded project PATHS.
He has also worked in a range of commercial and academic organisations including Reuters Ltd (where he was involved in the production and dissemination of the widely used Reuters Corpus), Adastral Park (British Telecom’s research lab) and the Center for the Study of Language and Information, Stanford University.
- Research interests
-
Mark Stevenson’s research focusses on Natural Language Processing and Information Retrieval. Topics he has worked on include word sense disambiguation, Information Extraction, plagiarism/reuse detection, lexicon adaptation, cross-lingual information retrieval and exploratory search.
His research includes applications of these technologies to a range of areas including biomedical journal articles (interpretation of documents, extraction of information from them and data mining information from corpora), cultural heritage (automatic organisation of corpora, exploratory search interfaces) and software testing (generation of realistic test suites).
- Publications
-
Books
- Words and Intelligence I: Selected Papers by Yorick Wilks. Springer.
- Words and Intelligence II: Essays in Honour of Yorick Wilks. Springer.
- Word Sense Disambiguation: The Case for Combinations of Knowledge Sources. Stanford, CA.: CSLI Publications.
Journal articles
- Understanding Linearity of Cross-Lingual Word Embedding Mappings. Transactions on Machine Learning Research.
- Investigating the Feasibility of Deep Learning Methods for Urdu Word Sense Disambiguation. ACM Transactions on Asian and Low-Resource Language Information Processing, 21(2), 1-16.
- A machine-learning assisted review of the use of habit formation in medication adherence interventions for long-term conditions. Health Psychology Review. View this article in WRRO
- Paraphrase type identification for plagiarism detection using contexts and word embeddings. International Journal of Educational Technology in Higher Education, 18. View this article in WRRO
- SciBabel: a system for crowd-sourced validation of automatic translations of scientific texts. Genomics and Informatics, 18(2).
- A Sense Annotated Corpus for All-Words Urdu Word Sense Disambiguation. Asian and Low-Resource Language Information Processing, 18(4). View this article in WRRO
- A Word Sense Disambiguation Corpus for Urdu. Language Resources and Evaluation. View this article in WRRO
- Co-occurrence graphs for word sense disambiguation in the biomedical domain. Artificial Intelligence in Medicine, 87, 9-19. View this article in WRRO
- Quantifying and filtering knowledge generated by literature based discovery. BMC Bioinformatics, (Suppl 7):249, 59-67. View this article in WRRO
- Evaluating topic representations for exploring document collections. Journal of the Association for Information Science and Technology, 68(1), 154-167. View this article in WRRO
- Why are these similar? Investigating item similarity types in a large digital library. Journal of the Association for Information Science and Technology, 67(7), 1624-1638. View this article in WRRO
- A corpus of potentially contradictory research claims from cardiovascular research abstracts. Journal of Biomedical Semantics, 7. View this article in WRRO
- An IR-Based Approach Utilizing Query Expansion for Plagiarism Detection in MEDLINE. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(4), 796-804. View this article in WRRO
- Exploring relation types for literature-based discovery. Journal of the American Medical Informatics Association, 22(5), 987-992. View this article in WRRO
- Automatic generation of valid and invalid test data for string validation routines using web searches and regular expressions. Science of Computer Programming, 97(4), 405-425.
- Cognitive styles within an exploratory search system for digital libraries. Journal of Documentation, 70, 970-996. View this article in WRRO
- Evaluating hierarchical organisation structures for exploring digital libraries. Information Retrieval, 17(4), 351-379. View this article in WRRO
- Determining the Difficulty of Word Sense Disambiguation. Journal of Biomedical Informatics. View this article in WRRO
- Comparing Medline citations using modified N-grams. Journal of the American Medical Informatics Association.
- Computing similarity between items in a digital library of cultural heritage. Journal of Computing and Cultural Heritage, 5(4).
- Towards semantic literature based discovery. AAAI Fall Symposium - Technical Report, FS-12-05, 86-87.
- Retrieving candidate plagiarised documents using query expansion. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7224 LNCS, 207-218.
- Exploiting domain information for Word Sense Disambiguation of medical documents.. J Am Med Inform Assoc, 19(2), 235-240.
- Developing a corpus of plagiarised short answers. LANG RESOUR EVAL, 45(1), 5-24. View this article in WRRO
- Extracting relationswithin and across sentences. International Conference Recent Advances in Natural Language Processing, RANLP, 25-32.
- Disambiguation of medline abstracts using topic models. International Conference on Information and Knowledge Management, Proceedings, 59-62.
- Resolving ambiguity in biomedical text to improve summarization. Information Processing and Management.
- Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation. LANG RESOUR EVAL, 44(4), 295-313. View this article in WRRO
- Graph-based word sense disambiguation of biomedical documents.. Bioinformatics, 26(22), 2889-2896.
- Disambiguation in the biomedical domain: the role of ambiguity type.. J Biomed Inform, 43(6), 972-981.
- Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus.. J Biomed Inform, 43(5), 762-773.
- The effect of ambiguity on the automated acquisition of WSD examples. NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference, 353-356.
- The Role of Natural Language Processing in Information Retrieval: Searching for Meaning and Structure, 215-231.
- Dependency pattern models for information extraction. Research on Language and Computation, 7(1), 13-39.
- Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation. Language Resources and Evaluation, 1-19.
- Disambiguation of biomedical text using diverse sources of information.. BMC Bioinformatics, 9 Suppl 11, S7. View this article in WRRO
- Fact distribution in information extraction. LANG RESOUR EVAL, 40(2), 183-201.
- Handbook for Language Engineers by Farghali, A. (Ed.), (2003) University of Chicago Press, 320pp. Journal of Natural Language Engineering, 11(1), 125-128.
- Introduction to the special issue on word sense disambiguation. COMPUT SPEECH LANG, 18(3), 201-207.
- Unsupervised induction of IE domain knowledge using an ontology. AAAI Workshop - Technical Report, WS-04-01, 80-85.
- Parallel Text Processing: Alignment and Use of Translation Corpora by Véronis, J. (Ed.), (2000), 393pp. Machine Translation Review, 12, 75-76.
- View this article in WRRO The interaction of knowledge sources in word sense disambiguation. COMPUT LINGUIST, 27(3), 321-349.
- The Grammar of Sense: Using part-of-speech tags as a first step in semantic disambiguation. Natural Language Engineering, 4(3), 135-144.
- The Grammar of Sense: Is word-sense tagging much more than part-of-speech tagging?. CoRR, cmp-lg/9607028.
- Computer-assisted screening in systematic evidence synthesis requires robust and well-evaluated stopping criteria. Systematic Reviews.
- Stopping Methods for Technology Assisted Reviews based on Point Processes. ACM Transactions on Information Systems.
- C3-IoC: A career guidance system for assessing student skills using machine learning and network visualisation. International Journal of Artificial Intelligence in Education.
- Refining Boolean Queries to Identify Relevant Studies for Systematic Review Updates. Journal of the American Medical Informatics Association. View this article in WRRO
- Cross-Lingual Word Embedding Refinement by $ell_{1}$ Norm Optimisation.
Chapters
- View this article in WRRO Supporting Exploration and Use of Digital Cultural Heritage Materials: the PATHS Perspective In Ruthven I & Chowdhury GG (Ed.), Cultural Heritage Information Access and Management (pp. 197-220). Facet Publishing
- Word Sense Disambiguation In Mitkov R (Ed.), Oxford Handbook of Computational Linguistics Oxford University Press
- Natural Language Processing and Information Retrieval In Davies J, Göker A & Graham M (Ed.), Information Retrieval: Searching in the 21st Century (pp. 215-232-215-232). Wiley
- Sense Tagging In Ludeling A, Kyto M & McEnery T (Ed.), Handbook of Corpus Linguistics Mouton de Gruyter
- Knowledge Sources for WSD, Text, Speech and Language Technology (pp. 217-251). Springer Netherlands
- Words and Intelligence II Essays in Honor of Yorick Wilks Introduction, WORDS AND INTELLIGENCE II: ESSAYS IN HONOR OF YORICK WILKS (pp. XI-XIV).
- Knowledge Sources for WSD, WORD SENSE DISAMBIGUATION: ALGORITHMS AND APPLICATIONS (pp. 217-251).
- Knowledge Sources for Word Sense Disambiguation In Agirre E & Edmonds P (Ed.), Word Sense Disambiguation: Algorithms, Applications and Trends Kluwer
- Word Sense Disambiguation In Mitkov R (Ed.), Oxford Handbook of Computational Linguistics (pp. 249-265-249-265). Oxford University Press
- Combining Independent Knowledge Sources for Word Sense Disambiguation In Nicolov N & Mitkov R (Ed.), Recent Advances in Natural Language Processing (pp. 74-86-74-86). John Benjamins Publishers
- Large Vocabulary Word Sense Disambiguation In Ravin Y & Leacock C (Ed.), Polysemy: Theoretical and Computational Contributions (pp. 161-177-161-177). Oxford: Oxford University Press.
Conference proceedings papers
- RLStop: A Reinforcement Learning Stopping Method for TAR. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
- Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation. Proceedings of the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics
- Identifying Automatically Generated Headlines using Transformers. Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda
- UserReg : a simple but strong model for rating prediction. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Toronto, ON, Canada, 6 June 2021 - 11 June 2021.
- Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis. NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp 2364-2375)
- Robustness and Reliability of Gender Bias Assessment in Word Embeddings: The Role of Base Pairs. Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing
- DTMBIO 2020. Proceedings of the 29th ACM International Conference on Information & Knowledge Management
- ParaPat: The multi-million sentences parallel corpus of patents abstracts. LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings (pp 3769-3774)
- Automatic Generation of Topic Labels.. SIGIR (pp 1965-1968)
- The University of Sheffield at CheckThat! 2020: Claim Identification and Verification on Twitter. CEUR Workshop Proceedings, Vol. 2696
- Automatic Generation of Topic Labels. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval View this article in WRRO
- View this article in WRRO Modelling stopping criteria for search results using poisson processes. EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp 3484-3489)
- Improving ranking for systematic reviews using query adaptation. CLEF 2019 Proceedings : Experimental IR Meets Multilinguality, Multimodality, and Interaction (pp 141-148). Lugarno, Switzerland, 9 September 2019 - 12 September 2019. View this article in WRRO
- View this article in WRRO Ranking studies for systematic reviews using query adaptation : University of Sheffield's approach to CLEF eHealth 2019 task 2 working notes for CLEF 2019. Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Vol. 2380. Lugano, Switzerland, 9 September 2019 - 12 September 2019.
- A Dataset of Systematic Reviews Updates. The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp 1257-1260), 21 July 2019 - 25 July 2019. View this article in WRRO
- Graph-KD: Exploring relational information for knowledge discovery. CEUR Workshop Proceedings, Vol. 2456 (pp 257-260)
- View this article in WRRO Retrieving and ranking studies for systematic reviews: University of Sheffield's approach to CLEF eHealth 2018 Task 2. CEUR Workshop Proceedings, Vol. 2125
- Topic or Style? Exploring the Most Useful Features for Authorship Attribution.. COLING (pp 343-353)
- View this article in WRRO Ranking abstracts to identify relevant evidence for systematic reviews: The University of Sheffield's approach to CLEF eHealth 2017 Task 2: Working notes for CLEF 2017. CEUR Workshop Proceedings, Vol. 1866
- View this article in WRRO Using TF-IDF n-gram and word embedding cluster ensembles for author profiling: Notebook for PAN at CLEF 2017. CEUR Workshop Proceedings, Vol. 1866
- Hyperlocal home location identification of Twitter profiles. HT 2017 - Proceedings of the 28th ACM Conference on Hypertext and Social Media (pp 45-54) View this article in WRRO
- Continuous N-gram Representations for Authorship Attribution. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers (pp 267-273). Valencia, Spain, 3 April 2017 - 7 April 2017. View this article in WRRO
- Plagiarism Detection in Texts Obfuscated with Homoglyphs (pp 669-675) View this article in WRRO
- Identifying Potential Early Biomarkers Of Acute Myocardial Infarction In The Biomedical Literature: A Comparison Of Text Mining And Manual Sifting Techniques. Value in Health, Vol. 19(7) (pp A367-A367)
- User profiling with geo-located posts and demographic data. Proceedings of the First Workshop on NLP and Computational Social Science, November 2016 - November 2016.
- ExploringWord embeddings and character N-Grams for author clustering. CEUR Workshop Proceedings, Vol. 1609 (pp 984-991)
- TM 2015 -- Topic Models. Proceedings of the 24th ACM International on Conference on Information and Knowledge Management - CIKM '15, 18 October 2015 - 23 October 2015.
- Improving distant supervision using inference learning. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), July 2015 - July 2015. View this article in WRRO
- A Hybrid Distributional and Knowledge-based Model of Lexical Semantics. Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, June 2015 - June 2015.
- A machine learning-based intrinsic method for cross-topic and cross-genre authorship verification notebook for PAN at CLEF 2015. CEUR Workshop Proceedings, Vol. 1391
- Topic models and n-gram language models for author profiling. CEUR Workshop Proceedings, Vol. 1391
- The short stories corpus. CEUR Workshop Proceedings, Vol. 1391
- Topic models and n-gram language models for author profiling. CEUR Workshop Proceedings, Vol. 1391
- The short stories corpus. CEUR Workshop Proceedings, Vol. 1391
- Automatic identification of potentially contradictory claims to support systematic reviews. 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 9 November 2015 - 12 November 2015.
- The Short Stories Corpus: Notebook for PAN at CLEF 2015.. CLEF (Working Notes), Vol. 1391
- Topic Models and n-gram Language Models for Author Profiling - Notebook for PAN at CLEF 2015.. CLEF (Working Notes), Vol. 1391
- A machine learning-based intrinsic method for cross-topic and cross-genre authorship verification notebook for PAN at CLEF 2015. CEUR Workshop Proceedings, Vol. 1391
- Making the most of limited training data using distant supervision. ACL-IJCNLP 2015 - BioNLP 2015: Workshop on Biomedical Natural Language Processing, Proceedings of the Workshop (pp 12-20)
- View this article in WRRO Held-out versus Gold Standard: Comparison of Evaluation Strategies for Distantly Supervised Relation Extraction from Medline abstracts. EMNLP 2015 - 6th International Workshop on Health Text Mining and Information Analysis, LOUHI 2015 - Proceedings of the Workshop (pp 97-102)
- Automatic Detection of Answers to Research Questions from Medline Abstracts. ACL-IJCNLP 2015 - BioNLP 2015: Workshop on Biomedical Natural Language Processing, Proceedings of the Workshop (pp 141-146)
- Labelling Topics using Unsupervised Graph-based Methods. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Vol. 2 (pp 631-636)
- Measuring the Similarity between Automatically Generated Topics. Proceedings of the 14th Conference of the European Chapter of $ the Association for Computational Linguistics (pp 22-27)
- View this article in WRRO PATHS in context: User characteristics and the construction of cultural heritage narratives. iConference Proceedings 2014
- Hashing and merging heuristics for text reuse detection: Notebook for PAN at CLEF-2014. CEUR Workshop Proceedings, Vol. 1180 (pp 939-946)
- Hashing and merging heuristics for text reuse detection: Notebook for PAN at CLEF-2014. CEUR Workshop Proceedings, Vol. 1180 (pp 939-946)
- Hashing and Merging Heuristics for Text Reuse Detection.. CLEF (Working Notes), Vol. 1180 (pp 939-946)
- Representing topics labels for exploring digital libraries. IEEE/ACM Joint Conference on Digital Libraries, 8 September 2014 - 12 September 2014.
- Supporting Information Access and Sensemaking in Digital Cultural Heritage Environments (pp 143-154)
- Self-supervised Relation Extraction Using UMLS (pp 116-127)
- Applying UMLS for Distantly Supervised Relation Detection. Proceedings of the The Fifth International Workshop on Health Text Mining and Information Analysis (pp 80-84)
- PATHS: A System for Accessing Cultural Heritage Collections.. ACL (Conference System Demonstrations) (pp 151-156)
- Evaluating topic coherence using distributional semantics. Proceedings of the 10th International Conference on Computational Semantics, IWCS 2013 - Long Papers
- Distinguishing Common and Proper Nouns. *SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Vol. 1 (pp 80-84)
- UBC UOS-TYPED: Regression for Typed-similarity. *SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Vol. 1 (pp 132-137)
- Unsupervised domain tuning to improve word sense disambiguation. NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference (pp 680-684)
- Representing topics using images. NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference (pp 158-167)
- Information seeking in digital cultural heritage with PATHS. SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp 1105-1106)
- Evolving readable string test inputs using a natural language model to reduce human oracle cost. Proceedings - IEEE 6th International Conference on Software Testing, Verification and Validation, ICST 2013 (pp 352-361)
- Unsupervised Domain Tuning to Improve Word Sense Disambiguation. Proceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 (pp 680-684)
- UBC UOS-TYPED: Regression for Typed-similarity. SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual SimilaritySEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity (pp 132-137)
- Representing Topics Using Images. Proceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 (pp 158-167)
- Distinguishing Common and Proper Nouns. SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual SimilaritySEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity (pp 80-84)
- DALE: A Word Sense Disambiguation System for Biomedical Documents Trained using Automatically Labeled Examples. 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 - Demonstration Session (pp 1-4)
- Identification of Genia Events using Multiple Classifiers. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Vol. 2013-October (pp 125-129)
- Generating Paths through Cultural Heritage Collections. Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp 1-10)
- PAN@FIRE: Overview of the Cross-Language !ndian Text Re-Use Detection Competition. Forum for Information Retrieval Evaluation (FIRE) Working Notes. Bombay, India
- PATHS - Exploring Digital Cultural Heritage Spaces. Theory and Practice of Digital Libraries 2012. Cyprus
- Evaluating the use of clustering for automatically organising digital library collections. Theory and Practice of Digital Libraries 2012. Cyprus
- Automated Discovery of Valid Test Strings using Dynamic Regular Expressions Collation and Tailored Web Searches. Proceedings of the 12th International Conference on Quality Software (QSIC 2012). Xi’an, China
- Scaling up WSD with Automatically Generated Examples. BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing (pp 231-239). Montréal, Canada
- Adapting Wikification to Cultural Heritage. Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (pp 101-106). Avignon, France
- Computing Similarity between Cultural Heritage Items using Multimodal Features. Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (pp 85-93). Avignon, France
- The Sheffield and Basque Country universities entry to CHiC: Using random walks and similarity to access cultural heritage. CEUR Workshop Proceedings, Vol. 1178
- User-centred design to support exploration and path creation in cultural heritage collections. CEUR Workshop Proceedings, Vol. 909 (pp 75-78)
- PATHS: Personalising access to cultural heritage spaces. Proceedings of the 2012 18th International Conference on Virtual Systems and Multimedia, VSMM 2012: Virtual Systems in the Information Society (pp 469-474)
- Automated discovery of valid test strings from the web using dynamic regular expressions collation and natural language processing. Proceedings - International Conference on Quality Software (pp 79-88)
- Search-based test input generation for string data types using the results of web queries. Proceedings - IEEE 5th International Conference on Software Testing, Verification and Validation, ICST 2012 (pp 141-150)
- Personalising access to cultural heritage collections using pathways. PATCH 2011 : 3rd International Workshop on Personalized Access To Cultural Heritage (pp 12-19-12-19)
- View this article in WRRO External Plagiarism Detection using Information Retrieval and Sequence Alignment - Notebook for PAN at CLEF 2011.. CLEF (Notebook Papers/Labs/Workshop), Vol. 1177
- The Effect of Ambiguity on the Automated Acquisition of WSD Examples. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp 353-356-353-356). Los Angeles, California
- Inter-sentential Relations in Information Extraction Corpora. Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC-2010). Valetta, Malta
- Improving Summarization of Biomedical Documents using Word Sense Disambiguation. Proceedings of the 2010 Workshop on Biomedical Natural Language Processing (pp 55-63-55-63). Uppsala, Sweden
- Aligning WordNet Synsets and Wikipedia Articles. Proceedings of the AAAI-2010 Workshop on Collaboratively-built Knowledge Sources and Artificial Intelligence. Atlanta, Georgia
- IIITH: Domain Specific Word Sense Disambiguation. Proceedings of the 5th International Workshop on Semantic Evaluation. Uppsala, Sweden
- View this article in WRRO University of Sheffield: Lab Report for PAN at CLEF 2010. Proceedings of the 4th International Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse
- Reducing qualitative human oracle costs associated with automatically generated test data. 1st International Workshop on Software Test Output Validation, STOV 2010, in Conjunction with the 2010 International Conference on Software Testing and Analysis, ISSTA 2010 (pp 1-4)
- Disambiguation of Biomedical Abbreviations. Proceedings of the BioNLP 2009 Workshop (pp 71-79). Boulder, Colorado
- A Corpus of Biomedical Abbreviations. Proceedings of Corpus Linguistics 2009. Liverpool, UK
- Designing a Corpus of Plagiarised Academic Texts. Proceedings of Corpus Linguistics 2009. Liverpool, UK
- Knowledge sources for word sense disambiguation of biomedical text. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing - BioNLP '08, 19 June 2008 - 19 June 2008.
- Knolwedge Sources for Word Sense Disambiguation of Biomedical Text. Proceedings of the workshop “BioNLP 2008" held in conjunction with the 46th Annual Meeting of the Association for Computational Linguistics (pp 80-87-80-87). Columbus, OH.
- Acquiring Sense Tagged Examples using Relevance Feedback. Proceedings of the 22nd International Conference on Computational Linguistics (COLING-08). Manchester, UK
- A Semantic Approach to Paraphrase Identification. Proceedings of the 11th Annual Research Colloquium of the UK Special-interest group for Computational Lingusitics. Oxford, England
- PAN@FIRE. Proceedings of the 5th 2013 Forum on Information Retrieval Evaluation - FIRE '13, 4 December 2013 - 6 December 2013.
- A Semi-supervised Approach to Learning Relevant Protein-Protein Interaction Articles. Proceedings of BioCreative II workshop (pp 175-177-175-177). Madrid, Spain
- A Task-based Comparison of Information Extraction Pattern Models. Proceedings of the Workshop “Deep Linguistic Processing” held in conjunction with the 45th Annual Meeting of the Association for Computational Linguistics (pp 81-88)
- Learning Expressive Models for Word Sense Disambiguation. 45th Annual Meeting of the Association of Computational Linguistics (pp 41-48)
- Improving Semi-supervised Acquisition of Relation Extraction Patterns. Proceedings of the Workshop “Information Extraction Beyond The Document” held in conjunction with 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp 29-35-29-35). Sydney, Australia
- Comparing Information Extraction Pattern Models. Proceedings of the Workshop “Information Extraction Beyond The Document” held in conjunction with 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp 12-19-12-19). Sydney, Australia
- Multilingual versus Monolingual WSD. Proceedings of the workshop "Making Sense of Sense" held in conjunction with the Eleventh Conference of the European Chapter of the Association for Computational Lingusitics (pp 33-40-33-40). Trento, Italy
- Translation Context Sensitive WSD. Proceedings of the European Association for Machine Transaltion 11th Annual Conference (EAMT-2006) (pp 227-232-227-232). Oslo, Norway
- The need for application-dependent WSD strategies: A case study in NIT. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS, Vol. 3960 (pp 233-237)
- The University of Sheffield's TREC 2006 Q&A Experiments.. TREC, Vol. 500-272
- Mining Rules for Word Sense Disambiguation. III TIL - Workshop em Tecnologia da Informacao e da Linguagem Humana, XXV Congresso da SBC. Sao Leopoldo, Brasil
- An Automatic Approach to Creating a Sense Tagged Corpus for Word Sense Disambiguation in Machine Translation. Second Workshop Organised by the MEANING project (MEANING-2005) (pp 31-36). Trento, Italy
- Automatically Acquiring a Linguistically Motivated Genic Interaction Extraction System. Proceedings of the workshop “Learning Language in Logic (LLL 05)” held in conjunction the 22nd International Conference on Machine Learning (ICML 05). Bonn, Germany
- Exploiting Parallel Texts to Produce a Multilingual Sense Tagged Corpus for Word Sense Disambiguation. Recent Advances in Natural Language Processing (pp 525-531)
- A Semantic Approach to IE Pattern Induction. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (pp 379-386-379-386). Ann Arbour, MI
- Learning Information Extraction Patterns Using WordNet. GWC 2006: THIRD INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS (pp 95-102)
- An Unsupervised WordNet-based Algorithm for Relation Extraction. Proceedings of the “Beyond Named Entity” workshop at the Fourth International Conference on Language Resources and Evalutaion (LREC-04) (pp 37-42-37-42). Lisbon, Portugal
- View this article in WRRO EuroWordNet as a Resource for Cross-language Information Retrieval. Proceedings of the Fourth International Conference on Language Resources and Evaluation (pp 777-780). Lisbon, Portugal
- Information Extraction from Single and Multiple Sentences. Proceedings of the Twentieth International Conference on Computational Linguistics (COLING-04) (pp 875-881-875-881). Geneva, Switzerland
- Cross-language information retrieval using EuroWordNet and word sense disambiguation. ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, Vol. 2997 (pp 327-337)
- Requirements for Information Extraction for Knowledge Management. Knowledge Management and Semantic Annotation Workshop at Second International Semantic Web Conference (ISWC-2003) (pp 89-94-89-94). Sanibel, FL.
- Information Extraction as a Semantic Web Technology: Requirements and Promises. Proceedings of the 14th European Conference on Machine Learning (ECML 2003) workshop “Adaptive Text Extraction and Mining”. Cavtat-Dubrovnik, Croatia
- View this article in WRRO Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-language Information Retrieval. GWC 2004: SECOND INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS (pp 97-105)
- Combining Disambiguation Techniques to Enrich an Ontology. Proceedings of the 15th European Conference on Artificial Intelligence (ECAI-02) workshop “Machine Learning and Natural Language Processing for Ontology Engineering” (pp 43-50-43-50). Lyon, France
- The Reuters Corpus – from Yesterday’s News to Tomorrow’s Language Resources. Proceedings of the Third International Conference on Language Resources and Evaluation (LREC-02) (pp 827-832-827-832). Las Palmas, Canary Islands
- Augmenting Noun Taxonomies by Combining Lexical Similarity Metrics. Proceedings of the 19th International Conference on Computational Linguistics (COLING-02) (pp 953-959-953-959). Taipei, Taiwan
- Adding Thesaural Information to Noun Taxonomies (poster). Proceedings of the Second International Conference on Recent Advances in Natural Language Processing (RANLP-01) (pp 297-299-297-299). Tzigov Chark, Bulgaria
- Improving Named Entity Recognition using Annotated Corpora. Proceedings of the Second International Conference on Language Resources and Evaluation (LREC-2000) workshop “Information Extraction meets Corpus Linguistics” (pp 26-32-26-32). Athens, Greece
- Using Corpus-derived Name Lists for Named Entity Recognition.. ANLP (pp 290-295)
- Experiments on Sentence Boundary Detection.. ANLP (pp 84-89)
- Baseline IE-NE Experiments using the SPRACH/LASIE System. Proceedings of the DARPA HUB-4 Workshop (pp 47-50-47-50). Herndon, Virginia
- Combining weak knowledge sources for sense disambiguation. IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2 (pp 884-889)
- A corpus-based approach to deriving lexical mappings. NINTH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS (pp 285-286)
- An Empirical Approach to Lexical Tuning. First International Conference on Language Resources and Evaluation (LREC-98) Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications (pp 27-33-27-33). Granada, Spain
- Implementing a Sense Tagger within a General Architecture for Text Engineering. Proceedings of the New Methods in Language Processing Conference (NeMLaP-3) (pp 59-72-59-72). Sydney, Australia
- Word Sense Disambiguation using Optimised Combinations of Knowledge Sources. Proceedings of the 17th International Conference on Computational Linguistics and the 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-98) (pp 1398-1402-1398-1402). Montreal, Canada
- Extracting Syntactic Relations using Heuristics. Proceedings of the European Summer School in Logic, Language and Information (ESSLLI-98) (pp 248-256-248-256). Saarbrücken, Germany
- Sense tagging and language engineering. ECAI 1998: 13TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS (pp 185-189)
- Sense Tagging: Semantic Tagging with a Lexicon. Fifth Conference on Applied Natural Language Processing (ANLP-1997) Workshop “Tagging Text with Lexical Semantics: Why, What and How?” (pp 47-51-47-51). Washington, D.C.
- Combining Independent Knowledge Sources for Word Sense Disambiguation. Proceedings of Recent Advances in Natural Language Processing (RANLP-97) (pp 1-7-1-7). Tzigov Chark, Bulgaria
- Document Set Expansion with Positive-Unlabelled Learning Using Intractable Density Estimation. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
- View this article in WRRO Combining counting processes and classification improves a stopping rule for technology assisted review. Findings of the Association for Computational Linguistics: EMNLP 2023. Singapore, 6 December 2023 - 6 December 2023.
- On the Vulnerabilities of Text-to-SQL Models. Proceedings of the 34th IEEE International Symposium on Software Reliability Engineering
- Introduction
- View this article in WRRO HiDE: A Tool for Unrestricted Literature Based Discovery. Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)
- Matching Cultural Heritage items to Wikipedia. Proceedings of the 8th International Conference on Language Resources and Evaluation. Istanbul, Turkey
- Mapping WordNet synsets to Wikipedia articles. Proceedings of the 8th International Conference on Language Resources and Evaluation. Istanbul, Turkey
- Detecting Text Reuse with Modified and Weighted N-grams. *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012) (pp 54-58). Montréal, Canada
- University_Of_Sheffield: Two Approaches to Semantic Text Similarity. *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012) (pp 655-661). Montréal, Canada
- Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis
- Stopping Criteria for Technology Assisted Reviews based on Counting Processes. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21), 11 July 2021 - 15 July 2021.
Reports
- On the Expressiveness of Information Extraction Patterns
- Evaluating the Single Sentence Assumption in Information Extraction
- Shallow Parsing using Heuristics
- Sense Tagging: Semantic Tagging with a Lexicon
Preprints
- Bio-SIEVE: Exploring Instruction Tuning Large Language Models for Systematic Review Automation, arXiv.
- Automatic Generation of Topic Labels, arXiv.
- Re-Ranking Words to Improve Interpretability of Automatically Generated Topics, arXiv.
- The Grammar of Sense: Is word-sense tagging much more than part-of-speech tagging?, arXiv.
- On the Security Vulnerabilities of Text-to-SQL Models, arxiv.
- A novel specific artificial intelligence-based method to identify COVID-19 cases using simple blood exams.
- View this article in WRRO Revisiting the linearity in cross-lingual embedding mappings: from a perspective of word analogies.
- Grants
-
Current Grants
- Distinguishing Common and Proper Nouns, Industrial, 03/2011 - 12/2022, £31,847 as PI
Previous Grants
- Automatically mapping and assessing inequalities in public health research, NIHR, 04/2021 - 12/2021, £48,764, as PI
- Institute of Coding, HEFCE, 11/2017 - 03/2021, £957,000, as Co-PI
- Digital Sensitivity Review, Industrial, 11/2018 - 03/2019, £39,880, as PI
- Data Analytics, Royal Academy of Engineering, 09/2017 - 09/2020, £30,000 as PI
- Recommendation Algorithm, Industrial, 04/2017 - 10/2017, £60,600 as PI
- HiDE: A Tool for Unrestricted Literature Based Discovery, Government, 01/2016 - 06/2016, £66,584 as PI
- InPuT: Individual Profiling using Text Analysis, Government, 09/2014 - 09/2015, £10,746 as PI
- Information Processing and Sensemaking: An Exploratory Search System for Document Collections, Government, 09/2014 - 08/2015, £77,840 as PI
- Connected Marketplace, Industrial, 01/2014 - 08/2014, £5,000 as PI
- PUMP: Developing a Data Set of Textual and Visual Topic Labels, EPSRC, 09/2013 - 10/2013, £1,540 as PI
- Language Processing for Literature Based Discovery in Medicine, EPSRC, 06/2012 - 05/2015, £293,127 as PI
- PATHS: Personalised Access to Cultural Heritage Spaces, EC FP7, 01/2011 - 12/2013, £709,407 as PI
- Professional activities and memberships
-
- Area chair for EACL 2017 track ``Document analysis including text categorisation, topic models, and retrieval’’
Winner of best paper award at CLEF 2004 (with Roland Roller) - Keynote speaker at RANLP 2013
- Area chair for EMNLP 2013 track “semantics”
- Assistant Director of Advanced Computing Research Centre
- Co-ordinator of EU-funded project (PATHS)
- Member of ACL SIGLEX board (2010-2013 and 2013-2016)
- EPSRC Advanced Research Fellow (2006-2011)
- Member of editorial board of Computational Linguistics (2008-2010)
- Member of the Natural Language Processing research group
- Area chair for EACL 2017 track ``Document analysis including text categorisation, topic models, and retrieval’’