Dr Mark Hepple
MSc, PhD
School of Computer Science
Reader
Member of the Natural Language Processing (NLP) research group
+44 114 222 1829
Full contact details
School of Computer Science
Regent Court (DCS)
211 Portobello
Sheffield
S1 4DP
- Profile
-
Mark Hepple is a Reader in Computer Science. He studied Psychology at Sheffield University (BSc, 1986), and Cognitive Science at Edinburgh University (MSc, 1987; PhD, 1990). Thereafter, he was a Research Associate at Cambridge University (1990-92), and a Postdoctoral Research Fellow at the University of Pennsylvania (1992-93).
He joined the Department of Computer Science at Sheffield University in 1993, as a Lecturer, and as a member of the Natural Language Processing group.
- Research interests
-
Dr Hepple has wide-ranging interests across Computational Linguistics and Natural Language Processing, and has published on many topics, including formal grammar and parsing, information extraction, clinical text mining, temporal information processing, robust dialogue processing, and efficient storage of large-scale linguistic data.
- Publications
-
Journal articles
- Toward an effective Igbo part-of-speech tagger. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(4). View this article in WRRO
- A Basic Language Resource Kit Implementation for the IgboNLPProject. ACM Transactions on Asian and Low-Resource Language Information Processing, 17(2), 1-23. View this article in WRRO
- Sub-story detection in Twitter with hierarchical Dirichlet processes. Information Processing & Management, 53(4), 989-1003. View this article in WRRO
- The TempEval challenge: identifying temporal relations in text.. Lang. Resour. Evaluation, 43, 161-179.
- Mining clinical relationships from patient narratives.. BMC Bioinformatics, 9 Suppl 11, S3. View this article in WRRO
- The CLEF corpus: semantic annotation of clinical text.. AMIA Annu Symp Proc, 625-629.
- A web service for biomedical term look-up.. Comp Funct Genomics, 6(1-2), 86-93. View this article in WRRO
- View this article in WRRO Evaluating two methods for Treebank grammar compaction.. Nat. Lang. Eng., 5, 377-394.
- Feature-based formalism for two-level phonology: A description and implementation. Computer Speech and Language, 7(4), 333-358.
Chapters
- Using Semantic Inferences for Temporal Annotation Comparison, The Language Of Time (pp. 575-584). Oxford University PressOxford
- Machine Learning Approaches to Human Dialogue Modelling, Advances in Natural Multimodal Dialogue Systems (pp. 355-370). Springer Netherlands
- Two Functional Approaches For Interpreting D-Tree Grammar Derivations, Studies in Linguistics and Philosophy (pp. 185-204). Springer Netherlands
- Grammatical relations and the Lambek calculus, Discontinuous Constituency DE GRUYTER
Conference proceedings papers
- Multi-task projected embedding for Igbo. Text, Speech, and Dialogue : 21st International Conference, Proceedings (pp 285-294). Brno, Czech Republic, 11 September 2018 - 14 September 2018. View this article in WRRO
- Igbo Diacritic Restoration using Embedding Models. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, June 2018 - June 2018.
- Transferred Embeddings for Igbo Similarity, Analogy and Diacritic Restoration Tasks. COLING 2018 - 3rd Workshop on Semantic Deep Learning, SemDeep 2018 - Proceedings (pp 30-38)
- The SENSEI Overview of Newspaper Readers’ Comments. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science, vol 10193. Springer (pp 758-761) View this article in WRRO
- Lexical Disambiguation of Igbo using Diacritic Restoration. Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications, April 2017 - April 2017. View this article in WRRO
- Automatic Label Generation for News Comment Clusters. Proceedings of the 9th International Natural Language Generation Conference (pp 61-69), 5 September 2016 - 8 September 2016. View this article in WRRO
- Automatic Restoration of Diacritics for Igbo Language. Text, Speech, and Dialogue, Vol. 9924 (pp 198-205), 12 September 2016 - 16 September 2016. View this article in WRRO
- Predicting Morphologically-Complex Unknown Words in Igbo. Text, Speech, and Dialogue, Vol. 9924 (pp 206-214), 12 September 2016 - 16 September 2016. View this article in WRRO
- The SENSEI Annotated Corpus: Human Summaries of Reader Comment Conversations in On-line News. Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp 42-52), 13 September 2016 - 15 September 2016. View this article in WRRO
- View this article in WRRO What's the issue here?: Task-based evaluation of reader comment summarization systems. Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 (pp 3094-3101)
- Studying the temporal dynamics of word co-occurrences: An application to event detection. Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 (pp 4380-4387)
- View this article in WRRO Improving Accuracy of Igbo Corpus Annotation Using Morphological Reconstruction and Transformation-Based Learning.. Proceedings of TALAf 2016 - Traitement automatique des langues africaines (pp 1-10)
- A Graph-Based Approach to Topic Clustering for Online Comments to News. Advances in Information Retrieval (pp 15-29), 20 March 2016 - 23 March 2016. View this article in WRRO
- Sheffield-Trento System for Sentiment and Argument Structure Enhanced Comment-to-Article Linking in the Online News Domain (Ahmet Aker, Fabio Celli, Adam Funk, Emina Kurtic, Mark Hepple and Rob Gaizauskas). MultiLing 2015 in SIGDIAL. Prague, 2 September 2015 - 4 September 2015.
- View this article in WRRO Use of Transformation-Based Learning in Annotation Pipeline of Igbo, an African Language.. Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (pp 24-33)
- Comment-to-Article Linking in the Online News Domain. Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, September 2015 - September 2015. View this article in WRRO
- Part-of-speech Tagset and Corpus Development for Igbo, an African Language. Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop, August 2014 - August 2014. View this article in WRRO
- Reliably evaluating summaries of twitter timelines. AAAI 2013 Spring Symposium on Analyzing Microtext. Stanford
- Evaluating Lexical Substitution: Analysis and New Measures. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (pp 3250-3254)
- Evaluation Metrics for the Lexical Substitution Task.. HLT-NAACL (pp 289-292)
- Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval.. EMNLP (pp 262-272)
- Efficient Minimal Perfect Hash Language Models.. LREC
- Evaluating Lexical Substitution: Analysis and New Measures.. LREC
- View this article in WRRO Building a semantically annotated corpus of clinical texts.. J. Biomed. Informatics, Vol. 42 (pp 950-966)
- Cross-Domain Dialogue Act tagging. Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (pp 1969-1976)
- Cross-Domain Dialogue Act Tagging.. LREC
- Combining Terminology Resources and Statistical Methods for Entity Recognition: an Evaluation.. LREC
- SemEval-2007 task 15. Proceedings of the 4th International Workshop on Semantic Evaluations - SemEval '07, 23 June 2007 - 24 June 2007.
- SemEval-2007 task 15: TempEval temporal relation identification. ACL 2007 - SemEval 2007 - Proceedings of the 4th International Workshop on Semantic Evaluations (pp 75-80)
- USFD: Preliminary exploration of features and classifiers for the TempEval-2007 tasks. ACL 2007 - SemEval 2007 - Proceedings of the 4th International Workshop on Semantic Evaluations (pp 438-441)
- Task-Oriented Extraction of Temporal Information: The Case of Clinical Narratives.. TIME (pp 188-195)
- SUPPLE. Proceedings of the Ninth International Workshop on Parsing Technology - Parsing '05, 9 October 2005 - 10 October 2005.
- SUPPLE: A Practical Parser for Natural Language Engineering Applications.. IWPT (pp 200-201)
- Error analysis of Dialogue Act classification. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, Vol. 3658 (pp 451-458)
- The Role of Inference in the Temporal Annotation and Analysis of Text.. Lang. Resour. Evaluation, Vol. 39 (pp 243-265)
- The University of Sheffield's TREC 2005 Q&A Experiments.. TREC, Vol. 500-266
- Human Dialogue Modelling Using Annotated Corpora.. LREC
- A Large-Scale Resource for Storing and Recognizing Technical Terminology.. LREC
- NLP-enhanced Content Filtering within the POESIA Project. Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC2004)
- Information retrieval for question answering a SIGIR 2004 workshop.. SIGIR Forum, Vol. 38 (pp 41-44)
- The University of Sheffield's TREC 2004 QA Experiments.. TREC, Vol. 500-261
- Human dialogue modelling using machine learning. Recent Advances in Natural Language Processing III, Vol. 260 (pp 17-28)
- The University of Sheffield's TREC 2003 Q&A Experiments.. TREC, Vol. 500-255 (pp 782-790)
- Independence and commitment. Proceedings of the 38th Annual Meeting on Association for Computational Linguistics - ACL '00, 3 October 2000 - 6 October 2000.
- Independence and commitment: Assumptions for rapid training and execution of rule-based POS taggers. 38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE (pp 278-285)
- An Earley-style Predictive Chart Parsing Method for Lambek Grammars.. ACL (pp 465-472)
- View this article in WRRO University of Sheffield TREC-8 Q&A System.. TREC, Vol. 500-246
- Compacting the Penn Treebank Grammar. CoRR, Vol. cs.CL/9902001
- Memoisation for glue language deduction and categorial parsing. Proceedings of the 17th international conference on Computational linguistics -, 10 August 1998 - 14 August 1998.
- Linear Categorial Deduction via First-order Compilation.. TAPD (pp 108-117)
- Memoisation for Glue Language Deduction and Categorial Parsing.. COLING-ACL (pp 538-544)
- Compacting the Penn Treebank Grammar.. COLING-ACL (pp 699-703)
- Maximal incrementality in linear categorial deduction. Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics -, 7 July 1997 - 12 July 1997.
- Maximal incrementality in linear categorial deduction. 35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE (pp 344-351)
- A Compilation-Chart Method for Linear Categorial Deduction.. COLING (pp 537-542)
- Hybrid Categorial Logics.. Log. J. IGPL, Vol. 3 (pp 343-355)
- Mixing modes of linguistic description in categorial grammar. SEVENTH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (pp 127-132)
- Discontinuity And The Lambek Calculus.. COLING (pp 1235-1239)
- Chart Parsing Lambek Grammars: Modal Extensions And Incrementality.. COLING (pp 134-140)
- EFFICIENT INCREMENTAL PROCESSING WITH CATEGORIAL GRAMMAR. 29TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS : PROCEEDINGS OF THE CONFERENCE (pp 79-86)
- PROOF FIGURES AND STRUCTURAL OPERATORS FOR CATEGORIAL GRAMMAR. FIFTH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (pp 198-203)
- Normal Form Theorem Proving for the Lambek Calculus.. COLING (pp 173-178)
- PARSING AND DERIVATIONAL EQUIVALENCE. FOURTH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (pp 10-18)
Preprints
- Grants
-
- POESIA: Public Open-source Environment for a Safer Internet, EUROPEAN COMMISSION - FP6/FP7, 02/2002 to 02/2004, £89,129, as PI
- CLEF: Clinical E-Science Framework, UNIVERSITY OF MANCHESTER, 10/2002 to 01/2006, £280,725, as Co-PI
- CLEF-Services, UNIVERSITY OF MANCHESTER, 01/2005 to 06/2008, £401,021, as Co-PI
- CA4NLP: Engineering Natural Language Interfaces: can CA help?, EPSRC, 04/2008 to 03/2009, £49,480, as PI
- Reveal II, GOVERNMENT COMMUNICATIONS HEADQUARTERS, 10/2008 to 03/2010, £141,763, as PI
- uComp: Embedded Human Computation for Knowledge Extraction and Evaluation, EPSRC, 11/2012 to 05/2016, £375,621, as Co-PI
- SENSEI: Making Sense of Human - Human Conversation, EUROPEAN COMMISSION - FP6/FP7, 11/2013 to 10/2016, £459,034, as Co-PI
- Professional activities and memberships
-
Member of the Natural Language Processing research group