Professor Nikos Aletras
School of Computer Science
Professor of Natural Language Processing
Head of the Natural Language Processing (NLP) research group
+44 114 222 1911
Full contact details
School of Computer Science
Regent Court (DCS)
211 Portobello
Sheffield
S1 4DP
- Profile
-
Nikos Aletras is a Professor in Natural Language Processing (NLP) in the Computer Science Department at the University of Sheffield, co-affiliated with the Machine Learning (ML) group. Previously, he was a research scientist at Amazon (Core ML and Alexa) and a research associate at UCL, Department of Computer Science, Media Futures Group. He completed a PhD in NLP at the University of Sheffield. His research interests are in NLP, Machine Learning and Data Science. He develops text analysis methods to solve problems in other scientific areas such as (computational) social and legal science.
- Research interests
-
- NLP
- Computational Social Science
- Legal NLP
- Data Science
- Machine Learning
- Publications
-
Journal articles
- Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization. Transactions of the Association for Computational Linguistics, 12, 1163-1181.
- We Need to Talk About Classification Evaluation Metrics in NLP.. CoRR, abs/2401.03831.
- An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Generative LLM Inference.. CoRR, abs/2402.10712.
- Who is bragging more online? A large scale analysis of bragging in social media. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, 17575-17587.
- Vocabulary Expansion for Low-resource Cross-lingual Transfer.. CoRR, abs/2406.11477.
- Introduction. EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations, IV.
- Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models.. CoRR, abs/2403.12809.
- Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization.. Trans. Assoc. Comput. Linguistics, 12, 1163-1181.
- Predicting and analyzing the popularity of false rumors in Weibo. Expert Systems with Applications, 122791-122791.
- CIKM'23 Program Chairs' Welcome. International Conference on Information and Knowledge Management, Proceedings, v.
- Frustratingly Simple Memory Efficiency for Pre-trained Language Models via Dynamic Embedding Pruning.. CoRR, abs/2309.08708.
- Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention. Findings of the Association for Computational Linguistics: EMNLP 2023.
- A Multimodal Analysis of Influencer Content on Twitter.. CoRR, abs/2309.03064.
- Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks.. CoRR, abs/2309.07794.
- Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets.. CoRR, abs/2309.11576.
- Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.
- Regulation and NLP (RegNLP): Taming Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.
- Active Learning Principles for In-Context Learning with Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2023.
- Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science.. CoRR, abs/2305.14310.
- Lighter, yet More Faithful: Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization.. CoRR, abs/2311.09335.
- How Does Calibration Data Affect the Post-training Pruning and Quantization of Large Language Models?. CoRR, abs/2311.09755.
- Node-Feature Convolution for Graph Convolutional Networks. Pattern Recognition, 108661-108661.
- Improving Graph-Based Text Representations with Character and Word Level N-grams.. CoRR, abs/2210.05999.
- Identifying and Characterizing Active Citizens who Refute Misinformation in Social Media.. CoRR, abs/2204.10080.
- Identifying Twitter users who repost unreliable news sources with linguistic information. PeerJ Computer Science, 6. View this article in WRRO
- Unsupervised quality estimation for neural machine translation. Transactions of the Association for Computational Linguistics, 8, 539-555. View this article in WRRO
- Analyzing Political Parody in Social Media. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. View this article in WRRO
- Evaluating topic representations for exploring document collections. Journal of the Association for Information Science and Technology, 68(1), 154-167. View this article in WRRO
- Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective. PeerJ Computer Science, 2. View this article in WRRO
- Why are these similar? Investigating item similarity types in a large digital library. Journal of the Association for Information Science and Technology, 67(7), 1624-1638. View this article in WRRO
- Computing similarity between items in a digital library of cultural heritage. Journal of Computing and Cultural Heritage, 5(4).
- Is There a Permanent Campaign for Online Political Advertising? Investigating Partisan and Non-Party Campaign Activity in the UK between 2018–2021. Journal of Political Marketing, 1-19.
- Flexible Instance-Specific Rationalization of NLP Models. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10), 10545-10553.
- On the Ethical Limits of Natural Language Processing on Legal Text.
- Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification.
- Analyzing Online Political Advertisements.
- An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction.
- Studying User Income through Language, Behaviour and Affect in Social Media. PLoS ONE, 10(9).
- Complaint Identification in Social Media with Transformer Networks.
- Frustratingly Simple Pretraining Alternatives to Masked Language Modeling.
- Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience.
- Point-of-Interest Type Prediction using Text and Images.
- Translation Error Detection as Rationale Extraction.
- Point-of-Interest Type Inference from Social Media Text.
- An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels.
- LEGAL-BERT: The Muppets straight out of Law School.
- Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases.
- Modeling the Severity of Complaints in Social Media.
- Variable Instance-Level Explainability for Text Classification.
- Bayesian Active Learning with Pretrained Language Models.
- Active Learning by Acquiring Contrastive Examples.
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English.
Conference proceedings papers
- Who Is Bragging More Online? A Large Scale Analysis of Bragging in Social Media.. LREC/COLING (pp 17575-17587)
- RISE: Robust Early-exiting Internal Classifiers for Suicide Risk Evaluation. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings (pp 14134-14145)
- Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - System Demonstrations, St. Julians, Malta, March 17-22, 2024. EACL (Demonstrations)
- Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science.. LREC/COLING (pp 12074-12086)
- Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks.. EACL (Findings) (pp 1126-1137)
- Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets.. LREC/COLING (pp 6739-6751)
- On the Impact of Calibration Data in Post-training Quantization and Pruning.. ACL (1) (pp 10100-10118)
- Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models.. NAACL-HLT (pp 3226-3244)
- Bayesian Prompt Ensembles: Model Uncertainty Estimation for Black-Box Large Language Models.. ACL (Findings) (pp 12229-12272)
- Self-training through Classifier Disagreement for Cross-Domain Opinion Target Extraction. Proceedings of the ACM Web Conference 2023
- It’s about Time: Rethinking Evaluation on Rumor Detection Benchmarks using Chronological Splits. EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023 (pp 724-731)
- Rethinking Semi-supervised Learning with Language Models. Findings of the Association for Computational Linguistics: ACL 2023, July 2023 - July 2023.
- On the Limitations of Simulating Active Learning. Findings of the Association for Computational Linguistics: ACL 2023, July 2023 - July 2023.
- Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), July 2023 - July 2023.
- Incorporating attribution importance for improving faithfulness metrics. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1 (pp 4732-4745). Toronto, Canada, 9 July 2023 - 9 July 2023. View this article in WRRO
- Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), July 2023 - July 2023.
- It’s about Time: Rethinking Evaluation on Rumor Detection Benchmarks using Chronological Splits. Findings of the Association for Computational Linguistics: EACL 2023, May 2023 - May 2023.
- Robust Weak Supervision with Variational Auto-Encoders. Proceedings of Machine Learning Research, Vol. 202 (pp 34394-34408)
- Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention.. EMNLP (Findings) (pp 10355-10373)
- Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?. EMNLP (pp 9085-9108)
- Active Learning Principles for In-Context Learning with Large Language Models.. EMNLP (Findings) (pp 5011-5034)
- Regulation and NLP (RegNLP): Taming Large Language Models.. EMNLP (pp 8712-8724)
- Incorporating Attribution Importance for Improving Faithfulness Metrics.. ACL (1) (pp 4732-4745)
- A Multimodal Analysis of Influencer Content on Twitter.. IJCNLP (1) (pp 225-240)
- We Need to Talk About Classification Evaluation Metrics in NLP.. IJCNLP (1) (pp 498-510)
- Towards Suicide Ideation Detection Through Online Conversational Context. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
- Identifying and Characterizing Active Citizens who Refute Misinformation in Social Media. 14th ACM Web Science Conference 2022
- Translation Error Detection as Rationale Extraction. Findings of the Association for Computational Linguistics: ACL 2022, May 2022 - May 2022.
- On the Importance of Effectively Adapting Pretrained Language Models for Active Learning. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), May 2022 - May 2022.
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022 - May 2022.
- How does the pre-training objective affect what large language models learn about linguistic properties?. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), May 2022 - May 2022.
- Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection. Findings of the Association for Computational Linguistics: ACL 2022, May 2022 - May 2022.
- Automatic Identification and Classification of Bragging in Social Media. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022 - May 2022.
- An Empirical Study on Explanations in Out-of-Domain Settings. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022 - May 2022.
- Combining Humor and Sarcasm for Improving Political Parody Detection. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, July 2022 - July 2022.
- HashFormers: Towards Vocabulary-independent Pre-trained Transformers. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp 7862-7874)
- Translation Error Detection as Rationale Extraction.. ACL (Findings) (pp 4148-4159)
- On the Impact of Temporal Concept Drift on Model Explanations. Findings of the Association for Computational Linguistics: EMNLP 2022 (pp 4068-4083)
- On the Impact of Temporal Concept Drift on Model Explanations. Findings of the Association for Computational Linguistics: EMNLP 2022, December 2022 - December 2022.
- Proceedings of the Natural Legal Language Processing Workshop, NLLP@EMNLP 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 8, 2022. NLLP@EMNLP
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English.. ACL (1) (pp 4310-4330)
- Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection. Proceedings - International Conference on Computational Linguistics, COLING, Vol. 29(1) (pp 6656-6666)
- How does the pre-training objective affect what large language models learn about linguistic properties?. ACL (2) (pp 131-147)
- Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection.. ACL (Findings) (pp 372-382)
- Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection.. COLING (pp 6656-6666)
- Improving Graph-Based Text Representations with Character and Word Level N-grams.. AACL/IJCNLP (2) (pp 228-233)
- A Hierarchical N-Gram Framework for Zero-Shot Link Prediction. Findings of the Association for Computational Linguistics: EMNLP 2022, December 2022 - December 2022.
- An Empirical Study on Explanations in Out-of-Domain Settings.. ACL (1) (pp 6920-6938)
- Combining Humor and Sarcasm for Improving Political Parody Detection.. NAACL-HLT (pp 1800-1807)
- Automatic Identification and Classification of Bragging in Social Media.. ACL (1) (pp 3945-3959)
- Knowledge distillation for quality estimation. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp 5091-5099). Bangkok, Thailand (virtual conference), 1 August 2021 - 6 August 2021. View this article in WRRO
- Point-of-Interest Type Prediction using Text and Images. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, November 2021 - November 2021.
- Frustratingly Simple Pretraining Alternatives to Masked Language Modeling. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, November 2021 - November 2021.
- Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, November 2021 - November 2021.
- Active Learning by Acquiring Contrastive Examples. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, November 2021 - November 2021.
- Introduction. Natural Legal Language Processing, NLLP 2021 - Proceedings of the 2021 Workshop (pp III)
- Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases.. NAACL-HLT (pp 226-241)
- Modeling the Severity of Complaints in Social Media.. NAACL-HLT (pp 2264-2274)
- On the Ethical Limits of Natural Language Processing on Legal Text.. ACL/IJCNLP (Findings) (pp 3590-3599)
- Knowledge Distillation for Quality Estimation.. ACL/IJCNLP (Findings) (pp 5091-5099)
- In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering.. ACL/IJCNLP (2) (pp 468-475)
- Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification.. ACL/IJCNLP (1) (pp 477-488)
- Analyzing Online Political Advertisements.. ACL/IJCNLP (Findings) (pp 3669-3680)
- An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction. EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp 9174-9179)
- Proceedings of the Natural Legal Language Processing Workshop 2021, NLLP@EMNLP 2021, Punta Cana, Dominican Republic, November 10, 2021. NLLP@EMNLP
- Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience.. EMNLP (1) (pp 8189-8200)
- Active Learning by Acquiring Contrastive Examples.. EMNLP (1) (pp 650-663)
- Frustratingly Simple Pretraining Alternatives to Masked Language Modeling.. EMNLP (1) (pp 3116-3125)
- Point-of-Interest Type Prediction using Text and Images.. EMNLP (1) (pp 7785-7797)
- An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction.. EMNLP (1) (pp 9174-9179)
- LegalOps: A Summarization Corpus of Legal Opinions. 2020 IEEE International Conference on Big Data (Big Data), 10 December 2020 - 13 December 2020.
- Quality In, Quality Out: Learning from Actual Mistakes.. EAMT (pp 145-153)
- Analyzing Political Parody in Social Media.. ACL (pp 4373-4384)
- Introduction to the nllp 2020workshop. CEUR Workshop Proceedings, Vol. 2645
- Automatic Generation of Topic Labels.. SIGIR (pp 1965-1968)
- LEGAL-BERT: "Preparing the Muppets for Court'".. EMNLP (Findings) (pp 2898-2904)
- An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels.. EMNLP (1) (pp 7503-7515)
- Complaint Identification in Social Media with Transformer Networks.. COLING (pp 1765-1771)
- Point-of-Interest Type Inference from Social Media Text.. AACL/IJCNLP (pp 804-810)
- LEGAL-BERT: The Muppets straight out of Law School. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020
- Automatic Generation of Topic Labels. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval View this article in WRRO
- Extreme Multi-Label Legal Text Classification: A Case Study in. Proceedings of the Natural Legal Language Processing Workshop 2019, June 2019 - June 2019.
- Neural Legal Judgment Prediction in English. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp 4317-4323). Florence, Italy, 28 July 2019 - 2 August 2019. View this article in WRRO
- Automatically identifying complaints in social media. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp 5008-5019). Florence, Italy, 28 July 2019 - 2 August 2019. View this article in WRRO
- Journalist-in-the-Loop: Continuous Learning as a Service for Rumour Analysis. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, November 2019 - November 2019.
- Automatically Identifying Complaints in Social Media.. ACL (1) (pp 5008-5019)
- Neural Legal Judgment Prediction in English. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019) (pp 4317-4323)
- Nowcasting the Stance of Social Media Users in a Sudden Vote: The Case of the Greek Referendum. CIKM '18 Proceedings of the 27th ACM International Conference on Information and Knowledge Management (pp 367-376), 22 October 2018 - 26 October 2018. View this article in WRRO
- Predicting Twitter User Socioeconomic Attributes with Network and Language Information. Proceedings of the 29th ACM Conference on Hypertext and Social Media (pp 20-24), 9 July 2018 - 12 July 2018. View this article in WRRO
- Nowcasting the Stance of Social Media Users in a Sudden Vote: The Case of the Greek Referendum.. CIKM (pp 367-376)
- Multimodal Topic Labelling. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, April 2017 - April 2017.
- Labeling topics with images using a neural network. Advances in Information Retrieval : 39th European Conference on IR Research, ECIR 2017, Aberdeen, UK, April 8-13, 2017, Proceedings (pp 500-505). Aberdeen, UK, 8 April 2017 - 13 April 2017.
- Inferring the Socioeconomic Status of Social Media Users Based on Behaviour and Language (pp 689-695)
- Session details: Short Papers. Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications
- TM 2015 -- Topic Models. Proceedings of the 24th ACM International on Conference on Information and Knowledge Management - CIKM '15, 18 October 2015 - 23 October 2015.
- A Hybrid Distributional and Knowledge-based Model of Lexical Semantics. Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, June 2015 - June 2015.
- An analysis of the user occupational class through Twitter content. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (pp 1754-1764)
- Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications, TM 2015, Melbourne, Australia, October 19, 2015. TM@CIKM
- Labelling Topics using Unsupervised Graph-based Methods. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Vol. 2 (pp 631-636)
- Measuring the Similarity between Automatically Generated Topics. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers, April 2014 - April 2014.
- Representing topics labels for exploring digital libraries. IEEE/ACM Joint Conference on Digital Libraries, 8 September 2014 - 12 September 2014.
- Predicting and Characterising User Impact on Twitter. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics
- PATHS: A System for Accessing Cultural Heritage Collections.. ACL (Conference System Demonstrations) (pp 151-156)
- UBC UOS-TYPED: Regression for Typed-similarity. *SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Vol. 1 (pp 132-137)
- Evaluating topic coherence using distributional semantics. Proceedings of the 10th International Conference on Computational Semantics, IWCS 2013 - Long Papers
- Representing topics using images. NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference (pp 158-167)
- UBC UOS-TYPED: Regression for Typed-similarity. SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual SimilaritySEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity (pp 132-137)
- PATHS - Exploring Digital Cultural Heritage Spaces. Theory and Practice of Digital Libraries 2012. Cyprus
- Computing Similarity between Cultural Heritage Items using Multimodal Features. Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (pp 85-93). Avignon, France
- User-centred design to support exploration and path creation in cultural heritage collections. CEUR Workshop Proceedings, Vol. 909 (pp 75-78)
- View this article in WRRO An empirical study on cross-lingual vocabulary adaptation for efficient language model inference. Findings of the Association for Computational Linguistics: EMNLP 2024. Miami, Florida, 12 November 2024 - 12 November 2024.
- Introduction
Preprints
- Self-calibration for Language Model Quantization and Pruning.
- Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research, arXiv.
- Vocabulary Expansion for Low-resource Cross-lingual Transfer, arXiv.
- Who is bragging more online? A large scale analysis of bragging in social media, arXiv.
- Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models, arXiv.
- An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Generative LLM Inference, arXiv.
- We Need to Talk About Classification Evaluation Metrics in NLP, arXiv.
- How Does Calibration Data Affect the Post-training Pruning and Quantization of Large Language Models?, arXiv.
- Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization, arXiv.
- Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?, arXiv.
- Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention, arXiv.
- Regulation and NLP (RegNLP): Taming Large Language Models, arXiv.
- Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets, arXiv.
- Frustratingly Simple Memory Efficiency for Pre-trained Language Models via Dynamic Embedding Pruning, arXiv.
- Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks, arXiv.
- A Multimodal Analysis of Influencer Content on Twitter, arXiv.
- Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues, arXiv.
- Active Learning Principles for In-Context Learning with Large Language Models, arXiv.
- Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science, arXiv.
- Rethinking Semi-supervised Learning with Language Models, arXiv.
- On the Limitations of Simulating Active Learning, arXiv.
- Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement, arXiv.
- Incorporating Attribution Importance for Improving Faithfulness Metrics, arXiv.
- Self-training through Classifier Disagreement for Cross-Domain Opinion Target Extraction, arXiv.
- It's about Time: Rethinking Evaluation on Rumor Detection Benchmarks using Chronological Splits, arXiv.
- On the Impact of Temporal Concept Drift on Model Explanations.
- HashFormers: Towards Vocabulary-independent Pre-trained Transformers.
- Improving Graph-Based Text Representations with Character and Word Level N-grams.
- Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection.
- Combining Humor and Sarcasm for Improving Political Parody Detection.
- Identifying and Characterizing Active Citizens who Refute Misinformation in Social Media, arXiv.
- A Hierarchical N-Gram Framework for Zero-Shot Link Prediction, arXiv.
- Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection, arXiv.
- How does the pre-training objective affect what large language models learn about linguistic properties?.
- Automatic Identification and Classification of Bragging in Social Media, arXiv.
- An Empirical Study on Explanations in Out-of-Domain Settings, arXiv.
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English, arXiv.
- An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction, arXiv.
- Knowledge Distillation for Quality Estimation, arXiv.
- Analyzing Online Political Advertisements, arXiv.
- On the Ethical Limits of Natural Language Processing on Legal Text, arXiv.
- Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification, arXiv.
- On the Importance of Effectively Adapting Pretrained Language Models for Active Learning, arXiv.
- Flexible Instance-Specific Rationalization of NLP Models, arXiv.
- Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases, arXiv.
- Modeling the Severity of Complaints in Social Media, arXiv.
- Complaint Identification in Social Media with Transformer Networks, arXiv.
- LEGAL-BERT: The Muppets straight out of Law School, arXiv.
- An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels, arXiv.
- Point-of-Interest Type Inference from Social Media Text, arXiv.
- Automatic Generation of Topic Labels, arXiv.
- Unsupervised Quality Estimation for Neural Machine Translation, arXiv.
- Analyzing Political Parody in Social Media, arXiv.
- Automatically Identifying Complaints in Social Media, arXiv.
- Neural Legal Judgment Prediction in English, arXiv.
- Re-Ranking Words to Improve Interpretability of Automatically Generated Topics, arXiv.
- Graph Node-Feature Convolution for Representation Learning.
- Nowcasting the Stance of Social Media Users in a Sudden Vote: The Case of the Greek Referendum, arXiv.
- Predicting Twitter User Socioeconomic Attributes with Network and Language Information, arXiv.
- Labeling Topics with Images using Neural Networks, arXiv.
- Extreme Multi-Label Legal Text Classification: A case study in EU Legislation.
- Grants
-
Current Grants
-
Efficient Deployment of Large Language Models for Industrial Applications, Industrial, 07/2024 - 07/2025, £30,360, as PI
-
AdSoLve: Addressing socio-technical limitations of Large Language Models (LLMs) for medical and social computing, RAI (EPSRC), 05/2024 - 03/2028, £3,498,789, as Co-PI
-
ESPERANTO: Exchanges for SPEech ReseArch aNd TechnOlogies, Horizon 2020, 01/2021 - 12/2025, £38,070, as co-PI
-
UKRI Centre for Doctoral Training in Speech and Language Technologies and their Applications, EPSRC, 04/2019 - 09/2027, £5,508,850, as Co-PI
Previous Grants
-
SAI: Social Explainable Artificial Intelligence, EPSRC, 02/2021 - 01/2024, £366,348, as PI
-
Understanding online political advertising: perceptions, uses and regulation, Leverhulme, 01/2021 - 07/2024, £395,011, as Co-PI
-
Responsible AI for Inclusive, Democratic Societies: A cross-disciplinary approach to detecting and countering abusive language online, ESRC, 02/2020 - 01/2024, £508,135, as Co-PI
-
Bergamot: Browser-based Multilingual Translation, EC H2020, 01/2019 - 12/2021, £473,113, as Co-PI
-
Innovation Next Generation Services Through Collaborative Design, ESRC, 12/2018 - 11/2020, £284,926, as Co-PI
-
Journalist-in-the-Loop Machine Learning as a Service for Rumour Analysis, Google, 11/2018 - 12/2019, £44,642, as Co-PI
-
Alexa Fellowship, Amazon, 08/2018 - 08/2021, £73,000, as PI
-