Professor Nikos Aletras
School of Computer Science
Professor of Natural Language Processing
Head of the Natural Language Processing (NLP) research group


Full contact details
School of Computer Science
Regent Court (DCS)
211 Portobello
Sheffield
S1 4DP
- Profile
-
Nikos Aletras is a Professor in Natural Language Processing (NLP) in the Computer Science Department at the University of Sheffield, co-affiliated with the Machine Learning (ML) group. Previously, he was a research scientist at Amazon (Core ML and Alexa) and a research associate at UCL, Department of Computer Science, Media Futures Group. He completed a PhD in NLP at the University of Sheffield. His research interests are in NLP, Machine Learning and Data Science. He develops text analysis methods to solve problems in other scientific areas such as (computational) social and legal science.
- Research interests
-
- NLP
- Computational Social Science
- Legal NLP
- Data Science
- Machine Learning
- Publications
-
Journal articles
- Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization. Transactions of the Association for Computational Linguistics, 12, 1163-1181.
- Predicting and analyzing the popularity of false rumors in Weibo. Expert Systems with Applications, 122791-122791.
- CIKM'23 Program Chairs' Welcome. International Conference on Information and Knowledge Management, Proceedings, v.
- Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention. Findings of the Association for Computational Linguistics: EMNLP 2023.
- Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.
- Regulation and NLP (RegNLP): Taming Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.
- Active Learning Principles for In-Context Learning with Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2023.
- Node-Feature Convolution for Graph Convolutional Networks. Pattern Recognition, 108661-108661.
- Identifying Twitter users who repost unreliable news sources with linguistic information. PeerJ Computer Science, 6. View this article in WRRO
- Unsupervised quality estimation for neural machine translation. Transactions of the Association for Computational Linguistics, 8, 539-555. View this article in WRRO
- Analyzing Political Parody in Social Media. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. View this article in WRRO
- Evaluating topic representations for exploring document collections. Journal of the Association for Information Science and Technology, 68(1), 154-167. View this article in WRRO
- Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective. PeerJ Computer Science, 2. View this article in WRRO
- Why are these similar? Investigating item similarity types in a large digital library. Journal of the Association for Information Science and Technology, 67(7), 1624-1638. View this article in WRRO
- Computing similarity between items in a digital library of cultural heritage. Journal of Computing and Cultural Heritage, 5(4).
- Is There a Permanent Campaign for Online Political Advertising? Investigating Partisan and Non-Party Campaign Activity in the UK between 2018–2021. Journal of Political Marketing, 1-19.
- Flexible Instance-Specific Rationalization of NLP Models. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10), 10545-10553.
- Studying User Income through Language, Behaviour and Affect in Social Media. PLoS ONE, 10(9).
Conference proceedings papers
- Self-training through Classifier Disagreement for Cross-Domain Opinion Target Extraction. Proceedings of the ACM Web Conference 2023
- Rethinking Semi-supervised Learning with Language Models. Findings of the Association for Computational Linguistics: ACL 2023, July 2023 - July 2023.
- On the Limitations of Simulating Active Learning. Findings of the Association for Computational Linguistics: ACL 2023, July 2023 - July 2023.
- Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), July 2023 - July 2023.
- Incorporating attribution importance for improving faithfulness metrics. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1 (pp 4732-4745). Toronto, Canada, 9 July 2023 - 9 July 2023. View this article in WRRO
- Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), July 2023 - July 2023.
- It’s about Time: Rethinking Evaluation on Rumor Detection Benchmarks using Chronological Splits. Findings of the Association for Computational Linguistics: EACL 2023, May 2023 - May 2023.
- Towards Suicide Ideation Detection Through Online Conversational Context. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
- Identifying and Characterizing Active Citizens who Refute Misinformation in Social Media. 14th ACM Web Science Conference 2022
- Translation Error Detection as Rationale Extraction. Findings of the Association for Computational Linguistics: ACL 2022, May 2022 - May 2022.
- On the Importance of Effectively Adapting Pretrained Language Models for Active Learning. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), May 2022 - May 2022.
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022 - May 2022.
- How does the pre-training objective affect what large language models learn about linguistic properties?. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), May 2022 - May 2022.
- Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection. Findings of the Association for Computational Linguistics: ACL 2022, May 2022 - May 2022.
- Automatic Identification and Classification of Bragging in Social Media. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022 - May 2022.
- An Empirical Study on Explanations in Out-of-Domain Settings. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022 - May 2022.
- Combining Humor and Sarcasm for Improving Political Parody Detection. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, July 2022 - July 2022.
- On the Impact of Temporal Concept Drift on Model Explanations. Findings of the Association for Computational Linguistics: EMNLP 2022, December 2022 - December 2022.
- A Hierarchical N-Gram Framework for Zero-Shot Link Prediction. Findings of the Association for Computational Linguistics: EMNLP 2022, December 2022 - December 2022.
- Knowledge distillation for quality estimation. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp 5091-5099). Bangkok, Thailand (virtual conference), 1 August 2021 - 6 August 2021. View this article in WRRO
- Point-of-Interest Type Prediction using Text and Images. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, November 2021 - November 2021.
- Frustratingly Simple Pretraining Alternatives to Masked Language Modeling. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, November 2021 - November 2021.
- Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, November 2021 - November 2021.
- Active Learning by Acquiring Contrastive Examples. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, November 2021 - November 2021.
- LegalOps: A Summarization Corpus of Legal Opinions. 2020 IEEE International Conference on Big Data (Big Data), 10 December 2020 - 13 December 2020.
- Automatic Generation of Topic Labels. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval View this article in WRRO
- Extreme Multi-Label Legal Text Classification: A Case Study in. Proceedings of the Natural Legal Language Processing Workshop 2019, June 2019 - June 2019.
- Neural Legal Judgment Prediction in English. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp 4317-4323). Florence, Italy, 28 July 2019 - 2 August 2019. View this article in WRRO
- Automatically identifying complaints in social media. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp 5008-5019). Florence, Italy, 28 July 2019 - 2 August 2019. View this article in WRRO
- Journalist-in-the-Loop: Continuous Learning as a Service for Rumour Analysis. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, November 2019 - November 2019.
- Nowcasting the Stance of Social Media Users in a Sudden Vote: The Case of the Greek Referendum. CIKM '18 Proceedings of the 27th ACM International Conference on Information and Knowledge Management (pp 367-376), 22 October 2018 - 26 October 2018. View this article in WRRO
- Predicting Twitter User Socioeconomic Attributes with Network and Language Information. Proceedings of the 29th ACM Conference on Hypertext and Social Media (pp 20-24), 9 July 2018 - 12 July 2018. View this article in WRRO
- Multimodal Topic Labelling. Proceedings of the 15th Conference of the European Chapter of the
Association for Computational Linguistics: Volume 2, Short Papers, April 2017 - April 2017.
- Labeling topics with images using a neural network. Advances in Information Retrieval : 39th European Conference on IR Research, ECIR 2017, Aberdeen, UK, April 8-13, 2017, Proceedings (pp 500-505). Aberdeen, UK, 8 April 2017 - 13 April 2017.
- Inferring the Socioeconomic Status of Social Media Users Based on Behaviour and Language (pp 689-695)
- Session details: Short Papers. Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications
- TM 2015 -- Topic Models. Proceedings of the 24th ACM International on Conference on Information and Knowledge Management - CIKM '15, 18 October 2015 - 23 October 2015.
- A Hybrid Distributional and Knowledge-based Model of Lexical Semantics. Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, June 2015 - June 2015.
- An analysis of the user occupational class through Twitter content. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (pp 1754-1764)
- Labelling Topics using Unsupervised Graph-based Methods. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Vol. 2 (pp 631-636)
- Measuring the Similarity between Automatically Generated Topics. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers, April 2014 - April 2014.
- Representing topics labels for exploring digital libraries. IEEE/ACM Joint Conference on Digital Libraries, 8 September 2014 - 12 September 2014.
- View this article in WRRO
- Introduction
Preprints
- Vocabulary Expansion for Low-resource Cross-lingual Transfer, arXiv.
- Who is bragging more online? A large scale analysis of bragging in social media, arXiv.
- Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models, arXiv.
- An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Generative LLM Inference, arXiv.
- We Need to Talk About Classification Evaluation Metrics in NLP, arXiv.
- How Does Calibration Data Affect the Post-training Pruning and Quantization of Large Language Models?, arXiv.
- Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization, arXiv.
- Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?, arXiv.
- Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention, arXiv.
- Regulation and NLP (RegNLP): Taming Large Language Models, arXiv.
- Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets, arXiv.
- Frustratingly Simple Memory Efficiency for Pre-trained Language Models via Dynamic Embedding Pruning, arXiv.
- Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks, arXiv.
- A Multimodal Analysis of Influencer Content on Twitter, arXiv.
- Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues, arXiv.
- Active Learning Principles for In-Context Learning with Large Language Models, arXiv.
- Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science, arXiv.
- Rethinking Semi-supervised Learning with Language Models, arXiv.
- On the Limitations of Simulating Active Learning, arXiv.
- Incorporating Attribution Importance for Improving Faithfulness Metrics, arXiv.
- Self-training through Classifier Disagreement for Cross-Domain Opinion Target Extraction, arXiv.
- Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization. Transactions of the Association for Computational Linguistics, 12, 1163-1181.
- Grants
-
Current Grants
-
Efficient Deployment of Large Language Models for Industrial Applications, Industrial, 07/2024 - 07/2025, £30,360, as PI
-
AdSoLve: Addressing socio-technical limitations of Large Language Models (LLMs) for medical and social computing, RAI (EPSRC), 05/2024 - 03/2028, £3,498,789, as Co-PI
-
ESPERANTO: Exchanges for SPEech ReseArch aNd TechnOlogies, Horizon 2020, 01/2021 - 12/2025, £38,070, as co-PI
-
UKRI Centre for Doctoral Training in Speech and Language Technologies and their Applications, EPSRC, 04/2019 - 09/2027, £5,508,850, as Co-PI
Previous Grants
-
SAI: Social Explainable Artificial Intelligence, EPSRC, 02/2021 - 01/2024, £366,348, as PI
-
Understanding online political advertising: perceptions, uses and regulation, Leverhulme, 01/2021 - 07/2024, £395,011, as Co-PI
-
Responsible AI for Inclusive, Democratic Societies: A cross-disciplinary approach to detecting and countering abusive language online, ESRC, 02/2020 - 01/2024, £508,135, as Co-PI
-
Bergamot: Browser-based Multilingual Translation, EC H2020, 01/2019 - 12/2021, £473,113, as Co-PI
-
Innovation Next Generation Services Through Collaborative Design, ESRC, 12/2018 - 11/2020, £284,926, as Co-PI
-
Journalist-in-the-Loop Machine Learning as a Service for Rumour Analysis, Industrial, 11/2018 - 12/2019, £44,642, as Co-PI
-
Alexa Fellowship, Amazon, 08/2018 - 08/2021, £73,000, as PI
-