FAIR Data and Software Awards
Celebrating 10 years of the FAIR Principles for research data
To celebrate 10 years of the FAIR Principles, the University Library launched the FAIR Data and Software Awards in March 2026 to recognise staff and PGRs who have demonstrated best practice in making their digital research outputs FAIR: Findable (F), Accessible (A), Interoperable (I), and Reusable (R).
About the prize
The FAIR Principles are key to openness and transparency, good research data management, and responsible research practices, yet a recent survey by the UK Reproducibility Network (UKRN) found that awareness of the FAIR Principles is still relatively low among researchers across all disciplines. The FAIR Data and Software Awards therefore aim to showcase and celebrate examples where individuals and teams have taken practical approaches towards making their digital research outputs (which may include quantitative or qualitative data, creative practice, software, digital media, etc.) FAIR, helping to raise awareness about what FAIR looks like in practice across different disciplines and research contexts.
We encouraged academic, technical, and professional staff and PGRs from across the University to take part in the FAIR Data and Software Awards competition, celebrating concrete examples of how digital research objects are being made Findable, Accessible, Interoperable, and Reusable (FAIR).
Winners
- Dr Romain Thomas and Dr Evgenia Dammer, 'STON: SofTware for petrOgraphic visualisatioN' (School of Computer Science, Faculty of Engineering)
Project Description
Thin-section petrography is a widely used technique in archaeology for analyzing the composition of ceramic and stone objects, as well as investigating their production technology and provenance. This method involves studying these materials in thin sections mounted on glass slides under a polarizing microscope to examine their microscopic features. A critical aspect of archaeological study is the comparison and identification of patterns within these features across multiple samples. However, it is typically only possible to view one sample at a time under the microscope. As a result, this method relies heavily on visual memory and repeated observations, making the process inefficient and time-consuming, particularly when dealing with hundreds of samples.
This tool is designed to address these challenges by enabling users to observe multiple photomicrographs simultaneously within a single, convenient interface. It facilitates detailed comparisons, clustering, and data recording, which is especially important in ceramic paste analysis. By allowing users to view multiple samples side by side, the software supports efficient sample grouping and evaluation of compositional characteristics.
Alignment with FAIR Principles
- Findability: STON is permanently identifiable via a unique Digital Object Identifier (DOI) generated through its publication in the Journal of Open Source Software (JOSS) and its archived record on Zenodo. It is permanently indexed in the Software Heritage archive. High machine discoverability is ensured through a CITATION.cff metadata file stored in its public GitHub repository.
- Accessibility: The source code and its comprehensive development history are fully accessible on GitHub without authentication walls. Long-term public accessibility is guaranteed via Zenodo, protecting the code from future repository changes, while a dedicated public website serves as a clear entry point for users.
- Interoperability: STON operates natively on non-proprietary, machine-readable formats like .TIF, .PNG, and .TXT. It is constructed on a standard, open-source Python library stack, guaranteeing frictionless integration with existing archaeological pipelines and broader scientific computing environments.
- Reusability: The software is released under a permissive GNU GPL license, legally protecting modification and redistribution rights. Reusability is enhanced by an active test suite packed with sample data, public contributor guidelines, and explicit developer notes for future community maintenance.
References
- Repository: GitHub Repository
- Documentation: STON Documentation Site
- JOSS Publication: JOSS Article (DOI: 10.21105/joss.08144)
Biographies
Dr Romain Thomas is the Head of Research Software Engineering at the University of Sheffield. His role is to lead a group of research software engineers (RSEs) who are collaborating with research groups at the University to deliver high quality research software. Before coming to Sheffield, Romain was a Staff Astronomer at the European Southern Observatory (ESO) in Chile where he was leading software projects for data quality control at the Very Large Telescopes.
Dr Evgenia Dammer is an archaeological scientist at the Rathgen Research Laboratory at the Prussian Cultural Heritage Foundation in Berlin, Germany. She leads and participates in projects about ancient ceramic and stone technologies around the world as well as about the impact of climate change on cultural heritage institutions.
- Dr Joseph Nockels and Jamie McLaughlin, 'Recognising Hands, Recognising Processes - eXplainable Automated Text Recognition for Scottish Spiritualist Newspapers' (School of History, Philosophy and Digital Humanities, Faculty of Arts & Humanities)
Project Description
With library curation increasingly involving AI-enabled historical collection transcription, practitioners require greater support in evaluating system reliability. Supported by the National Library of Scotland (NLS) and Digital Humanities Institute (DHI), this project investigated how Handwritten Text Recognition (HTR) and Optical Character Recognition (OCR) tools perform when aligned with "eXplainable" AI (XAI) principles, the effort to explain computational processes to non-technical users.
Using the NLS's serialisation of The Spiritualist Newspaper (1869–1882), the project systematically benchmarked ten major community and commercial AI transcription tools against a verified baseline. The project explored whether implementing transparent, explainable workflows compromised transcription speed and precision, translating these technical findings into clear procurement and training frameworks for cultural heritage staff.
Alignment with FAIR Principles
- Findability: The baseline "Ghostwriter" Transkribus model is assigned a unique, discoverable URL. The underlying ground truth transcription dataset is held on Hugging Face under a persistent identifier, and all supporting research presentations are discoverable on Zenodo using distinct DOIs.
- Accessibility: Files are openly retrievable via Hugging Face (.txt, image, and ALTO formats) and Zenodo (.ppt). Furthermore, an interactive public dashboard hosted by the DHI will provide a 10-year access pathway for curatorially comparing model errors alongside intelligible, downloadable Jupyter Notebooks (.ipynb).
- Interoperability: To ensure that these tests were workable by library curators, with minimal hardware, the project prioritised open and locally hosted OCR models, via Jupyter Notebooks. The project then systematically cataloged and ranked each software’s interoperability constraints (such as third party dependencies and/or heavy remote processing needs).
- Reusability: The open-access dataset has already achieved over 100 organic downloads, while enabling further collaborative document-understanding research with University College London academics. In factoring in local applications, the project also demonstrated how legacy tools like Tesseract can still outperform commercial LLMs on specific historical scripts, informing NLS recommendations and establishing a reliable, reproducible text-correction pipeline for 19th-century newspaper transcription.
References
- Dataset & Readme: Hugging Face Repository
- Transcription Model: Transkribus Ghostwriter Model
- Zenodo Presentation: Zenodo Record
- External Reuse Paper: arXiv Pre-print
Biographies
Joe Nockels (Lead Researcher) is a Research Associate at the Digital Humanities Institute (DHI), responsible for developing and supporting the strategic research theme, Digital Representation of Cultural Artefacts, which sets out to advance the state-of-the-art in the digital capture, interpretation and representation of physical culture. His research primarily focuses on AI-enabled transcription of historical manuscripts, critical digitisation and digital archives more broadly. Through a socio-technical lens, Joe explores how such methods are changing access to our collective past, libraries, and library users’ relationship with collections.
Jamie McLaughlin (Technical Specialist) is a Senior Research Software Engineer at the DHI. Since joining the DHI, Jamie has worked on over thirty Digital Humanities projects including the Hartlib Papers, Beyond The Multiplex, Digital Panopticon and The Old Bailey Proceedings. His present areas of interest include machine learning, 3D visualisation, WebXR and digital product design.
- Dr Dani Madrid-Morales, 'Beyond "Emergencies?" Reporting on Humanitarian Issues Around the World' (School of Information, Journalism and Communication, Faculty of Social Sciences)
Project Description
Conducted alongside collaborators at the University of Edinburgh and New York University, this project examined global media landscapes to determine if the restrictive "emergency imaginary" frame continues to dominate international humanitarian journalism. The research team constructed and analyzed a massive corpus comprising more than one million texts extracted from 582 distinct media sources across 92 nations, spanning a ten-year window (2010–2020).
Because the raw articles were harvested from commercial databases protected by strict third-party copyright laws, public redistribution of the text corpus was legally prohibited. Instead of abandoning open-science frameworks, the team strategically structured their open deposits around legally shareable computational derivatives and replication scripts. This design won "Best Dataset" at the University of Edinburgh's 2024 Digital Research Prizes.
Alignment with FAIR Principles
- Findability: The project published an Edinburgh DataShare collection featuring seven components, each bearing an individual, citable DOI. The overall collection is discoverable via its landing DOI. The repository metadata fields are explicitly cross-linked bi-directionally to the final Digital Journalism journal paper.
- Accessibility: All non-copyright-restricted files—including a comprehensive document-feature matrix (DFM), r-replication code, an expert keyword panel log, and media codebooks—are freely downloadable via standard HTTP. To handle the restricted source texts, metadata records detail the complete file architecture (FILE_STRUCTURE.md). Legitimate researchers can access the full text via a secure, legal Virtual Machine (VM) collaboration pipeline managed under standard CRediT taxonomy terms.
- Interoperability: Data tables are distributed in open formats (.csv, .xlsx). Advanced text processing outputs utilize R’s native .rds structural format, explicitly tailored to import into quanteda (the standard text analysis package), allowing external labs to integrate data without conversion.
- Reusability: Every component in the data collection carries distinct, machine-readable Creative Commons licensing. Because the collection pipeline, API retrieval scripts, and variable codebooks are exhaustively documented, external researchers at LMU Munich successfully reused the framework to publish two additional peer-reviewed papers.
References
- Journal Article: Digital Journalism Publication
- Code Repository: GitHub Repository
- Data Collection: Edinburgh DataShare
- Dr Benjamin Davison, 'The SHeffield Ice Velocity ExploreR (SHIVER)' (School of Geography and Planning, Faculty of Social Sciences)
Project Description
Glacier flow transports ice from the interior of the Greenland and Antarctic ice sheets into the oceans, contributing to global sea-level rise. While the global remote sensing community leverages satellite imagery to monitor ice velocities across Greenland and Antarctica, these datasets are typically scattered across isolated archives, presenting access barriers for climate scientists, educators, and policy stakeholders.
The SHeffield Ice Velocity ExploreR (SHIVER) addresses this fragmentation by unifying global ice velocity measurements into an interactive, low-latency web platform. Users can point and click anywhere on a map of Greenland or Antarctica to instantly visualize and plot historical ice flow trends.
Alignment with FAIR Principles
- Findability: To ensure long-term findability, the creator is linking the SHIVER application repository and the underlying Data Cube generation code directly to Zenodo to issue independent DOIs. A descriptive peer-reviewed manuscript is also being prepared for the open-access journal Earth System Science Data (ESSD).
- Accessibility: SHIVER is hosted publicly and free of charge via standard GitHub Pages. For direct computational workflows, the underlying multi-terabyte SHIVER Data Cube is being hosted on a Google Cloud Bucket to enable machine-to-machine extraction. The application interface will be designed to auto-generate API requests in multiple programming languages on the fly.
- Interoperability: Data extractions can be exported directly into open, interoperable formats, specifically .csv and .nc (NetCDF) files. This allows the outputs to be fed directly into external spatial processing software.
- Reusability: The frontend code and back-end data cube generation software are protected under the open GNU General Public License. Exported time-series files are packaged with extensive processing metadata (detailing quality metrics, gap-filling parameters, and coordinates). Furthermore, NetCDF exports automatically comply with rigorous international CF-1.8 and ACDD-1.3 metadata conventions alongside auto-generated attribution tables.
References
- Web Interface: SHIVER Web Application
- Source Code: GitHub Repository
Biography
Dr Ben Davison received a BSc in Physical Geography and an MSc in Polar and Alpine Change from the University of Sheffield before completing his PhD in Glaciology at the University of St Andrews. Ben’s research centres on understanding how changes in atmospheric and oceanic conditions affect the flow and mass balance of the Greenland and Antarctic Ice Sheets, using a combination of satellite remote sensing, fieldwork and numerical modelling of ocean circulation. Ben's current research at the University of Sheffield focuses on investigating how atmospheric warming and increased ice surface melting has affected ice flow variations on the Antarctic Peninsula.
- Dr Lindsay Lee and Dr Tim Rooker, 'Data-Centric Manufacturing' (Advanced Manufacturing Research Centre)
Project Description
Advanced manufacturing operational and R&D teams often collect massive volumes of complex data, but lack standardised, cross-organisational data curation frameworks to manage and maximise return on investment from their datasets. The Data-Centric Manufacturing (DCM) toolkit was built to establish a trusted, scalable body of data-driven research templates, best-practices and development tools explicitly tailored for industrial engineers. Using an agile data science workflow, the toolkit embeds FAIR research principles directly into factory-level workflows, ensuring that industrial data asset strategies comply with emerging international Trustworthy AI standards.
Alignment with FAIR Principles
- Findability: In combination with the AMRC’s Harbour platform for project management, the DCM toolkit enhances the findability of data generated within projects by enabling common approaches for data dictionary compilation, granular dataset identification and comprehensive reporting. As more teams adopt and record their projects with the toolkit, searchable project repositories (such as Harbour) will be extendable with automated indexing of datasets-in-context.
- Accessibility: Industrial manufacturing involves highly sensitive commercial and national intellectual property. The DCM toolkit supports engineers to balance non-disclosure agreements with research opportunities from the outset, whilst maximising quality of their outputs and ensuring that data remains accessible, within agreed bounds, after the project ends.
- Interoperability: Data generated with the toolkit is recorded in a structured, unified data dictionary with full visibility on the critical metadata. The workflow was designed to accommodate future ontology research and standardised reference language across manufacturing organisations and supply chains.
- Reusability: The toolkit addresses the loss of knowledge imparted by staff turnover through comprehensive documentation and a common structure for data-centric projects which is familiar to both past and current engineers. By introducing data pipeline best-practices alongside standardised lifecycle documentation, the toolkit ensures that internal team members can inspect, replicate, and reuse old manufacturing datasets safely.
References
- Public Portal: DCM Google Site (In Development)
- Internal Portal: AMRC SharePoint (Restricted)
- Initial Dashboard Architecture: DCM Shared Architecture Spreadsheet (University Log-in Required)
- Project Report: Signed Technical Report PDF (University Log-in Required)
Biographies
Dr Tim Rooker (Technical Fellow, IMG) is an experienced Data Scientist and Engineer, and one of the architects of the Data-Centric Manufacturing toolkit. He first joined the AMRC as a doctoral student in 2016, where his research explored predictive modelling approaches to enhance inspection methods in multi-axis machining. He joined IMG in 2021, where he has gained exposure to a wide range of data related projects including sensor system design, process monitoring, digital threads, and cloud analytics. His current research interests focus on data engineering for industrial applications, data Lakehouse design and implementation, and supporting colleagues with robust statistical workflows.
Dr Lindsay Lee (Technical Fellow, IMG) is a data scientist and co-creator of the Data-Centric Manufacturing toolkit. Lindsay has a PhD in Probability and Statistics and a passion for creating collaborative teams of data specialists and domain experts (and others) to tackle real-world problems using data-driven methods for actionable outcomes. After 10 years in Environmental Science using advanced statistical methods to better understand complex systems, Lindsay joined IMG in 2021. Lindsay believes that the key to creating impactful data-driven projects is facilitating the necessary knowledge exchange and promoting respect for individual expertise, thus the DCM toolkits aims to provide the guidance and resources to fulfil these goals.
- Faizhal Arif Santosa, 'Coconut Libtool' (Postgraduate Student, School of Information, Journalism and Communication, Faculty of Social Sciences)
Project Description
Textual mining and data visualization are powerful methods for extracting insights from large collections of textual data, yet many library professionals are restricted from using these techniques due to a lack of software development experience or limited local computing budgets. Coconut Libtool is a free, open-source, web-based text analysis application developed specifically to eliminate these technical barriers. Because the platform runs directly inside standard web browsers, it requires no software installations, letting non-technical library staff experiment with natural language processing seamlessly.
Alignment with FAIR Principles
- Findability: The underlying source code repository is fully open on GitHub and directly integrated into Zenodo, which issues persistent DOIs. The datasets are deposited in Zenodo and are automatically harvested by OpenAIRE, while the source code is archived by Software Heritage. Discoverability is enhanced through rich metadata, including researcher identifiers (ORCID iDs) and institutional identifiers provided by the Research Organization Registry (ROR).
- Accessibility: Long-term digital preservation is maintained across both GitHub and Zenodo. DataCite relationType properties are used to connect related research outputs, including historic project websites, earlier prototypes, and scientific conference posters.
- Interoperability: To maximize cross-disciplinary metadata harvesting, Coconut Libtool maps its internal vocabulary keywords directly to the established Library of Congress Subject Headings (LCSH) (e.g., "Text data mining", "Open source software--Library applications"). External dependencies and source code are fully documented to promote transparency, reproducibility, and interoperability across repository and discovery platforms.
- Reusability: The application is released under MIT License, which is explicitly declared within the root directories of its distribution platforms. The software includes detailed project development logs and explicit contributor histories, ensuring that future open-source developers understand the software's functional provenance for safe codebase adaptation.
References
- Source code and documentation: Zenodo repository
Biography
Faizhal Arif Santosa is currently studying for an M.Sc. in Data Science at the University of Sheffield. He is a Librarian at the National Research and Innovation Agency, Indonesia.
Runners up
- Dr Sanjeetha Pennada, 'DT-DRIVE: A Tool for Deterministic Replay-Based Testing of Autonomous Driving Systems', (School of Computer Science, Faculty of Engineering)
Project Description
Evaluating the safety and reliability of Autonomous Driving Systems (ADS) relies heavily on simulation-based testing. However, simulation environments often exhibit non-deterministic behaviour, meaning that identical scenarios can produce different outcomes across repeated executions. This makes failures difficult to reproduce, complicates debugging, and hinders fair comparisons between different ADSs.
To address these challenges, we developed DT-Drive, an open-source record-modify-replay framework for deterministic testing and debugging of ADS. DT-Drive records a driving scenario once, including environmental conditions, traffic participants, and vehicle trajectories, and then deterministically replays that scenario under identical or modified conditions.
A key feature of DT-Drive is its ability to decouple the ego vehicle from the recorded scenario. This enables ego-injection replay, where different autonomous driving agents can be evaluated fairly under exactly the same conditions. DT-Drive also supports counterfactual replay, allowing researchers to modify environmental factors such as weather, traffic, or buildings to investigate “what-if” scenarios and analyse the root causes of failures.
DT-Drive was evaluated on 128 CARLA benchmark scenarios and achieved 100% deterministic evaluation results, eliminating simulator-induced flakiness and supporting reproducible experimentation.
Alignment with FAIR Principles
- Findability: The project maintains a publicly discoverable GitHub repository structured with a detailed README, project descriptions, and block diagrams mapping out the operational workflow. A demonstration video is also available on YouTube, providing an overview of the tool’s capabilities and key features. While a persistent Digital Object Identifier (DOI) is currently paused to protect anonymity during a conference peer-review cycle, a stable, citable DOI will be minted immediately following publication.
- Accessibility: The DT-Drive source code, documentation, demo video and demonstration materials are openly available without authentication barriers. The repository can be accessed and cloned using standard HTTPS protocols. To support new users, the project provides step-by-step installation instructions, usage tutorials, example artefacts, and a video demonstration illustrating the complete workflow.
- Interoperability: DT-Drive uses widely adopted machine-readable formats, including JSON scenario definitions and CARLA log files. The framework is designed to support multiple CARLA-compatible autonomous driving agents through its ego-injection mechanism. Software dependencies, execution requirements, and configuration settings are explicitly documented to facilitate integration into existing testing and evaluation pipelines.
- Reusability: DT-Drive is released under the permissive MIT License, enabling researchers and practitioners to use, modify, extend, and redistribute the software. The repository includes example scenarios, sample outputs, evaluation artefacts, and detailed documentation to support reuse. The framework promotes reproducible experimentation by producing deterministic outputs from recorded scenarios, enabling consistent evaluation and fair comparison of autonomous driving systems. The reusability of the pipeline has been independently validated by a PhD researcher and a Research Software Engineer, both of whom successfully reproduced execution benchmarks using only the provided documentation.
References
- Source Code & Documentation: GitHub Repository (Public Access)
- Tool Demonstration Video: DT Drive Tool Demo
Biography
Dr Sanjeetha Pennada is a Postdoctoral Research Associate in the School of Computer Science at the University of Sheffield. Her research focuses on improving the testing, evaluation, and reliability of ML-enabled autonomous systems, with interests in software engineering, artificial intelligence, robotics, and safety-critical systems. She completed her PhD at the University of Strathclyde, where she developed ALICS, an AI-driven robotic imaging system for infrastructure inspection, resulting in patents. Her work has led to publications in areas including AI, computer vision, autonomous systems, and software engineering. Dr Pennada contributes to the wider research community through peer-review activities for leading journals, including Automation in Construction, and serves as a Programme Committee Member for the International Conference on Software Engineering (ICSE). She is also a University Gold Medal recipient and is passionate about developing trustworthy, reproducible, and impactful AI and software solutions for real-world engineering problems.
- Mustafa Onur Onen, 'Strategic deep-water observations enhance probabilistic parameter estimation of lake hydrodynamic models' (Postgraduate Researcher, School of Mechanical, Aerospace and Civil Engineering, Faculty of Engineering)
Project Description
Physics-based hydrodynamic computer models are essential for predicting lake water temperatures, yet they are notoriously difficult to calibrate. Because most lakes lack multi-depth sensor networks, many different configuration settings can yield identical temperature predictions. This phenomenon creates immense uncertainty regarding which parameters accurately represent the actual water column.
This project conducted one of the first rigorous uncertainty estimations in lake modelling, using seven years of historical environmental data from Lake Mendota to analyze how sensor placement impacts model accuracy. Surprisingly, the study revealed that calibrating a model using a single sensor placed in a deep-water zone yielded far more accurate predictions across the entire water column than relying on surface sensors or, in some scenarios, data from all depths combined. This discovery provides an efficient framework for water resource managers, proving that deep-water sensors can dramatically optimize drinking water reservoir models without the need for expensive multi-depth monitoring equipment.
Alignment with FAIR Principles
- Findability: The complete underlying dataset is uniquely and permanently identifiable via a persistent DOI generated through the University of Sheffield’s Online Research Data AnaLytics (ORDA) repository. Discoverability is maximized via rich, machine-readable metadata records containing tailored keywords (e.g., "lake hydrodynamic modelling", "scoring rules", "data sparsity"). The project's preprint paper links directly back to this specific identifier within a dedicated Data Availability statement.
- Accessibility: All research data and processing workflows are fully open-access and retrievable via standard HTTP protocols without registration or authentication barriers. Anyone can download the full data collection and associated code execution scripts directly using the ORDA DOI landing page.
- Interoperability: The analysis workflows are written in Python, the dominant open-source scientific computing language. To eliminate environment mismatches, the repository includes explicit configuration files for both Windows and Linux. The scripts are built to be highly scalable, packaging High-Performance Computing (HPC) batch job submission files alongside local deployment steps. Data is ingested and processed using open, highly compatible formats, including CSV, NetCDF, and MAT files.
- Reusability: The repository is cleanly divided into three functional sub-folders: Sensitivity Analysis, Running Simulations, and Uncertainty Estimation. Each folder features an independent README file and pre-calculated intermediate results, allowing users to skip computationally heavy processing steps. The workflows carry an MIT License to safeguard unrestricted academic and commercial reuse.
References
- Research Preprint Paper: Earth and Space Science Open Archive
- Data & Workflow Repository: University of Sheffield ORDA
Biography
Mustafa Onur Onen is a PhD Candidate and civil engineer specializing in lake and reservoir water quality modelling under uncertainty. His research focuses on predicting nitrate concentrations in UK reservoirs to support clean water planning, reservoir operations, and long-term water supply decision-making.
Mustafa’s work examines how future climate conditions, nutrient loads, model structure, measurement uncertainty, and real-world data limitations influence water quality outcomes and infrastructure planning. By explicitly acknowledging the constraints and imperfections of available data, his research aims to develop more robust and decision-relevant modelling approaches. He combines advanced environmental modelling with 10 years of engineering experience in water infrastructure planning and design, bringing both research depth and practical industry insight to challenges in sustainable water management.
- Sylvia Whittle, 'AFMReader' (Postgraduate Researcher, School of Chemical, Biological and Materials Engineering, Faculty of Engineering)
Project Description
Atomic Force Microscopy (AFM) is a powerful imaging technique used to scan surfaces at the nanoscale. However, the field suffers from severe data fragmentation. Different microscope manufacturers deploy conflicting data storage standards, ranging from altered implementations of common .tiff images to entirely closed, proprietary, and undocumented binary formats. Because of these barriers, data sharing between collaborating laboratories is highly inefficient, and custom analysis software written for one instrument is often completely incompatible with another.
To overcome this duplication of effort and remove the barrier to reproducibility, AFMReader was developed. It is a unified, open-source Python package that consolidates dozens of disparate AFM file formats into a single interface. By combining community-built loaders (such as .spm parsers) with custom schemas reverse-engineered from raw binary bytes, AFMReader provides structural data extraction via a single command. The tool has achieved substantial community adoption, surpassing 44,000 downloads on the Python Package Index (PyPI).
Alignment with FAIR Principles
- Findability: The application uses a distinct, searchable name across its public GitHub and PyPI repositories. Releases are managed using strict semantic version numbers to ensure traceable software iterations. To establish permanent, scholarly findability, a dedicated peer-reviewed journal submission is underway, which will mint a structural paper DOI alongside a matching entry in the University of Sheffield’s Online Research Data AnaLytics (ORDA) repository.
- Accessibility: Built entirely on Python—the dominant open-source language of the physical sciences—AFMReader removes all financial and software barriers to entry. The stable release can be deployed instantly using standard package protocols (pip install AFMReader), while active development versions are accessible via a public GitHub repository. Long-term digital preservation is safeguarded by mirroring documentation and citation instructions across PyPI, GitHub, and an upcoming institutional ORDA deposit tied to the TopoStats 2.0 framework under a GPL 3.0 license.
- Interoperability: AFMReader is explicitly designed to be imported as a dependency in external image-processing pipelines. Instead of forcing users to sort through dense, poorly documented hardware metadata heaps, the package parses files into an immediately usable Python tuple consisting of the processed matrix, pixel-to-nanometer scaling factors, and simplified metadata attributes. Future roadmap updates include introducing an optional .HDF5 export format to allow structured, cross-platform file saving.
- Reusability: The repository features an interactive, copy-and-pasteable Jupyter Notebook tutorial demonstrating configuration settings for every supported file type. Code quality, type safety, and standardized formatting are automatically enforced through a rigid continuous integration (CI) pipeline using pre-commit hooks (including black, ruff, pylint, and mypy). Comprehensive numpydoc-validation guarantees that every function includes strict docstring type parameters, and automatic GitHub Actions test suites prevent broken code transformations from merging into the stable main branch.
References
- Source Code Repository: GitHub Repository
- Package Registry: PyPI Project Page
- Documentation Website: AFMReader Documentation
Biography
Sylvia Whittle is a 3rd year PhD student in the Pyne Lab research group at the University of Sheffield University who is looking at computational methods for analysing DNA conformation in Atomic Force Microscopy images. Sylvia is funded by Discovery Medicine North (DiMeN). Sylvia greatly enjoys working on open source software and attempting to make quantitative analysis more widespread and accessible to fellow researchers. Before beginning this doctoral research, Sylvia studied physics at Newcastle University.
Case studies prepared and edited by Dr Qwin Saikia (Research Data Steward, University of Sheffield Library), in collaboration with the winners.