1
PhD Project Proposals
The PhD project proposals listed below will be considered for 2022/23 studentships in the Department
of Informatics to start 1 October 2022 or later during the 2022/23 academic year.
This list is not inclusive and the potential applicants can alternatively identify and contact the
appropriate supervisors outlining their background and research interests or proposing their own project
ideas. The PhD projects are listed in two groups. In the first group are the projects with allocated
studentships: each project in this group has one allocated studentship.
The remaining studentships will be considered for the projects listed in the second group. The number
of those remaining studentships is smaller than the number of the projects in the second group. The
allocation of studentships will be based on the merits of individual applications.
Contents
PhD Project Proposals ............................................. 1
Projects with allocated PhD studentships ................................ 4
Algorithms and Data Analysis ...................................... 4
Data architectures with humans-in-the-loop ............................. 4
Data-driven modelling of value chains for efficient decision support using digital twins 5
Efficient Mechanism Design for Markets and Reallocation of Goods ............ 6
Imperfect Rationality and Computation ................................ 7
Towards Fair/Explainable AI for Robotics .............................. 7
Understanding the Complexity of Negotiations .......................... 7
Visual recognition with minimal supervision in deep learning context ........... 8
Wearable, Discreet Augmentative and Alternative Communication ............ 8
Supporting more Accessible Remote Communication ...................... 9
Object-Based Access: Enhancing Accessibility with Data-Driven Media ........ 10
Projects for the remaining studentships ................................. 11
Administrative Access Control Policies ............................... 11
“Alexa, cover your ears!”: Privacy-Aware AI Personal Assistants ............. 11
Automated Signature Generation for Network Intrusion Detection Systems (NIDS) 11
Backbone Guided Local Search Methods for MAX-SAT ................... 11
Development of Scalable General Artificial Intelligence (AI) Problem Solving Systems
........................................................... 15
Distributed Computing by Population Protocols ......................... 15
Fairness in Automatic Assessment Supervisor: Dr Zheng Yuan ............... 15
Formal verification of smart contracts ................................ 17
From Requirements to Models Using Natural Language Processing ........... 17
2
Human data interaction .......................................... 18
Human and social factors in information systems ........................ 19
Learning Representations for Reinforcement Learning in Human-Robot Interaction 20
Learning of Software Design Patterns ................................ 20
Machine Learning Augmented Algorithms ............................. 21
Model Driven Engineering in Finance ................................ 21
Modelling Predictive Space-Time Cube for Urban Informatics Supervisors: Dr Yijing Li,
Prof. Nicolas Holliman ........................................... 21
Modular and Hierarchical Learning and Representation of Large Software ...... 21
Monitoring Compliance with Dynamic Security Policies under Uncertainty ..... 22
Multiple Robots Performing Random Walks ........................... 22
Natural language explanations for artificial intelligence Supervisor: Dr Zheng Yuan . 23
Network Optimisation Algorithms .................................. 24
The Nexus between Crime, Mental Wellbeing and the Built Environment in Urban
Areas ....................................................... 24
A Novel Model-driven AI Paradigm for Intrusion Detection ................ 24
Participatory Agent-Based Modelling of Emergency Department Patient Flow ... 25
Personalised Medicine ........................................... 25
Predictive Visual Analytics for Urban Contingency Planning ................ 25
Privacy in the Internet of Things.................................... 26
Programming as an HCI Challenge - IDE Interaction Design ................ 26
Program editor design for accessibility................................ 26
Programming history for learning and reflection ......................... 27
Safe Reinforcement Learning from Human Feedback ..................... 27
Security and Safety of Cyber-Physical Systems ......................... 28
Smart Metering Voice Controlled Devices Supervisors: Dr Rita Borgo and Dr Alfie
Abdul-Rahman ................................................ 28
Software Verification and Nominal Dependent Type Theory ................ 29
String Sanitisation with Applications to Internet of Things Data .............. 30
Temporal and Resource Controllability of Workflows of Autonomous Systems ... 30
Towards Protection of Users in Online Social Networks ................... 31
The Undergound Economy: Understanding and Modelling Misuse in the Darkest
Corners of the Web ............................................. 31
Understanding Cyber-Dependent Crimes that are enabled by Malware from a Software
Development Perspective......................................... 32
Unifying Principals in Safe and Trusted Assistive AI ...................... 32
3
Unstructured Big Data ........................................... 33
Projects with allocated PhD studentships
Algorithms and Data Analysis
Supervisors: Dr Dimitrios Letsios
One PhD studentship will be allocated to the following project which lies at the intersection of
algorithms, computational optimization, and data science.
Models and Algorithms for Resource Allocation Problems with Machine Learning Predictions
This project aims to design and analyze optimization models and algorithms for temporal resource
allocation problems, e.g. electric power distribution, logistics, and production scheduling problems,
arising in different application domains, including the energy sector, manufacturing and process
engineering [Letsios et al. 2020]. The goal is to effectively assign resources, e.g. machine time and
energy, to activities, so as to optimize performance. Solving instances of such problems may result in
substantial economic benefits. Typically, future resource requirements and customer demand are not
precisely known in advance, but can be predicted using data science and machine learning capabilities
[Bertsimas et al. 2018]. However, these predictions are subject to errors. In this context, determining
efficient algorithms for supporting and automating the resource allocation process is a challenge. To
this end, prior work develops efficient algorithms and optimization models accounting for the time-
varying nature and uncertainty of temporal resource allocation problems [Antoniadis et al. 2020,
Letsios et al. 2021,Manish et al. 2018].
This project aims to (i) develop novel discrete optimization methods for temporal resource allocation
methods and analyze their performance theoretically, (ii) suggest ways to mitigate the effect of
prediction errors in the quality of the obtained solutions, and (iii) evaluate the performance of the
proposed approaches numerically using real data. Prior experience on discrete optimization,
approximation/online algorithms and/or integer programming will be useful.
[Antoniadis et al. 2020] Antonios Antoniadis, Christian Coester, Marek Eliás, Adam Polak
and Bertrand Simon. Online Metric Algorithms with Untrusted Predictions. International
Conference on Machine Learning (ICML), 2020.
[Bertsimas et al. 2018] Dimitris Bertsimas, Vishal Gupta, Nathan Kallus. Data-Driven Robust
Optimization. Mathematical Programming, p. 235-292, 2018.
[Letsios et al. 2020] Dimitrios Letsios, Radu Baltean-Lugojan, Francesco Ceccon, Miten
Mistry, Johannes Wiebe, Ruth Misener. Approximation Algorithms for Process Systems
Engineering. Computers and Chemical Engineering 132, 2020.
[Letsios et al. 2021] Dimitrios Letsios, Miten Mistry, Ruth Misener. Exact Lexicographic
Scheduling and Approximate Rescheduling. European Journal of Operational Research, 2021.
[Manish et al. 2018] Manish Purohit and Zoya Svitkina and Ravi Kumar. Improving Online
Algorithms via ML Predictions. Advances in Neural Information Processing Systems
(NeurIPS), p. 9661--9670, 2018.
Data architectures with humans-in-the-loop
Supervisor: Professor Elena Simperl
Fundamental data-centric tasks such as conceptual modelling, content labelling, entity extraction and
query processing are routinely realised as hybrid processes, which consist of human and algorithmic
elements. Examples include any AI system that depends on large amounts of labelled data, interactive
machine learning systems, but also knowledge graphs such as Yago, Wikidata, or DBpedia, which are
created by people alongside a range of more or less sophisticated bots.
The projects in this category explore methodologies, computational methods and tools that go beyond
the capabilities of existing AI and machine learning stacks in terms of tasks, performance and user
experience. For example topics will include:
Novel methodologies and tools to create knowledge graphs, offering advanced user
experiences, accessible to non-experts and using the latest tech (audio and video processing,
intelligent assistants, AR and VR etc.)
Methodologies and techniques to acquire and encode common sense knowledge at scale
Quality of knowledge graphs, including frameworks to define it, methods to assess and repair
it, and the link between process, provenance and outcomes
New interfaces and experiences e.g. conversational agents to collect and curate knowledge
and improve algorithmic performance.
Managing discussions, collaborative decision making and conflicts.
Data-driven modelling of value chains for efficient decision support using digital
twins
Supervisor: Dr Partha Dutta
Keywords:
Data-Driven Modelling, Deep Learning, Digital Twins, Resilient Cyber Physical Systems,
Automated Decision Making, Industry Value Chain.
Research background, scope and outcome:
Modern value chains are the critical backbone of the world economy. Value chains are complex
networks of activities and interactions both within and across organizations of various types. Such
activities are essential for creating the various goods, services and products necessary for the
sustenance of our daily lives. Some examples are producing raw materials required for industrial
manufacturing, designing of engineering products, operations and maintenance of high-value
equipment, providing various end-user or customer services, among others. Although these are
distinctly different activities, delivering these reliably require the effective collaboration between
organizations. One of the main challenges in such collaborative work is the impact of unforeseen
events that can disrupt any activity within a value chain. Against this background, digital twins (DT)
offer a solution to help organizations make decisions under uncertainties to better manage value
chains. DTs are models of real-world systems that can be used to simulate their behaviour for
generating real-time or right-time insights about their operations. However, building effective DTs of
real-world entities require replicating their behaviour reliably by capturing their parameters and
constraints in detail. Doing so, however, is highly challenging because value chain entities are
characterised by complex and unique properties that can vary subtly even across related entities
within the same family (for example, power plants can use different technologies steam, natural gas,
or waste for producing electricity and the DTs for the various types of power plants will be very
different). Hence, building DTs by enumerating the physical system properties can be difficult to
replicate (e.g., even across different but related entities as exemplified before) and scale (e.g., from
simpler to larger/more complex entities).
Against this background, this PhD project aims to develop an alternative method to building digital
twins to address the limitations of current DT methods. In this context, rapid instrumentation of
industrial systems through IoT sensors has made operational data more readily available. Furthermore,
recent progress in advanced artificial intelligence algorithms such as deep learning has created
promising opportunities to build complex models of real-world systems using a data-driven approach.
More specifically, the PhD project will research current methods and limitations of developing
empirical models of value-chain systems using sensor data. It will then design methods for developing
behaviour models of multi-level value-chain systems by leveraging multi-class deep learning
algorithms or other related frameworks, applied to sensor data. The methodology will be
demonstrated by developing representative models of real-world value-chains (for example, those
taken from the energy or manufacturing domains, for which adequate open-source data sets can be
obtained). It is expected that the research output will contribute towards developing more resilient
cyber-physical systems by enabling the design of more robust digital twins which in turn can improve
value-chain decision making under uncertainty.
Efficient Mechanism Design for Markets and Reallocation of Goods
Supervisor: Dr Bart de Keijzer
This project focuses on the designing computationally and economically efficient mechanisms for
market and exchange platforms.
On such platforms, a number of agents are present who have the intention of selling or buying items
from other agents. A mechanism interacts with these agents and determines, based on this interaction,
how the agents should trade and against which payments. The agents are assumed to act rationally, in
the sense that they have a certain utility function which they want to optimise: For example, a natural
setting would be one where agents want to maximise the total value of the items received, plus the
payment that they potentially receive in compensation for losing some of their goods. The agents will
interact with the mechanism in such a way that their utility is optimised, and mechanisms for such
scenarios need to be designed in such a way that trade happens in an optimal way, while agents are
not able to "cheat" the mechanism for their own benefit. Moreover, these mechanisms should perform
their computations reasonably (and provably) fast. How to design the trading mechanism in such a
way that these requirements are satisfied?
As there are many details that need to be specified in the above sketch to yield a very concrete model,
this gives rise to a wide range of interesting mechanism design challenges. Different properties of the
market require different mechanisms, where one can think of e.g. a static "one-shot" trading
scenario versus a scenario where agents can dynamically enter and exit the market, or indivisible
versus divisible goods, shareable vs unshareable goods, etc. In this project we will work on trying to
solve various challenging variants of this design problem.
This is a project in algorithmic game theory, which means that it lies in the intersection of theoretical
computer science and economics. This project relates strongly to computational complexity theory,
approximation algorithms, matching theory, auction theory, and (clearly) mechanism design. This is a
theoretical research field which is in the lucky position of also being relevant in practice: As
examples of where this field is applied, one may think of ad-auctions in search engines, and various
automated market platforms where goods are exchanged, or where clients are assigned to service
providers (think of various popular platforms for taxi drivers, finding holiday accommodation, food
delivery, and transportation services for goods).
Imperfect Rationality and Computation
Supervisor: Professor Carmine Ventre
Algorithmic Game Theory is a research field that provides a set of tools to account for strategic
reasoning in computer science. One assumption underlying much of the work in the area is, however,
pretty limiting: agents need to be fully rational. This is unrealistic in many real-life scenarios; we, in
fact, have empirical evidence that people often misunderstand the incentives and try to game the
system even when misbehaving is against their own interest.
This project will look at novel approaches to deal with imperfect rationality, including the analysis of
known systems and the design of novel ones. This will involve theoretical work (such as, mechanism
design) as well as more applied approaches (such as, agent-based modelling) to get a better
understanding of the strategic interactions within a population of agents with imperfect rationality.
Towards Fair/Explainable AI for Robotics
Supervisor: Dr Martim Brandao
This project will contribute to the area of “Responsible Robotics” – which focuses on a critical
analysis of existing systems and practices in AI for robotics. The goal is to uncover social, ethical, and
interaction issues of robot systems, and develop new methods that alleviate them. More concretely,
the project will focus on one of the following topics:
1. Fairness in robot motion planning and robot vision
identifying hidden values in existing motion planners, robot vision algorithms, and robotics
datasets,
identifying fairness concerns in robotics through user studies and critical literature/media
analysis,
questioning current practices in fair AI,
proposing new system configurations or technical methods to alleviate issues of fairness.
2. Explainable and human-in-the-loop robot motion planning
modeling human expectations and human understanding of robot motion,
developing new algorithms and user interfaces for explainable and human-in-the-loop
planning,
conducting user studies to evaluate the effectiveness of explainable/human-in-the-loop
algorithms, and to characterize issues such as automation bias.
Understanding the Complexity of Negotiations
Supervisors: Dr Alfie Abdul-Rahman & Dr Rita Borgo
A negotiated text is the product of a formal decision-making process where a text has been negotiated
and drafted over a period of time. Many of the foundational texts of the modern world have not been
written by individuals, by negotiated by groups of people in formal settings. For example, treaties
between states such as the Universal Declaration of Human Rights or the Treaty of Versailles; or
constitutions, such as the one negotiated by the American states in the Constitutional Convention of
1787.
During such negotiations, it is important for us to keep track of the delegations and their involvements
in order to grasp their influence on the negotiation process either using techniques such as close
reading, distance reading, or machine learning. Even relatively short historical documents written
collectively in this way have been the product of thousands of specific proposals and decisions.
This project will apply a visual analytics approach towards the understanding of the complexity of a
negotiation and the influence of the delegations during a negotiation process.
Possible research questions:
Developing new static and interactive visualization to assist with data discovery and insight
generation in large datasets of events within interacting timelines.
Developing new approaches to show the evolution of complicated, technical documents over
the period of months or years.
Developing new approaches for indexing the datasets related to the negotiation of documents,
and more intuitive displays of the results.
Developing natural-language-based approaches to relate information captured in ‘informal’
archives (such as private diaries, letters, social media feeds etc.) to the formal records of a
negotiation.
This project will work closely with the Quill Project, based at Oxford University:
https://www.quillproject.net/
Visual recognition with minimal supervision in deep learning context
Supervisors: Dr Miaojing Shi & Dr Michael Spratling
The goal of this PhD is to study object detection/segmentation in images or video with minimal
supervision. This task will be placed into a setting where only image-level annotation is provided. To
begin, additional supervision such as clicks, strokes, or bounding boxes may also be assumed. Towards
the end of the PhD, the student is expected to work with datasets of mixed levels of supervision,
including a harder, semi-supervised setting where there are only a few image-level labels as well as a
large amount of unlabeled images.
Several ideas can be investigated in the context of deep learning. For instance, generative adversarial
learning can be employed to either augment the dataset or bridge the predicted detections with their
ground truth. Recurrent neural networks can be applied to video segmentation in particular to localize
and segment semantic parts across nearby frames. On unstructured image datasets, ideas like deep
metric learning and random-walk label propagation can be extended across pairs or groups of images.
Cross-category transfer learning can be a further extension.
Few-shot learning is another challenging direction to explore. After learning on a set of base classes
with abundant examples, new tasks are given with only few examples of novel (unseen) classes. For
such cases, the learning strategy of multi-million parameters architectures in deep learning needs to be
rethought in order to allow the networks to squeeze out the maximum amount of information from the
few available samples.
Wearable, Discreet Augmentative and Alternative Communication
Supervisor: Dr Timothy Neate
Please note: applicants to this proposal are welcome to self-fund, or apply for the studentship or K-CSC
Scholarship.
Approximately 2.2 million people in the UK experience a form of communication impairment [1],
including a third of stroke survivors and 2/3 children in each classroom.
Communication impairments might mean that people find it hard to convey or understand
information when they need it most. This is different for everyone, but communication impairments
can affect one’s reading, writing, speaking and/or listening. People with communication impairments
often use AAC (Augmentative and Alternative Communication) to support them in communication;
generally, via a laptop, tablet or smartphone. These, assistive devices are not always quick to access
and often carry with them a stigma [2].
Wearables, such as smartwatches and smart glasses, have the potential to provide a range of sensors
and modes of input/output within an unobtrusive, commonplace form factor. Wearables are discreet.
Their always-available nature, coupled with instant access to the internet and processing (e.g.,
recognition models) on a companion device (e.g., a smartphone), have the potential to support people
with communication impairments in accessing and expressing information in a subtle and less
obtrusive manner.
Building upon work supporting access with wearables [3], this PhD project will conduct co-design of
wearable applications and models which can support people with communication impairments in
everyday life. Using established co-design approaches with users with communication impairments
(e.g. [4]) this work will develop a range of input and output approaches with consumer and potentially
custom form factor wearables, working closely with end-users and evaluate them in real-world
contexts.
REFERENCES
[1] UKGov, “Disability prevalence estimates,” 2012.
[2] Phil Parette and Marcia Scherer. Assistive Technology Use and Stigma. Education and Training
in Developmental Disabilities
Vol. 39, No. 3
[3] Dhruv Jain, Hung Ngo, Pratyush Patel, Steven Goodman, Leah Findlater, and Jon Froehlich.
2020. SoundWatch: Exploring Smartwatch-based Deep Learning Approaches to Support Sound
Awareness for Deaf and Hard of Hearing Users. In The 22nd International ACM SIGACCESS
Conference on Computers and Accessibility (ASSETS '20).
[4] Timothy Neate, Aikaterini Bourazeri, Abi Roper, Simone Stumpf, and Stephanie Wilson. 2019.
Co-Created Personas: Engaging and Empowering Users with Diverse Needs Within the Design
Process. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI
'19).
Supporting more Accessible Remote Communication
Supervisor: Dr Timothy Neate
Please note: applicants to this proposal are welcome to self-fund, or apply for the studentship or K-CSC
Scholarship.
Remote communication platforms such as Zoom and Microsoft Teams make use of video and audio but
lack the physical affordances we take for granted when talking in-person. In ‘real-world’ discussions we
frequently use gestures, sketches on pieces of paper and make use of props within the physical
environment. While we might make do without these capabilities, non-language communication
modalities are particularly important for those with communication impairments and mean that there is
an access barrier in remote compared to co-located communication.
Access to remote communication platforms has been spotlighted by the Covid-19 pandemic. Without
equal access, people with communication impairments face the risk of further isolation, wherein access
to social interaction, civic engagement and even vital speech and language therapy might be
challenging.
Building upon prior work in gestural communication [1], this project will design novel interaction
techniques which leverage commercially available and bespoke technologies to support communication
in remote settings. Platforms will explore how the capabilities of commercial devices beyond video and
audio (such as Lidar sensing, augmented reality and external devices such as wearables) might support
‘real-world’ affordances e.g. through custom recognisers and sensor fusion.
This project will involve a PhD student working with people with diverse language impairments and
speech and language therapists to co-develop a fully functional platform aimed specifically at supporting
remote communication, to complement existing video conferencing platforms.
REFERENCES
[1] Roper, A., Marshall, J. and Wilson, S. (2016). Benefits and Limitations of Computer Gesture
Therapy for the Rehabilitation of Severe Aphasia. Frontiers in Human Neuroscience, 10.
Object-Based Access: Enhancing Accessibility with Data-Driven Media
Supervisor: Dr Timothy Neate
Please note: applicants to this proposal are welcome to self-fund, or apply for the studentship or K-CSC
Scholarship.
The tools by which we use to create and consume media-rich digital content such as video streaming,
podcasts, TV and radio, are undergoing substantial change. The core idea behind the future media
ecosystem is Object-Based Media (OBM). OBM is the practice of linking media assets, such as
individually recorded audio and video, to metadata. These metadata might include anything from time-
stamped information about what happened in a video, who shot the video, or details about how the
media should be played on different devices [1]. Digital media are sent as a collection of objects,
arranged on a user’s device according to their exact needs. Vitally, this means each user’s experience of
creation and consumption can be different.
While OBM principals might be used to automatically change a programme’s duration to meet our time
requirements [2] or make a story more relatable to us by including local information based on our
location [3], there are also substantial implications for accessibility. As the content creation and
consumption process differs for everyone, they have the potential to be accessible to everyone.
This PhD project will work with a range of stakeholders to develop accessible alternative media formats,
workflows and interaction techniques for the creation of novel interactive media experiences. Building
upon prior work accessible content workflows (e.g. [4, 5]) this project will consider how the future of
content creation might be made more accessible through extreme customisation of content that is
different to everyone and bespoke to their needs.
REFERENCES
[1] D. Varghese, P. Olivier, and T. Bartindale, “Towards participatory video 2.0,” in proc ACM CHI,
2020.
[2] M. Armstrong, M. Brooks, A. Churnside, M. Evans, F. Melchior, and M. Shotton, “Object-based
broadcasting-curation, responsiveness and user experience,” 2014.
[3] S. Concannon, N. Rajan, P. Shah, D. Smith, M. Ursu, and J. D. Hook, “Brooke leave home:
Designing a personalized film to support public engagement with open data,” in proc. CHI, 2020.
[4] T. Neate, A. Roper, S. Wilson, and J. Marshall, “Empowering expression for users with aphasia
through constrained creativity,” in proc. ACM CHI, pp. 1–12, 2019.
[5] T. Neate, A. Roper, S. Wilson, J. Marshall, and M. Cruice, Creatable content and tangible
interaction in aphasia,” in proc. ACM CHI, 2020.
Projects for the remaining studentships
Administrative Access Control Policies
Supervisors: Professor Maribel Fernandez & Dr Jose Such
Administrative access control policies specify the rights that security administrators have in the
system (e.g., to add or remove users, or change users' rights). These policies are critical to ensure the
overall security of the system, but not much work has been done on the development of general
models for administrative access control. In this project we aim to define formal generic models of
administrative access control, based on the Category-Based Meta Model of access control (CBAC),
which can be used to analyse access control systems and help identify the impact of changes made by
administrators (impact change) on the overall security of the system.
“Alexa, cover your ears!”: Privacy-Aware AI Personal Assistants
Supervisor: Dr Jose Such
AI personal assistants are becoming mainstream in practice, with the widespread introduction of
desktop, phone and home assistants. For instance, over 70 million users utilise smartphone assistants
like Siri, Google Assistant, and Cortana every day; and smarthome assistants have been sold in
massive numbers, like the five million units of Amazon Echo with the Alexa personal assistant sold in
less than two years. However, recent incidents involving AI personal assistants like Alexa recording a
private conversation and sending it to a random contact, have increased users' privacy concerns, with
some users trashing their assistants all together and companies like Mattel cancelling assistant projects.
It is therefore paramount to consider and respect users' privacy to realise the benefits of AI personal
assistants and foster trust from users. In this project, you will formalise the social norms that govern
information sharing, management, collection, processing and learning in AI personal assistants. Based
on this, you will design novel methods to personalise privacy in AI assistants based on the social norms
but also on the users' contextual, group, and individual preferences with an optimal accuracy-
intervention trade-off.
Automated Signature Generation for Network Intrusion Detection Systems
(NIDS)
Supervisor:Dr Fabio Pierazzi
A Network Intrusion Detection System (NIDS) is a probe that passively monitors network traffic
and triggers a “security alert” whenever a signature matching a particular pattern is found. However,
signatures are still mostly generated manually, a process which is error-prone and time consuming.
This project will explore how AI and ML can support automated generations of signatures which are
effective, efficient and interpretable. Evaluations will include adaptivity to different network
environments, and performance speedup in terms of DR and FPR with respect to manually-defined
signatures and existing statistical approaches.
Backbone Guided Local Search Methods for MAX-SAT
Supervisors: Dr Kathleen Steinhofel & Dr Dimitrios Letsios
Satisfiability (SAT) is a key problem in combinatorial optimisation and has a huge range of real-life
applications. It seeks for a given Boolean Formula (conjunctive normal form) an assignment of
variables such that the formula returns True. In case such an assignment does not exist, we seek an
assignment that satisfies a maximum number of clauses (MAXSAT). As backbone structure, we
denote the set of variables that have the same assignment in all optimal solutions.
Knowledge about the backbone structure can be used to guide heuristic methods which aim to find
near optimal solutions. For instance, the size of the backbone can give indications of how many
optimal solutions exists and consequently how hard it is for the search method to converge to an
optimal solution. The guidance can be provided in two different types:
1. Deriving instance dependent methods by using pre-processing to approximate the backbone
structure and to derive parameter settings for local search.
2. Estimating the backbone structure based on configurations visited by the local search method.
The findings will lead to faster convergence to optimum solutions and more importantly can produce
methods which adapt to instance dependent properties. At the same time, methods to derive and
analyse the backbone structure can be used to classify candidate solutions and to model additional,
sought after properties such as robustness of candidate solutions.
Behavioral Modeling of Process Memory for Real-Time Detection of Attacks
Supervisor:Dr Fabio Pierazzi
Memory vulnerabilities such as buffer overflow, heap spraying, heap vulnerabilities, are still one of the
major threats in all modern systems. Most modern approaches to detect memory attacks are based on
heavyweight monitoring and analysis which causes significant overhead and prevents realtime
application. This project will explore how AI and ML can be used to create a behavioral model of
process memory for real-time anomaly detection of attacks occurring in memory.
Better Error Help Using Large Scale Programmer Data
Supervisors: Professor Michael Kolling & Dr Neil Brown
Could large scale beginning programmer data be used to give useful hints and help to beginners stuck
on an error? For example, if a novice had problems with a task, could perhaps useful hints be
automatically generated by analysing previous users who had similar problems, what they did, and
whether their actions led to solving the problem?
Making use of the Blackbox data set [1] is one option to automatically generate helpful hints and tips
for novice programmers.
[1] https://bluej.org/blackbox/
Big Data in Programming Education
Supervisors: Professor Michael Rolling & Dr Neil Brown
The Blackbox project has collected a large amount of data about the behaviour of novice
programmers. We have data about hundreds of millions of programming sessions. So far, this data has
been analysed only very superficially. An interesting project would be to use a big data approach for
deeper analysis of this data set, and to work out what we could learn from this.
Characterization of Immunoglobulins
Supervisors: Professor Costas Iliopoulos, Dr Sophia Karagiannis & Dr Grigorios Loukides
Antibodies, or immunoglobulins, belong to the ‘gamma globulin' protein group and can be found
mainly in the blood of vertebrates [1]. Antibodies constitute the major serological line of defense of
the vertebrates with jaws (gnathostomata) by which the immune system identifies and neutralizes
threatening invaders, such as viruses, fungi, parasites, bacteria. The contrivance underlying the
reaction efficiency of our immune system to specifically recognize and fight invading organisms or to
trigger an autoimmune response and disease still remains to be elucidated. The efficient reaction of our
immune system against all kinds of intruders is highly dependent on the number, condition and
availability of antibodies, as reaction times are ‘key' to the successful elimination of the foreign
pathogen.
The importance of antibodies in health care and the biotechnology industry demands knowledge of
their structures at high resolution. This information can be used for antibody engineering,
modification of the antigens binding affinity and epitope identification of a given antibody.
Computational approaches provide a cheaper and faster alternative to the commonly used, albeit
laborious and time consuming, X-ray crystallography. Available immunogenetics data can be utilized
for computational modelling of antibody variable domains. Standardized amino acid positions and
properties can assist in optimizing the relative orientation of light and heavy chains as well as in
designing homology models that predict successful docking of antibodies with their unique antigen.
As a result, it comes down to identifying conserved motifs or patterns that are implicated and mediate
antibody-antigen interactions.
Detection of such motifs by simple sequence comparison is impossible. Consequently, our research is
fixated on the investigation of alternative approaches to efficiently study antibodies, mainly by the
multimodal fusion of information from genetic, structural and physicochemical analysis.
All in all, herein we propose a holistic approach in the realm of immunoinformatics that will focus on
elucidating the mechanism of antibody-antigen recognition. The results and the final tool (in the form
of either an online service or a downloadable tool) will be made freely available to the scientific
community. We are confident that many fellow researchers from all walks of immunology,
bioinformatics and antibody-related sciences will benefit from such a tool, both in terms of applied
research and basic understanding of the function of CDRs.
Nowadays, it is certain that such specialized and specific recognition properties cannot be based on
random and hypervariable sequences. It is just that using the 20 amino acid code is not a suitable
approach to explain the phenomenon. Therefore, herein, we will calculate more than 430 different
physicochemical properties to represent each residue of all antibodies, in an effort to identify what is
the right dictionary (or indeed the right combination of dictionaries) required to decrypt the
antibody-antigen interaction puzzle. The calculation of the physicochemical properties will be done
using the QSAR module as it is implemented into the Molecular Operating Suite (CCG).
A brief overview of our main objectives includes the following:
C ollecting and building the working dataset
• Collect and curate antibody structural data from numerous databases
Deep learning for feature extraction and prediction • Predict reliable classification markers through
the use of convolutional neural networks (CNNs)
Computational Analysis of Ageing Brains
Supervisors: Dr Kathleen Steinhofel & Professor Zoran Cvetkovic
The ability to acquire and store information is a key function of the brain. This ability is affected by
ageing and in various age-related disease, including dementia. In old age the acquisition of new
information is more difficult than in young age. Moreover, updating of acquired information is also
affected by ageing. The mechanistic basis of the age-related decline is not well understood. It is
known that changes at synapses, the connections between nerve cells, are the basis of information
storage. But it remains unknown how the synaptic basis of information storage changes with age.
Recently, ultrastructural changes at synapses were discovered and analysed after training in a memory
task in young and aged mice. In the research programme, we want to investigate the impact of these
changes by using computational approaches based on models of these biological observations. In
addition to the modelling of ultrastructural changes, the regulatory function and expression level of
microRNA in the neuro cell will be analysed towards the impact to the ability to store information.
The findings will not only advance insights into mechanisms of information storage, but also support
the analysis of age-related diseases, such as dementia, that affect cognition.
The supervisor team will include Prof Peter Giese (IoPPN) and Anna Zampetaki (Cardiovascular
Division).
Contextualising Big Programming Data
Supervisors: Professor Michael Kolling & Dr Neil Brown
The Blackbox project has been running for over five years. It collects data from noviceprogrammers:
source code that is written, compilation errors displayed, and various other data about a programmer's
interaction with the BlueJ IDE. The data is collected without any further context: we do not know
the age or experience of the programmer, whether they are on a course or not, whether they are doing
well on their assignments, and so on. This allows for a large data set, but one that is stripped of useful
context. This project would investigate collecting useful data (e.g. experience, course grades) for a
subset of Blackbox participants, to provide a richer subset for other researchers, and to be used in the
project to investigate associations between programming activity and success on a course.
Data Science Strategies for Cancer Immunotherapy Application
Supervisors: Dr Sophia Tsoka & Dr Grigorios Loukides
Computational analysis of biomedical datasets can lead to understanding of disease systems and
therapeutic interventions. We propose a project that will target the computational analysis of
experimental data on immune activation against cancer using antibodies. Integration of experiments
with publicly available data on known cellular interactions will establish a resource for data mining.
Such a resource will be used to implement machine learning algorithms to link gene features to cancer
response, network analyses to represent molecular interactions and logical modelling to explore
regulatory effects from proteomic experiments. The combination of these Data Science frameworks
will elucidate signalling networks related to the control of tumor growth by antibody- enhanced
human immune cells and identify key altered pathways and their regulation state. The long-term
prospect is to improve understanding of disease mechanisms and cell signalling, so as to improve the
design of novel drugs and therapies.
Development of Scalable General Artificial Intelligence (AI) Problem Solving
Systems
Supervisor: Dr Amanda Coles
This project aims to develop scalable general Artificial Intelligence (AI) problem solving systems,
capable of reasoning with the large combinatorial problems that arise in effectively managing the
oversubscribed infrastructure of densely populated cities. This project builds on a study, supervised by
Dr Amanda Coles (KCL Informatics) an expert in AI Planning and Professor Christopher Beck
(University of Toronto) an expert in Constraint Programming (CP), exploring the application of CP
and AI planning to disruption recovery in the UK rail network. The PhD project aims to
significantly increase the solution quality and scalability of AI problem solving technologies, based
on our new understanding of the strengths these approaches, by automatically decomposing
problems so CP solvers and AI Planners solve the parts best suited to their strengths. The successful
candidate will extend the state-of-the-art in AI research and have the opportunity to apply this to
real-world UK rail network problems.
Distributed Computing by Population Protocols
Supervisor: Professor Tomasz Radzik
Population protocols are a simple model of distributed computing, with applications
extending to other areas, including processes in chemical network and online social networks.
This model assumes that the computing system consists of a large number of identical devices,
called agents or nodes, which communicate with each other in pairwise interactions. The
pattern of interactions depends on external factors and interacting nodes follow a simple
protocol, which should ensure that all nodes gradually learn some global property of the
system. This project is a study of the computational potential and limitations of this model
and an investigation of applications.
Fairness in Automatic Assessment
Supervisor: Dr Zheng Yuan
Research areas: fairness in artificial intelligence, bias in artificial intelligence, artificial
intelligence in education, machine learning, natural language processing, automatic
assessment
Automated assessment (AA), the task of employing machine learning models to automatically
score written/spoken text, is one of the most important educational applications of natural
language processing. Emerged as a means to overcome issues arising with standardised
assessment, AA supports a faster assessment and provides instant feedback, not only
facilitating self-assessment and self-tutoring, but also addressing educational shortfalls
promptly. Moreover, the potential of a reduced workload is becoming more attractive,
especially in large-scale assessments. As a lot of teaching has moved online and the number of
students keeps rising, AA is crucial to the scalability of teaching and marking. Further
advantages become more pronounced when it comes to marking constructed responses, a task
prone to an element of subjectivity. AA systems guarantee the application of constant
marking criteria, thus reducing inconsistency, which may arise in particular when more than
one human examiner is employed.
Over the last few years, there has been a significant amount of work done on ensuring
fairness, accountability and transparency for machine learning models. With the deployment
of AA in both summative and formative scenarios (e.g. high-stakes testing and classroom
instruction, respectively), it is important to ensure fairness in these AA systems and all test-
takers are treated fairly, especially for making high-stakes decisions like college admissions,
employment, or visa applications. Recently, there has been increasing interest in AA
fairness/bias, and research in this area has mainly focused on detecting bias in a post-hoc
setting. For example, studies have documented differing performance of existing AA systems
for test-takers with different gender, race, native language, socioeconomic status, or
disabilities.
This project will study fairness and ethics in artificial intelligence (AI), with a special focus on
AA. Studies in machine learning have highlighted that algorithms often introduce their own
biases either due to an existing bias in the data or due to a minority group being inadequately
represented. The aim of this project is to develop machine learning models with fairness and
ethical considerations. As a result, the decisions made by the new systems will be unbiased
and the decision-making processes will be transparent, which will eventually build up trust in
AI and benefit all.
This project is expected to detect, understand and mitigate both algorithmic bias and data
bias in machine learning models, as well as to define and measure fairness in AI systems. In
particular, the project will focus on developing accountable and responsible machine learning
models for AA, so as to ensure fairness in AA. However, the models and techniques produced
as well as lessons learnt will be sufficiently generic such that they can be applied to other AI
applications and the diverse range of contexts for AI.
References:
Andersen et al. Benefits of alternative evaluation methods for Automated Essay
Scoring. EDM 2021.
Litman et al. A Fairness Evaluation of Automated Methods for Scoring Text
Evidence Usage in Writing. AIED 2021.
Ke and Ng. Automated Essay Scoring: A Survey of the State of the Art. IJCAI 2019.
Madnani et. al. Building Better Open-Source Tools to Support Fairness in Automated
Scoring. EthNLP@EACL 2017.
Romei and Ruggieri. A multidisciplinary survey on discrimination analysis. The
Knowledge Engineering Review 2014.
Formal verification of smart contracts
Supervisor: Dr Hana Chockler
Formal verification of software is gaining popularity for verifying increasingly complex and
safety-critical software. While the full verification task is unsolvable (the problem is easily
reduced to the halting problem, which is undecidable), numerous existing solutions to
subproblems are general enough to provide thorough verification and correctness assurance
for real-life systems. There is a number of teams currently working on tools for software
verification, with the Formal Verification Team at the University of Lugano (USI), led by
Prof. Sharygina, being one of the most established ones. Prof. Sharygina recently received
Swiss government funding for a large project titled “Beyond Symbolic Model Checking
through Deep Modelling”, in which Dr. Chockler (the first supervisor) is a named
collaborator. The proposed Ph.D. project will be done in collaboration with the team at USI.
The student will be able to travel to work face-to-face with the team in Lugano, and close
collaboration via skype and emails is expected when the student is in London.
Current model-checkers (automated formal verification tools) are mostly suitable to verify
programs in C and C++. In this project, we will research the direction of formally verifying
smart contracts. The student will research different options of extending the verification
platform to smart contracts written in Solidity (or other languages) and will analyse whether
the verification should be done on the source code level or on the bytecode level.
Smart contracts are typically small. However, they interact with other contracts and are
being called in a loop or recursively, thus leading to a number of subtle bugs (see, for example,
the exploit of the DAO bug, leading to loss of $50 Millions). It is then reasonable to expect
that the best way to formally verify smart contracts is by using modular reasoning: for each
smart contract, the other contracts with which it interacts can be considered an environment.
This environment can be overapproximated using learning techniques in combination with
sampling and traditional model checking approaches. After verification of a single contract
passes successfully, some symbolic representations of the contracts with respect to the
correctness properties will be combined to prove correctness of the overall system.
The project will include a significant implementation component. The implementation is
done using the software verification platform developed at USI. The main development task
is the new front-end, so that the verification platform is able to analyse programs in Solidity
(or EVM bytecode). As model-checkers require writing a large and complex software, the
advantage of having such a software available already and being in contact with the team that
develops and maintains it is hard to overestimate. In addition, being a part of a very active
and experienced research team guarantees discussions and collaborations that further aid the
research, especially in the initial stages.
From Requirements to Models Using Natural Language Processing
Supervisors: Dr Kevin Lano
The construction of UML models such as class diagrams can be a complex and time-consuming
activity, even with tool support, and the modification and evolution of these diagrams is also
challenging.
This PhD project will investigate the automated production of models from natural language
requirements statements, using rule-based or neural net approaches to identify model elements such as
classes and operations from the statements.
The project should involve a comparison of the relative effectiveness of rule-based versus neural net
approaches, and investigate how these could be combined.
The project is part of a large research programme for ``User-centered model-driven engineering"
carried out within the Software Systems research group, which aims to make MDE techniques usable
by mainstream software practitioners.
Human data interaction
Supervisor: Professor Elena Simperl
Technology can play an important role in improving people’s experiences with data, whether in a
professional context, or in everyday life. Projects in this space look at human factors that affect our
ability to find, make sense and communicate with data, including topics such as:
Dataset search and discovery, including Google’s dataset search engine
Data portals: how are they used and how can they be improved
Communicating and presenting data, metadata and data-related activities
User experience in data science and data engagement
Tools and experiences to increase accessibility of data and data science work
Collaboration in data science
Data storytelling tools with narrative support
Data science communities: where are they, how do they work, how can we make them
better?
Example project: New interfaces and experiences to data engagement
The project will explore novel ways to present a dataset, for example a CSV file, using speech, audio
or video technologies. The aim is to propose an algorithm that given a dataset produces a media
summary of the content and context of the data and evaluate the results in a user study. The algorithm
could use a range of techniques, including machine learning, computer vision and speech generation.
This will also require capabilities to generate text from data, as text is more accessible that metadata to
convey what a dataset is about and how it should be used. In previous studies we used a manual
approach to create summaries, which does not scale. The aim here would be to use natural language
generation to automatically create short text summaries for a given tabular dataset, formatted, for
example, in CSV. The project could use machine learning or rule-based techniques.
Example project: Personalising dataset search
In previous studies we explored different ways to present datasets in the context of search, including
structured metadata, text descriptions, data previews and visualisations. In this project, the aim is to
develop an information retrieval algorithm that tests the impact of these different result presentations
and personalizes them based on user preferences and feedback.
This could include, among other things, personalised analogies for numerical data. Research has
shown that using familiar concepts to describe numbers and numerical datasets can improve
engagement. The aim of this project is to explore the same approach for a wider range of datasets
(beyond spatial data such as distances and areas) and to develop an approach that for a given
numerical dataset learns to recommend relevant analogies.
Example project: communicating data quality
Most work in data visualisation has focused on choosing and customising charts and stories to
communicate data. The aim of this project is to look into contextual aspects of data use, including
sources, uncertainty, missing or incorrect values, timeliness and the way this additional information
could be embedded into visual design. The project will first undertake a survey of existing approaches
for numerical datasets and then propose and test ways to communicate less explored quality aspects.
Human and social factors in information systems
Supervisor: Professor Elena Simperl
Some of most remarkable online platforms and tools we are using today bring together human and
social intelligence with data and algorithms in ingenious ways. Underlying them, there is a huge,
interdisciplinary research space concerned with the design principles, methods and tools that allow us
to build such systems and understand and predict their evolution. The most successful of them seem
to share a core set of principles:
They are decentralised and self-organizing, and can mobilize a critical mass of resources
effectively, whether that’s people, data or computational devices.
They make extensive use of mobile, sensor and web platforms, alongside openly available data
and software to enable communication, knowledge exchange and coordinated action.
They know how to bring crowd and machine capabilities together to achieve their aims in a
sustainable way.
They empower individuals to self-organise and commit to being fair, transparent and accountable
about the data and resources these contribute.
Relevant topics include applications of crowdsourcing and social computing to AI systems, as well as
fundamental crowdsourcing research around task and workflow design, crowd learning, quality
assurance, and ethical crowdsourcing. The research would potentially focus on a class of social
machines, including peer-production systems, human computation platforms and participatory sensing
networks.
Example project: Improving task design
There is a large body of literature exploring how to achieve a particular goal via crowdsourcing and
proposing workflows and improvements. The aim of this project is to derive such task design
guidelines from a new source: discussion forums used by the crowd, for example on Mechanical Turk
or in citizen science projects on the Zooniverse platform. The project will collect a sample of relevant
discussions and extract comments pertinent to design guidelines, using, for instance, quantitative
(NLP) or qualitative techniques.
Example project: Crowd self-assessment
In crowdsourcing, asking participants to self-assess their skills and performance helps designers
understand the feasibility of the task and identify areas of improvement. Previous research has looked
at the ability of crowd participants to self-assess. The aim here would be to carry out a follow-up
study to understand whether the initial conclusions apply to other tasks, domains and workflows.
Learning Representations for Reinforcement Learning in Human-Robot
Interaction
Supervisor: Dr Matteo Leonetti
When humans and robots interact, the robot needs a representation to plan and learn, which takes the
humans, their objectives, and their actions into account. Representation learning has been a long-
standing research goal, revitalised by progress in deep learning. In this project we will focus on
representation learning for human-robot interaction, where we want the robot to plan for human-
robot collaboration, and adapt to the human. The robot must acquire a representation that lets it
predict and interpret to the human actions, while planning its own. We will focus on collaborative
tasks, where robot and human share a common goal.
The complexity of learning such a model and representation will be mitigated through the use of
curriculum learning. Curriculum learning consists in creating a sequence of increasingly complex tasks
to let a learning agent progress faster or to a better behaviour then when learning the final task from
scratch. While predominantly used for model-free learning so far, curriculum learning has the
potential to be a prominent tool for model learning in general, and for human-robot interaction in
particular.
Learning of Software Design Patterns
Supervisor: Dr Hana Chockler
A software engineer joining a development team typically does not start writing software
immediately; first, she needs to understand the large existing body of code and recognise the key
components and how they interact with each other. Documentation is typically sparse and not
updated regularly. The software, on the other hand, is large and difficult to understand. “Software is
like entropy: It is difficult to grasp, weighs nothing, and always increases.” (Norman Augustine).
Project development tools aid understanding the software by identifying the participating classes, and
the static dependencies
between them.
The next step is to identify a set of design patterns common to this project, e.g.: when are resources
allocated and freed, in what manner are certain components of a class visited? In this area, the static
analysis tools are insufficient. The proposed project is to learn the design patterns in the given code
automatically by applying grammatical inference (learning) algorithms.
The benefits for automatically learning design patterns go beyond helping the software engineer
getting a clear representation of the design patterns. An automatic analysis can discover areas where the
same goal is achieved by utilising different patterns: one of the patterns can be erroneous or
obsolete, or the multitude of patterns can point to the lack of precise development guidelines for a
certain task, indicating a need for a guiding design pattern. These challenges require developing
learning algorithms that can learn several automata simultaneously, such that the resulting automata
correctly capture the main abstractions in the given corpus of code.
The project will build upon previous research results of Dr Hana Chockler in collaboration with the
University of Oxford.
Machine Learning Augmented Algorithms
Supervisor: Dr Frederik Mallmann-Trenn
Machine learning and in particular deep learning has gained much attention over the past several
years, yet theoretical understanding is still very limited. As a remedy, a recent line of research
emerged in which neural networks are used as a black box in many online problems, where the data
arrives over time. The idea is to use a neural network to give predictions of the data that will arrive in
the stream. The goal is to design algorithms that perform much better than previously if the
prediction is good and on the other side, to show that even if the prediction is bad, the solution found
by the algorithms is still reasonably good.
A toy example is the ski rental problem, where each day a skier on vacation has to make a decision:
either rent skis for 10$ or buy skis for 100$. We assume that the ski trip can end abruptly (chosen
adversarially). See https://en.wikipedia.org/wiki/Ski_rental_problem for classical algorithms. Now
if we assume that a neural network makes a prediction on when the ski trip will end, how much
better can we do? The field is very young and great problems of practical and theoretical importance
await!
Related literature: http://www14.in.tum.de/personen/albers/papers/inter.pdf for a technical survey
on online algorithms.
Model Driven Engineering in Finance
Supervisors: Dr Kevin Lano
In the finance industry there is a strong emphasis on the rapid time-to-market of new financial
software products and financial models, which can conflict with the achievement of software quality
and correctness. The proposed research will investigate how these conflicting aspects can be managed
and partly resolved through, for example, the reuse of trusted components, and the use of model-based
rapid application development and iterative (agile) development.
Modelling Predictive Space-Time Cube for Urban Informatics
Supervisors: Dr Yijing Li, Prof. Nicolas Holliman
This project will build up the space-time cube(s) predictive model for urban information on multiple
dimensions, e.g., greenspace accessibility and values, land-use simulated mobility, residents’ happiness
and geodemographic profiles, and the development of local crimes, in the expectation to enlighten
policy makers with data-driven evidence. The Predictive Space-Time Cube model will be trained and
tested with multi-sourced trajectory open data (for example, remote sensing images, census data, google
mobility data, detailed crime incidents data, statistics on socio-economic, etc.) in selected metropolitan
cities like London, New York, Sydney, and Hong Kong (https://comparecitycrime.com/,
preliminary exploration). Besides of the widely applied spatial data analytical skills and machine
learning techniques, student will develop a 3D understanding of the urban crimes in a dynamic and
forecasting way, and contribute to the tradition literature on spatial analysis from an innovated angle by
adding the dynamic temporal and layers’ dimensions.
Modular and Hierarchical Learning and Representation of Large Software
Supervisor: Dr Hana Chockler
A software engineer joining a development team typically does not start writing software
immediately; first, she needs to understand the large existing body of code and recognise the key
components and how they interact with each other. Documentation is typically sparse and not
updated regularly. The software, on the other hand, is large and difficult to understand. “Software is
like entropy: It is difficult to grasp, weighs nothing, and always increases.” (Norman Augustine).
Being able to represent large software in a graphical way with the ability to zoom in and out of
components would help tremendously towards understanding of the software structure and its
functionality. The proposed project is to learn a hierarchical compositional structure that will be
used for such a graphical representation. The most likely candidate for such a hierarchical structure
is state charts which have been used for software design for many years.
There are no existing learning algorithms for learning state charts. There is, however, a number of
algorithms for learning similar structures, such as finite automata, transducers, etc. The first part of
the proposed project consists of constructing a new learning algorithm for learning state charts. The
second part of the proposed project is using this algorithm to learn complex software in a hierarchical
way, allowing the user to zoom in and out of components (composite states).
The project will build upon previous research results of Dr Hana Chockler in collaboration with the
University of Oxford.
Monitoring Compliance with Dynamic Security Policies under Uncertainty
Supervisors: Dr Natalia Criado & Dr Jose Such
This project will develop the first monitor capable of checking compliance with dynamic and
adaptable security policies on the basis of incomplete and uncertain observations. Most existing
proposals on security policy compliance monitoring assume that monitors have perfect information
and observation capabilities, or that security policies are fixed and known at design time. However,
this assumption is too strong for modern hyper-connected, socio-technical, and cyber-physical
systems due to their inherent uncertainty, incompleteness and dynamism.
For example, in a business environment it is: unfeasible to observe all files uploaded/downloaded
to/from public cloud services, since employees can perform these actions using non-corporate
network/devices; impossible to detect all sensitive information contained in files with complete
certainty; and security policies controlling access to public cloud services can change as a result of
new legislation (GDPR) or threats. This project, will propose a novel security policy monitor for
hyper-connected, socio- technical, and cyber-physical systems.
Multiple Robots Performing Random Walks
Supervisor: Dr Frederik Mallmann-Trenn
The goal of the project is to study distributed algorithms for dynamic and noisy settings robot
swarms, and biological systems. We will seek new algorithms to solve fundamental problems of
communication, construction, reaching agreement, estimation, data processing, searching, shape
formation, task allocation, and more.
One example is the setting of [1], where robots have to estimate the fraction of black tiles in a grid.
Each of the robots is very simple and performs a random walk. Whenever, two or more robots are
close to each other, they can communicate with each other. In the end, the robots have to agree on a
joint estimate of the fraction of black tiles. The arising questions here are:
How much can multiple random walks speed up the process?
How many samples have to be taken?
What happens if the communication is noisy?
The goal is also to collaborate with researchers in the robotics community by modelling and
analyzing systems theoretically. In addition to a solid understanding of Markov chains, students
should be interested in collaborating with researchers across different disciplines.
https://dl.acm.org/citation.cfm?id=3237953
Natural language explanations for artificial intelligence
Supervisor: Dr Zheng Yuan
Research areas: artificial intelligence, deep learning, explainable artificial intelligence, natural language
processing
In recent years, artificial intelligence (AI) has been successfully applied to various applications with
the breakthrough of deep learning (DL). Despite the impressive performance, the decision-making
processes of DL models are still generally not transparent or interpretable to humans due to their
‘black-box’ nature.
Explainability is becoming an inevitable part of machine learning systems. This is especially important
in domains like healthcare, education, finance and law where it is crucial to understand the decisions
made by AI systems and build up trust in AI. Several directions for explainable artificial intelligence
(XAI) have been explored and the majority of explainability methods focus on providing explanations
at the input feature level, which consist of assessing the importance or contribution of each input
feature, after the models have been trained and fixed. However, these methods may 1) fail to provide
human-readable explanations as the underlying features used by AI models can be hard to
comprehend even by expert users (e.g. tokens for text and pixels for images); 2) only detect the
incorrect learned behaviour, without providing any general solution for improvement.
As an appealing new research direction, this project will focus on generating human-friendly and
comprehensive natural language explanations (NLEs) for AI, where NLEs normally consist of natural
language sentences that provide human-like arguments supporting a decision or prediction. In
particular, the aim of this project is to develop AI models that can make use of NLEs to provide better
performance, counteract existing biases in the data, and provide human-readable explanations for the
decisions made by the models. The AI models produced will have the advantage of making use of
explanations and providing human-level explanations, just like how humans both learn from
explanations and explain their decisions.
The project will focus on natural language processing as the primary application area and start from
publicly available NLEs datasets. However, the AI models and techniques produced will be
sufficiently generic such that they can be applied to other areas, such as computer vision, speech
processing, policy learning, and planning.
References:
Arrieta et. al. Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities
and challenges toward responsible AI. Information Fusion 2020.
Camburu et. al. e-SNLI: Natural Language Inference with Natural Language Explanations.
NeurIPS 2018.
Liu et al. Towards Explainable NLP: A Generative Explanation Framework for Text
Classification. ACL 2019.
Network Optimisation Algorithms
Supervisors: Professor Tomasz Radzik & Dr Kathleen Steinhofel
Network Optimisation problems are computational problems with input data referring to a network
structure. Such problems occur in computer science, operations research, engineering, and applied
mathematics. From the computer science point of view, the general objective of studying network
optimisation problems is to develop efficient algorithms, which provide strict performance guarantees.
This project will focus on algorithms for network optimisation problems with the dynamic network
structure, which changes over time. One of the applications is to provide efficient routing in networks
where individual node-to-node links are not always available.
The Nexus between Crime, Mental Wellbeing and the Built Environment in Urban
Areas
Supervisors: Dr Nishanth Sastry, Dr Rita Borgo & Dr Andrea Mechelli
This project will explore the nexus between crime, mental wellbeing and the built environment in
urban areas, using London as a case study. Using a data-driven approach, the student will develop a
holistic understanding of the spatial and temporal dynamics of crime. For instance, objective notions
of crime such as real-time crime reports from Metropolitan Police can be compared with more
subjective notions of how safe a place "feels", measured using crowdsourcing using the UrbanMind
App. Spatial variation in crime levels and Temporal Dynamics (night vs. day or weekday vs.
weekend) will be mapped. Machine learning on images from Google Street View, Flickr etc can shed
light on how the built environment affects perceived notions of safety and whether it has an effect on
actual crime, as hypothesised by the "Broken Window" theory. The results will be used to inform
future work on urban wellbeing, as well as urban planning.
A Novel Model-driven AI Paradigm for Intrusion Detection
Supervisor: Dr Fabio Pierazzi
Intrusion Detection Systems (IDSs) are commonly deployed in networks and hosts to identify
malicious activities representing misuse of computer systems. The numbers and types of attacks have
been constantly increasing, and detection based on manually-defined signature is no longer a viable
option. Hence, AI-powered IDS solutions have been explored to keep up the arms race and scale to
new threats, but they are not yet deployed at scale in companies; this is mostly because such AI-
powered systems cannot be trusted and are not interpretable [1], and they suffer from a lot of false
positives preventing their applicability in real-world scenarios. In particular, a major limitation is that
most existing solutions for AI-powered IDSs are data-driven, where the relationships learned from the
data are often artifacts or domain-agnostic, and thus harder to trust and interpret even for network
administrators.
This project aims to explore the design of a novel model-driven AI paradigm for intrusion detection,
where expert knowledge is embedded in a model to characterize user behaviors (e.g., through formal
logic [2]), with the purpose of identifying malicious activities with trust, interpretability and
verifiability of the IDS decisions, in particular when deployed to real-world contexts. In other words,
this project aims to advance the state-of-the-art in AI-powered IDSs by integrating expert knowledge
in the models to achieve trust, interpretability and verifiability of decisions. This will increase the
overall safety of protected users by making IDS systems more effective and reliable, and
progress towards industry-wide deployment of AI-based solutions for intrusion detection.
[1] R. Sommer and V. Paxson, "Outside the closed world: On using machine learning for network
intrusion detection." IEEE Symp. Security and Privacy, 2010.
[1] S. Jajodia, N. Park, F. Pierazzi, A. Pugliese, E.Serra, G.I. Simari, V.S. Subrahmanian. A
probabilistic logic of cyber deception. IEEE Transactions on Information Forensics and Security, 2017.
Participatory Agent-Based Modelling of Emergency Department Patient Flow
Supervisors: Dr Steffen Zschaler & Dr Simon Miles
Emergency health care is in crisis; the core "4-hour" KPI has not been met since 2015. Emergency
Departments (EDs) are socio-technical systems with complex interactions between a wide range of
actors and with their urban environment. To help predict how changes in practice will affect the 4-
hour KPI while ensuring patient safety and quality of care, we have been developing agent-based
models (ABMs) of EDs, which can provide explainable analyses of behaviour of complex systems
emerging from the lower-level interaction of large numbers of agents. However, ABMs currently are
implemented in Java or C++, making them too technical to be understood and manipulated by clinical
decision makers. Hence, findings from ABM-based analyses are often not translated into
interventions. In this PhD project, you will explore how using domain-specific languages (DSLs)
closely aligned with clinical staffs' conceptualisation of the ED environment will affect acceptance of
ABM. We collaborate with King's College Hospital ED and Westminster City Council.
Personalised Medicine
Supervisors: Professor Costas Iliopoulos & Dr Sophia Tsoka
The explosion of human genomic data is a key driver of the current transition in healthcare to an era
of personalised medicine. The correct assembly and subsequent analysis of this data is, therefore,
crucial.
Algorithms can be designed to provide answers for the various and vast number of specific questions
that collectively elucidate the (dis)functioning of a biological entity, at the most fundamental level.
These algorithms can be categorised based on their purposes. For example, pattern
matching/discovery algorithms can find biologically significant motifs in sequences; alignment
algorithms can identify the similarity between sequences; and compression algorithms can allow the
latter two problems to be solved in a more time- and space-efficient manner.
Predictive Visual Analytics for Urban Contingency Planning
Supervisors: Dr Rita Borgo & Dr Grigorios Loukides
The aim of this project is to investigate the power of integrating predictive analytics and data
visualization to address the challenge of generation, validation and deployment of contingency plans
in the context of urban related scenarios.
Based on initial work already conducted in this area by both supervisors, the project will investigate
development and evaluation of:
algorithms for mining city rich data, both static (stored by the London Councils) and realtime
(retrieved from mobile platforms, remote sensors, and social media), in order to predict
emergencies efficiently and accurately;
novel visual encodings to enhance the diagnostic and predictive capabilities of mining
algorithms, through their integration within a flexible visual analytics system capable of
supporting and leveraging domain expert knowledge.
Application domain: contingency plans are employed across different organizations, from
government to businesses, to minimise risk of catastrophic impacts of unexpected events. Research
will have impact beyond the city remit.
Privacy in the Internet of Things
Supervisor: Professor Maribel Fernandez
Data Collection policies are used to restrict the kind of data transmitted by devices in the Internet of
Things (e.g., health trackers, smart electricity meters, etc.) according to the privacy preferences of the
user. The goal of this project is to develop cloud/IoT architectures with integrated data
collection and data sharing models, to allow users to specify their own policies and trade data for
services. For this, new data collection and data sharing models will have to be developed, with
appropriate user interfaces, policy languages, and policy enforcement mechanisms. An important
aspect of the project is the development of policy recommendation systems that can suggest/create
policies based on user profiles, making privacy an integral part of the system (according to the
privacy-by-design” IoT paradigm).
Programming as an HCI Challenge - IDE Interaction Design
Supervisors: Professor Michael Rolling & Dr Neil Brown
Frame-based editing with Stride [1] was a first attempt to revisit the design of program editing from
an HCI perspective, in the context of novice programmers. What would it look like to approach
professional IDEs from this perspective?
This project would take a Stride-like approach to professional tools and design, build and evaluate a
new, better editor.
[1] https://www.greenfoot.org/frames/
Program editor design for accessibility
Supervisors: Professor Michael Kolling & Dr Neil Brown
Most program editors use text for editing. Screen readers can be used with text-based editing by
visually-impaired programmers, but the syntax can often be confusing. Whitespace and punctuation
are highly significant in program text but often omitted or have poor interaction with screen readers.
Block-based editors rely less on syntax, so are potentially more suitable for accessible programming -
but blocks are often manipulated only through drag-and-drop interactions which are ill-suited to
visually-impaired users. Our existing Stride editor combines keyboard interactions with structural
programming, but does not yet have support for accessibility tools. This project would look at
improving the Stride editor to work well with accessibility, especially for vision-impaired users,
including the design, implementation and evaluation of the editor with actual users.
Programming history for learning and reflection
Supervisors: Professor Michael Kolling & Dr Neil Brown
Version control provides a way to store and view the history of program code. This is generally
considered an advanced tool, used for collaborating or once a programmer is working on a large code
base. This project would investigate the implications of using built-in automatic version control. Can
this help during novice program development, can it help students in reflecting on their learning
progress, and could it be used to provide more accurate programming assessment. This would involve
the design, development and multiple evaluations of automatic version-control in a beginner's IDE.
Safe Reinforcement Learning from Human Feedback
Supervisor: Dr Yali Du
Reinforcement learning (RL) has become a new paradigm for solving complex decision making
problems. However, it presents numerous safety concerns in real world decision making, such as
unsafe exploration, unrealistic reward function, etc [1]. While humans understand the dangers, human
involvement in the agent’s learning process can be promising to boost AI safety [2,3].
Early research [2] adopted human preference as a replacement for reward signals, without considering
safety and trustedness of agents. [3] uses human’s guidance in a supervised learning manner; agents
are asking for guidance randomly without adapting to its knowledge of the environment or task.
This project considers leveraging human feedback to build safe RL agents based on symbolised
preference or abstracted states. On the one hand, symbolic feedback can be easily generated by
humans and effectively applied to the agent learning phase, such as a human's binary preference on an
agent's actions or policies. On the other hand, different approaches for state abstractions will be
considered to build up the knowledge base of safe or dangerous behaviours, such as spatial-temporal
abstractions [4].
Based on the symbolized preference and behaviour abstractions, there are several potential scenarios
be explored:
1. Active parenting. Firstly, like a toddler learning to walk, Human guidance is when parents
say “no” or redirect a toddler attempting something dangerous. With a parent agent that is
knowledgeable of the dangerous states, it can provide guidance to an AI agent. When the AI
agent attempts to go to a dangerous state, the parent agent with the knowledge of the
dangerous set will forbid the AI agent to do so.
2. Active learning. Secondly, the parent agent does not proactively provide guidance to the AI
agent but only helps when the AI agent asks for it. The AI agent will have two policies, one
policy is for decision making, and the other policy is for generating decisions of whether it
should ask parents for guidance.
3. Sharing autonomy. Explainable models can be employed to predict situations where the AI
agent is not performing well. On such occasions we can take control from the agent and ask
for expert/human advice. The key challenge is to achieve a balance between exhausting
experts and reducing the false negative rate of prediction of unsafe situations.
References:
[1] Wirth C, Akrour R, Neumann G, Fürnkranz J. A survey of preference-based reinforcement
learning methods. Journal of Machine Learning Research. 2017;18(136):1-46.
[2] Christiano PF, Leike J, Brown TB, Martic M, Legg S, Amodei D. Deep reinforcement learning
from human preferences. In Proceedings of the 31st International Conference on Neural Information
Processing Systems (NeurIPS), 2017, pp. 4302-4310.
[3] Frye C, Feige I. Parenting: Safe reinforcement learning from human input. arXiv preprint
arXiv:1902.06766.
[4] Zahavy, T., Zrihem, N. Ben, & Mannor, S. (2016). Graying the black box: Understanding
DQNs. 33rd International Conference on Machine Learning (ICML) 2016, 4, 28092822.
Security and Safety of Cyber-Physical Systems
Supervisor: Professor Luca Vigano
Cyber-Physical Systems (CPSs) are integrations of networking and distributed computing systems
with physical processes and associated instrumentation that monitor and control entities in the
physical environment, with feedback loops where physical processes affect computations and vice
versa. Emerging applications of CPS include all the essential pieces of our social infrastructure:
telecoms, banking, manufacturing, health energy, transportation, government smart cities. CPSs have
effectively become one of the driving factors of the so-called fourth industrial revolution (Industry
4.0), but all the new opportunities opened by CPSs will only materialize if we can ensure their
security and safety.
However, this need is often not addressed in current practice because of the major challenges that are
posed by the heterogeneous and distributed nature of the systems and their interaction with the
physical world and with the human users. As a consequence, there has been a dramatic increase in the
number of attacks, e.g., influencing physical processes to bring the system into an undesired state.
System failure can be extremely costly and threaten not only the system's environment but also
human life.
The main aim of this PhD project will be to develop model-based AI techniques for representing,
analysing and reasoning about the security and safety of both the technical components of a CPS
(control, computation, communication) and its social components (e.g., user interaction processes and
user behavior) together and at the same time. The goal will be to overcome the limits of the state-of-
the-art to devise methodologies and technologies for the formal validation of properties of CPSs to
include the human element together with the technical in a holistic, socio-technical approach for
security and safety, and to rebound the findings over the users through behavior change techniques.
This will greatly simplify the design, development, deployment, and management of socio-technically
secure CPSs, and thus have a disruptive and lasting impact.
Smart Metering Voice Controlled Devices
Supervisors: Dr Rita Borgo and Dr Alfie Abdul-Rahman
Communication is an integral part of our daily lives and no communication mean is more significant
than the human voice.
The advent of the Internet of Things (IoT) and advances in computing technologies and natural
language processing have made possible to exploit voice recognition in the context of voice controlled
devices. Such devices capture user’s spoken words and employ sophisticated AI frameworks to
analyse, interpret and act out their inner intention. Such devices share the same challenges as any IoT
device, that is Privacy of the information flow and Security of the system.
Oftentimes vulnerability of voice controlled devices resides not as much in potential cyber-attacks as
rather in the technology itself and its maker’s interpretation of user Privacy, key element to Trust in
Autonomous Systems, and as a consequence Safety of their data and persona.
In this project we are interested to move the attention from the system itself to the user side. We will
focus our attention to devices such as Alexa and Google Nest where the human-AI interaction
pervades multiple levels of a user life context.
The volume of information exchange is large, varied, may touch critical aspects of one’s life which
may, or may not, be explicitly interconnected. In this scenario the nature and type of information fed
to the AI through the user-AI communication channel matters greatly, yet there is no mean to return
to the user control of such flow.
In this project we will leverage Visualization, Natural Language Processing and Human-Computer
Interaction as means to model the human-AI dialogue, its domain and parameter space. Core to this
will be the ability to:
characterise the nature of information flows (human-AI and vice versa);
develop metrics to estimate the level of privacy of information exchanged, volunteered by the
user and pried by the system, in each communication flow;
develop metrics to estimate the level of privacy when cross referencing information exchanged
within more than a single flow.
Where a flow can be seen as either a temporal instance or a thematic instance.
Starting from these core elements we will aim to define a novel framework capable of supporting
human understanding and agency within the human-AI dialogue dynamic and its characteristics, with
special focus on the specific context of home voice-activated devices. We aim to explore users abilities
to gauge the level of threat versus gain incurred through the use of voice-controlled devices. How to
empower the user to make informed decisions with respect to which elements of a dialogue could
potentially be released to the AI system and for which deletion should be required, appraise the level
threats and risks associated with each choice.
Outcomes of the proposed research will be grounded in theoretical foundations, validated and verified
through empirical evaluation. Methods to achieve the project goals will include and not limited to:
symbolic modelling to map user vs system knowledge, sentiment analysis and topic modelling,
information visualization, grounded theory.
Software Verification and Nominal Dependent Type Theory
Supervisor: Professor Maribel Fernandez
Dependent Type Theory is a mathematical tool to write formal specifications and prove the
correctness of software implementations. The proof assistants used to certify the correctness of
programs (such as Coq), are based on dependently-typed higher-order abstract syntax. The goal of
this project is to explore alternative foundations for proof assistants using nominal techniques. The
nominal approach has roots in set theory and has been successfully used to specify programming
languages. This project will focus on the combination of dependent types and nominal syntax, and
explore the connections between the nominal approach and the higher-order syntax approach used in
current proof assistants.
String Sanitisation with Applications to Internet of Things Data
Supervisors: Dr Grigorios Loukides, Professor Costas Iliopoulos, Professor Luca Vigano
The overall aim of the project is to develop and evaluate a robust and efficient approach that allows
organisations and businesses to protect the privacy of data represented as strings. The project will
consider the protection of aggregated data (event sequences), as well as string databases, and it will
also address the interrelated issues of usefulness, security, and scalability. It aims to develop a
methodology (model, algorithms, protocols) for sanitising (i.e., transforming) data that is: (I) privacy-
preserving, by designing and applying a privacy model along with algorithms for sanitising string data.
(II) Utility-preserving, by designing measures and tools for quantifying the level of usefulness of data
that must be traded-off for achieving privacy. (III) Secure and scalable, by designing efficient
protocols that allow multiple parties to securely and jointly protect their data. The methodology will
be evaluated on data from the Internet of Things (IoT) domain.
Temporal and Resource Controllability of Workflows of Autonomous Systems
Supervisor: Professor Luca Vigano
Workflow technology has long been employed for the modeling, validation and execution of business
processes, and will play a crucial role in the design, development and maintenance of
future autonomous systems. A workflow is a formal description of a business process in which
single atomic work units (tasks), organized in a partial order, are assigned to processing entities
(agents) in order to achieve some business goal(s). Workflows can also employ workflow paths in
order (not) to execute a subset of tasks. A workflow management system coordinates the execution
of tasks that are part of workflow instances such that all relevant constraints are eventually
satisfied.
Temporal workflows specify business processes subject to temporal constraints such as controllable or
uncontrollable durations, delays and deadlines. The choice of a workflow path may be controllable or
not, considered either in isolation or in combination with uncontrollable durations. Access controlled
workflows specify workflows in which users are authorized for task executions and authorization
constraints say which users remain authorized to execute which tasks depending on who did what.
Access controlled workflows may consider workflow paths too other than the uncertain availability of
resources. When either a task duration or the choice of the workflow path to take or the availability of
a user is out of control, we need to verify that the workflow can be executed by verifying all
constraints for any possible combination of behaviors arising from the uncontrollable parts. Indeed,
users might be absent before starting the execution (static resiliency), they can also become so during
execution (decremental resiliency) or they can come and go throughout the execution (dynamic
resiliency).
Temporal access controlled workflows merge the two previous formalisms by considering several
kinds of uncontrollable parts simultaneously. Authorization constraints may be extended to
support conditional and temporal features.
This PhD project will aim to ensure the safety and trust of autonomous systems by reasoning about
the temporal and resource controllability under uncertainty of the workflows that govern them.
Towards Protection of Users in Online Social Networks
Supervisor: Dr Guillermo Suarez de Tangil
Methods currently used to detect unwanted content in Online Social Networks (OSN) suffer from a
number of limitations. First, they are prone to produce unfair decision dominated by the skewed
population they are modelled with. Second, algorithms are unable to explain why certain content has
been flagged as unwanted.
Active adversaries are currently exploiting this flaws to evade the detection mechanisms placed in
current OSN. This applies to several application domains such as: i) the use of spear phishing or
malware attacks to enable cyber-dependent crime, ii) the use of coordinated harassment campaign to
deliver harmful or deceiving content (e.g., politically-biased memes), or iii) the use of commercially-
driven sensational content to deliver fake news.
The purpose of this PhD is to devise new disciplines aiming at protecting users, and specially minors,
from malicious actions in OSN and understanding novel threats. The scope of the project will focus on
studying the problem of online aggression from a broad perspective.
Tracing Trust - Visual Frameworks for Explainable AI
Supervisors: Dr Rita Borgo, Dr Daniele Magazzeni, Dr Alfie Abdul-Rahman
Explainable Artificial Intelligence (XAI) is a topic receiving close review and increasing interest
across different fields. Crucial to explainability is understanding of cause-effect relationships which in
complex intelligent systems are anything but clear. Lack of ability to present the rationale behind a
decision making process inevitably mines trust and introduces uncertainty with respect to
accountability of consequences.
The proposed research program will focus on the creation of a theoretical and applied framework to
support the creation of systems to help people interpret the reasoning behind decisions made by AI
systems. The project will entail design, implementation, and testing of visualization interfaces
connecting to and integrating with explainable intelligent systems designed by partners.
This project places itself across three different fields: visual analytics, human-computer interaction,
and artificial intelligence.
Hub relevance: Autonomous Systems
The Undergound Economy: Understanding and Modelling Misuse in the Darkest
Corners of the Web
Supervisor: Dr Guillermo Suarez de Tangil
Underground markets play a key role in the proliferation of cybercrime. Users with few technical
skills can easily acquire services and tools in the darknet to set up their own criminal operations. An
example is illicit crypto-mining campaigns, that use botnets rented in such markets to leverage stolen
resources and covertly mine 4.4% of the entire Monero in circulation (circa 58 million USD) [1].
Profits generated by these campaigns introduce massive incomes to cyber-criminals. These incomes
fuel the underground economy and gear other cyber-criminal activities. More importantly, these
threats generally cause important economical loses to victims.
The purpose of this PhD is to develop data driven approaches to better understand how these
communities are structured and the type of crimes they support.
[1] A First Look at the Crypto-Mining Malware Ecosystem: A Decade of Unrestricted Wealth.
Sergio Pastrana and Guillermo Suarez-Tangil. ACM Internet Measurement Conference (IMC). Oct
21-23, 2019. Amsterdam, Netherlands
Understanding Cyber-Dependent Crimes that are enabled by Malware from a
Software Development Perspective
Supervisors: Dr Guillermo Suarez de Tangil & Professor Luca Vigano
The goal of this thesis is to better understand cyber-dependent crimes that are enabled by malware
from a software development perspective. The purpose is threefold: a) to profile malware developers,
b) to understand their business model, and c) to measure the impact of malware trading in
underground markets and surface forums. Throughout this Thesis, the PhD candidate will learn how
to reverse engineer malicious code and feed this information to different machine learning algorithms.
The candidate will also be conducting malware-related measurements in underground markets and
darknet forums. The qualifications obtained by the candidate will be relevant to different stakeholders
such as: i) low-enforcement and anti-crime agencies when designing strategies to prosecute these
actors, ii) incident response teams and forensic analysts to make informed decisions when a malware is
discovered, and iii) national advisory centres to understand novel infection vectors or CERTs to
design both mitigation and early detection strategies.
Unifying Principals in Safe and Trusted Assistive AI
Supervisor: Dr Yali Du
AI agents are often required to assist humans in many day-to-day tasks, such as in recommendation
systems, restaurant reservation and self-driving cars [1]. As AI agents are frequently evaluated in
terms of performance measures, such as human-stated rewards, many challenges are posed. Firstly,
due to the involvement of multiple users, agents have to learn to strike a balance between the widely
different human preferences [3]. Secondly, while it is usually assumed that humans are acting honestly
in specifying their preference, such as by rewards or demonstrations, the consequence of humans mis-
stating their objectives is commonly underestimated. Humans may maliciously or unintentionally mis-
state their preference, leading the assistive AI agent to perform unexpected implementations. An
example is the Tay chatbot from Microsoft; prankster users falsify their demonstrations and train Tay
to mix the racist comments into its dialogue .
This project aims to unify many principals to achieve fairness and social welfare, towards building safe
and trustworthy assistive AI agents that avoid bias and manipulation like Tay Chatbot. The human
preference can be explicitly stated as ‘like’ or ‘dislike’ of the agent’s performance, or implicitly stated
through the demonstrations. Two popular learning paradigms can be considered, Reinforcement
Learning (RL) from specified preference [1] and Apprenticeship Learning (AL) [2] with human’s
value implicitly expressed by their demonstrations. By reinforcement learning, agents learn to
perform given tasks based on preference. By apprenticeship learning, agents observe human
demonstrations (historical trajectories) that reveal human’s interest, and learn to perform tasks to align
with human values.
Example questions that can be explored:
1. Multi-objective learning: given the objectives specified either by reward or demonstrations,
how can we balance the different and possibly conflicting objectives from users?
2. Manipulating the assistive learning: a famous result from social choice theory is that, a non-
trivial collective decision is subject to manipulation [4], how easy is it for one or some users to
change the behavior of an assistive agent? Or how can a human bias the system towards their
own interest? By studying how to manipulate assistive learning, the ultimate goal is still to
develop robots that can delegate multiple humans’ interests fairly and correctly.
References:
[1] Chen, X., Du, Y., Xia, L., & Wang, J. (2021). Reinforcement recommendation with user multi-
aspect preference. The Web Conference 2021 - Proceedings of the World Wide Web Conference
(WWW ) 2021, 425435. https://doi.org/10.1145/3442381.3449846
[2] Fickinger, A., Zhuang, S., Critch, A., Hadfield-Menell, D., & Russell, S. (2020). Multi-Principal
Assistance Games: Definition and Collegial Mechanisms. NeurIPS, 2020, 110.
[3] McAleer S, Lanier J, Dennis M, Baldi P, Fox R. Improving Social Welfare While Preserving
Autonomy via a Pareto Mediator. arXiv preprint arXiv:2106.03927. 2021.
[4] Allan Gibbard. Straightforwardness of game forms with lotteries as outcomes. Econometrica:
Journal of the Econometric Society, pages 595614, 1978.
Unstructured Big Data
Supervisors: Professor Costas Iliopoulos & Dr Grigorios Loukides
A major challenge in today's society is the explosive growth of unstructured data such as text,
images, videos and speech data. These forms of data exhibit the three characteristics of velocity,
volume and variety that make processing and comprehending them a challenging task.
The initial processing of this data is invariably done using automated methods, as manual processing
would be prohibitively expensive. The output of this automated processing is uncertain, either due to
inaccuracies or inconsistencies in the raw data, or due to the automated processing. The database
community has recognised this phenomenon in recent years, and several probabilistic formulations of
uncertain data have been proposed, with a focus on processing SQL-like or ranking queries on such
data. However, the science of mining, pattern analysis and pattern discovery on uncertain data
expressed in probabilistic terms is very much in its infancy.
Mining probabilistic uncertain data to obtain reliable and actionable information is a critical
challenge. Since the proliferation of "data science pipelines" uncertainties in one stage can propagate
and magnify in later stages. It is essential both that uncertainty is processed appropriately by the
system and that the data is not artificially made certain by, for example, choosing the most probable
outcome at each stage.
The central hypothesis of this proposal is that the new field of algorithms on uncertain sequences
that we propose is an important and broad foundation for representing and mining uncertain data
arising in a wide variety of contexts. In addition, novel algorithmic techniques and ideas will be
needed and could be useful for other high throughput data processing. The breadth of the proposed
area investigation can be illustrated by the three abstract models given below:
A) Probabilistic sequences: they model a number of real-world data, as - DNA sequences,
either to represent single nucleotide polymorphisms, or errors introduced by wet-lab
sequencing platforms during the process of DNA sequencing.
- Converting sensor readings into meaningful human actions (e.g., accelerometer
readings into kinds of human activity, using blood pressure/voice pitch to infer
emotions) due to the process' intrinsically uncertainty.
- Software behaviour is often characterised in terms of sequences of events, such as the
order of user interactions with a GUI or a web-page, the order of function invocations
within a program, or the order in which network packets are sent to a server. A
common assumption is that system behaviour is deterministic. It is however easy to
envisage situations in which this assumption is violated. Network packets might arrive
in a different order, depending on their route through a network, a unit of code might
include stochastic behaviour to arise from random number generators, or different
interleaving of concurrent processes.
B) Uncertain Event Sequences: arise from a number of sources including measurement error,
randomness in the underlying phenomenon, and due to distributed and asynchronous data
gathering. They are used in a number of real-world scenarios to model and analyse spatial or
temporal data, which is of interest in diverse disciplines as computational neuroscience, earth
science and telecommunications. Marked event sequences are even more general and can be
applied to computer and economic systems for examples.
Uncertain Time Series: are most naturally associated with measurement errors, but can
directly represent a range of variation (e.g. high/low stock prices in a day's trading,
confidence intervals for predictions) or deliberate obfuscation for reasons of privacy
preservation. They can be seen as special cases of event sequences, but while in uncertain
time series the uncertainty lies in the value, in uncertain event sequences, the uncertainty is in
the time that the event occurred.
Research Programme and Methodology:
We will focus on highly-scalable methods discovering repetitive structures in uncertain sequences.
Given the uncertainty in the underlying data, these repetitions will of necessity be approximate,
rather than exact.
There are two major technical obstacles to overcome: firstly, classical measures of approximation
(edit or Hamming distance) are inadequate to measure similarity between uncertain sequences. One
objective of this project is to define new, alternative, well-founded and powerful approaches for
measuring similarity between sequences. Secondly, we need to develop novel algorithmic
techniques for solving problems in the context of uncertain sequences.
These problems are directly motivated by bioinformatics applications, such as studying genetic
mutations; DNA sequence analysis of antibodies and identification of "hairpins" that occur in DNA
sequences in Tuberculosis and HIV virus strains, respectively. However, they are also closely
related to pattern discovery tasks that arise in other problem domains. Furthermore, they are also the
most intensively studied problems in mining time series data and, to the best of our knowledge, these
problems have not been considered in the uncertain time series framework, and it is not at all clear
how to extend the known methods to this case. A solution is to build upon the experience we have
in musical and biological computation pattern analysis, which share some characteristics with
uncertain sequence processing, to suggest lines of attack.
Objectives:
1. Devise appropriate uncertain / probabilistic sequence formulations for modelling large-
scale complex heterogeneous data.
2. Develop highly-scalable algorithms for pattern / motif discovery and sequential pattern
mining in uncertain sequence data.
3. Build a theoretical framework for pattern discovery in dynamic, streaming and high-
throughput uncertain sequence data.
4. Develop robust and well-founded methods for inferring actionable models of uncertain
sequence data.
5. Devise appropriate and tractable formal frameworks for modelling stochastic
dependencies in uncertain sequence data.