Shot charts of Allen Iverson 2005-06 vs. James Harden 2018-19. Taken from Kirk Goldsberry's Twitter: https://twitter.com/kirkgoldsberry/status/1108030357570371584

Ryan Chan

I'm currently a Research Software Engineer working in the Research Engineering Team at The Alan Turing Institute.

Prior to joining the Research Engineering Team at Turing, I completed my PhD in Statistics at the Institute and the University of Warwick. During my PhD, I worked on developing scalable Monte Carlo methods under the supervision of Professor Gareth Roberts (Warwick) and Dr Murray Pollock (Newcastle).

I completed my undergraduate degree at the University of Leeds, where I was awarded a first class honours MMath Mathematics degree and recieved the Royal Statistics Society Prize for outstanding performance in my studies.

Research & Projects

My main research focus during my PhD was primarily in developing Monte Carlo methods for combining distributed statistical analyses (Fusion). These days, I work less on Monte Carlo methods, and my role as a research software engineer at the Turing has allowed me to explore a number of interesting research areas. Some projects that I have been involved with at the Turing are:

  • Rough Paths
    • Developing software and applications of Rough Path theory and path signatures - a mathematical tool for describing complex streams of data in a structured and standardised way.
    • Working on applying path signatures in natural language processing problems with the DataSig research team.
      • Check out nlpsig, a Python library for constructing paths of embeddings obtained from transformers.
      • Check out sig-networks, a PyTorch library for training and evaluating signature-window neural networks for longitudinal NLP tasks - see our systems demonstration paper on this work here.
      • Check out SigMahaKNN, a Python implementation of an algorithm for anomaly detection on mulitvariate streams using path signatures and the variance norm - for details, see our paper on this work here.
  • Reginald: A friendly Turing Slack bot
    • A side project born out the Research Engineering Team's Hackweek 2023 at the Turing.
    • Reginald is a simple Slack bot written in Python that listens for direct messages and @mentions in any channel it is in and responds with a message and an emoji. It responds by using a Retreival Augmented Generation (RAG) model using llama-index.
    • Check out the Reginald project on GitHub.
  • IceNet
    • IceNet is a probabilistic, deep learning sea ice forecasting system developed by an international team and led by British Antarctic Survey and The Alan Turing Institute.
    • Contributed to the development a machine learning pipeline for operational use of the IceNet model.
    • IceNet was nominated for the Artic Circle's Frederik Paulsen Arctic Academic Action Award.
  • AIrsenal & AIrgentina
    • AIrsenal is a Python package for using Machine learning to pick a Fantasy Premier League team. You can read more about the model here.
    • AIrgentina is a model for predicting the results for the 2022 World Cup using a framework based on the team-level model used in the AIrsenal. We modified AIrsenal to make it more suited to predicting international results, and our resulting AIrgentina model came 6th in @Futbolmetrix1's WC2022 Sophisticated Prediction Contest outperforming models developed by FiveThirtyEight, Opta and Betfair! You can read more about the model here.
  • Foundation Models Reading Group
    • I run a reading group on Foundation Models at the Turing, find out more about it here.

Previous projects

  • Recommendation Systems for Podcast Discovery (ATI Data Study Group, April 2021)
    • Participated in a project with Entale to develop podcast recommendation systems, where I primarily worked on developing a topic model to recommend new podcasts based on the similarity to the topics that have been of interest to a listener previously.
    • Allowed me to gain some general experience working with natural language processing methods, collaborative filtering, text mining, clustering algorithms, dimension reduction techniques and recommender systems using Python. You can read more about the project here.
  • Bayesian Sports Modelling
    • In my masters project, I investigated the applicability of Bayesian hierarchical models for predicting the outcome of football matches. We were able to develop models that achieved a greater prediction accuracy than existing models in the literature.
    • R and Stan were used to implement various models; the code and report can be found here.

Research Interests

  • Machine Learning
  • Natural Language Processing
  • Evaluating LLMs
  • Deep Learning and AI
  • Bayesian statistics
  • Computational Statistics & Monte Carlo methods
  • Markov Chain Monte Carlo & Sequential Monte Carlo

Publications

  • Tseriotou, T., Chan, R.S.Y., Tsakalidis, A., Bilal, I.M., Kochkina, E., Lyons, T., Liakata, M. Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling. Submitted. [arXiv].
  • Shao, Z., Chan, R.S.Y., Cochrane, T., Foster, P., Lyons, T. 2023. Dimensionless Anomaly Detection on Multivariate Streams with Variance Norm and Path Signature. Submitted. [arXiv].
  • Mougan, C., Plant, R., Teng, C., Bazzi, M., Ejea, A.C., Chan, R.S.Y., Jasin, D.S., Stoffel, M., Whitaker, K.J. and Manser, J. 2023. How to Data in Datathons. 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmark. [Paper].
  • Chan, R.S.Y., Johansen, A.M., Pollock, M., and Roberts, G.O. 2023. Divide-and-Conquer Fusion. The Journal of Machine Learning Research, 24(193):1−82. [Paper].
  • Data Study Group team. 2022. Data Study Group Final Report: Entale. Zenodo. [Report].
  • Chan, R.S.Y., Johansen, A.M., Pollock, M., and Roberts, G.O. 2021. Divide-and-Conquer Monte Carlo Fusion. [arXiv].
  • Chan, R. and Dai, H. 2020. Discussion of "Quasi-Stationary Monte Carlo and the ScaLE Algorithm" by Pollock, Fearnhead, Johansen and Roberts. JRSS B. [Article]. [Video].

Talks, Conferences and Awards


Presentations

Greek Stochastics Nu [Naxos]. 07/07/23.
AIUK Demo (IceNet) [London]. 2023. 21/03/23.
Lunchtime Tech Talk [The Alan Turing Institute]. 25/10/22.
2022 World Meeting of the International Society for Bayesian Analysis [Montreal]. 28/06/22.
Data Study Group (April 2021) Entale [The Alan Turing Institute]. 29/04/21.
Young Researchers' Meetings (YRM) [University of Warwick]. 09/02/21.
Statistics Group Seminar [Newcastle University]. 10/07/20.
RSS Discussion Meeting for "Quasi-stationary Monte Carlo methods and the ScaLE Algorithm" [The Royal Statistical Society]. 24/06/20. [Video].
The Student Seminar Series [The Alan Turing Institute]. 21/04/20.
University of Warwick Departmental Conference [Gregynog]. 24/03/20.
Algorithms & Computationally Intensive Inference Seminar [University of Warwick]. 08/03/19.
Student Seminar [University of Leeds]. 22/05/18. [Handout].

Posters

2022 World Meeting of the International Society for Bayesian Analysis [Montreal]. 28/06/22.
Bayes at CIRM 2021 [Marseille]. 25/10/21.
2021 World Meeting of the International Society for Bayesian Analysis [Online]. 28/06/21.
The Turing Research Showcase [Online]. 24/09/20. [Video].
Greek Stochastics Lambda [Corfu]. 28/08/19.
O'Bayes 2019 [University of Warwick]. 29/06/19.

Awards

The Alan Turing Institute Doctoral Studentship (2018-2022)
The Royal Statistical Society Prize (2018)
The Top 10 Scholarship, awarded to the top 10 undergraduate students at the School of Mathematics, University of Leeds (2015, 2016, 2017)
Summer Vacation Bursary Scheme, awarded to undertake a research project with the School of Mathematics, University of Leeds (2016, 2017)

Miscellaneous

Thesis: Monte Carlo methods for combining sample approximations of distributions; Examined by Professor Nicolas Chopin and Dr. Krzysztof Łatuszyński.
Advent of Code 2022 with Julia
R package to implement Fusion methodologies (for unifying distributed analyses): DCFusion
R package to simulate layered Brownian bridges: layeredBB
Advent of Code 2021 with R
Masters Thesis: Bayesian Sports Modelling
Undergraduate project: Automatic Puzzle Solving
Fundraising for FareShare: Cycling challenge