Shot charts of Allen Iverson 2005-06 vs. James Harden 2018-19. Taken from Kirk Goldsberry's Twitter: https://twitter.com/kirkgoldsberry/status/1108030357570371584

Ryan Chan

I'm currently a Research Software Engineer working in the Research Engineering Team and in the Fundamental AI research group at The Alan Turing Institute.

At the Turing, I run the Robots in Disguise reading group on fundamental AI and deep learning research. You can find out more about it here. I have presented on a number of topics in this reading group and links to the slides can be found under the Presentations section below.

Prior to joining the Research Engineering Team at Turing, I completed my PhD in Statistics at the Institute and the University of Warwick. During my PhD, I worked on developing scalable Monte Carlo methods under the supervision of Professor Gareth Roberts (Warwick) and Dr Murray Pollock (Newcastle).

I completed my undergraduate degree at the University of Leeds, where I was awarded a first class honours MMath Mathematics degree and recieved the Royal Statistics Society Prize for outstanding performance in my studies.

Research & Projects

My main research focus during my PhD was primarily in developing Monte Carlo methods for combining distributed statistical analyses (Fusion). These days, I work less on Monte Carlo methods, and my role as a research software engineer at the Turing has allowed me to explore a number of interesting research areas. Some projects that I have been involved with recently are:

  • Maya: Multimodal Aya (Cohere For AI's Expedition Aya 2024 project)
  • prompto: supporting reproducible experiments on LLMs at scale
    • prompto is an open source software package developed to support reproducible experiments on Large Language Models at scale. The tool interacts with self-hosted open source models as well as proprietary API-based services, making use of individual rate and token limits, to optimise for large-scale evaluations.
    • A pre-print of this work can be found here.
  • Reginald: A friendly Turing Slack bot
    • A side project born out the Research Engineering Team's Hackweek 2023 at the Turing.
    • Reginald is a simple Slack bot written in Python that listens for direct messages and @mentions in any channel it is in and responds with a message and an emoji. It responds by using a Retreival Augmented Generation (RAG) model using llama-index. We have also written a command-line interface to interact with the model directly in the terminal.
  • Rough Paths
    • Developing software and applications of Rough Path theory and path signatures - a mathematical tool for describing complex streams of data in a structured and standardised way.
    • Working on applying path signatures in natural language processing problems with the DataSig research team.
      • Check out nlpsig, a Python library for constructing paths of embeddings obtained from transformers.
      • Check out sig-networks, a PyTorch library for training and evaluating signature-window neural networks for longitudinal NLP tasks - see our EACL2024 systems demonstration paper on this work here.
      • Check out SigMahaKNN, a Python implementation of an algorithm for anomaly detection on mulitvariate streams using path signatures and the variance norm - for details, see our paper on this work here.
  • IceNet
    • IceNet is a probabilistic, deep learning sea ice forecasting system developed by an international team and led by British Antarctic Survey and The Alan Turing Institute.
    • Contributed to the development a machine learning pipeline for operational use of the IceNet model.
    • IceNet was nominated for the Artic Circle's Frederik Paulsen Arctic Academic Action Award.
  • AIrsenal & AIrgentina
    • AIrsenal is a Python package for using Machine learning to pick a Fantasy Premier League team. You can read more about the model here.
    • AIrgentina is a model for predicting the results for the 2022 World Cup using a framework based on the team-level model used in the AIrsenal. We modified AIrsenal to make it more suited to predicting international results, and our resulting AIrgentina model came 6th in @Futbolmetrix1's WC2022 Sophisticated Prediction Contest outperforming models developed by FiveThirtyEight, Opta and Betfair! You can read more about the model here.

Previous projects

  • Recommendation Systems for Podcast Discovery (ATI Data Study Group, April 2021)
    • Participated in a project with Entale to develop podcast recommendation systems, where I primarily worked on developing a topic model to recommend new podcasts based on the similarity to the topics that have been of interest to a listener previously.
    • Allowed me to gain some general experience working with natural language processing methods, collaborative filtering, text mining, clustering algorithms, dimension reduction techniques and recommender systems using Python. You can read more about the project here.
  • Bayesian Sports Modelling
    • In my masters project, I investigated the applicability of Bayesian hierarchical models for predicting the outcome of football matches. We were able to develop models that achieved a greater prediction accuracy than existing models in the literature.
    • R and Stan were used to implement various models; the code and report can be found here.

Publications (Google Scholar)

  • Chan, R.S.Y., Nanni, F., Brown, E., Chapman, E., Williams, A.R., Bright, J., Gabasova, E. Prompto: An open source library for asynchronous querying of LLM endpoints. 2024. Submitted. [Pre-print].
  • Williams, A.R., Burke-Moore, L., Chan, R.S.Y., Enock, F.E., Nanni, F., Sippy, T., Chung, Y.L., Gabasova, E., Hackenburg, K., Bright, J. 2024. Submitted. [Pre-print].
  • Tseriotou, T., Chan, R.S.Y., Tsakalidis, A., Bilal, I.M., Kochkina, E., Lyons, T., Liakata, M. Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling. 2024. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. [Paper].
  • Shao, Z., Chan, R.S.Y., Cochrane, T., Foster, P., Lyons, T. 2023. Dimensionless Anomaly Detection on Multivariate Streams with Variance Norm and Path Signature. Submitted. [Pre-print].
  • Mougan, C., Plant, R., Teng, C., Bazzi, M., Ejea, A.C., Chan, R.S.Y., Jasin, D.S., Stoffel, M., Whitaker, K.J. and Manser, J. 2023. How to Data in Datathons. 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmark. [Paper].
  • Chan, R.S.Y., Johansen, A.M., Pollock, M., and Roberts, G.O. 2023. Divide-and-Conquer Fusion. The Journal of Machine Learning Research, 24(193):1−82. [Paper].
  • Data Study Group team. 2022. Data Study Group Final Report: Entale - Recommendation systems for podcast discovery. [Report].
  • Chan, R. and Dai, H. 2020. Discussion of "Quasi-Stationary Monte Carlo and the ScaLE Algorithm" by Pollock, Fearnhead, Johansen and Roberts. JRSS B. [Article]. [Video].

Talks, Conferences and Awards


Awards

C4AI Expedition Aya 2024: Most Promising Award for Maya: Multimodal Aya
The Alan Turing Institute Doctoral Studentship (2018-2022)
The Royal Statistical Society Prize (2018)
The Top 10 Scholarship, awarded to the top 10 undergraduate students at the School of Mathematics, University of Leeds (2015, 2016, 2017)
Summer Vacation Bursary Scheme, awarded to undertake a research project with the School of Mathematics, University of Leeds (2016, 2017)


Presentations

Robots in Disguise: Mechanistic Interpretability I. 23/09/24.
Robots in Disguise: An overview of Llama 3.1. 12/08/24.
Robots in Disguise: Vision Transformers. 07/08/23.
Robots in Disguise: GPT. 24/07/23.
Robots in Disguise: BERT. 07/07/23.
Greek Stochastics Nu [Naxos]. 07/07/23.
Robots in Disguise: Transformer Encoder and Decoder models. 26/06/23.
Robots in Disguise: Sequence-to-sequence models: Part II - Encoder-Decoder models. 03/05/23.
Robots in Disguise: Sequence-to-sequence models: Part I - RNNs/LSTMs. 17/04/23.
AIUK Demo (IceNet) [London]. 2023. 21/03/23.
Lunchtime Tech Talk [The Alan Turing Institute]. 25/10/22.
2022 World Meeting of the International Society for Bayesian Analysis [Montreal]. 28/06/22.
Data Study Group (April 2021) Entale [The Alan Turing Institute]. 29/04/21.
Young Researchers' Meetings (YRM) [University of Warwick]. 09/02/21.
Statistics Group Seminar [Newcastle University]. 10/07/20.
RSS Discussion Meeting for "Quasi-stationary Monte Carlo methods and the ScaLE Algorithm" [The Royal Statistical Society]. 24/06/20. [Video].
The Student Seminar Series [The Alan Turing Institute]. 21/04/20.
University of Warwick Departmental Conference [Gregynog]. 24/03/20.
Algorithms & Computationally Intensive Inference Seminar [University of Warwick]. 08/03/19.
Student Seminar [University of Leeds]. 22/05/18. [Handout].

Posters

2022 World Meeting of the International Society for Bayesian Analysis [Montreal]. 28/06/22.
Bayes at CIRM 2021 [Marseille]. 25/10/21.
2021 World Meeting of the International Society for Bayesian Analysis [Online]. 28/06/21.
The Turing Research Showcase [Online]. 24/09/20. [Video].
Greek Stochastics Lambda [Corfu]. 28/08/19.
O'Bayes 2019 [University of Warwick]. 29/06/19.

Miscellaneous

Cycling from Manchester to London for Ambitious for Autism
Thesis: Monte Carlo methods for combining sample approximations of distributions; Examined by Professor Nicolas Chopin and Dr. Krzysztof Łatuszyński.
Advent of Code 2022 with Julia
R package to implement Fusion methodologies (for unifying distributed analyses): DCFusion
R package to simulate layered Brownian bridges: layeredBB
Advent of Code 2021 with R
Masters Thesis: Bayesian Sports Modelling
Undergraduate project: Automatic Puzzle Solving
Fundraising for FareShare: Cycling challenge