Katherine's Blog

Hope this is interesting to someone, somewhere.

My GitHub chart: GitHub Chart

Hello beautiful people! Thanks for being interested in finding out about me :).

I consider myself as a future poet, next-generation Oscar Wilde (or Noël Coward maybe) and comedian with healing power and so on (well, we all have dreams right?). But somehow, I am currently doing Ph.D. research in the area of statistical/machine learning and econometrics.


This website is mainly a tech blog, so it’s going to be all about my research (referring to some keywords I did not expect when I was a kid :P.

I gained my master’s degree in Econ/Accounting/Fin area several years ago. I developed great interests in quantitative research and felt that the courses in Business School were not enough. So I spend two years in School of Mathmatics and Statistics, finishing 95% of all the undergraduate and master’s modules in advanced statistics. That’s when I felt properly ready for the journey in data science and programming research.

These two subjects overlap a lot in terms of learning the complex systems, such as financial market, option pricing, or macroeconomic. The key challenge in any complex system learning is problem identification , which comes down to various types of model building and causality inference work. I am particularly experienced in building theoretical stochastic models and evaluating econometric causalities. Years of school has equipped me to design natural experiments and observational study by applying time series, stochastic, probabilistic based models, etc.

Interests in Statistics

As a statistical researcher, I’m into Bayesian learning and variable selection in classification issues. I used to explore the re-parameterized Bayesian model, such as spike & slab prior. These unusual model comes with good mathematical properties but difficult to evaluate analytically. To conquer the computational limitation, I used MCMC with the Metropolis-Hastings process to recover the asymptotic probability model. In this process, I began to develop interests in high-performance computation.

Interests in Econometrics

As an economic researcher, my specialty lies in quantitative financial economics research. This branch tries to study macroeconomic questions with microeconomic data points. I am currently undertaking the study of evaluating macroeconomic policies and their impacts on the corporate finance market with medium to big size data. Although this branch of econ study derived from a strong emphasis on theoretical edge, my research goal is trying to help build a better world by understanding the real world impacts.

Interests in bottom-up OS configuration.

My daily research workflow has a high requirement for the working environment. Academic work involves several things at the same time: Latex writing, multi-language programming running, syntax highlighting, embedded slides making. So my workflow is built on Emacs, which requires certain knowledge of Emacs-lisp hacking and functional programming. A highly customized operating system with some specialized hardware has been my friend all this time.


Publication as Data Scientist

I participate in the Alan Turing Institute Data Study Group to gain experience in different sub-areas in machine learning.

(1) Linear classification in social science Collaborating with UK Cabinet Office, we utilize data to provide a more detailed picture of an individual’s engagement with extremism to identify both the risk factors of radicalization and potential points for intervention, with a paper published in a private channel.

(2) Counterfactual modeling in AI ethics Collaborating with Accenture AI specialists, we propose quantifying AI fairness with 5 different proxies and implement a set of algorithms to detect the potential biases of the current dataset to eliminate the disparate impact of unfair data, which is shown as a well-performed framework to help AI-related companies to improve their work to maintain fairness during data-based decision-making process under the upcoming GDPR, with research paper published publicly.

(3) Data fusion in cybersecurity Collaborating with Imperial College, Los Alamos National Laboratory, and The Heilbronn Institute for Mathematical Research to develop data science tools for improving enterprise cyber-security. My contribution is to apply unsupervised learning in unusual change detection in the ‘blend-in attack’ of information fusion research.

(4) Cybernetics in trajectory prediction Collaborating with National Air Traffic Services, we developed a series of machine learning/deep learning methods to predict air traffic trajectories (high-frequency time series data). My role is to help understand the linear dynamic control theory and Kalman filter.

comments powered by Disqus