Katherine's Blog

Hope this is interesting to someone, somewhere.

My GitHub chart: GitHub Chart

Hello beautiful people! Thanks for being interested in finding out about me :).

I am a future poet, next generation Oscar Wilde (or Noël Coward maybe) and comedian with healing power and so on (well, we all have dreams right?). But somehow, I am currently doing Ph.D. research in the area of statistical/machine learning and econometrics.

This is mainly a tech blog, so it going to be all about my research (referring to some keywords I did not expect when I was a kid):

I gained my master’s degree in Econ/Accounting/Fin area several years ago. I developed great interests in quantatitive research and felt that the courses in Business School were not enough. So I spend two years in School of Mathmatics and Statistics, finishing 95% of all the undergraduate and master’s modules and graduated with A4 grade! :) That’s when I feel truly ready for the journey in data science and programming research.

These two subjects overlap a lot in learning complex systems, such as financial market, option pricing, or macroeconomic. The key challenge in any complex system learning is problem identification , which comes down to various types of causality inference and optimization work. I am particularly experienced in building theoretical statistical models and evaluating empirical econometric studies. Years of school has equipped me to design natural experiments and observational study with applying time series, stochastic, probabilistic based models, etc.

Interests in Statistics

As a statistical researcher, I’m into Bayesian learning and variable selection in classification issues. I used to explore the re-parameterized Bayesian model, such as spike & slab prior. These unusual model comes with good mathematical properties but difficult to evaluate analytically. To conquer the computational limitation, I used MCMC with the Metropolis-Hastings process to recover the asymptotic probability model. In this process, I began to develop interests in high-performance computation.

Interests in Econometrics

As an economic researcher, my speciality lies in quantitative corporate finance study (a sub-area of financial economics). Based on this interest, I am currently undertaking the study of evaluation algorithms to help make risk-controlled and efficient investment decisions, in medium to big size data. Following this, I hope to establish financial indicators/systems to detect and, most importantly, predict investment/risk alert signals in daily information processing activities, which potentially can be beneficial to government sectors, the central bank, corporate governors, researchers, and other stakeholders.

Interests in bottom-up OS configuration.

My daily research workflow has a high requirement for working environment. An academic work involves several things at the same time: Latex writing, multi-language programming running, syntax highlighting, embedded slides making. So my workflow is built on Emacs, which requires certain knowledge of Emacs-lisp hacking and functional programming. A highly customerized operating system with some specialized hardware has been my friend all this time.

Publication as Data Scientist

I participate in the Alan Turing Institute Data Study Group to gain experience in different sub-areas in machine learning.

(1) Linear classification in social science Collaborating with UK Cabinet Office, we utilize data to provide a more detailed picture of an individual’s engagement with extremism to identify both the risk factors of radicalization and potential points for intervention, with a paper published in private channel.

(2) Counterfactual modeling in AI ethics Collaborating with Accenture AI specialists, we propose quantifying AI fairness with 5 different proxies and implement a set of algorithms to detect the potential biases of current dataset to eliminate the disparate impact of unfair data, which is shown as a well-performed framework to help AI-related companies to improve their work to maintain fairness during data-based decision-making process under the upcoming GDPR, with research paper published publicly.

(3) Data fusion in cybersecurity Collaborating with Imperial College, Los Alamos National Laboratory, and The Heilbronn Institute for Mathematical Research to develop data science tools for improving enterprise cyber-security. My contribution is to apply unsupervised learning in unusual change detection in the ‘blend-in attack’ of information fusion research.

(4) Cybernetics in trajectory prediction Collaborating with National Air Traffic Services, we developed a series of machine learning/deep learning methods to predict air traffic trajectories (high-frequency time series data). My role is to help understand the linear dynamic control theory and Kalman filter.

comments powered by Disqus