Interpretation of DNA evidence as a paradigm for speaker recognition
David Balding, UCL Genetics Institute, London
The speaker is an expert in the interpretation of forensic DNA profile evidence, with little experience of speaker recognition problems. In this talk, he will review many issues that have confronted DNA profile experts when presenting evidence in court: problems such as population structure and (possibly remote) relatedness of suspects, sources of uncertainty in the use databases for allele frequency estimation, and the stochastic nature of DNA profiles in the presence of low amounts of DNA. Although the calculation of likelihood ratios in many of these settings is problematic, the use of likelihood ratios to evaluate evidence has substantial support from academic researchers. It still meets resistance in practice, despite the serious flaws of alternative approaches. Even the academic supporters agree on the difficulty of trying to explain to judges and jurors how to correctly use likelihood ratios as a guide to rational thought, but on the basis of substantial courtroom experience the present speaker will argue that this is possible. Looking across to the speaker recognition problem, similarities and differences will be explored, and suggestions for research directions will be proposed.
David Balding was educated in his native Australia before coming to the UK to study for a PhD in mathematics at the University of Oxford. He then held a junior academic post at Oxford for a year before moving successively to Queen Mary London, the University of Reading, and Imperial College London, where he was Professor of Statistical Genetics from 2001 to 2009. Since October 2009, he has been Professor of Statistical Genetics at UCL, the first new, senior appointment in the UGI. David researches a wide range of mathematical and statistical problems in genetics – evolutionary, population and medical. He has also developed widely-adopted methods of analysis for the interpretation of forensic DNA profiles, summarized in his monograph Weight-of-Evidence for Forensic DNA Profiles (Wiley, 2005). On occasions, he acts as an expert witness for cases involving complex DNA profile evidence, and he is a member of the Independent Advisory Group of the UK Forensic Science Service. In population and evolutionary genetics, David has developed statistical methods for drawing inferences about the demographic history of populations from DNA data, and for identifying loci that appear to have been affected by recent, strong selection. Recently he has participated in the design and analysis of several genome-wide association studies, for disease and drug response phenotypes, and has developed novel statistical methods in this area, particularly dealing with the problem of confounding by population structure. Much of his statistical genetics work involves computer-intensive stochastic algorithms, and is usually within the Bayesian paradigm of statistical inference. David has been lead editor of Wiley’s successful Handbook of Statistical Genetics, which reached its 3rd edition in 2007, and is working on a new Handbook of Statistical Systems Biology. He has for many years been a Fellow of the Royal Statistical Society and was recently elected a Fellow of the Society of Biology. He is a member of the UK Medical Research Council’s Molecular and Cellular Medicine Board and the MRC/NIHR Methodology Research Panel.
Current developments in forensic speaker identification
Michael Jessen, Department of Speech and Audio Analysis (KT54), Bundeskriminalamt, Wiesbaden
The presentation will provide an overview of current tasks, methods and problems in forensic speaker identification. After giving brief introductions to speaker profiling and voice line-ups, special attention will be given to the forensic task of voice comparison. Most forensic labs and practitioners have approached this task using methods derived from phonetics and linguistics. Some of the corresponding speaker parameters that are commonly used in forensic speaker comparison, as well as the relevant research will be discussed. More recently, some labs have added automatic speaker recognition to their set of methods. Experience with this method as well as challenges deriving from limited duration and technical quality along with stylistic and technical mismatch will be illustrated.
Michael Jessen is a senior scientist at the Department of Speaker Identification and Audio Analysis (KT54) at the National Forensic Science Institute of Bundeskriminalamt, Germany and has regular experience as expert witness in forensic casework. He received an MA degree in linguistics from Universität Bielefeld and a PhD in linguistics from Cornell University, specialising in phonetics and phonology. In his research he is interested in the multitude of linguistic and acoustic factors that carry speaker-specific information. Michael Jessen is one of the editors of the International Journal of Speech, Language and the Law.
Bayesian Speaker Verification with Heavy-Tailed Priors
Patrick Kenny, Centre de recherche informatique de Montreal
We describe a new approach to speaker verification which, like Joint Factor Analysis, is based on a generative model of speaker and channel effects but differs from Joint Factor Analysis in several respects. Firstly, each utterance is represented by a low dimensional feature vector, rather than by a high dimensional set of Baum-Welch statistics. Secondly, heavy-tailed distributions are used in place of Gaussian distributions in formulating the model, so that the effect of outlying data is diminished, both in training the model and at recognition time. Thirdly, the likelihood ratio used for making verification decisions is calculated (using variational Bayes) in a way which is fully consistent with the modeling assumptions and the rules of probability. Finally, experimental results show that, in the case of telephone speech, these likelihood ratios do not need to be normalized in order to set a trial-independent threshold for verification decisions.
We report results on female speakers for several conditions in the NIST 2008 speaker recognition evaluation data, including microphone as well as telephone speech. As measured both by equal error rates and the minimum values of the NIST detection cost function, the results on telephone speech are about 30% better than we have achieved using Joint Factor Analysis.
Patrick Kenny received the BA degree in Mathematics from Trinity College, Dublin, and the MSc and PhD degrees, also in Mathematics, from McGill University. He was a professor of Electrical Engineering at INRS-Telecommunications in Montreal and is currently a principal research scientist in the speech group at the Centre de recherche informatique de Montreal (CRIM).
My interests are focused on the use of modern machine learning methods in speech processing. I believe that there is a promising future in our field for large scale Bayesian methods based on informative priors thanks to a happy confluence of circumstances. The signals that we have to deal with are of relatively low complexity (compared with, say, moving images) so that we can hope to construct reasonably realistic probabilistic models of speech. We are fortunate also that very large quantities of speech data have been collected, annotated, and made available to our community. This puts us in a position where we can use data driven methods to elicit informative priors for Bayesian inference with such models. And the fast approximate inference methods that have been developed in machine learning in recent years (Variational Bayes and Expectation Propagation) should enable us to embody these priors in real applications.
My work on Joint Factor Analysis is concerned with the problem of learning an informative prior on speaker and channel dependent Gaussian mixture models (the models used in the standard GMM/UBM approach to speaker recognition) from the Switchboard and Mixer speech corpora. I have published ten journal articles and numerous conference papers on this subject and its application to speaker recognition and speaker diarization. The underlying ideas appear to be applicable to a wide range of problems where good variability (such as speaker effects) needs to be distinguished from bad (such as channel effects) and they have stimulated research in related areas such as language recognition and speech recognition (subspace GMM's). A parallel method has emerged in face recognition (Probabilistic Principal Components Analysis) and other biometric authentication tasks involving noisy data seem to be amenable to similar approaches. My talk will be about a recent extension of this work, including a fully Bayesian treatment.