Forensic Acoustics Subcommittee
Special Session on Forensic Acoustics
- ICA 2013: 21st International Congress on Acoustics
- 165th Meeting of the Acoustical Society of America
- 52nd Meeting of the Canadian Acoustical Association
Montréal, Québec, 2–7 June 2013
last update: 26 July 2013
Distinguishing between science and pseudoscience in forensic acoustics
- This special session will accept submissions for presentations on all aspects of forensic acoustics.
- It will also include presentations on distinguishing between science and pseudoscience in forensic acoustics.
- Abstracts should be submitted via the ASA website.
- Select SC - Speech Communication as Technical Committee
- Select Distinguishing Between Science and Pseudoscience in Forensic Acoustics as Special Session
- Select 43.72.Uv as PACS Code
- Deadline for submission of abstract: 15 November 2012
- Acceptance notices sent out: 1 December 2012
also see Itinerary Planner
Monday 3 June 2013, room 515abc
- Chair's introduction
- Distinguishing Between Science and Pseudoscience in Forensic Acoustics
1aSCa1, 9:00–9:20 am
- Geoffrey Stewart Morrison
Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales, Sydney, NSW, Australia
Refereed-journal version to be published as:
Distinguishing between forensic science and forensic pseudoscience: Testing of validity and reliability, and approaches to forensic voice comparison. Science & Justice. doi:10.1016/j.scijus.2013.07.004
In this presentation I argue that one should not attempt to directly assess whether a forensic analysis technique is scientifically acceptable. Rather one should first specify what one considers to be appropriate scientific principles governing acceptable practice, then consider any particular approach in light of those principles. I focus on one principle: The validity and reliability of an approach should be empirically tested under conditions reflecting those of the case under investigation using test data taken from the relevant population. Versions of this principle have been key elements in several reports on forensic science, including forensic voice comparison, published over the last four-and-a-half decades. I consider the aural-spectrographic approach to forensic voice comparison (also known as “voiceprint” or “voicegram” examination) in light of this principle, and also the currently widely practised auditory-acoustic-phonetic approach (these two approaches do not appear to be mutually exclusive). Finally, I challenge the audience members to consider what each of them thinks constitutes the relevant scientific principles regarding acceptable practice, and then consider their own approach to forensic-acoustic analysis in light of those principles.
Dr Morrison is Director of the Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales; Invited Lecturer, Judicial Phonetics Specialization, Master of Phonetics and Phonology Program, Spanish National Research Council / Menéndez Pelayo International University; and Adjunct Associate Professor, Department of Linguistics, University of Alberta. He is a member of the Canadian Acoustical Association and of the Canadian Society of Forensic Science, and is Chair of the Acoustical Society of America's Forensic Acoustics Subcommittee. He is a Subject Editor for the journal Speech Communication, with responsibility for papers on forensic speech science. His publications include “Measuring the validity and reliability of forensic likelihood-ratio systems” Science & Justice (2011) and “The likelihood-ratio framework and forensic evidence in court: A response to R v T” International Journal of Evidence and Proof (2012). He was also lead author on a 22-signatory response to the 2012 Draft Australian Standard on Forensic Analysis – Interpretation.
- A Canadian Perspective on Forensic Science versus Pseudoscience
1aSCa2, 9:20–9:40 am
- Brent Ostrum
Senior Scientific Advisor, Science & Engineering Directorate, Canada Border Services Agency, Ottawa, ON, Canada
This presentation will provide my personal observations regarding Forensic Science versus Pseudoscience in the Canadian legal system. I am neither a lawyer nor a judge; rather, I am a forensic scientist with over 25 years of experience in the Canadian system. My presentation focuses on relevant criteria for expert evidence considered in Canadian courts. The key ruling in R v Mohan (1994) provides the start of the discussion with subsequent court rulings adding various elements. In Canada, we have had several judicial inquiries, such as the Kaufmann Commission, that can serve to guide experts. Select aspects of the 2009 NAS report “Strengthening Forensic Science in the United States: A Path Forward” will also be referenced. There are some common ‘criteria’ often used by courts in different jurisdictions to assess expert evidence, including Forensic Acoustics. In other words, some basic expectations for all forms of expert evidence can be identified. I will attempt to show how select ‘sciences’ have tried to fulfill those expectations. This will involve some commentary on issues of individual examiner competency, oversight at a system level (eg. accreditation), and the need for proper and adequate method validation.
Mr Ostrum is a Senior Scientific Advisor in the Canada Border Services Agency’s Science and Engineering Directorate. He has been employed as a forensic scientist for over twenty-five years, and has worked for both the Royal Canadian Mounted Police and the Canada Border Services Agency. Mr Ostrum has given testimony as an expert witness in a number of Canadian jurisdictions including British Columbia, Alberta, Saskatchewan, Manitoba and Ontario. He is a student of logic, statistical inference and evidence evaluation as well as expert testimony in the Canadian legal system (particularly from a practitioner’s point-of-view but applying equally to all forms of expert evidence). Mr Ostrum is the sitting chairman of Document Section in the Canadian Society of Forensic Science. He serves as a member of the Executive Council for St2ar (Skill-Task Training Assessment & Research, Inc); an international non-profit organization that provides training and skill-task testing, as well as support for research in the forensic sciences. Mr Ostrum is also very active in standards and methods development for both forensic document examination and facial identification. Mr Ostrum has lectured on many topics including evidence evaluation, admissibility, and proficiency testing and competency.
- Voice Stress Analyses: Science and Pseudoscience
1aSCa3, 10:00–10:20 am
- Francisco Lacerda
Professor in Phonetics, Department of Linguistics, Stockholm University, Sweden
Voice stress analyses could be relevant tools to detect deception in many forensic and security contexts. However, today’s commercial voice-based lie-detectors are not supported by convincing scientific evidence. In addition to the scientific implausibility of their working principles, the experimental evidence invoked by the sellers is either anecdotal or drawn from methodologically flawed experiments. Nevertheless, criminal investigators, authorities and even some academics appear to be persuaded by the ungrounded claims of the aggressive propaganda from sellers of voice stress analysis gadgets, perhaps further enhanced by the portrays of “cutting-edge voice-analysis technology” in the entertainment industry. Clearly, because there is a serious threat to public justice and security if authorities adopt a naïve “open-minded” attitude towards sham lie-detection devices, this presentation will attempt to draw attention to plausibility and validity issues in connection with the claimed working principles of two commercial voice stress analysers. The working principles will be discussed from a phonetics and speech analysis perspective and the processes that may lead naïve observers into interpreting as meaningful the spurious results generated by such commercial devices will be examined. Finally, the scope and limitations of using scientific phonetic analyses of voice to detect deception for forensic purposes will be discussed.
Prof Lacerda is Head of the Department of Linguistics, Stockholm University. He is professor in Phonetics and has a degree in Electrical Engineering, Telecommunications, and Electronics. He was co-author with Anders Eriksson of “Charlatanry in forensic speech science: A problem to be taken seriously” International Journal of Speech, Language and the Law (2007), a paper which the publisher, Equinox, withdrew from its website when a manufacturer of a “voice stress analyzer”, Nemesysco, threatened to sue them for liable in the courts of England & Wales (legal action was not actually instigated). Prof Lacerda later gave evidence before a committee of the Parliament of the United Kingdom which was considering reform of the liable laws in England & Wales. Prof Lacerda has written numerous articles and given numerous presentations, including a 2012 open lecture at the Royal Swedish Academy of Sciences entitled “Les Liaisons Dangereuses: Is finance research flirting with pseudoscience?”
- Assessing Acoustic Features in the Speech of Asylum Seekers
1aSCa4, 10:20–10:40 am
- Judith Rosenhouse
Linguistics Unit, Swantech Ltd, Haifa, Israel
One of the areas of forensic linguistics concerns asylum seekers who speak languages which are foreign to the official language of the country where they apply for asylum. Identifying and verifying their real national background may be difficult if their speech manner reveals non-typical properties of their (real or alleged) native languages. Governments submit such asylum seekers’ speech samples for linguistic analysis on various levels, including phonetic acoustics. This aspect of forensic linguistics raises questions about the scientific merit of such an analysis. Our aim is to examine some of the questions which relate to segmental and supra-segmental features that are analyzed acoustically based on recorded samples of asylum seekers’ (alleged) native language and compared with the same features as known from the literature. We demonstrate such issues by examples from the speech of Arabic-speaking asylum seekers whose native tongue is (supposed to be) some local dialect but the recording includes various foreign features reflecting different dialects or languages. These questions involve sociolinguistic factors that affect individual speakers’ speech production due to a complex and unstable life-history. We suggest that the acoustic methods currently used in speech analysis in this context could be considered pseudo-science in many cases.
Prof Rosenhouse’s research interests include phonetics, Arabic dialectology, sociolinguistics, bilingualism, multilingualism, and forensic linguistics. She has authored more than 130 papers and 10 books. She was winner of a New Israel Foundation Prize (1989), an International Society of Phonetic Sciences Prize (2004), and has been a recipient of several stipends from the Alexander von Humboldt Foundation. Prof Rosenhouse retired from the Technion – Israel Institute of Technology in 2005, and now works for Swantech Ltd.
- Analysis Criteria for Forensic Musicology
1aSCa5, 10:40–11:00 am
- Durand R Begault, H D Heise, Christopher A Peltier
Audio Forensic Center, Charles M Salter Associates Inc, San Francisco, CA, USA
Expert testimony for forensic musicology addresses a broad spectrum of legal issues, including the authentication and differentiation of published compositions and musical recordings, performance rights, and legal determinations regarding copyright infringement. While legal cases involving music and performance infringement date back as far as the 19th century, the field of forensic musicology has no stated methodology by which an objective forensic determination can be made. Expert opinions based merely on subjective impression or resulting from the “golden ear” syndrome are pseudo-scientific and not objectively based. This paper proposes scientific methods and recommendations for analysis based on stated criteria, with the goal of controlling examiner bias. Considerations include analyses of composition, performance, and acoustical features, and factors such as melody, harmony, rhythm, and orchestration; pitch, tone, vibrato, and embellishment; metadata analysis; recording technologies; and digital signal processing, including “effects.” By engaging in a series of structured categorizations, the forensic expert can establish a consistent, replicable, and objectively verifiable means of determining whether or not a recorded piece of music has been misappropriated.
Dr Begault has a PhD (1987) from University of California San Diego, where he worked on music theory, psychoacoustics, digital signal processing, computer audio, acoustic engineering, and speech sciences. His Forensic work at the Audio Forensic Center, Charles M Salter Associates Inc, San Francisco includes: authentication of audio recordings; analysis of the audibility of speech, alarms, and other sounds; enhancement of speech from noisy recordings; voice identification and elimination; analysis of ear witness testimony; analysis of acoustic recordings of gunshots; patent/intellectual property analyses for audio technology; and music copyright infringement. He is also a Principal Investigator at the NASA Ames Research Center’s Spatial Auditory Display Laboratory, and is an Adjunct Professor in the Sound Recording Area, Department of Music Research, McGill University, Montréal.
- Panel Discussion
- Lunch Break
11:30 am 1:30 pm
- Mismatched Distances from Speakers to Telephone in a Forensic-Voice-Comparison Case
1pSCc1, 1:00–1:20 pm
- Ewald Enzinger
Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales, Sydney, NSW, Australia
In a forensic-voice-comparison (FVC) case, one speaker (A) was talking on a mobile telephone, and another (B) was standing a short distance away. Later, B moved closer to the telephone. Shortly thereafter, there was a section of speech where the identity of the speaker was disputed. All material for training an FVC-system could be extracted from this single recording, but there was a near-far mismatch: Training data for A were near, training data for B were far, and the disputed speech was near. We describe a procedure for addressing the degree of validity and reliability of an FVC system under such conditions, prior to it being applied to the casework recording: Sections of recordings of pairs of speakers of known identity are used to train an A and a B model; multiple other sections from each of the A and B recordings are used as test data; a likelihood ratio is calculated for each test section; and system validity and reliability are assessed. Prior to training and testing, the A and B recordings were played through loudspeakers and rerecorded via a mobile-telephone network, B was rerecorded twice, once with the loudspeaker near and once with it far from the telephone.
- Colleen Kavanagh
Audio & Video Analysis Unit, Royal Canadian Mounted Police, Ottawa, ON, Canada
The speaker-specificity of five acoustic features of British English /m/ was explored from a forensic speaker comparison perspective. Normalised duration, centre of gravity (COG), standard deviation (SD), and frequencies at peak and minimum amplitudes were measured for 30 adult male Standard Southern British English and Leeds English speakers. Spectral measurements were made in each of five frequency bands (0–0.5 kHz, 0.5–1 kHz, 1–2 kHz, 2–3 kHz, and 3–4 kHz) and calculated from a 40-ms window at the midpoint of each token. ANOVAs showed Speaker to be a highly significant factor for all variables. Discriminant analysis (DA) and likelihood ratio (LR) estimation assessed speaker discrimination with individual predictors and combinations thereof. Sample sizes limited the number of predictors in DA to eight; F-ratios were used to select the best predictors for analysis. The COG+SD (bands 1, 3, 4, 5) and Best 8 F-ratios (COG bands 1, 4, 5 + SD 1, 3, 4 + Peak 1, 4) tests achieved 53% and 49% correct classification respectively. The Best 8 F-ratios and COG+SD tests also produced the best LR results, while COG+Peak performed similarly. DA and LR results for all predictor combinations will be presented and the most promising speaker comparison parameters highlighted.
- Examining Long-term Formant Distributions as a Discriminant in Forensic Speaker Comparisons under a Likelihood Ratio Framework
1pSCc3, 1:40–2:00 pm
- Erica Gold,1 Philip Harrison,2 Peter French2
1Department of Language and Linguistic Science, The University of York, York, United Kingdom
2J P French Associates, York, United Kingdom
This study investigates the use of long-term formant distributions (LTFD) as a discriminant in forensic speaker comparisons. LTFD are the distributions calculated for all values of each formant for a speaker in a single recording. Spontaneous speech recordings from 100 male speakers of Southern Standard British English, aged 18–25 were analyzed from the DyViS Database (Nolan 2009). The recordings were auto-segmented to obtain a minimum of 50 seconds of vowels per speaker. The iCAbS (iterative cepstral analysis by synthesis) formant tracker was used to automatically extract and measure F1–F4 every 5 msec. To assess the evidential value of the LTFDs, likelihood ratios (LRs) were computed using a Matlab implementation of Aitken and Lucy’s (2004) Multivariate Kernel-Density formula (Morrison 2007). It was found that LTFD performs well overall, but much better with different speaker comparisons than same speaker comparisons (97.76 % compared to 78% of comparisons providing correct support; Cllr = .9072 and EER = 5.47%). LTFD appears to be a good discriminant to include in forensic speaker comparison analyses and offers the added attraction of avoiding potential correlation problems between vowel phonemes.
- Establishing Typicality: A Closer Look at Individual Formants
1pSCc4, 2:00–2:20 pm
- Vincent Hughes
Department of Language and Linguistic Science, The University of York, York, United Kingdom
Research into the forensic performance of individual formants has offered considerable evidence to support the traditional acoustic-phonetic view that whilst F1 and F2 encode broad phonetic contrast, higher formants may offer greater speaker-discriminatory potential (Peterson 1959, Ladefoged 2006, Clermont and Mokhtari 1998, Rose 2002). However, the comparative performance of formants has largely been assessed using posterior assessments of speaker-specificity (McDougall 2004, 2006; Clermont et al 2008). Using quadratic polynomial fittings of F1 to F3 from spontaneous tokens of /ai/ extracted from all 100 speakers in the DyVis database (Nolan et al 2009), this paper discusses issues relating to p(H|E)-based voice comparison analysis (particularly the use of discriminant analysis, DA). Further, DA performance is compared with an analysis based on likelihood ratios (LRs). LRs based on F3 are found to only marginally outperform F1 and F2 with regard to the magnitude of same-speaker and different-speaker strength of evidence, as well system performance metrics (EER and Cllr). The poorer than expected F3 LRs are assessed with regard to the distributions of values within- and between-speakers for the best performing F3 coefficient: the constant. The data go some way to establishing F3 population statistics which may potentially be applied to voice comparison casework.
- A Likelihood-ratio-based Forensic Voice Comparison Using Formant Trajectories of Thai Diphthongs
1pSCc5, 2:40–3:00 pm
- Supawan Pingjai, Shunichi Ishihara, Paul J Sidwell
School of Culture, History & Language, Australian National University, Canberra, ACT, Australia
This study investigates the phonetic-acoustic properties of the three Thai diphthongs /i:a, ɨ:a, u:a/ within the context of forensic voice comparison. The likelihood-ratio approach is applied to the parameterized formant trajectories of each diphthong in order to evaluate their respective discriminatory power. The aim of this study is to assess to what extent such properties can be used to distinguish, in a probabilistic sense, two or more speech samples. Formant trajectories were fitted using both polynomial interpolation and the discrete cosine transform. Likelihood ratio values were derived by the multivariate kernel density (MVKD) estimation approach proposed by Aitken and Lucy (2004) and then calibrated by using the Log-Likelihood Ratio Cost function -Cllr (Brummer 2005) and the 95%-Credible Interval (Morrison et al 2010). We have finished gathering all speech data for this study, and are currently processing the data using various computational tools. References: Aitken, C. G. G. and Lucy, D. “Evaluation of Trace Evidence in the Form of Multivariate Data,” App. Stat., Vol. 54, 2004, pp.109–122. Brümmer, N., FoCal Toolkit [software], 2005. Available: http://www.dsp.sun.ac.za/~nbrummer/focal/. Morrison G.S., Thiruvaran T., and Epps J. (2010 ) “Estimating the likelihood-ratio output of a forensic-voice-comparison system,” The Speaker and Language Recognition Workshop, Brno.
- Fusion of Multiple Formant-Trajectory- and Fundamental-Frequency-Based Forensic-Voice-Comparison Systems: Chinese /ei1/, /ai2/, and /iau1/
1aSCa6, 3:00–3:20 pm
- Culing Zhang,1,2 Ewald Enzinger2
1Department of Forensic Science & Technology, China Criminal Police University, Shenyang, Liaoning, China
2Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales, Sydney, NSW, Australia
This study investigates the fusion of multiple formant-trajectory- and fundamental-frequency-trajectory-based (f0-trajectory-based) forensic-voice-comparison systems. Each system was based on tokens of a single phoneme: tokens of Chinese /ei1/, /ai2/, and /iau1/ (numbers indicate tones). Human-supervised formant-trajectory and f0-trajectory measurements were made on tokens from a database of recordings of 60 female speakers of Chinese. Discrete cosine transforms (DCT) were fitted to the trajectories and the DCT coefficients used to calculate likelihood ratios via the multivariate kernel density (MVKD) formula. The individual-phoneme systems were fused with each other and with a baseline mel-frequency cepstral-coefficient (MFCC) Gaussian-mixture-model universal-background-model (GMM-UBM). The latter made use of the entire speech-active portion of the recordings. Tests were conducted using high-quality recordings as nominal suspect samples and mobile-to-landline transmitted recordings as nominal offender samples. Fusion of the phoneme-systems with the baseline system via logistic regression did not lead to any substantial improvement in validity and reliability deteriorated.
- Panel Discussion
- Organizational Meeting of ASA Forensic Acoustics Subcommittee
- The meeting agenda and draft reports will be distributed to members via the members-only forum.
- This meeting is only open to members (non-members may attend if specifically invited).