Veera Vignesh Kandasamy and Anup Bera, Accenture Solutions India Pvt Ltd, India
With the increased use of human-machine interaction via voice enabled smart devices over the years, there are growing demands for better accuracy of the speech analytics systems. Several studies show that speech analytics system exhibits bias towards speaker demographics, such age, gender, race, accent etc. To avoid such a bias, speaker demographic information can be used to prepare training dataset for the speech analytics model. Also, speaker demographic information can be used for targeted advertisement, recommendation, and forensic science. In this research we will demonstrate some algorithms for age and gender prediction from speech data with our custom dataset that covers speakers from around the world with varying accents. In order to extract speaker age and gender from speech data, we’ve also included a method for determining the appropriate length of audio file to be ingested into the system, which will reduce computational time. This study also identifies the most effective padding and cropping mechanism for obtaining the best results from the input audio file. We investigated the impact of various parameters on the performance and end-to-end implementation of a real-time speaker age and gender information extraction system. Our best model has a RMSE value of 4.1 for age prediction and 99.5% for gender prediction on custom test dataset.
Age and Gender prediction, Data Bias, Speech Analytics, CNN, LSTM, Wav2Vec.