- Data Professor
- Posts
- How to Build a Call Center Speech Analytics Workflow in Python
How to Build a Call Center Speech Analytics Workflow in Python

Have you ever wondered how call centers analyze their recorded conversations? Additionally, managing and extracting meaningful insights from vast amounts of audio data can be a daunting task. The challenge lies in accurately transcribing conversations, identifying speakers, analyzing sentiments, and visualizing the data effectively.
Fortunately, advancements in artificial intelligence provide robust solutions to streamline this process. By leveraging tools like AssemblyAI, you can build a comprehensive speech analytics workflow tailored to your call center needs.
In this blog, I will walk you through the process of creating a call center speech analytics tool. You will learn how to:
- Transcribe audio recordings
- Identify and map different speakers
- Perform sentiment analysis
- Visualize the analyzed data
Getting Started
Before diving into building the workflow, ensure you have an AssemblyAI API token and sign up with this link to get $50 free credit.
After signed in, you’ll see the dashboard that provides you with a copy-pastable code snippet (see below) to get you started in under a minute. Briefly, you can paste the code in a Colab notebook and run the code.

Next, fire up the Colab notebook that you can retrieve from my GitHub repo at https://github.com/dataprofessor/assemblyai and download 4_call_center_analytics.ipynb
From here on out, we’ll go over the code contained within the notebook.
Setting Up the Environment
Let's start by installing the Python library for AssemblyAI.
pip install assemblyai
Import libraries:
import assemblyai as aai
import json
import time
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display, Audio, Markdown, HTML
import os
Load AssemblyAI API Token
First, let's load in the API token.
from google.colab import userdata
aai_key = userdata.get('AAI_KEY')
Assign the API token to the AssemblyAI SDK.
aai.settings.api_key = aai_key
Instantiate the Transcriber
Let's instantiate the transcriber function so that we can transcribe the text from audio.
transcriber = aai.Transcriber()
Audio Selection
You can use a sample URL or upload your own audio file (just uncomment Option 2). Here, we provide options for both.
# Option 1: Use a sample call center audio from a URL
audio_input = "https://github.com/dataprofessor/assemblyai/raw/refs/heads/master/call-center-recording.wav"
# Option 2: Use a local file (uncomment and update path)
# audio_input = "./call-center-recording.wav"
# Hear the audio
display(Audio(audio_input))
This should return an audio widget that will allow you to listen to the audio.

Process the Call Recording
Let's transcribe the audio and specify the transcription configuration through the AssemblyAI SDK.
config = aai.TranscriptionConfig(speaker_labels=True,
sentiment_analysis=True
)
Let's instantiate the Transcriber()
function and apply it to transcribe text.
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_input, config)
Let’s take a look at the audio duration of the transcript:
transcript.audio_duration
And this returns a result of 88
as in 88 seconds.
Now, let’s see how many words there are in our transcript:
len(transcript.words)
And for this example, we have 253
corresponding to 253 words.
Speaker Identification
Process the transcript with speaker labels:
transcript.utterances
This returns the following output:
[Utterance(text='Thank you for calling Electrigo Motors. This is Sarah speaking. How may I assist you today?',
start=160,
end=4900,
confidence=0.95094377,
speaker='B',
channel=None,
words=[UtteranceWord(text='Thank', start=160, end=288, confidence=0.99897, speaker='B', channel=None), UtteranceWord(text='you', start=288, end=368, confidence=0.99981, speaker='B', channel=None), UtteranceWord(text='for', start=368, end=488, confidence=0.99988, speaker='B', channel=None), UtteranceWord(text='calling', start=496, end=696, confidence=0.9999, speaker='B', channel=None), UtteranceWord(text='Electrigo', start=728, end=1384, confidence=0.58494, speaker='B', channel=None), UtteranceWord(text='Motors.', start=1432, end=1832, confidence=0.9692, speaker='B', channel=None), UtteranceWord(text='This', start=1896, end=2072, confidence=0.99933, speaker='B', channel=None), UtteranceWord(text='is', start=2096, end=2232, confidence=0.99988, speaker='B', channel=None),
...
...
...
UtteranceWord(text='now.', start=87118, end=87190, confidence=0.99989, speaker='A', channel=None)])]
You’ll notice that the output starts with the first speaker Sarah and the full sentences of her dialogue text='Thank you for calling Electrigo Motors. This is Sarah speaking. How may I assist you today?'
) and then it is followed by a breakdown of each word along with the associated start
and end
timestamp.
Let’s now iterate through the transcript.utterances
and extract the speaker label utt.speaker
) and the associated text utt.text
).
text_with_speaker_labels = ""
for utt in transcript.utterances:
text_with_speaker_labels += f"Speaker {utt.speaker}: {utt.text}\n"
Finally, we’re displaying out the results of the above iteration by printing the text_with_speaker_labels
variable.
print(text_with_speaker_labels)
This yields the following output:
Speaker B: Thank you for calling Electrigo Motors. This is Sarah speaking. How may I assist you today?
Speaker A: Hi, Sarah, this is Michael Johnson. I just wanted to call and say how much I'm loving my new Electre Go Pulse. The driving experience is incredible.
Speaker B: That's wonderful to hear, Mr. Johnson. I'm thrilled you're enjoying your pulse. Is there anything specific about the vehicle that's really impressed you?
Speaker A: The acceleration is amazing, and I'm getting even better range than advertised. Plus, the app makes charging so convenient, I can schedule everything from my phone.
Speaker B: That's fantastic feedback. We're really proud of both the performance and the app integration. Have you had a chance to try the new remote climate control feature?
Speaker A: Yes. Being able to warm up the car while it's still plugged in on cold mornings has been a game changer. Oh, and the customer service has been excellent, too. The technician who helped set up my home charger was so helpful.
Speaker B: I'm delighted to hear that, Mr. Johnson. We really strive to provide exceptional service throughout the ownership experience. Is there anything else I can help you with today?
Speaker A: No, that's it. Just wanted to share some positive feedback. You guys have really converted me to electric vehicles for life.
Speaker B: Thank you so much for taking the time to share that with us. It truly means a lot. Please don't hesitate to reach out if you need anything in the future. Have a wonderful day. Enjoying your Electrigo Pulse.
Speaker A: You too. Bye now.
Note that the entirety of the dialogue is now printed along with the speaker labels as Speaker A
and Speaker B
.
Infer and Count the Number of Unique Speakers
Count the unique speakers, then create a LemurQuestion
for each speaker. Lastly, ask LeMUR the questions, specifying text_with_speaker_labels
as the input_text
.
unique_speakers = set(utterance.speaker for utterance in transcript.utterances)
questions = []
for speaker in unique_speakers:
questions.append(
aai.LemurQuestion(
question=f"Who is speaker {speaker}?",
answer_format="<First Name> <Last Name (if applicable)>")
)
result = aai.Lemur().question(
questions,
input_text=text_with_speaker_labels,
final_model=aai.LemurModel.claude3_5_sonnet,
context="Your task is to infer the speaker's name from the speaker-labelled transcript"
)
Let’s now see the generated response stored in result.response
:
result.response
You should see the following:
[LemurQuestionAnswer(question='Who is speaker B?', answer='Sarah'),
LemurQuestionAnswer(question='Who is speaker A?', answer='Michael Johnson')]
Map Speaker Labels in Transcript
Here, we're ...
Identifying speakers
By default, you've seen that different speakers were identified and assigned generic speaker labels of A and B.
Here, we're asking the LeMUR LLM model to identify who is the speaker.
Simply put, LLM helps us figure out the speaker names based on their mention in the transcript.
Mapping speaker labels in the transcript
Speakers A and B labels were replaced with the identified speakers through mapping.
A = Michael Johnson
andB = Sarah
import re
speaker_mapping = {}
for qa_response in result.response:
pattern = r"Who is speaker (\w)\?"
match = re.search(pattern, qa_response.question)
if match and match.group(1) not in speaker_mapping.keys():
speaker_mapping.update({match.group(1): qa_response.answer})
Let’s now see the mapped speaker labels:
speaker_mapping
which yields the following:
{'B': 'Sarah', 'A': 'Michael Johnson'}
Let’s now print the transcript with speaker names.
for utterance in transcript.utterances:
speaker_name = speaker_mapping[utterance.speaker]
print(f"{speaker_name}: {utterance.text}...")
which generates the following:
Sarah: Thank you for calling Electrigo Motors. This is Sarah speaking. How may I assist you today?...
Michael Johnson: Hi, Sarah, this is Michael Johnson. I just wanted to call and say how much I'm loving my new Electre Go Pulse. The driving experience is incredible....
Sarah: That's wonderful to hear, Mr. Johnson. I'm thrilled you're enjoying your pulse. Is there anything specific about the vehicle that's really impressed you?...
Michael Johnson: The acceleration is amazing, and I'm getting even better range than advertised. Plus, the app makes charging so convenient, I can schedule everything from my phone....
Sarah: That's fantastic feedback. We're really proud of both the performance and the app integration. Have you had a chance to try the new remote climate control feature?...
Michael Johnson: Yes. Being able to warm up the car while it's still plugged in on cold mornings has been a game changer. Oh, and the customer service has been excellent, too. The technician who helped set up my home charger was so helpful....
Sarah: I'm delighted to hear that, Mr. Johnson. We really strive to provide exceptional service throughout the ownership experience. Is there anything else I can help you with today?...
Michael Johnson: No, that's it. Just wanted to share some positive feedback. You guys have really converted me to electric vehicles for life....
Sarah: Thank you so much for taking the time to share that with us. It truly means a lot. Please don't hesitate to reach out if you need anything in the future. Have a wonderful day. Enjoying your Electrigo Pulse....
Michael Johnson: You too. Bye now....
Previously, we've just printed out the transcript with mapped speakers.
Next, we're aggregating the transcript as a list so that we can save the mapped speakers to variable (dialogue_list
).
dialogue_list = []
for utterance in transcript.utterances:
speaker_name = speaker_mapping[utterance.speaker]
dialogue_list.append(f"{speaker_name}: {utterance.text}")
dialogue_list
Entities Visualization
Named entities in text can be visualized using the displacy()
function from the spacy
library.
First, we'll prepare the text by joining the dialogue into a string (the original data type is a list and is not compatible with the displacy
function.
text = '\n'.join(dialogue_list)
Let’s have a look at the returned value of the text
variable.
text
which yields the following:
Sarah: Thank you for calling Electrigo Motors. This is Sarah speaking. How may I assist you today?\nMichael Johnson: Hi, Sarah, this is Michael Johnson. I just wanted to call and say how much I'm loving my new Electre Go Pulse. The driving experience is incredible.\nSarah: That's wonderful to hear, Mr. Johnson. I'm thrilled you're enjoying your pulse. Is there anything specific about the vehicle that's really impressed you?\nMichael Johnson: The acceleration is amazing, and I'm getting even better range than advertised. Plus, the app makes charging so convenient, I can schedule everything from my phone.\nSarah: That's fantastic feedback. We're really proud of both the performance and the app integration. Have you had a chance to try the new remote climate control feature?\nMichael Johnson: Yes. Being able to warm up the car while it's still plugged in on cold mornings has been a game changer. Oh, and the customer service has been excellent, too. The technician who helped set up my home
Next, we're visualizing the text in terms of the identified entities.
# Visualizing the entities
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
displacy.render(doc, style="ent", jupyter=True)
This should give us the following renedered output where entities are highlighted in colored boxes along with a label of the entity type (e.g. PERSON
, ORG
, DATE
, etc.):

Sentiment Analysis
Now, let's analyze the sentiment of the transcript and we can do that using the sentiment_analysis
method. To use it, you can append it to the transcript
variable to give us transcript.sentiment_analysis
:
transcript.sentiment_analysis
which gives us:
[Sentiment(text='Thank you for calling Electrigo Motors.', start=160, end=1832, confidence=0.9181462, speaker='B', channel=None, sentiment=<SentimentType.positive: 'POSITIVE'>),
Sentiment(text='This is Sarah speaking.', start=1896, end=3176, confidence=0.7743245, speaker='B', channel=None, sentiment=<SentimentType.neutral: 'NEUTRAL'>),
...
...
...
Sentiment(text='Bye now.', start=86790, end=87190, confidence=0.5288157, speaker='A', channel=None, sentiment=<SentimentType.neutral: 'NEUTRAL'>)]
Let's structure the data into a DataFrame so that we can subsequently use it for data visualization.
# Create a DataFrame of Speaker and Sentiment
data = []
index_value = 0 # Initialize an index counter
for sentiment in transcript.sentiment_analysis:
# speaker = sentiment.speaker
speaker = speaker_mapping[sentiment.speaker] # Applies our speaker mapping
sentiment_value = sentiment.sentiment.value
text = sentiment.text
data.append({'speaker': speaker, 'sentiment': sentiment_value, 'text': text, 'index': index_value})
index_value += 1 # Increment the index
df = pd.DataFrame(data)
df
which gives us the following DataFrame output:

Heatmap of Sentiment Analysis 1
Here, we'll count the occurrences of each speaker-sentiment combination
# Count the occurrences of each speaker-sentiment combination
import altair as alt
heatmap_data = df.groupby(['speaker', 'sentiment']).size().reset_index(name='count')
font_size = 14
# Create the base chart
base = alt.Chart(heatmap_data).encode(
x=alt.X('speaker', axis=alt.Axis(title='Speaker', titleFontSize=font_size, labelFontSize=font_size)),
y=alt.Y('sentiment', axis=alt.Axis(title='Sentiment', titleFontSize=font_size, labelFontSize=font_size))
)
# Create the heatmap rectangles
heatmap = base.mark_rect().encode(
color=alt.Color('count', title='Count', scale=alt.Scale(range='heatmap')),
tooltip=['speaker', 'sentiment', 'count']
)
# Add the text labels
text = base.mark_text(fontSize=font_size, fontWeight='bold').encode(
text=alt.Text('count'),
color=alt.condition(
alt.datum.count > heatmap_data['count'].max() / 2, # Adjust the threshold as needed
alt.value('white'),
alt.value('black')
)
)
# Combine the heatmap and text
chart = (heatmap + text).properties(
# title='Sentiment by Speaker',
width=300,
height=300
).interactive()
Once we have the structured data, we'll generate a heatmap showing the sentiment occurrence as a function of the speakers.
# Display the chart
chart

Heatmap of Sentiment Analysis 2
Let's now zoom into the individual sentences and see the sentiment for sequences of words as spoken in the transcript.
font_size = 12
# Define the color scale for sentiment
sentiment_colors = {
'POSITIVE': '#4CAF50', # Green
'NEUTRAL': '#9E9E9E', # Gray
'NEGATIVE': '#F44336' # Red
}
# Create the base chart
base = alt.Chart(df).encode(
x=alt.X('speaker:N', axis=alt.Axis(title='Speaker', titleFontSize=font_size, labelFontSize=font_size)),
y=alt.Y('index:O', axis=alt.Axis(title=None, labels=False)) # Use 'index' for Y-axis, hide labels
)
# Create the heatmap rectangles with black stroke (border)
heatmap = base.mark_rect(stroke='black').encode(
color=alt.Color('sentiment:N', scale=alt.Scale(domain=list(sentiment_colors.keys()), range=list(sentiment_colors.values())),
legend=alt.Legend(orient='bottom')), # Move legend to the bottom
tooltip=['speaker:N', 'sentiment:N', 'text:N']
).properties(
width=200, # Reduced width for the heatmap
height=df.shape[0] * 20 # Adjust height based on the number of rows
)
# Add the text column to the left of the chart and hide its y-axis
text_right = alt.Chart(df).mark_text(align='left', baseline='middle', dx=5).encode(
y=alt.Y('index:O', axis=None), # Remove y-axis from text
text=alt.Text('text:N'),
color=alt.value('black')
).properties(
width=10, # Adjust width for the text column
height=df.shape[0] * 20 # Ensure consistent height
)
# Combine the heatmap and the text
chart = alt.concat(
heatmap,
text_right
).properties(
# title='Call Center Data Visualization',
).configure_axis(
labelFontSize=font_size,
titleFontSize=font_size
).configure_view(
strokeOpacity=0
#strokeWidth=1, # Add a border to the entire view
#stroke='black' # Make the border black
).interactive()
chart

In a nutshell, the plot shows a simple 2-column heatmap corresponding to the speakers Michael Johnson and Sarah the call center agent. And on the same row the corresponding dialogue are shown. Finally, the sentiment is shown via the color-highlight where positive sentiment is shown in green, neutral in gray and negative in red.
Note here that only the positive and neutral sentiments are shown.
If you’d like to follow along, there’s a video companion to this article:
Conclusion
In this blog, we covered how to build a call center speech analytics workflow using Python and AssemblyAI. You learned how to:
- Transcribe audio recordings with speaker identification
- Map generic speaker labels to actual names
- Perform sentiment analysis on the transcribed text
- Visualize the sentiment data using heatmaps
By following this workflow, you can efficiently analyze call center conversations, gain insights into customer sentiments, and improve overall service quality. This setup can be further adapted and customized to fit various use cases, such as analyzing meeting recordings or customer service interactions in different industries.
References
Here are additional resources to dive deeper into topics that are mentioned in this blog.
Videos:
- AssemblyAI tutorial video playlist
Articles:
- 🔑 Sign up to get free AssemblyAI API token
- 🙂 Sentiment analysis
- 👥 Speaker identification
Documentation:
- 📚 AssemblyAI Documentation
- 📊 Altair User Guide