• Data Professor
  • Posts
  • How to Build a Call Center Speech Analytics Workflow in Python

How to Build a Call Center Speech Analytics Workflow in Python

Have you ever wondered how call centers analyze their recorded conversations? Additionally, managing and extracting meaningful insights from vast amounts of audio data can be a daunting task. The challenge lies in accurately transcribing conversations, identifying speakers, analyzing sentiments, and visualizing the data effectively.

Fortunately, advancements in artificial intelligence provide robust solutions to streamline this process. By leveraging tools like AssemblyAI, you can build a comprehensive speech analytics workflow tailored to your call center needs.

In this blog, I will walk you through the process of creating a call center speech analytics tool. You will learn how to:
- Transcribe audio recordings
- Identify and map different speakers
- Perform sentiment analysis
- Visualize the analyzed data

Getting Started

Before diving into building the workflow, ensure you have an AssemblyAI API token and sign up with this link to get $50 free credit.

After signed in, you’ll see the dashboard that provides you with a copy-pastable code snippet (see below) to get you started in under a minute. Briefly, you can paste the code in a Colab notebook and run the code.

Next, fire up the Colab notebook that you can retrieve from my GitHub repo at https://github.com/dataprofessor/assemblyai and download 4_call_center_analytics.ipynb

From here on out, we’ll go over the code contained within the notebook.

Setting Up the Environment

Let's start by installing the Python library for AssemblyAI.

pip install assemblyai

Import libraries:

import assemblyai as aai
import json
import time
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display, Audio, Markdown, HTML
import os

Load AssemblyAI API Token

First, let's load in the API token.

from google.colab import userdata
aai_key = userdata.get('AAI_KEY')

Assign the API token to the AssemblyAI SDK.

aai.settings.api_key = aai_key

Instantiate the Transcriber

Let's instantiate the transcriber function so that we can transcribe the text from audio.

transcriber = aai.Transcriber()

Audio Selection

You can use a sample URL or upload your own audio file (just uncomment Option 2). Here, we provide options for both.

# Option 1: Use a sample call center audio from a URL
audio_input = "https://github.com/dataprofessor/assemblyai/raw/refs/heads/master/call-center-recording.wav"

# Option 2: Use a local file (uncomment and update path)
# audio_input = "./call-center-recording.wav"
# Hear the audio
display(Audio(audio_input))

This should return an audio widget that will allow you to listen to the audio.

Process the Call Recording

Let's transcribe the audio and specify the transcription configuration through the AssemblyAI SDK.

config = aai.TranscriptionConfig(speaker_labels=True,
                                 sentiment_analysis=True
                                 )

Let's instantiate the Transcriber() function and apply it to transcribe text.

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_input, config)

Let’s take a look at the audio duration of the transcript:

transcript.audio_duration

And this returns a result of 88 as in 88 seconds.

Now, let’s see how many words there are in our transcript:

len(transcript.words)

And for this example, we have 253 corresponding to 253 words.

Speaker Identification

Process the transcript with speaker labels:

transcript.utterances

This returns the following output:

[Utterance(text='Thank you for calling Electrigo Motors. This is Sarah speaking. How may I assist you today?', 
start=160, 
end=4900, 
confidence=0.95094377, 
speaker='B', 
channel=None, 
words=[UtteranceWord(text='Thank', start=160, end=288, confidence=0.99897, speaker='B', channel=None), UtteranceWord(text='you', start=288, end=368, confidence=0.99981, speaker='B', channel=None), UtteranceWord(text='for', start=368, end=488, confidence=0.99988, speaker='B', channel=None), UtteranceWord(text='calling', start=496, end=696, confidence=0.9999, speaker='B', channel=None), UtteranceWord(text='Electrigo', start=728, end=1384, confidence=0.58494, speaker='B', channel=None), UtteranceWord(text='Motors.', start=1432, end=1832, confidence=0.9692, speaker='B', channel=None), UtteranceWord(text='This', start=1896, end=2072, confidence=0.99933, speaker='B', channel=None), UtteranceWord(text='is', start=2096, end=2232, confidence=0.99988, speaker='B', channel=None),
...
...
...
UtteranceWord(text='now.', start=87118, end=87190, confidence=0.99989, speaker='A', channel=None)])]

You’ll notice that the output starts with the first speaker Sarah and the full sentences of her dialogue text='Thank you for calling Electrigo Motors. This is Sarah speaking. How may I assist you today?') and then it is followed by a breakdown of each word along with the associated start and end timestamp.

Let’s now iterate through the transcript.utterances and extract the speaker label utt.speaker) and the associated text utt.text).

text_with_speaker_labels = ""

for utt in transcript.utterances:
    text_with_speaker_labels += f"Speaker {utt.speaker}: {utt.text}\n"

Finally, we’re displaying out the results of the above iteration by printing the text_with_speaker_labels variable.

print(text_with_speaker_labels)

This yields the following output:

Speaker B: Thank you for calling Electrigo Motors. This is Sarah speaking. How may I assist you today?
Speaker A: Hi, Sarah, this is Michael Johnson. I just wanted to call and say how much I'm loving my new Electre Go Pulse. The driving experience is incredible.
Speaker B: That's wonderful to hear, Mr. Johnson. I'm thrilled you're enjoying your pulse. Is there anything specific about the vehicle that's really impressed you?
Speaker A: The acceleration is amazing, and I'm getting even better range than advertised. Plus, the app makes charging so convenient, I can schedule everything from my phone.
Speaker B: That's fantastic feedback. We're really proud of both the performance and the app integration. Have you had a chance to try the new remote climate control feature?
Speaker A: Yes. Being able to warm up the car while it's still plugged in on cold mornings has been a game changer. Oh, and the customer service has been excellent, too. The technician who helped set up my home charger was so helpful.
Speaker B: I'm delighted to hear that, Mr. Johnson. We really strive to provide exceptional service throughout the ownership experience. Is there anything else I can help you with today?
Speaker A: No, that's it. Just wanted to share some positive feedback. You guys have really converted me to electric vehicles for life.
Speaker B: Thank you so much for taking the time to share that with us. It truly means a lot. Please don't hesitate to reach out if you need anything in the future. Have a wonderful day. Enjoying your Electrigo Pulse.
Speaker A: You too. Bye now.

Note that the entirety of the dialogue is now printed along with the speaker labels as Speaker A and Speaker B .

Infer and Count the Number of Unique Speakers

Count the unique speakers, then create a LemurQuestion for each speaker. Lastly, ask LeMUR the questions, specifying text_with_speaker_labels as the input_text.

unique_speakers = set(utterance.speaker for utterance in transcript.utterances)

questions = []
for speaker in unique_speakers:
    questions.append(
        aai.LemurQuestion(
        question=f"Who is speaker {speaker}?",
        answer_format="<First Name> <Last Name (if applicable)>")

    )

result = aai.Lemur().question(
    questions,
    input_text=text_with_speaker_labels,
    final_model=aai.LemurModel.claude3_5_sonnet,
    context="Your task is to infer the speaker's name from the speaker-labelled transcript"
)

Let’s now see the generated response stored in result.response:

result.response

You should see the following:

[LemurQuestionAnswer(question='Who is speaker B?', answer='Sarah'),
 LemurQuestionAnswer(question='Who is speaker A?', answer='Michael Johnson')]

Map Speaker Labels in Transcript

Here, we're ...

  1. Identifying speakers

    • By default, you've seen that different speakers were identified and assigned generic speaker labels of A and B.

    • Here, we're asking the LeMUR LLM model to identify who is the speaker.

    • Simply put, LLM helps us figure out the speaker names based on their mention in the transcript.

  2. Mapping speaker labels in the transcript

    • Speakers A and B labels were replaced with the identified speakers through mapping.

    • A = Michael Johnson and B = Sarah

import re

speaker_mapping = {}

for qa_response in result.response:
    pattern = r"Who is speaker (\w)\?"
    match = re.search(pattern, qa_response.question)
    if match and match.group(1) not in speaker_mapping.keys():
        speaker_mapping.update({match.group(1): qa_response.answer})

Let’s now see the mapped speaker labels:

speaker_mapping

which yields the following:

{'B': 'Sarah', 'A': 'Michael Johnson'}

Let’s now print the transcript with speaker names.

for utterance in transcript.utterances:
   speaker_name = speaker_mapping[utterance.speaker]
   print(f"{speaker_name}: {utterance.text}...")

which generates the following:

Sarah: Thank you for calling Electrigo Motors. This is Sarah speaking. How may I assist you today?...
Michael Johnson: Hi, Sarah, this is Michael Johnson. I just wanted to call and say how much I'm loving my new Electre Go Pulse. The driving experience is incredible....
Sarah: That's wonderful to hear, Mr. Johnson. I'm thrilled you're enjoying your pulse. Is there anything specific about the vehicle that's really impressed you?...
Michael Johnson: The acceleration is amazing, and I'm getting even better range than advertised. Plus, the app makes charging so convenient, I can schedule everything from my phone....
Sarah: That's fantastic feedback. We're really proud of both the performance and the app integration. Have you had a chance to try the new remote climate control feature?...
Michael Johnson: Yes. Being able to warm up the car while it's still plugged in on cold mornings has been a game changer. Oh, and the customer service has been excellent, too. The technician who helped set up my home charger was so helpful....
Sarah: I'm delighted to hear that, Mr. Johnson. We really strive to provide exceptional service throughout the ownership experience. Is there anything else I can help you with today?...
Michael Johnson: No, that's it. Just wanted to share some positive feedback. You guys have really converted me to electric vehicles for life....
Sarah: Thank you so much for taking the time to share that with us. It truly means a lot. Please don't hesitate to reach out if you need anything in the future. Have a wonderful day. Enjoying your Electrigo Pulse....
Michael Johnson: You too. Bye now....

Previously, we've just printed out the transcript with mapped speakers.

Next, we're aggregating the transcript as a list so that we can save the mapped speakers to variable (dialogue_list ).

dialogue_list = []

for utterance in transcript.utterances:
   speaker_name = speaker_mapping[utterance.speaker]
   dialogue_list.append(f"{speaker_name}: {utterance.text}")

dialogue_list

Entities Visualization

Named entities in text can be visualized using the displacy() function from the spacy library.

First, we'll prepare the text by joining the dialogue into a string (the original data type is a list and is not compatible with the displacy function.

text = '\n'.join(dialogue_list)

Let’s have a look at the returned value of the text variable.

text

which yields the following:

Sarah: Thank you for calling Electrigo Motors. This is Sarah speaking. How may I assist you today?\nMichael Johnson: Hi, Sarah, this is Michael Johnson. I just wanted to call and say how much I'm loving my new Electre Go Pulse. The driving experience is incredible.\nSarah: That's wonderful to hear, Mr. Johnson. I'm thrilled you're enjoying your pulse. Is there anything specific about the vehicle that's really impressed you?\nMichael Johnson: The acceleration is amazing, and I'm getting even better range than advertised. Plus, the app makes charging so convenient, I can schedule everything from my phone.\nSarah: That's fantastic feedback. We're really proud of both the performance and the app integration. Have you had a chance to try the new remote climate control feature?\nMichael Johnson: Yes. Being able to warm up the car while it's still plugged in on cold mornings has been a game changer. Oh, and the customer service has been excellent, too. The technician who helped set up my home

Next, we're visualizing the text in terms of the identified entities.

# Visualizing the entities
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
displacy.render(doc, style="ent", jupyter=True)

This should give us the following renedered output where entities are highlighted in colored boxes along with a label of the entity type (e.g. PERSON, ORG, DATE, etc.):

Sentiment Analysis

Now, let's analyze the sentiment of the transcript and we can do that using the sentiment_analysis method. To use it, you can append it to the transcript variable to give us transcript.sentiment_analysis:

transcript.sentiment_analysis

which gives us:

[Sentiment(text='Thank you for calling Electrigo Motors.', start=160, end=1832, confidence=0.9181462, speaker='B', channel=None, sentiment=<SentimentType.positive: 'POSITIVE'>),
 Sentiment(text='This is Sarah speaking.', start=1896, end=3176, confidence=0.7743245, speaker='B', channel=None, sentiment=<SentimentType.neutral: 'NEUTRAL'>),
...
...
...
Sentiment(text='Bye now.', start=86790, end=87190, confidence=0.5288157, speaker='A', channel=None, sentiment=<SentimentType.neutral: 'NEUTRAL'>)]

Let's structure the data into a DataFrame so that we can subsequently use it for data visualization.

# Create a DataFrame of Speaker and Sentiment
data = []
index_value = 0  # Initialize an index counter

for sentiment in transcript.sentiment_analysis:
    # speaker = sentiment.speaker
    speaker = speaker_mapping[sentiment.speaker]  # Applies our speaker mapping
    sentiment_value = sentiment.sentiment.value
    text = sentiment.text
    data.append({'speaker': speaker, 'sentiment': sentiment_value, 'text': text, 'index': index_value})
    index_value += 1  # Increment the index

df = pd.DataFrame(data)
df

which gives us the following DataFrame output:

Heatmap of Sentiment Analysis 1

Here, we'll count the occurrences of each speaker-sentiment combination

# Count the occurrences of each speaker-sentiment combination
import altair as alt

heatmap_data = df.groupby(['speaker', 'sentiment']).size().reset_index(name='count')

font_size = 14

# Create the base chart
base = alt.Chart(heatmap_data).encode(
    x=alt.X('speaker', axis=alt.Axis(title='Speaker', titleFontSize=font_size, labelFontSize=font_size)),
    y=alt.Y('sentiment', axis=alt.Axis(title='Sentiment', titleFontSize=font_size, labelFontSize=font_size))
)

# Create the heatmap rectangles
heatmap = base.mark_rect().encode(
    color=alt.Color('count', title='Count', scale=alt.Scale(range='heatmap')),
    tooltip=['speaker', 'sentiment', 'count']
)

# Add the text labels
text = base.mark_text(fontSize=font_size, fontWeight='bold').encode(
    text=alt.Text('count'),
    color=alt.condition(
        alt.datum.count > heatmap_data['count'].max() / 2,  # Adjust the threshold as needed
        alt.value('white'),
        alt.value('black')
    )
)

# Combine the heatmap and text
chart = (heatmap + text).properties(
    # title='Sentiment by Speaker',
    width=300,
    height=300
).interactive()

Once we have the structured data, we'll generate a heatmap showing the sentiment occurrence as a function of the speakers.

# Display the chart
chart

Heatmap of Sentiment Analysis 2

Let's now zoom into the individual sentences and see the sentiment for sequences of words as spoken in the transcript.

font_size = 12

# Define the color scale for sentiment
sentiment_colors = {
    'POSITIVE': '#4CAF50',  # Green
    'NEUTRAL': '#9E9E9E',   # Gray
    'NEGATIVE': '#F44336'    # Red
}

# Create the base chart
base = alt.Chart(df).encode(
    x=alt.X('speaker:N', axis=alt.Axis(title='Speaker', titleFontSize=font_size, labelFontSize=font_size)),
    y=alt.Y('index:O', axis=alt.Axis(title=None, labels=False))  # Use 'index' for Y-axis, hide labels
)

# Create the heatmap rectangles with black stroke (border)
heatmap = base.mark_rect(stroke='black').encode(
    color=alt.Color('sentiment:N', scale=alt.Scale(domain=list(sentiment_colors.keys()), range=list(sentiment_colors.values())),
                    legend=alt.Legend(orient='bottom')),  # Move legend to the bottom
    tooltip=['speaker:N', 'sentiment:N', 'text:N']
).properties(
    width=200,  # Reduced width for the heatmap
    height=df.shape[0] * 20  # Adjust height based on the number of rows
)

# Add the text column to the left of the chart and hide its y-axis
text_right = alt.Chart(df).mark_text(align='left', baseline='middle', dx=5).encode(
    y=alt.Y('index:O', axis=None),  # Remove y-axis from text
    text=alt.Text('text:N'),
    color=alt.value('black')
).properties(
    width=10,  # Adjust width for the text column
    height=df.shape[0] * 20  # Ensure consistent height
)

# Combine the heatmap and the text
chart = alt.concat(
    heatmap,
    text_right
).properties(
    # title='Call Center Data Visualization',
).configure_axis(
    labelFontSize=font_size,
    titleFontSize=font_size
).configure_view(
    strokeOpacity=0
    #strokeWidth=1,  # Add a border to the entire view
    #stroke='black'  # Make the border black
).interactive()

chart

In a nutshell, the plot shows a simple 2-column heatmap corresponding to the speakers Michael Johnson and Sarah the call center agent. And on the same row the corresponding dialogue are shown. Finally, the sentiment is shown via the color-highlight where positive sentiment is shown in green, neutral in gray and negative in red.

Note here that only the positive and neutral sentiments are shown.

If you’d like to follow along, there’s a video companion to this article:

Conclusion

In this blog, we covered how to build a call center speech analytics workflow using Python and AssemblyAI. You learned how to:
- Transcribe audio recordings with speaker identification
- Map generic speaker labels to actual names
- Perform sentiment analysis on the transcribed text
- Visualize the sentiment data using heatmaps

By following this workflow, you can efficiently analyze call center conversations, gain insights into customer sentiments, and improve overall service quality. This setup can be further adapted and customized to fit various use cases, such as analyzing meeting recordings or customer service interactions in different industries.

References

Here are additional resources to dive deeper into topics that are mentioned in this blog.