- Data Professor
- Posts
- How to Paraphrase Text using Python
How to Paraphrase Text using Python
A step-by-step tutorial on the use of AI for Content Creation
As writers, we often seek out tools to help us become more efficient or productive. Tools such as Grammarly can help with language editing. Text generation tools can help to rapidly generate original contents by just giving the AI a few keyword ideas to work with.
Perhaps this could help end writer’s block? This is a debatable question that is best saved for a later time.
Paraphrasing content is also another great way to take existing content (either from your own or from others) and add your own spin to it. Wouldn’t it be great if we could paraphrase text automatically?
In this article, you will learn how to paraphrase text for FREE in Python using the PARROT library. Particularly, under the hood PARROT’s paraphrasing technology is based on the T5 algorithm (an acronym for Text-To-Text Transfer Transformer) that was originally developed by Google (for more information refer to the T5 resource at Papers with Code). At a high-level, text generation is niche area of the exciting area of natural language processing (NLP), which is generally referred to as artificial intelligence or AI when explained to the general audience.
It should be noted that an accompanying YouTube video (How to paraphrase text in Python using the PARROT library (Ft. Ken Jee)) to this article is shown below.
1. Launching a Google Colab Notebook
We’re going to perform the text paraphrasing on the cloud using Google Colab, which is an online version of the Jupyter notebook that allows you to run Python code on the cloud. If you’re new to Google Colab, you will want to brush up on the basics in the Introductory notebook.
Log into your Gmail account, then go to Google Colab.
Launch the tutorial notebook by first heading over to
File > Open Notebook
and then click on theUpload
tab (far right).Type
dataprofessor/parrot
into the search boxClick on the
parrot.ipynb
file
Screenshot of loading the PARROT tutorial notebook.
2. Installing the PARROT Library
The PARROT library can be installed via pip by typing the following into the code cell:
! pip install git+https://github.com/PrithivirajDamodaran/Parrot.git
Library installation should take a short moment.
Screenshot showing the installation of the PARROT Python library.
3. Importing the Libraries
Here, we’re going to import 3 Python libraries consisting of parrot
, torch
and warnings
. You can go ahead and type the following (or copy and paste) into a code cell then run it either by pressing the CTRL + Enter
buttons (Windows and Linux) or the CMD + Enter
buttons (Mac OSX). Alternatively, the code cell can also be run by clicking on the play button found to the left of the code cell.
from parrot import Parrot
import torch
import warnings
warnings.filterwarnings("ignore")
Screenshot of the play button that allows the code cell to be run.
The parrot
library contains the pre-trained text paraphrasing model that we will use to perform the paraphrasing task.
Under the hood, the pre-trained text paraphrasing model was created using PyTorch (torch
) and thus we’re importing it here in order to run the model. This model is called parrot_paraphraser_on_T5
and is listed on the Hugging Face website. It should be noted that Hugging Face is the company that develops the transformer
library which hosts the parrot_paraphraser_on_T5
model.
As the code implies, warnings that appears will be ignored via the warnings
library.
4. Reproducibility of the Text Paraphrasing
In order to allow reproducibility of the text paraphrasing, the random seed number will be set. What this does is produce the same results for the same seed number (even if it is re-run multiple times).
To set the random seed number for reproducibility, enter the following code block into the code cell:
def random_state(seed):
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
random_state(1234)
5. Load the Text Paraphrasing Model
We will now load and initialize the PARROT model by entering the following into a code cell and run the cell.
parrot = Parrot(model_tag="prithivida/parrot_paraphraser_on_T5", use_gpu=False)
The models will be loaded as shown below:
Screenshot of initialized model.
6. Input Text
The input text for this example, which is What’s the most delicious papayas?
, will be assigned to the phrases
variable, which we will be using in just a moment.
To find out the answer to that make sure to watch the accompanying YouTube video (How to paraphrase text in Python using the PARROT library (Ft. Ken Jee)).
phrases = ["What's the most delicious papayas?"]
7. Generating the Paraphrased Text
Now, to the fun part of generating the paraphrased text using the PARROT T5 model.
7.1. The Code
Enter the following code block into the code cell and run the cell.
for phrase in phrases:
print("-"*100)
print("Input_phrase: ", phrase)
print("-"*100)
para_phrases = parrot.augment(input_phrase=phrase)
for para_phrase in para_phrases:
print(para_phrase)
7.2. Line-by-line Explanation
Here, we’ll be using a for
loop to iterate through all the sentences in the phrases
variable (in the example above we assigned only a single sentence or a single phrase to this variable).
For each phrase
in the phrases
variable:
Print out the
-
character for 100 times.Print
"Input phrase: "
followed by the returned output of thephrase
that is being iterated.Print out the
-
character for 100 times.Perform the paraphrasing using the
parrot.augment()
function that takes in as input argument thephrase
being iterated. Generated paraphrases are assigned to thepara_phrases
variable.Perform a nested
for
loop on thepara_phrases
variable:
— Print the returned output of the paraphrases from thepara_phrases
variable that have been generated iteratively (the 4 paraphrased text that we will soon see in the next section).
7.3. Code Output
This code block generates the following output:
Here, we can see that PARROT produces 4 paraphrased text and you can choose any of these for further usage.
8. What’s Next?
Congratulations, you can successfully produced paraphrased text using AI!
In case you’re interested in taking this a step or two further.
Here are some project ideas that you can try out and build to expand your own portfolio of projects. Speaking of portfolios, you can learn how to build a portfolio website for free from this recent article that I wrote:
Project Idea 1
Create a Colab/Jupyter notebook that expands on this example (which generates paraphrased text for a single input phrase) by making a version that can take in multiple phrases as input. For example, we can assign a paragraph consisting of a couple of phrases to an input variable, which is then used by the code to generate paraphrased text. Then for the returned outputs of each phrase, randomly select a single output to represent each of the phrase (i.e. each input phrase will correspondingly have 1 paraphrased text). Combine the paraphrased phrases together into a new paragraph. Compare the original paragraph and the new paraphrased paragraph.
Project Idea 2
Expand on Project Idea 1 by making it into a web app using Streamlit (Also check out the Streamlit Tutorial Playlist) or PyWebIO. Particularly, the web app would take as input a paragraph of phrases and applies the code to generate paraphrased text and return them as output in the main panel of the web app.
I’d love to see some of your creations and so please feel free to post them in the comment section. Happy creation!
Credit
Code used in this article was adapted from the example provided by Parrot creator Prithiviraj Damodaran.
Created (with license) using the image by alexdndz from envato elements.