How to Convert a PDF File to an Audio MP3 File Using Python

How to Convert a PDF File to an Audio MP3 File Using Python


Published at - Nov 25, 2021

Reading pdf is a big task, so I have decided to convert your pdf file to audio so you can enjoy your favorite ebook without reading it, by just wearing the headphones.

So, to convert the pdf file to audio we have to understand the process which I am going to use to perform the following operation.

Applications

We have many applications of this script some of them are given below.

  1. Create Audiobooks

  2. Storyteller

Process

It is a two-step process to convert pdf to audio, in the first step we have to read the python file and get the text data from it using pypdf and once we have text then we can convert it into the audio using pyttsx3 python package and save it into an audio file.

Package installation

We are going to use two python packages in this tutorial

PyPDF2 — to read the pdf file and convert it into the text

PyPDF2 is a pure-python PDF toolkit originating from the pyPdf project. It is currently maintained by Phaseit, Inc. PyPDF2 can extract data from PDF files, or manipulate existing PDFs to produce a new file. PyPDF2 is compatible with Python versions 2.6, 2.7, and 3.2–3.5.

If PyPDF2 has been useful to you and you would like to help fund the continued development of the library, please consider donating either through Dwolla or the PayPal button below.

pyttsx3 — convert text to audio file

pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline and is compatible with both Python 2 and 3.

Install using pip

pip install PyPDF2
pip install pyttsx3

Import packages

To use the package we have to import them first at the very start of our application.

import pyttsx3
import PyPDF2

Initialize PDF reader

To read the pdf file, it is required to initialize and define the pdf file source to read it by using the following code.

pdfreader = PyPDF2.PdfFileReader(open('story.pdf','rb'))

Initialize Speaker

To convert text to speech we have to define speaker form pyttsx3 by using the below code.

speaker = pyttsx3.init()

Read PDF file pages and convert to audio

Now, we will loop through the app pages of the pdf file and convert them into text and then convert them into the speech.

<iframe src="https://medium.com/media/a05ed68fdf4b1117ac66f56a353f5e39" frameborder=0></iframe>

Done!, we have successfully converted our pdf file to an mp3 audio file. Now, look at the full code.

<iframe src="https://medium.com/media/3364311298c9c16346cbc7ccd23a7ec5" frameborder=0></iframe>

Summary

In these tutorials, we have learned about

  • Read PDF files into text format

  • Convert text to speech and save them into an audio file

I hope you will love to read this tutorial, you can follow me to read more tutorials from me in the future. Thank you for reading.

By becoming a Medium member, you can support me and your other favorite authors. Thanks! 👇 Join Medium with my referral link - Harendra Verma As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…medium.com

More content at plainenglish.io. Sign up for our free weekly newsletter here.





About author

Harendra
Harendra Kanojiya

Hello, I am Harendra Kumar Kanojiya - Owner of this website and a Fullstack web developer. I have expertise in full-stack web development using Angular, PHP, Node JS, Python, Laravel, Codeigniter and, Other web technologies. I also love to write blogs on the latest web technology to keep me and others updated. Thank you for reading the articles.



Follow Us

Follow us on facebook Click Here

Facebook QR
Scan from mobile
Join our telegram channel Click Here
Telegram QR
Scan from mobile