Multipage PDF to JPEG Image Conversion in Python
There is an easy-to-use wonderful library called pdf2image which converts PDF files into JPEG or PNG images as
requested. It works quite fine, but it generates multiple files for multi-page PDF files and I needed a single image
containing all files in a PDF file, merged vertically, so pdf2image
does not satisfy all my needs.
Then I found pillow to manipulate/process image files, from there I can merge multiple image files together. Here it goes:
Requirements
Installation requires two python modules and some extra libraries to process PDF files.
To install pillow:
pip install pillow
To install pdf2image:
pip install pdf2image
pdf2image
requires poppler
, the repo itself has the installation instructions. Since I’m a macOS user, I simply ran:
brew install poppler
Everything is ready.
The code
Below code piece is written with Python 3.7
. It first converts pdf to multiple images and then merges them together.
Run it with the source pdf file path and the output folder path you want JPEG to be exported.
import os
import tempfile
from pdf2image import convert_from_path
from PIL import Image
def convert_pdf(file_path, output_path):
# save temp image files in temp dir, delete them after we are finished
with tempfile.TemporaryDirectory() as temp_dir:
# convert pdf to multiple image
images = convert_from_path(file_path, output_folder=temp_dir)
# save images to temporary directory
temp_images = []
for i in range(len(images)):
image_path = f'{temp_dir}/{i}.jpg'
images[i].save(image_path, 'JPEG')
temp_images.append(image_path)
# read images into pillow.Image
imgs = list(map(Image.open, temp_images))
# find minimum width of images
min_img_width = min(i.width for i in imgs)
# find total height of all images
total_height = 0
for i, img in enumerate(imgs):
total_height += imgs[i].height
# create new image object with width and total height
merged_image = Image.new(imgs[0].mode, (min_img_width, total_height))
# paste images together one by one
y = 0
for img in imgs:
merged_image.paste(img, (0, y))
y += img.height
# save merged image
merged_image.save(output_path)
return output_path