Multipage PDF to JPEG Image Conversion in Python

There is an easy-to-use wonderful library called pdf2image which converts PDF files into JPEG or PNG images as requested. It works quite fine, but it generates multiple files for multi-page PDF files and I needed a single image containing all files in a PDF file, merged vertically, so pdf2image does not satisfy all my needs.

Then I found pillow to manipulate/process image files, from there I can merge multiple image files together. Here it goes:

Requirements

Installation requires two python modules and some extra libraries to process PDF files.

To install pillow:

pip install pillow

To install pdf2image:

pip install pdf2image

pdf2image requires poppler, the repo itself has the installation instructions. Since I’m a macOS user, I simply ran:

brew install poppler

Everything is ready.

The code

Below code piece is written with Python 3.7. It first converts pdf to multiple images and then merges them together. Run it with the source pdf file path and the output folder path you want JPEG to be exported.

import os
import tempfile
from pdf2image import convert_from_path
from PIL import Image


def convert_pdf(file_path, output_path):
    # save temp image files in temp dir, delete them after we are finished
    with tempfile.TemporaryDirectory() as temp_dir:

        # convert pdf to multiple image
        images = convert_from_path(file_path, output_folder=temp_dir)

        # save images to temporary directory
        temp_images = []
        for i in range(len(images)):
            image_path = f'{temp_dir}/{i}.jpg'
            images[i].save(image_path, 'JPEG')
            temp_images.append(image_path)

        # read images into pillow.Image
        imgs = list(map(Image.open, temp_images))

    # find minimum width of images
    min_img_width = min(i.width for i in imgs)

    # find total height of all images
    total_height = 0
    for i, img in enumerate(imgs):
        total_height += imgs[i].height

    # create new image object with width and total height
    merged_image = Image.new(imgs[0].mode, (min_img_width, total_height))

    # paste images together one by one
    y = 0
    for img in imgs:
        merged_image.paste(img, (0, y))
        y += img.height

    # save merged image
    merged_image.save(output_path)

    return output_path