You are here:   Blog
Register   |  Login

 

Mar 4

Written by: Michael Washington
3/4/2017 2:09 PM  RssIcon

image

Notice: A few times, In testing I have encountered intermittent errors. This Stack Overflow post details the issues I found.

You can use Azure Functions to convert PDF files to PNG files.

image

You can put a .pdf file in Microsoft Azure Storage (this can be done programmatically or using the Microsoft Azure Storage Explorer)…

image

… then enter a message in Azure Queue Storage with the name of the file to process.

image

The PDF file will be split into individual .png files.

Create The Azure Function

image

Go to:

https://azure.microsoft.com/en-us/services/functions/

Log into your Azure account (or create a new one).

image

Select a Subscription, enter a Name for your function, select a Region close to you, and click the Create + get started button.

image

Click the New Function button.

image

Select the C# Language, the Data Processing Scenario, then the QueueTrigger-CSharp template.

image

Enter a Name for the function.

Set the queue name (leave the default).

Select an existing Storage account connection or create a new one.

Click the Create button.

image

The function edit screen will show.

Click the expand buttons on the Logs and File View sections to show them.

image

Click the Add button to add a new file.

image

Name the file project.json and press the Enter key.

image

Enter the following code and press the Save button:

 

{
  "frameworks": {
    "net46": {
      "dependencies": {
        "Ghostscript.NET": "1.2.1"
      }
    }
  }
}

 

This will instruct Azure Functions to load the GhostScript.Net Nuget package.

GhostScript is still required and we will add that in a later step.

image

Click on the run.csx file, enter the following code, and click the Save button:

 

#r "Microsoft.WindowsAzure.Storage"
#r "System.Drawing"
#r "System.Web"
#r "System.Configuration"
using System.Drawing;
using System.Drawing.Imaging;
using System;
using System.Configuration;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Blob;
using System.Net;
using Ghostscript.NET;
using Ghostscript.NET.Rasterizer;
public static void Run(string myQueueItem, TraceWriter log)
{
    log.Info($"C# Queue trigger function processed. Value passed: {myQueueItem}");
    string uploadFileName = myQueueItem;
    log.Info($"Get AzureWebJobsStorage and cloudUploadedFilesContainerName Settings");
    string _storageConnectionString = ConfigurationManager.AppSettings["AzureWebJobsStorage"];
    string _cloudUploadedFilesContainerName = "pdf-conversion-input";
    log.Info($"Get CloudStorageAccount");
    var _cloudStorageAccount = CloudStorageAccount.Parse(_storageConnectionString);
    log.Info($"Get CloudBlobClient");
    var _cloudBlobClient = _cloudStorageAccount.CreateCloudBlobClient();
    log.Info($"Get ContainerReference");
    var _cloudUploadedFilesBlobContainer =
        _cloudBlobClient.GetContainerReference(_cloudUploadedFilesContainerName);
    log.Info($"Get sourceBlockBlob");
    CloudBlockBlob sourceBlockBlob =
        _cloudUploadedFilesBlobContainer.GetBlockBlobReference(uploadFileName);
    if (!sourceBlockBlob.Exists())
    {
        log.Info($"sourceBlockBlob does not exist.");
        return;
    }
    // _pdfConversionFilePath is the path the temp folder is in 
    string _pdfConversionFilePath = @"D:\home\data\Functions\sampledata";
    log.Info($"_pdfConversionFilePath: {_pdfConversionFilePath}");
    string filePath = string.Format(@"{0}\{1}", _pdfConversionFilePath, uploadFileName);
    log.Info($"filePath: {filePath}");
    sourceBlockBlob.DownloadToFile(filePath, FileMode.Create);
    // The png file will have the same name as the PDF file but with a png extension
    string fileName = Path.GetFileNameWithoutExtension(filePath);
    log.Info($"fileName: {fileName}");
    string pngFilePath = string.Format(@"{0}\{1}.png", _pdfConversionFilePath, fileName);
    log.Info($"pngFilePath: {pngFilePath}");
    // Delete png file if already exists
    log.Info($"File.Exists(pngFilePath): {File.Exists(pngFilePath)}");
    if (File.Exists(pngFilePath))
    {
        File.Delete(pngFilePath);
    }
    // Use Ghostscript to convert from pdf to png
    log.Info($"Get cloudConvertedFilesContainerName Settings");
    string _cloudConvertedFilesContainerName = "pdf-conversion-output";
    log.Info($"GetContainerReference");
    var _cloudConvertedFilesContainer =
        _cloudBlobClient.GetContainerReference(_cloudConvertedFilesContainerName);
    int desired_x_dpi = 96;
    int desired_y_dpi = 96;
    string inputPdfPath = filePath;
    string outputPath = Path.GetDirectoryName(pngFilePath);
    GhostscriptVersionInfo gvi =
        new GhostscriptVersionInfo(@"D:\home\data\Functions\packages\nuget\ghostscript.net\1.2.1\lib\net40\gsdll32.dll");
    log.Info($"using (GhostscriptRasterizer _rasterizer = new GhostscriptRasterizer())");
    using (GhostscriptRasterizer _rasterizer = new GhostscriptRasterizer())
    {
        log.Info($"_rasterizer.Open: {inputPdfPath}");
        _rasterizer.Open(inputPdfPath, gvi, true);
        for (int pageNumber = 1; pageNumber <= _rasterizer.PageCount; pageNumber++)
        {
            string pageFilePath = Path.Combine(outputPath, fileName + "-Page-" + pageNumber.ToString() + ".png");
            log.Info($"pageFilePath: {pageFilePath}");
            log.Info($"_rasterizer.GetPage: {pageNumber}");
            Image img = _rasterizer.GetPage(desired_x_dpi, desired_y_dpi, pageNumber);
            log.Info($"img.Save: {pageFilePath}");
            img.Save(pageFilePath, ImageFormat.Png);
            string pngfileName = Path.GetFileName(pageFilePath);
            log.Info($"pngfileName: {pngfileName}");
            log.Info($"GetBlockBlobReference");
            CloudBlockBlob targetBlockBlob = _cloudConvertedFilesContainer.GetBlockBlobReference(pngfileName);
            log.Info($"UploadFromStream: {pageFilePath}");
            using (FileStream fileStream = File.OpenRead(pageFilePath))
            {
                targetBlockBlob.UploadFromStream(fileStream);
            }
            log.Info($"File.Delete png file: {pageFilePath}");
            File.Delete(pageFilePath);
        }
    }
    log.Info($"File.Delete PDF file: {filePath}");
    File.Delete(filePath);
    log.Info($"Completed processing {uploadFileName}");
}

 

Adding GhostScript

 

image

Download the GhostScript 32 bit assembly from this link.

After you run the installer, the assembly that you need will be at the following location:

  • C:\Program Files (x86)\gs\gs9.20\bin\gsdll32.dll

 

image

Return to the Azure portal, select the Function app (AzureFunction-App), and click Function app settings.

image

Click the Go to Kudu button.

image

In Kudu, select Debug console then CMD.

image

Click on the data folder to navigate to it.

image

Then click on Functions.

image

Next, click on packages

image

nuget

image

ghostscript.net

image

1.2.1

image

lib

image

Finally, net40.

Now drag and drop the gsdll32.dll file onto the web page in your web browser.

The file will upload to the directory.

Connect To Azure Storage

image

Download and install the Microsoft Azure Storage Explorer.

image

After you install it, open the Microsoft Azure Storage Explorer.

image

Log in using your Azure account.

image

Navigate to the Azure storage account you specified when you created the function (if you forgot which one it is, you can see the AzureWebJobsStorage connection string in Configure app settings under Function app settings).

Expand the tree to show all the nodes.

Right-click on the Blob Containers node and select Create Blob Container.

image

Create:

  • pdf-conversion-input
  • pdf-conversion-output

image

Select the pdf-conversion-input folder and use the Upload button to upload a PDF file.

image

Right-click on the Queues node and select Create Queue.

image

Create a queue called myqueue-items.

image

Select the myqueue-items, click the Add button, and add a message to the queue with the name of the PDF file that was uploaded to the pdf-conversion-input folder.

Note, you must use the exact casing for the file name.

image

The PDF will be read, and a PNG file will be created for each page of the PDF file.

The files will show up in the pdf-conversion-output folder.

image

You can monitor the function and diagnose any errors using the Logs in the Azure portal.

You can watch the files being processed in Kudu by navigating to:

D:\home\data\Functions\sampledata

 

GhostScript Errors

image

If you get the following error:

Exception while executing function: Functions.PDFtoPNG. mscorlib: Exception has been thrown by the target of an invocation. Ghostscript.NET: Ghostscript native library could not be found.

You need to make sure you have added gsdll32.dll to the correct directory, also that you have added the 32 bit version not the 64 bit version of the file.

If you get other errors, see this Stack Overflow post.

 

Links

Converting PDFs to Multipage Tiff files Using Azure WebJobs

Azure Functions

Microsoft Azure Storage Explorer

Microsoft Visual Studio is a registered trademark of Microsoft Corporation / LightSwitch is a registered trademark of Microsoft Corporation