You are here:   Blog
Register   |  Login

 

Feb 26

Written by: Michael Washington
2/26/2017 5:17 PM  RssIcon

image

You can use Azure Webjobs to convert PDF files to multipage Tiff files.

image

This sample project allows you to put a .pdf file in Microsoft Azure Storage (this can be done programmatically or using the Microsoft Azure Storage Explorer).

image

Enter a message in Azure Queue Storage with the name of the file to process.

image

You can then go into the Azure Portal, select the Azure Web App, select WebJobs, select the WebJob, then select Logs

image

See that the WebJob was invoked and was successful.

We then click on the log entry to see the details.

image

Clicking Toggle Output on the detail page reveals the log entries created by the code.

image

When we look in the output folder in Azure Storage we see the converted multi-page Tiff file that we can download and open.

Using The Sample Code

image

You can download the project from the downloads page on this site.

When you open it up in Visual Studio 2015 (or higher) you will see that the Ghostscript assemblies are missing (the license does not allow them to be distributed).

image

Download the GhostScript 32 bit and 64 bit assemblies from this link.

(the version needed at runtime will change based on the server the code is running on).

After you run the installer, the assemblies that you need will be at the following locations:

  • C:\Program Files (x86)\gs\gs9.20\bin\gsdll32.dll
  • C:\Program Files\gs\gs9.20\bin\gsdll64.dll

image

Copy and paste the files into the root directory of the Visual Studio project.

Note that they are both set to “Copy always”.

This is what allows them to be pushed to the Azure server, and to end up in the correct place, when you publish the project (shown in the later step).

Azure Storage Account

image

You will need an Azure Storage Account.

Follow the directions at this link to create one (if you don’t already have one).

image

After creating the account, you will need the Azure Storage Account Name and the AccountKey.

See this link for directions on how to retrieve them.

image

You will also need to use the Microsoft Azure Storage Explorer to create the following Blob Containers and Queues:

  • Blobs
    • pdf-conversion-input
    • pdf-conversion-output
  • Queues
    • pdf-conversion

Update The Settings

image

Now return to the Solution in Visual Studio, open the App.config, and update the Azure Storage Account Name and the AccountKey.

(this is in three places and use the same values in all three places)

image

You can now open the Functions.cs code to see the code that does all the work:

 

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Azure.WebJobs;
using System.Configuration;
using Microsoft.WindowsAzure.Storage.Blob;
using Microsoft.WindowsAzure.Storage;
using iTextSharp.text.pdf;
using GhostscriptSharp.Settings;
using GhostscriptSharp;
namespace PDFConversionWebJob
{
    public class Functions
    {
        // This function will get triggered/executed when a new message is written 
        // on an Azure Queue called queue.
        public static void ProcessQueueMessage([QueueTrigger("pdf-conversion")] string message, TextWriter log)
        {
            log.WriteLine($"Queue message: {message}");
            string uploadFileName = message;
            log.WriteLine($"Looking for file: {uploadFileName}");
            log.WriteLine($"Get AzureWebJobsStorage and cloudUploadedFilesContainerName Settings");
            string _storageConnectionString = ConfigurationManager.AppSettings["AzureWebJobsStorage"];
            string _cloudUploadedFilesContainerName = ConfigurationManager.AppSettings["cloudUploadedFilesContainerName"];            
            log.WriteLine($"Get CloudStorageAccount");
            var _cloudStorageAccount = CloudStorageAccount.Parse(_storageConnectionString);
            log.WriteLine($"Get CloudBlobClient");
            var _cloudBlobClient = _cloudStorageAccount.CreateCloudBlobClient();
            log.WriteLine($"Get ContainerReference");
            var _cloudUploadedFilesBlobContainer = _cloudBlobClient.GetContainerReference(_cloudUploadedFilesContainerName);            
            log.WriteLine($"Get sourceBlockBlob");
            CloudBlockBlob sourceBlockBlob = _cloudUploadedFilesBlobContainer.GetBlockBlobReference(uploadFileName);
            if (!sourceBlockBlob.Exists())
            {
                log.WriteLine($"sourceBlockBlob does not exist.");
                return;
            }
            string extension = Path.GetExtension(uploadFileName).ToLower();
            if (extension == ".pdf")
            {
                int pageCount = 0;
                using (Stream stream = sourceBlockBlob.OpenRead())
                {
                    try
                    {
                        PdfReader pdfReader = new PdfReader(stream);
                        pageCount = pdfReader.NumberOfPages;
                        log.WriteLine($"Number of PDF Pages in {uploadFileName} is  {pageCount}");
                    }
                    catch (Exception ex)
                    {
                        log.WriteLine($"Error trying to count PDF Pages {ex.Message}");
                    }
                }
            }
            // _pdfConversionFilePath is the path the temp folder is in
            string _pdfConversionFilePath = Environment.GetEnvironmentVariable("WEBJOBS_PATH");
            log.WriteLine($"_pdfConversionFilePath: {_pdfConversionFilePath}");
            string filePath = string.Format(@"{0}\{1}", _pdfConversionFilePath, uploadFileName);
            log.WriteLine($"filePath: {filePath}");
            sourceBlockBlob.DownloadToFile(filePath, FileMode.Create);
            // The TIFF file will have the same name as the PDF file but with a tiff extension
            string fileName = Path.GetFileNameWithoutExtension(filePath);
            log.WriteLine($"fileName: {fileName}");
            string tiffFilePath = string.Format(@"{0}\{1}.tif", _pdfConversionFilePath, fileName);
            log.WriteLine($"tiffFilePath: {tiffFilePath}");
            // Delete tiff file if already exists
            log.WriteLine($"File.Exists(tiffFilePath): {File.Exists(tiffFilePath)}");
            if (File.Exists(tiffFilePath))
            {
                File.Delete(tiffFilePath);
            }
            // Use Ghostscript to convert from pdf to TIFF
            GhostscriptPages gsPages = new GhostscriptPages { AllPages = true };
            System.Drawing.Size gsResolution = new System.Drawing.Size { Height = 600, Width = 600 };
            GhostscriptPageSize gsPageSize = new GhostscriptPageSize { Native = GhostscriptPageSizes.letter };
            GhostscriptSettings gsSettings = new GhostscriptSettings
            {
                Device = GhostscriptDevices.tifflzw,
                Page = gsPages,
                Resolution = gsResolution,
                Size = gsPageSize
            };
            log.WriteLine($"gsSettings: {gsSettings.ToString()}");
            try
            {
                GhostscriptWrapper.GenerateOutput(filePath, tiffFilePath, gsSettings);
            }
            catch (Exception ex)
            {
                log.WriteLine($"Ghostscript error: {ex.Message}");
            }
            string TifffileName = Path.GetFileName(tiffFilePath);
            log.WriteLine($"TifffileName: {TifffileName}");
            log.WriteLine($"Get cloudConvertedFilesContainerName Settings");
            string _cloudConvertedFilesContainerName = ConfigurationManager.AppSettings["cloudConvertedFilesContainerName"];
            log.WriteLine($"GetContainerReference");
            var _cloudConvertedFilesContainer = _cloudBlobClient.GetContainerReference(_cloudConvertedFilesContainerName);
            log.WriteLine($"GetBlockBlobReference");
            CloudBlockBlob targetBlockBlob = _cloudConvertedFilesContainer.GetBlockBlobReference(TifffileName);
            log.WriteLine($"UploadFromStream: {tiffFilePath}");
            using (FileStream fileStream = File.OpenRead(tiffFilePath))
            {
                targetBlockBlob.UploadFromStream(fileStream);
            }
            log.WriteLine($"File.Delete Tiff file: {tiffFilePath}");
            File.Delete(tiffFilePath);
            log.WriteLine($"File.Delete PDF file: {filePath}");
            File.Delete(filePath);
            log.WriteLine($"Completed processing {uploadFileName}");
        }
    }
}

 

Note that we use a assembly called GhostScriptSharp. It is a ASP.Net “wrapper” that allows us to invoke GhostScript from ASP.Net code.

 

Publish the Project

image

In Visual Studio, right-click on the Project node (not the Solution node), and select Publish as Azure WebJob (not Publish).

image

Select Microsoft Azure App Service.

image

Select an existing Web Application (App Service), or click the New button to create a new one.

image

Complete the Wizard and click the Publish button.

image

Normally the AzureWebJobsDashboard setting is set from entries in the Web.config, but the sample project does not actually have a web application, just a Webjob, so we don’t have a Web.config file, so the settings are not properly set in Azure.

In order for the dashboard on the logging page to work correctly, you will need to log into the Azure portal, select the App Service that the WedbJob was published to, select Application settings, and add the following keys:

  • AzureWebJobsDashboard
  • AzureWebJobsStorage

For the value, enter the exact same value that you used for the Azure storage connection in the app.config file.

image

Finally you will need to select WebJobs, select the Webjob, then click Stop, wait a bit, click Refresh, then Start, for the new settings to take effect.

Links

GhostScript Downloads

GhostScriptSharp

Microsoft Azure Storage Explorer

Special Thanks

This would not be possible without code and assistance from Richard Waddell

Download

The project is available at http://lightswitchhelpwebsite.com/Downloads.aspx

Microsoft Visual Studio is a registered trademark of Microsoft Corporation / LightSwitch is a registered trademark of Microsoft Corporation