Feb 26

Written by: Michael Washington
2/26/2017 5:17 PM  RssIcon

image

You can use Azure Webjobs to convert PDF files to multipage Tiff files.

image

This sample project allows you to put a .pdf file in Microsoft Azure Storage (this can be done programmatically or using the Microsoft Azure Storage Explorer).

image

Enter a message in Azure Queue Storage with the name of the file to process.

image

You can then go into the Azure Portal, select the Azure Web App, select WebJobs, select the WebJob, then select Logs

image

See that the WebJob was invoked and was successful.

We then click on the log entry to see the details.

image

Clicking Toggle Output on the detail page reveals the log entries created by the code.

image

When we look in the output folder in Azure Storage we see the converted multi-page Tiff file that we can download and open.

Using The Sample Code

image

You can download the project from the downloads page on this site.

When you open it up in Visual Studio 2015 (or higher) you will see that the Ghostscript assemblies are missing (the license does not allow them to be distributed).

image

Download the GhostScript 32 bit and 64 bit assemblies from this link.

(the version needed at runtime will change based on the server the code is running on).

After you run the installer, the assemblies that you need will be at the following locations:

  • C:\Program Files (x86)\gs\gs9.20\bin\gsdll32.dll
  • C:\Program Files\gs\gs9.20\bin\gsdll64.dll

image

Copy and paste the files into the root directory of the Visual Studio project.

Note that they are both set to “Copy always”.

This is what allows them to be pushed to the Azure server, and to end up in the correct place, when you publish the project (shown in the later step).

Azure Storage Account

image

You will need an Azure Storage Account.

Follow the directions at this link to create one (if you don’t already have one).

image

After creating the account, you will need the Azure Storage Account Name and the AccountKey.

See this link for directions on how to retrieve them.

image

You will also need to use the Microsoft Azure Storage Explorer to create the following Blob Containers and Queues:

  • Blobs
    • pdf-conversion-input
    • pdf-conversion-output
  • Queues
    • pdf-conversion

Update The Settings

image

Now return to the Solution in Visual Studio, open the App.config, and update the Azure Storage Account Name and the AccountKey.

(this is in three places and use the same values in all three places)

image

You can now open the Functions.cs code to see the code that does all the work:

 

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Azure.WebJobs;
using System.Configuration;
using Microsoft.WindowsAzure.Storage.Blob;
using Microsoft.WindowsAzure.Storage;
using iTextSharp.text.pdf;
using GhostscriptSharp.Settings;
using GhostscriptSharp;
namespace PDFConversionWebJob
{
    public class Functions
    {
        // This function will get triggered/executed when a new message is written 
        // on an Azure Queue called queue.
        public static void ProcessQueueMessage([QueueTrigger("pdf-conversion")] string message, TextWriter log)
        {
            log.WriteLine($"Queue message: {message}");
            string uploadFileName = message;
            log.WriteLine($"Looking for file: {uploadFileName}");
            log.WriteLine($"Get AzureWebJobsStorage and cloudUploadedFilesContainerName Settings");
            string _storageConnectionString = ConfigurationManager.AppSettings["AzureWebJobsStorage"];
            string _cloudUploadedFilesContainerName = ConfigurationManager.AppSettings["cloudUploadedFilesContainerName"];            
            log.WriteLine($"Get CloudStorageAccount");
            var _cloudStorageAccount = CloudStorageAccount.Parse(_storageConnectionString);
            log.WriteLine($"Get CloudBlobClient");
            var _cloudBlobClient = _cloudStorageAccount.CreateCloudBlobClient();
            log.WriteLine($"Get ContainerReference");
            var _cloudUploadedFilesBlobContainer = _cloudBlobClient.GetContainerReference(_cloudUploadedFilesContainerName);            
            log.WriteLine($"Get sourceBlockBlob");
            CloudBlockBlob sourceBlockBlob = _cloudUploadedFilesBlobContainer.GetBlockBlobReference(uploadFileName);
            if (!sourceBlockBlob.Exists())
            {
                log.WriteLine($"sourceBlockBlob does not exist.");
                return;
            }
            string extension = Path.GetExtension(uploadFileName).ToLower();
            if (extension == ".pdf")
            {
                int pageCount = 0;
                using (Stream stream = sourceBlockBlob.OpenRead())
                {
                    try
                    {
                        PdfReader pdfReader = new PdfReader(stream);
                        pageCount = pdfReader.NumberOfPages;
                        log.WriteLine($"Number of PDF Pages in {uploadFileName} is  {pageCount}");
                    }
                    catch (Exception ex)
                    {
                        log.WriteLine($"Error trying to count PDF Pages {ex.Message}");
                    }
                }
            }
            // _pdfConversionFilePath is the path the temp folder is in
            string _pdfConversionFilePath = Environment.GetEnvironmentVariable("WEBJOBS_PATH");
            log.WriteLine($"_pdfConversionFilePath: {_pdfConversionFilePath}");
            string filePath = string.Format(@"{0}\{1}", _pdfConversionFilePath, uploadFileName);
            log.WriteLine($"filePath: {filePath}");
            sourceBlockBlob.DownloadToFile(filePath, FileMode.Create);
            // The TIFF file will have the same name as the PDF file but with a tiff extension
            string fileName = Path.GetFileNameWithoutExtension(filePath);
            log.WriteLine($"fileName: {fileName}");
            string tiffFilePath = string.Format(@"{0}\{1}.tif", _pdfConversionFilePath, fileName);
            log.WriteLine($"tiffFilePath: {tiffFilePath}");
            // Delete tiff file if already exists
            log.WriteLine($"File.Exists(tiffFilePath): {File.Exists(tiffFilePath)}");
            if (File.Exists(tiffFilePath))
            {
                File.Delete(tiffFilePath);
            }
            // Use Ghostscript to convert from pdf to TIFF
            GhostscriptPages gsPages = new GhostscriptPages { AllPages = true };
            System.Drawing.Size gsResolution = new System.Drawing.Size { Height = 600, Width = 600 };
            GhostscriptPageSize gsPageSize = new GhostscriptPageSize { Native = GhostscriptPageSizes.letter };
            GhostscriptSettings gsSettings = new GhostscriptSettings
            {
                Device = GhostscriptDevices.tifflzw,
                Page = gsPages,
                Resolution = gsResolution,
                Size = gsPageSize
            };
            log.WriteLine($"gsSettings: {gsSettings.ToString()}");
            try
            {
                GhostscriptWrapper.GenerateOutput(filePath, tiffFilePath, gsSettings);
            }
            catch (Exception ex)
            {
                log.WriteLine($"Ghostscript error: {ex.Message}");
            }
            string TifffileName = Path.GetFileName(tiffFilePath);
            log.WriteLine($"TifffileName: {TifffileName}");
            log.WriteLine($"Get cloudConvertedFilesContainerName Settings");
            string _cloudConvertedFilesContainerName = ConfigurationManager.AppSettings["cloudConvertedFilesContainerName"];
            log.WriteLine($"GetContainerReference");
            var _cloudConvertedFilesContainer = _cloudBlobClient.GetContainerReference(_cloudConvertedFilesContainerName);
            log.WriteLine($"GetBlockBlobReference");
            CloudBlockBlob targetBlockBlob = _cloudConvertedFilesContainer.GetBlockBlobReference(TifffileName);
            log.WriteLine($"UploadFromStream: {tiffFilePath}");
            using (FileStream fileStream = File.OpenRead(tiffFilePath))
            {
                targetBlockBlob.UploadFromStream(fileStream);
            }
            log.WriteLine($"File.Delete Tiff file: {tiffFilePath}");
            File.Delete(tiffFilePath);
            log.WriteLine($"File.Delete PDF file: {filePath}");
            File.Delete(filePath);
            log.WriteLine($"Completed processing {uploadFileName}");
        }
    }
}

 

Note that we use a assembly called GhostScriptSharp. It is a ASP.Net “wrapper” that allows us to invoke GhostScript from ASP.Net code.

 

Publish the Project

image

In Visual Studio, right-click on the Project node (not the Solution node), and select Publish as Azure WebJob (not Publish).

image

Select Microsoft Azure App Service.

image

Select an existing Web Application (App Service), or click the New button to create a new one.

image

Complete the Wizard and click the Publish button.

image

Normally the AzureWebJobsDashboard setting is set from entries in the Web.config, but the sample project does not actually have a web application, just a Webjob, so we don’t have a Web.config file, so the settings are not properly set in Azure.

In order for the dashboard on the logging page to work correctly, you will need to log into the Azure portal, select the App Service that the WedbJob was published to, select Application settings, and add the following keys:

  • AzureWebJobsDashboard
  • AzureWebJobsStorage

For the value, enter the exact same value that you used for the Azure storage connection in the app.config file.

image

Finally you will need to select WebJobs, select the Webjob, then click Stop, wait a bit, click Refresh, then Start, for the new settings to take effect.

Links

GhostScript Downloads

GhostScriptSharp

Microsoft Azure Storage Explorer

Special Thanks

This would not be possible without code and assistance from Richard Waddell

Download

The project is available at http://lightswitchhelpwebsite.com/Downloads.aspx

5 comment(s) so far...


Gravatar

Re: Converting PDFs to Multipage Tiff files Using Azure WebJobs

When I attempt to publish this to Azure using VS 2015, I get the following error which appears to be common and I've double checked all the steps to ensure I'm following the guide. I've been at this for hours, and attempted publishing well over 10 times and rebuilt everything every time to be safe.

Error message (I chopped off the first part of the path for privacy):

"....PDFConversionWebJob.csproj(0,0): Error MSB4057: The target "MSDeployPublish" does not exist in the project."

I've seen several fixes, none of which work for me. Any insight?

(Here's the most popular fix, which doesn't work for me either: https://webcache.googleusercontent.com/search?q=cache:m0zoPHl4OZIJ:https://stackoverflow.com/questions/23634039/the-target-msdeploypublish-does-not-exist-in-the-project+&cd=1&hl=en&ct=clnk&gl=us)

By Mike Rogers Compton on   6/24/2017 9:39 PM
Gravatar

Re: Converting PDFs to Multipage Tiff files Using Azure WebJobs

@Mike Rogers Compton - I'm sorry but I have never had that error.

By Michael Washington on   6/24/2017 9:40 PM
Gravatar

Re: Converting PDFs to Multipage Tiff files Using Azure WebJobs

Understood, thank you anyways. If i find the solution I'll let you know so you can share. Have a great week!

By Mike Rogers Compton on   6/26/2017 5:10 AM
Gravatar

Re: Converting PDFs to Multipage Tiff files Using Azure WebJobs

I was able to manually upload the webjob, however it appears the syntax in the config file connection string is upsetting Azure :) .

Have you had a similar error, or do you know if I leave or remove the brackets or asterisks in the syntax?

This is what was in the log:
*************************************************
Continuous WebJob Details TEST8


Make sure that you are setting a connection string named AzureWebJobsDashboard in your Microsoft Azure Website configuration by using the following format DefaultEndpointsProtocol=https;AccountName=NAME;AccountKey=KEY pointing to the Microsoft Azure Storage account where the Microsoft Azure WebJobs Runtime logs are stored.

Please visit the article about configuring connection strings for more information on how you can configure connection strings in your Microsoft Azure Website.

The configuration is not properly set for the Microsoft Azure WebJobs Dashboard.
In your Microsoft Azure Website configuration you must set a connection string named AzureWebJobsDashboard by using the following format DefaultEndpointsProtocol=https;AccountName=NAME;AccountKey=KEY pointing to the Microsoft Azure Storage account where the Microsoft Azure WebJobs Runtime logs are stored.

Please visit the article about configuring connection strings for more information on how you can configure connection strings in your Microsoft Azure Website.

By Mike Rogers Compton on   6/27/2017 7:36 AM
Gravatar

Re: Converting PDFs to Multipage Tiff files Using Azure WebJobs

@Mike Rogers Compton - You will want to post to the StackOverflow group (at: https://stackoverflow.com/questions/tagged/azure-functions)with your error because it is monitored by the Azure Functions team.

By Michael Washington on   6/27/2017 7:38 AM

Your name:
Gravatar Preview
Your email:
(Optional) Email used only to show Gravatar.
Your website:
Title:
Comment:
Security Code
CAPTCHA image
Enter the code shown above in the box below
Add Comment   Cancel 
Microsoft Visual Studio is a registered trademark of Microsoft Corporation / LightSwitch is a registered trademark of Microsoft Corporation