Feb
26
Written by:
Michael Washington
2/26/2017 5:17 PM
You can use Azure Webjobs to convert PDF files to multipage Tiff files.
This sample project allows you to put a .pdf file in Microsoft Azure Storage (this can be done programmatically or using the Microsoft Azure Storage Explorer).
Enter a message in Azure Queue Storage with the name of the file to process.
You can then go into the Azure Portal, select the Azure Web App, select WebJobs, select the WebJob, then select Logs…
See that the WebJob was invoked and was successful.
We then click on the log entry to see the details.
Clicking Toggle Output on the detail page reveals the log entries created by the code.
When we look in the output folder in Azure Storage we see the converted multi-page Tiff file that we can download and open.
Using The Sample Code
You can download the project from the downloads page on this site.
When you open it up in Visual Studio 2015 (or higher) you will see that the Ghostscript assemblies are missing (the license does not allow them to be distributed).
Download the GhostScript 32 bit and 64 bit assemblies from this link.
(the version needed at runtime will change based on the server the code is running on).
After you run the installer, the assemblies that you need will be at the following locations:
- C:\Program Files (x86)\gs\gs9.20\bin\gsdll32.dll
- C:\Program Files\gs\gs9.20\bin\gsdll64.dll
Copy and paste the files into the root directory of the Visual Studio project.
Note that they are both set to “Copy always”.
This is what allows them to be pushed to the Azure server, and to end up in the correct place, when you publish the project (shown in the later step).
Azure Storage Account
You will need an Azure Storage Account.
Follow the directions at this link to create one (if you don’t already have one).
After creating the account, you will need the Azure Storage Account Name and the AccountKey.
See this link for directions on how to retrieve them.
You will also need to use the Microsoft Azure Storage Explorer to create the following Blob Containers and Queues:
- Blobs
- pdf-conversion-input
- pdf-conversion-output
- Queues
Update The Settings
Now return to the Solution in Visual Studio, open the App.config, and update the Azure Storage Account Name and the AccountKey.
(this is in three places and use the same values in all three places)
You can now open the Functions.cs code to see the code that does all the work:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Azure.WebJobs;
using System.Configuration;
using Microsoft.WindowsAzure.Storage.Blob;
using Microsoft.WindowsAzure.Storage;
using iTextSharp.text.pdf;
using GhostscriptSharp.Settings;
using GhostscriptSharp;
namespace PDFConversionWebJob
{
public class Functions
{
// This function will get triggered/executed when a new message is written
// on an Azure Queue called queue.
public static void ProcessQueueMessage([QueueTrigger("pdf-conversion")] string message, TextWriter log)
{
log.WriteLine($"Queue message: {message}");
string uploadFileName = message;
log.WriteLine($"Looking for file: {uploadFileName}");
log.WriteLine($"Get AzureWebJobsStorage and cloudUploadedFilesContainerName Settings");
string _storageConnectionString = ConfigurationManager.AppSettings["AzureWebJobsStorage"];
string _cloudUploadedFilesContainerName = ConfigurationManager.AppSettings["cloudUploadedFilesContainerName"];
log.WriteLine($"Get CloudStorageAccount");
var _cloudStorageAccount = CloudStorageAccount.Parse(_storageConnectionString);
log.WriteLine($"Get CloudBlobClient");
var _cloudBlobClient = _cloudStorageAccount.CreateCloudBlobClient();
log.WriteLine($"Get ContainerReference");
var _cloudUploadedFilesBlobContainer = _cloudBlobClient.GetContainerReference(_cloudUploadedFilesContainerName);
log.WriteLine($"Get sourceBlockBlob");
CloudBlockBlob sourceBlockBlob = _cloudUploadedFilesBlobContainer.GetBlockBlobReference(uploadFileName);
if (!sourceBlockBlob.Exists())
{
log.WriteLine($"sourceBlockBlob does not exist.");
return;
}
string extension = Path.GetExtension(uploadFileName).ToLower();
if (extension == ".pdf")
{
int pageCount = 0;
using (Stream stream = sourceBlockBlob.OpenRead())
{
try
{
PdfReader pdfReader = new PdfReader(stream);
pageCount = pdfReader.NumberOfPages;
log.WriteLine($"Number of PDF Pages in {uploadFileName} is {pageCount}");
}
catch (Exception ex)
{
log.WriteLine($"Error trying to count PDF Pages {ex.Message}");
}
}
}
// _pdfConversionFilePath is the path the temp folder is in
string _pdfConversionFilePath = Environment.GetEnvironmentVariable("WEBJOBS_PATH");
log.WriteLine($"_pdfConversionFilePath: {_pdfConversionFilePath}");
string filePath = string.Format(@"{0}\{1}", _pdfConversionFilePath, uploadFileName);
log.WriteLine($"filePath: {filePath}");
sourceBlockBlob.DownloadToFile(filePath, FileMode.Create);
// The TIFF file will have the same name as the PDF file but with a tiff extension
string fileName = Path.GetFileNameWithoutExtension(filePath);
log.WriteLine($"fileName: {fileName}");
string tiffFilePath = string.Format(@"{0}\{1}.tif", _pdfConversionFilePath, fileName);
log.WriteLine($"tiffFilePath: {tiffFilePath}");
// Delete tiff file if already exists
log.WriteLine($"File.Exists(tiffFilePath): {File.Exists(tiffFilePath)}");
if (File.Exists(tiffFilePath))
{
File.Delete(tiffFilePath);
}
// Use Ghostscript to convert from pdf to TIFF
GhostscriptPages gsPages = new GhostscriptPages { AllPages = true };
System.Drawing.Size gsResolution = new System.Drawing.Size { Height = 600, Width = 600 };
GhostscriptPageSize gsPageSize = new GhostscriptPageSize { Native = GhostscriptPageSizes.letter };
GhostscriptSettings gsSettings = new GhostscriptSettings
{
Device = GhostscriptDevices.tifflzw,
Page = gsPages,
Resolution = gsResolution,
Size = gsPageSize
};
log.WriteLine($"gsSettings: {gsSettings.ToString()}");
try
{
GhostscriptWrapper.GenerateOutput(filePath, tiffFilePath, gsSettings);
}
catch (Exception ex)
{
log.WriteLine($"Ghostscript error: {ex.Message}");
}
string TifffileName = Path.GetFileName(tiffFilePath);
log.WriteLine($"TifffileName: {TifffileName}");
log.WriteLine($"Get cloudConvertedFilesContainerName Settings");
string _cloudConvertedFilesContainerName = ConfigurationManager.AppSettings["cloudConvertedFilesContainerName"];
log.WriteLine($"GetContainerReference");
var _cloudConvertedFilesContainer = _cloudBlobClient.GetContainerReference(_cloudConvertedFilesContainerName);
log.WriteLine($"GetBlockBlobReference");
CloudBlockBlob targetBlockBlob = _cloudConvertedFilesContainer.GetBlockBlobReference(TifffileName);
log.WriteLine($"UploadFromStream: {tiffFilePath}");
using (FileStream fileStream = File.OpenRead(tiffFilePath))
{
targetBlockBlob.UploadFromStream(fileStream);
}
log.WriteLine($"File.Delete Tiff file: {tiffFilePath}");
File.Delete(tiffFilePath);
log.WriteLine($"File.Delete PDF file: {filePath}");
File.Delete(filePath);
log.WriteLine($"Completed processing {uploadFileName}");
}
}
}
Note that we use a assembly called GhostScriptSharp. It is a ASP.Net “wrapper” that allows us to invoke GhostScript from ASP.Net code.
Publish the Project
In Visual Studio, right-click on the Project node (not the Solution node), and select Publish as Azure WebJob (not Publish).
Select Microsoft Azure App Service.
Select an existing Web Application (App Service), or click the New button to create a new one.
Complete the Wizard and click the Publish button.
Normally the AzureWebJobsDashboard setting is set from entries in the Web.config, but the sample project does not actually have a web application, just a Webjob, so we don’t have a Web.config file, so the settings are not properly set in Azure.
In order for the dashboard on the logging page to work correctly, you will need to log into the Azure portal, select the App Service that the WedbJob was published to, select Application settings, and add the following keys:
- AzureWebJobsDashboard
- AzureWebJobsStorage
For the value, enter the exact same value that you used for the Azure storage connection in the app.config file.
Finally you will need to select WebJobs, select the Webjob, then click Stop, wait a bit, click Refresh, then Start, for the new settings to take effect.
Links
GhostScript Downloads
GhostScriptSharp
Microsoft Azure Storage Explorer
Special Thanks
This would not be possible without code and assistance from Richard Waddell
Download
The project is available at http://lightswitchhelpwebsite.com/Downloads.aspx
5 comment(s) so far...
When I attempt to publish this to Azure using VS 2015, I get the following error which appears to be common and I've double checked all the steps to ensure I'm following the guide. I've been at this for hours, and attempted publishing well over 10 times and rebuilt everything every time to be safe.
Error message (I chopped off the first part of the path for privacy):
"....PDFConversionWebJob.csproj(0,0): Error MSB4057: The target "MSDeployPublish" does not exist in the project."
I've seen several fixes, none of which work for me. Any insight?
(Here's the most popular fix, which doesn't work for me either: https://webcache.googleusercontent.com/search?q=cache:m0zoPHl4OZIJ:https://stackoverflow.com/questions/23634039/the-target-msdeploypublish-does-not-exist-in-the-project+&cd=1&hl=en&ct=clnk&gl=us)
By Mike Rogers Compton on
6/24/2017 9:39 PM
|
@Mike Rogers Compton - I'm sorry but I have never had that error.
By Michael Washington on
6/24/2017 9:40 PM
|
Understood, thank you anyways. If i find the solution I'll let you know so you can share. Have a great week!
By Mike Rogers Compton on
6/26/2017 5:10 AM
|
I was able to manually upload the webjob, however it appears the syntax in the config file connection string is upsetting Azure :) .
Have you had a similar error, or do you know if I leave or remove the brackets or asterisks in the syntax? This is what was in the log: ************************************************* Continuous WebJob Details TEST8
Make sure that you are setting a connection string named AzureWebJobsDashboard in your Microsoft Azure Website configuration by using the following format DefaultEndpointsProtocol=https;AccountName=NAME;AccountKey=KEY pointing to the Microsoft Azure Storage account where the Microsoft Azure WebJobs Runtime logs are stored.
Please visit the article about configuring connection strings for more information on how you can configure connection strings in your Microsoft Azure Website. The configuration is not properly set for the Microsoft Azure WebJobs Dashboard. In your Microsoft Azure Website configuration you must set a connection string named AzureWebJobsDashboard by using the following format DefaultEndpointsProtocol=https;AccountName=NAME;AccountKey=KEY pointing to the Microsoft Azure Storage account where the Microsoft Azure WebJobs Runtime logs are stored.
Please visit the article about configuring connection strings for more information on how you can configure connection strings in your Microsoft Azure Website.
By Mike Rogers Compton on
6/27/2017 7:36 AM
|
@Mike Rogers Compton - You will want to post to the StackOverflow group (at: https://stackoverflow.com/questions/tagged/azure-functions)with your error because it is monitored by the Azure Functions team.
By Michael Washington on
6/27/2017 7:38 AM
|