Mar
4
Written by:
Michael Washington
3/4/2017 2:09 PM
Notice: A few times, In testing I have encountered intermittent errors. This Stack Overflow post details the issues I found.
You can use Azure Functions to convert PDF files to PNG files.
You can put a .pdf file in Microsoft Azure Storage (this can be done programmatically or using the Microsoft Azure Storage Explorer)…
… then enter a message in Azure Queue Storage with the name of the file to process.
The PDF file will be split into individual .png files.
Create The Azure Function
Go to:
https://azure.microsoft.com/en-us/services/functions/
Log into your Azure account (or create a new one).
Select a Subscription, enter a Name for your function, select a Region close to you, and click the Create + get started button.
Click the New Function button.
Select the C# Language, the Data Processing Scenario, then the QueueTrigger-CSharp template.
Enter a Name for the function.
Set the queue name (leave the default).
Select an existing Storage account connection or create a new one.
Click the Create button.
The function edit screen will show.
Click the expand buttons on the Logs and File View sections to show them.
Click the Add button to add a new file.
Name the file project.json and press the Enter key.
Enter the following code and press the Save button:
{
"frameworks": {
"net46": {
"dependencies": {
"Ghostscript.NET": "1.2.1"
}
}
}
}
This will instruct Azure Functions to load the GhostScript.Net Nuget package.
GhostScript is still required and we will add that in a later step.
Click on the run.csx file, enter the following code, and click the Save button:
#r "Microsoft.WindowsAzure.Storage"
#r "System.Drawing"
#r "System.Web"
#r "System.Configuration"
using System.Drawing;
using System.Drawing.Imaging;
using System;
using System.Configuration;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Blob;
using System.Net;
using Ghostscript.NET;
using Ghostscript.NET.Rasterizer;
public static void Run(string myQueueItem, TraceWriter log)
{
log.Info($"C# Queue trigger function processed. Value passed: {myQueueItem}");
string uploadFileName = myQueueItem;
log.Info($"Get AzureWebJobsStorage and cloudUploadedFilesContainerName Settings");
string _storageConnectionString = ConfigurationManager.AppSettings["AzureWebJobsStorage"];
string _cloudUploadedFilesContainerName = "pdf-conversion-input";
log.Info($"Get CloudStorageAccount");
var _cloudStorageAccount = CloudStorageAccount.Parse(_storageConnectionString);
log.Info($"Get CloudBlobClient");
var _cloudBlobClient = _cloudStorageAccount.CreateCloudBlobClient();
log.Info($"Get ContainerReference");
var _cloudUploadedFilesBlobContainer =
_cloudBlobClient.GetContainerReference(_cloudUploadedFilesContainerName);
log.Info($"Get sourceBlockBlob");
CloudBlockBlob sourceBlockBlob =
_cloudUploadedFilesBlobContainer.GetBlockBlobReference(uploadFileName);
if (!sourceBlockBlob.Exists())
{
log.Info($"sourceBlockBlob does not exist.");
return;
}
// _pdfConversionFilePath is the path the temp folder is in
string _pdfConversionFilePath = @"D:\home\data\Functions\sampledata";
log.Info($"_pdfConversionFilePath: {_pdfConversionFilePath}");
string filePath = string.Format(@"{0}\{1}", _pdfConversionFilePath, uploadFileName);
log.Info($"filePath: {filePath}");
sourceBlockBlob.DownloadToFile(filePath, FileMode.Create);
// The png file will have the same name as the PDF file but with a png extension
string fileName = Path.GetFileNameWithoutExtension(filePath);
log.Info($"fileName: {fileName}");
string pngFilePath = string.Format(@"{0}\{1}.png", _pdfConversionFilePath, fileName);
log.Info($"pngFilePath: {pngFilePath}");
// Delete png file if already exists
log.Info($"File.Exists(pngFilePath): {File.Exists(pngFilePath)}");
if (File.Exists(pngFilePath))
{
File.Delete(pngFilePath);
}
// Use Ghostscript to convert from pdf to png
log.Info($"Get cloudConvertedFilesContainerName Settings");
string _cloudConvertedFilesContainerName = "pdf-conversion-output";
log.Info($"GetContainerReference");
var _cloudConvertedFilesContainer =
_cloudBlobClient.GetContainerReference(_cloudConvertedFilesContainerName);
int desired_x_dpi = 96;
int desired_y_dpi = 96;
string inputPdfPath = filePath;
string outputPath = Path.GetDirectoryName(pngFilePath);
GhostscriptVersionInfo gvi =
new GhostscriptVersionInfo(@"D:\home\data\Functions\packages\nuget\ghostscript.net\1.2.1\lib\net40\gsdll32.dll");
log.Info($"using (GhostscriptRasterizer _rasterizer = new GhostscriptRasterizer())");
using (GhostscriptRasterizer _rasterizer = new GhostscriptRasterizer())
{
log.Info($"_rasterizer.Open: {inputPdfPath}");
_rasterizer.Open(inputPdfPath, gvi, true);
for (int pageNumber = 1; pageNumber <= _rasterizer.PageCount; pageNumber++)
{
string pageFilePath = Path.Combine(outputPath, fileName + "-Page-" + pageNumber.ToString() + ".png");
log.Info($"pageFilePath: {pageFilePath}");
log.Info($"_rasterizer.GetPage: {pageNumber}");
Image img = _rasterizer.GetPage(desired_x_dpi, desired_y_dpi, pageNumber);
log.Info($"img.Save: {pageFilePath}");
img.Save(pageFilePath, ImageFormat.Png);
string pngfileName = Path.GetFileName(pageFilePath);
log.Info($"pngfileName: {pngfileName}");
log.Info($"GetBlockBlobReference");
CloudBlockBlob targetBlockBlob = _cloudConvertedFilesContainer.GetBlockBlobReference(pngfileName);
log.Info($"UploadFromStream: {pageFilePath}");
using (FileStream fileStream = File.OpenRead(pageFilePath))
{
targetBlockBlob.UploadFromStream(fileStream);
}
log.Info($"File.Delete png file: {pageFilePath}");
File.Delete(pageFilePath);
}
}
log.Info($"File.Delete PDF file: {filePath}");
File.Delete(filePath);
log.Info($"Completed processing {uploadFileName}");
}
Adding GhostScript
Download the GhostScript 32 bit assembly from this link.
After you run the installer, the assembly that you need will be at the following location:
- C:\Program Files (x86)\gs\gs9.20\bin\gsdll32.dll
Return to the Azure portal, select the Function app (AzureFunction-App), and click Function app settings.
Click the Go to Kudu button.
In Kudu, select Debug console then CMD.
Click on the data folder to navigate to it.
Then click on Functions.
Next, click on packages…
nuget…
ghostscript.net…
1.2.1…
lib…
Finally, net40.
Now drag and drop the gsdll32.dll file onto the web page in your web browser.
The file will upload to the directory.
Connect To Azure Storage
Download and install the Microsoft Azure Storage Explorer.
After you install it, open the Microsoft Azure Storage Explorer.
Log in using your Azure account.
Navigate to the Azure storage account you specified when you created the function (if you forgot which one it is, you can see the AzureWebJobsStorage connection string in Configure app settings under Function app settings).
Expand the tree to show all the nodes.
Right-click on the Blob Containers node and select Create Blob Container.
Create:
- pdf-conversion-input
- pdf-conversion-output
Select the pdf-conversion-input folder and use the Upload button to upload a PDF file.
Right-click on the Queues node and select Create Queue.
Create a queue called myqueue-items.
Select the myqueue-items, click the Add button, and add a message to the queue with the name of the PDF file that was uploaded to the pdf-conversion-input folder.
Note, you must use the exact casing for the file name.
The PDF will be read, and a PNG file will be created for each page of the PDF file.
The files will show up in the pdf-conversion-output folder.
You can monitor the function and diagnose any errors using the Logs in the Azure portal.
You can watch the files being processed in Kudu by navigating to:
D:\home\data\Functions\sampledata
GhostScript Errors
If you get the following error:
Exception while executing function: Functions.PDFtoPNG. mscorlib: Exception has been thrown by the target of an invocation. Ghostscript.NET: Ghostscript native library could not be found.
You need to make sure you have added gsdll32.dll to the correct directory, also that you have added the 32 bit version not the 64 bit version of the file.
If you get other errors, see this Stack Overflow post.
Links
Converting PDFs to Multipage Tiff files Using Azure WebJobs
Azure Functions
Microsoft Azure Storage Explorer
4 comment(s) so far...
Very nice work, thank you. Quick question, i'm getting this error in the log: "sourceBlockBlob does not exist", but after reading through the code i can't see why as i used the same names. ... but perhaps i misspelled something.
Any thoughts as to what I should look at? No problem if not, thanks again
By Raj Finn on
6/30/2017 7:04 AM
|
@Raj Finn - Just a guess, perhaps your storage container name and sub folder and file name do not match? Upper and lower case does matter ("UsConstitution" is not the same as "USCONSTITUTION"). Also ensure that this is working: string _storageConnectionString = ConfigurationManager.AppSettings["AzureWebJobsStorage"]; By setting it in the steps outlined in the article
By Michael Washington on
6/30/2017 7:09 AM
|
you can use this "http://www.cnetsdk.com/net-pdf-to-image-converter-sdk" pdf to image convert sdk and this .net pdf to image sdk.
By Paul Jones on
5/15/2018 7:51 PM
|
Try online website https://miniimagesvideos.com/jpgtopdf to compress jpeg to pdf. Any other digital media can be also compressed using this site. Like compress jpeg or compress images using https://miniimagesvideos.com or compress pdf using https://miniimagesvideos.com/pdf or compress video using https://miniimagesvideos.com/compress_video
By Neelam Jain on
9/10/2019 4:06 AM
|