This serverless application deploys API Gateway and Lambda functions to your AWS account. It reads a PDF file from your S3 bucket and splits it into several page groups to invoke converting functions in parallel. Converting function processes images and uploads them to your S3 bucket. The data of uploaded images are served through API Gateway.
Because of the way ImageMagick works, If your command asks for the entire PDF document, it will all be converted in memory before any images are written. Thus, depends on the size of your PDF: the number of pages, and the pixels per page, It needs more memory and time to complete processing than you have available.
This application is built with serverless framework.
- Install serverless framework open-source CLI. More details read this.
$ npm i -g serverless
- Create AWS IAM user and access key on your AWS account. More details read this.
- recommend choosing AdministratorAccess in existing policies
- Setup AWS credential with serverless command. More details read this
$ serverless config credentials --provider aws --key xxx --secret xxx
Since ImageMagick lambda layer does not include Ghostscript, you need to deploy both layers.
/* serverless.ts */
provider: {
/* Required. Your AWS region */
region: 'xxx',
environment: {
/* Required. Your public s3 bucket name */
BUCKET: 'xxx',
/* Optional. You can change default query options */
DEFAULT_OPTIONS: JSON.stringify({
format: 'png',
size: null,
quality: null,
density: null,
division: 3,
pathname: 'images',
}),
},
},
functions: {
convert: {
/* Optional. You can change memory size of converting function */
memorySize: 1024,
/* Required. Your lambda layer ARN */
layers: [
'arn:aws:lambda:ap-northeast-2:xxx:layer:ghostscript:1',
'arn:aws:lambda:ap-northeast-2:xxx:layer:image-magick:1',
],
},
},
Note: Check AWS Lambda pricing when you set the memory size.
yarn install
yarn start
You can query some tests on http://localhost:3000
via serverless-offline. Be sure ImageMagick is installed on your computer.
yarn deploy
You can get the endpoint url like https://xxxxx.execute-api.ap-northeast-2.amazonaws.com/production/v1/convert
.
Name | Type | Required | Description |
---|---|---|---|
key | String | true | The encoded object key of PDF file in your S3 bucket. |
format | String | false | Image format. (Available format) Default is 'png'. |
pathname | String | false | The S3 path where images are saved. If you put nothing, images are saved at the same level as your PDF resource. Default is 'images'. |
division | Number | false | The number of pages to process by one converting function. If you put 0, there will be no division. Default is 3. |
size | Number | false | The width of image. (resize) Default is original size. |
quality | Number | false | The quality of image. (quality) Default is 85. |
density | Number | false | The densitiy of image. (density) Default is 72dpi. |
export type SuccessResponseBody = {
status: 'succeded';
data: {
/* Page number of uploaded image */
page: number;
/* Object key of uploaded image */
url: string;
}[];
};
export type ErrorResponseBody = {
status: 'error';
error: {
code: string;
message?: Error | string;
};
};
- UNDEFINED_QUERY_PARAMS : missing 'key' query string parameter.
- UNDEFINED_FILE_TYPE : missing content type of S3 Object
- UNSUPPORTED_FILE_TYPE : request with non-pdf file
- FAILED_S3_GET_OBJECT : fail to get the object from S3 bucket
- FAILED_S3_PUT_OBJECTL : fail to put the object from S3 bucket
- FAILED_S3_DELETE_OBJECT : fail to delete the object from S3 bucket
- UNDEFINED_CONVERT_PAYLOAD : missing data to process image
- FAILED_PARSE_PDF : fail to parse PDF with pdf-parse
- FAILED_CONVERT_PAGE : fail to process image with ImageMagick
- Basically, S3 path where images are saved will be deleted when an error is occurred during converting process.
- API Gateway times out after 30 seconds, and as of now, it is not configurable. Depending on the requested file size, you might get a timeout message. In this case, however, the lambda function continues converting process.