Skip to main content

Generative AI

Document Summarization with AI on Amazon Bedrock

Istock 1830042746

Objective

Enable automated document summarization by allowing us to upload TXT, PDF, or DOCX files, extracting content, summarizing it using Amazon Bedrock, and delivering the summary either via API response or by storing it for future retrieval.

Why This Is Needed

  • Organizations face information overload with a large number of documents.
  • Manual summarization is time-consuming and inconsistent.
  • AI enables faster, accurate, and scalable content summarization.
  • Amazon Bedrock provides easy access to powerful foundation models without managing infrastructure.
  • Helps improve decision-making by delivering quick, reliable insights.

Architecture Overview

Blog Ai1

  1. Uploads a document (TXT, PDF, or DOCX) to an S3 bucket.
  2. S3 triggers a Lambda function.
  3. Extracted content is passed to Amazon Bedrock for summarization (e.g., Claude 3 Sonnet).
  4. The summary is stored in Amazon S3.
  5. Lambda returns a response confirming successful summarization and storage.

AWS Services We Used

  • Amazon S3: Used to upload and store original documents like TXT, PDF, or DOCX files.
  • AWS Lambda: Handles the automation logic, triggered by S3 upload, it parses content and invokes Bedrock.
  • Amazon Bedrock: Provides powerful foundation models (Claude, Titan, or Llama 3) for generating document summaries.
  • IAM Roles: Securely manage permissions across services to ensure least-privilege access control.

Step-by-Step Guide

1. Create an S3 Bucket

  1. Navigate to AWS Console → S3 → Create Bucket
  • Example bucket name: kunaldoc-bucket

Note: Use the US East (N. Virginia) region (us-east-1) since Amazon Bedrock is not available in most of the regions (e.g., not available in Ohio).

  1. Inside the bucket, create two folders:
  • uploads/ – to store original documents (TXT, PDF, DOCX)
  • summaries/ – to save the AI-generated summaries.

Picture2

Step 2: Enable Amazon Bedrock Access

  1. Go to the Amazon Bedrock console.
  2. Navigate to Model access from the left menu.

Picture3

  1.  Select and enable access to the foundation models to be used, such as:
    • Claude 3.5 Sonnet (I used)
    • Meta Llama 3
    • Anthropic Claude
  2. Wait for the status to show as Access granted (this may take a few minutes).

Note: Make sure you’re in the same region as your Lambda function (e.g., us-east-1 / N. Virginia).

Picture4

Step 3: Set Up IAM Role for Lambda

  1. Go to IAM > Roles > Create Role
  2. Choose Lambda as the trusted entity type
  3. Attach these AWS managed policies:
  • AmazonS3FullAccess
  • AmazonBedrockFullAccess
  • AWSLambdaBasicExecutionRole
  1. Name the role something like: LambdaBedrockExecutionRole

This role allows Lambda functions to securely access S3, invoke Amazon Bedrock, and write logs to CloudWatch.

Picture5

Step 4: Create the Lambda Function

  1. Go to AWS Lambda > Create Function
  2. Set the Function Name: docSummarizerLambda (I used that)
  3. Select Runtime: Python 3.9
  4. Choose the Execution Role you created earlier.

(LambdaBedrockExecutionRole)

  1. Upload your code:
  • I added the lambda_function.py code to the GitHub repo.
  • Dependencies (like lxml, PDF) are also included in the same GitHub repo.
  • Download the dependencies zip file to your local machine and attach it as a Lambda Layer during Lambda configuration.

Picture6

This Lambda function handles document parsing, model invocation, and storing the generated summary

Step 5: Set S3 as the Trigger for Lambda

  1. Go to your Lambda function → Configuration → Triggers → Click “Add trigger”
  2. Select Source as S3
  3. Choose the S3 bucket you created earlier (which contains uploads/ and summaries/ folders)
  4. Set the Event type – PUT
  5. Under Prefix, enter: uploads/
  6. Leave Suffix empty (optional)
  7. Click “Add” to finalize the trigger.

Picture7

This ensures your Lambda function is automatically invoked whenever a new file is uploaded to the uploads/ folder in your bucket.

Step 6: Add Lambda Layer for Dependencies

To include external Python libraries (like lxml, pdfminer.six, or python-docx), create a Lambda Layer:

  1. Download the dependencies ZIP

  • Clone or download the dependencies folder from the GitHub repo.
  1. Create the Layer

  • Go to AWS Lambda > Layers > Create layer
  • Name it (e.g., kc-lambda-layer)
  • Upload the ZIP file you downloaded
  • Set the compatible runtime to Python 3.9
  • Click Create

Picture8

  1. Attach Layer to Lambda Function

  • Open your Lambda function
  • Go to Configuration > Layers
  • Click Add a layer > Custom layers
  • Select the one you just downloaded
  • Click Add

Picture9

Picture10

The final version of the Lambda function is shown below:

Picture11

 

Step 7: Upload a Document

  1. Navigate to S3 > uploads/  folder.
  2. Upload your document

Picture12

Once uploaded, the Lambda function is automatically triggered and performs the following actions:

  • Sends content to Bedrock for AI-based summarization.
  • Saves the summary in the summaries/ folder in the same S3 bucket.

Sample data of Document Summarization with AI on Amazon Bedrock file:

Picture13

Step 8: Monitor Lambda Logs in CloudWatch

Debug or verify your Lambda execution:

  1. Go to your Lambda Function in the AWS Console.
  2. Click on the Monitor tab → then View CloudWatch Logs.
  3. Open the Log stream to inspect detailed logs and execution steps.

This helps track any errors or view how the document was processed and summarized.

Picture14

Step 9: View Output Summary

  1. Navigate to your S3 bucket → open the summaries/ folder.
  2. Download the generated file (e.g., script_summary.txt).

Picture15

Results

We can see that the summary for the document summarization with the AI.txt file is successfully generated and saved as document summarization with_summary.txt inside the summaries/ folder.

Picture16

Conclusion

With this serverless workflow, you’ve built an automated document summarization pipeline using Amazon S3, Lambda, and Bedrock. This solution allows us to upload documents in various formats (TXT, PDF, DOCX) and receive concise summaries stored securely in S3 without manual intervention. It’s scalable, cost-effective, and ideal for document-heavy workflows like legal, academic, or business reporting.

We can further enhance it by adding an API Gateway to fetch summaries on demand or integrating DynamoDB for indexing and search.

Thoughts on “Document Summarization with AI on Amazon Bedrock”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Kunal Choudharkar

Kunal Choudharkar is a Senior Technical Consultant at Perficient with 3+ years of experience in cloud & DevOps services. He is also a Certified AWS Solutions Architect Associate, Google Associate Cloud Engineer & AWS AI practitioner. Kunal is willing to share his knowledge through blogging in the future.

More from this Author

Follow Us