Skip to Content
Welcome to AI360Xpert
AI BlogsBeadrockAmazon Bedrock – Deep Dive (Agents, Knowledge Bases, Guardrails)

Amazon Bedrock – Deep Dive

Amazon Bedrock is a fully managed service for building GenAI apps with foundation models (FMs) from Amazon and leading model providers. It provides:

  • Unified APIs (InvokeModel, Converse, streaming)
  • Orchestration with Agents for Bedrock
  • Retrieval with Knowledge Bases
  • Safety with Guardrails
  • Fine-tuning/model evaluation
  • Enterprise security (IAM, VPC, KMS, PrivateLink)

This guide focuses on practical building blocks, IAM/security, and end-to-end snippets in Node.js and Python.

When to use Bedrock

  • You need managed access to multiple FMs (Anthropic Claude, Llama, Mistral, Cohere, Amazon Titan).
  • You want production features: RAG, tool use, safety filters, evaluation, observability.
  • You need enterprise controls: IAM, VPC endpoints, encryption, data privacy (no training on your data by default).

Core building blocks

Foundation Models (FMs)

  • Text/chat: Claude, Llama, Mistral, Titan
  • Image: Stability, Titan Image
  • Multimodal: Claude 3.x
  • Embeddings: Titan, Cohere, others

Choose the modelId per use case (reasoning, latency, token limits, cost).

Runtime APIs

  • InvokeModel / InvokeModelWithResponseStream
  • Converse / ConverseStream (structured multi-turn, tool use, guardrails, input/output modalities)

Notes:

  • Prefer Converse for chat, structured responses, tool use, and guardrails.
  • Use streaming for low-latency UX.

Agents for Bedrock

  • Define goals and tools (AWS Lambda, API schemas) for orchestration.
  • Built-in planning, memory, tool invocation with explainability/traces.
  • Invoke via Agent Runtime (InvokeAgent).

Knowledge Bases for Bedrock (RAG)

  • Managed retrieval: ingest from S3, chunk, embed, and index in a vector store (serverless).
  • Use Retrieve, Query, or RetrieveAndGenerate via Agent Runtime.
  • Works standalone or with Agents.

Guardrails

  • Configure safety, content filters, topic restrictions, PII redaction, contextual cues.
  • Apply in Converse by passing guardrailIdentifier/version.
  • Supports audit and tuning of thresholds.

Model customization

  • Fine-tune supported models with CreateModelCustomizationJob.
  • Manage custom model artifacts, KMS encryption, and S3 data.

Evaluation

  • Built-in model evaluation and custom metrics to compare models/prompts.

Common architectures

  • RAG chatbot
    • User → API → Converse (with guardrails) → Knowledge Base RetrieveAndGenerate → Answer with citations.
  • Agentic workflow with tools
    • User → InvokeAgent → Agent plans → Calls Lambda/APIs → Optional KB retrieval → Final response + trace.
  • Batch inference
    • EventBridge/Step Functions → InvokeModel in parallel for documents or tasks.

IAM and security

  • Minimize permissions. Separate control plane (bedrock) from runtime (bedrock-runtime, bedrock-agent-runtime).
  • Data privacy: Your content isn’t used to train FMs by default.
  • Encryption: KMS for S3 and job outputs.
  • Networking: Use VPC endpoints (AWS PrivateLink) for Bedrock, S3, and supporting services.
  • Logging: CloudWatch (traces for Agents), CloudTrail for API auditing.

Example user policy for runtime + KB:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["bedrock:ListFoundationModels"], "Resource": "*" }, { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream", "bedrock:Converse", "bedrock:ConverseStream" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "bedrock:ApplyGuardrail", "bedrock:InvokeAgent", "bedrock:Retrieve", "bedrock:Query", "bedrock:RetrieveAndGenerate" ], "Resource": "*" } ] }

Agent execution role trust policy:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "bedrock.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }

Grant tools (e.g., Lambda), S3 read for KB sources, and KMS permissions as needed.


Using the Converse API (Node.js)

import { BedrockRuntimeClient, ConverseCommand, ConverseStreamCommand, } from "@aws-sdk/client-bedrock-runtime"; const client = new BedrockRuntimeClient({ region: "us-east-1" }); export async function chatOnce() { const res = await client.send( new ConverseCommand({ modelId: "anthropic.claude-3-sonnet-20240229-v1:0", messages: [ { role: "user", content: [ { text: "Summarize the benefits of Amazon Bedrock in 3 bullets." }, ], }, ], inferenceConfig: { maxTokens: 400, temperature: 0.2 }, // Optional guardrails guardrailIdentifier: "gr-xxxxxxxx", guardrailVersion: "1", }) ); return res.output?.message?.content?.map((p) => p.text).join("\n"); } export async function chatStream() { const stream = await client.send( new ConverseStreamCommand({ modelId: "anthropic.claude-3-sonnet-20240229-v1:0", messages: [ { role: "user", content: [{ text: "Stream a 2-sentence answer: What is RAG?" }], }, ], inferenceConfig: { maxTokens: 300, temperature: 0.3 }, }) ); for await (const event of stream.stream) { if (event?.contentBlockDelta?.delta?.text) process.stdout.write(event.contentBlockDelta.delta.text); } }

Using InvokeModel (Python)

import boto3, json brt = boto3.client("bedrock-runtime", region_name="us-east-1") payload = { "max_tokens": 400, "temperature": 0.2, "messages": [ {"role": "user", "content": "Give 3 risks of RAG and how to mitigate them."} ] } res = brt.invoke_model( modelId="anthropic.claude-3-sonnet-20240229-v1:0", body=json.dumps(payload).encode("utf-8"), contentType="application/json", accept="application/json" ) out = json.loads(res["body"].read()) print(out)

Knowledge Bases (RAG) via RetrieveAndGenerate

Node.js:

import { BedrockAgentRuntimeClient, RetrieveAndGenerateCommand, } from "@aws-sdk/client-bedrock-agent-runtime"; const kb = new BedrockAgentRuntimeClient({ region: "us-east-1" }); export async function askKB(knowledgeBaseId: string, query: string) { const res = await kb.send( new RetrieveAndGenerateCommand({ input: { text: query }, retrieveAndGenerateConfiguration: { type: "KNOWLEDGE_BASE", knowledgeBaseConfiguration: { knowledgeBaseId, modelArn: "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0", }, }, }) ); return res.output?.text; }

Python:

import boto3 kb = boto3.client("bedrock-agent-runtime", region_name="us-east-1") def ask_kb(knowledge_base_id: str, query: str): res = kb.retrieve_and_generate( input={"text": query}, retrieveAndGenerateConfiguration={ "type": "KNOWLEDGE_BASE", "knowledgeBaseConfiguration": { "knowledgeBaseId": knowledge_base_id, "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0" } } ) return res.get("output", {}).get("text")

Notes:

  • Source data typically in S3; ingestion handles chunking/embeddings/vector index.
  • You can bring your own embeddings model if supported.
  • Cite sources using returned references where available.

Agents for Bedrock (tool use + planning)

Node.js:

import { BedrockAgentRuntimeClient, InvokeAgentCommand, } from "@aws-sdk/client-bedrock-agent-runtime"; const agentClient = new BedrockAgentRuntimeClient({ region: "us-east-1" }); export async function runAgent( agentId: string, agentAliasId: string, input: string ) { const res = await agentClient.send( new InvokeAgentCommand({ agentId, agentAliasId, sessionId: crypto.randomUUID(), inputText: input, enableTrace: true, // view reasoning/tool calls in CloudWatch }) ); // Streaming chunks in res.completion is also supported; here we join plain text: const final = res.completion ?.map((c) => c?.chunk?.bytes ? new TextDecoder().decode(c.chunk.bytes) : "" ) .join(""); return final; }

Tips:

  • Define tools via Lambda functions or OpenAPI/JSON schema.
  • Provide clear instructions, API schemas, and guardrail configs for safe tool use.
  • Use session memory if you need context persistence.

Guardrails in Converse

import { BedrockRuntimeClient, ConverseCommand, } from "@aws-sdk/client-bedrock-runtime"; const client = new BedrockRuntimeClient({ region: "us-east-1" }); await client.send( new ConverseCommand({ modelId: "anthropic.claude-3-sonnet-20240229-v1:0", guardrailIdentifier: "gr-xxxxxxxx", guardrailVersion: "1", messages: [ { role: "user", content: [{ text: "Explain how to perform X." }] }, ], inferenceConfig: { maxTokens: 300, temperature: 0.3 }, }) );

Configure guardrails (PII redaction, topic filters, blocklists, contextual guidance) in the console or API, then reference here.


Fine-tuning (model customization)

import { BedrockClient, CreateModelCustomizationJobCommand, } from "@aws-sdk/client-bedrock"; const bedrock = new BedrockClient({ region: "us-east-1" }); await bedrock.send( new CreateModelCustomizationJobCommand({ jobName: "my-titan-text-ft", customModelName: "my-titan-text-ft-01", baseModelIdentifier: "amazon.titan-text-lite-v1", roleArn: "arn:aws:iam::123456789012:role/bedrock-customization-role", trainingDataConfig: { s3Uri: "s3://my-bucket/finetune/train/" }, outputDataConfig: { s3Uri: "s3://my-bucket/finetune/output/" }, hyperParameters: { epochCount: "3", learningRate: "2e-5" }, vpcConfig: { securityGroupIds: ["sg-..."], subnetIds: ["subnet-..."] }, stoppingCriteria: { maxRuntimeInSeconds: 36000 }, encryptionConfig: { kmsKeyId: "arn:aws:kms:us-east-1:123456789012:key/...", }, }) );

Networking, privacy, and regions

  • Use VPC endpoints (AWS PrivateLink) for bedrock, bedrock-runtime, bedrock-agent-runtime to keep traffic on AWS backbone.
  • Encrypt all buckets and outputs (SSE-KMS).
  • Bedrock does not retain or use your data to train FMs by default.
  • Check model/regional availability before deployment.

Pricing and quotas

  • Costs vary by model and feature (tokens, input/output, RAG, agents, fine-tuning). See Amazon Bedrock pricing.
  • Watch service quotas: token limits, request rate, job concurrency.

Troubleshooting

  • AccessDenied: verify model access, region, and IAM actions (InvokeModel/Converse/InvokeAgent/RetrieveAndGenerate).
  • Model not found: wrong modelId/region or not enabled in console.
  • KB empty answers: check ingestion status, chunking, embeddings, and query length.
  • Agent tool failures: inspect CloudWatch traces/logs; validate Lambda permissions and payload schema.
Last updated on