Amazon Bedrock – Deep Dive
Amazon Bedrock is a fully managed service for building GenAI apps with foundation models (FMs) from Amazon and leading model providers. It provides:
- Unified APIs (InvokeModel, Converse, streaming)
- Orchestration with Agents for Bedrock
- Retrieval with Knowledge Bases
- Safety with Guardrails
- Fine-tuning/model evaluation
- Enterprise security (IAM, VPC, KMS, PrivateLink)
This guide focuses on practical building blocks, IAM/security, and end-to-end snippets in Node.js and Python.
When to use Bedrock
- You need managed access to multiple FMs (Anthropic Claude, Llama, Mistral, Cohere, Amazon Titan).
- You want production features: RAG, tool use, safety filters, evaluation, observability.
- You need enterprise controls: IAM, VPC endpoints, encryption, data privacy (no training on your data by default).
Core building blocks
Foundation Models (FMs)
- Text/chat: Claude, Llama, Mistral, Titan
- Image: Stability, Titan Image
- Multimodal: Claude 3.x
- Embeddings: Titan, Cohere, others
Choose the modelId per use case (reasoning, latency, token limits, cost).
Runtime APIs
- InvokeModel / InvokeModelWithResponseStream
- Converse / ConverseStream (structured multi-turn, tool use, guardrails, input/output modalities)
Notes:
- Prefer Converse for chat, structured responses, tool use, and guardrails.
- Use streaming for low-latency UX.
Agents for Bedrock
- Define goals and tools (AWS Lambda, API schemas) for orchestration.
- Built-in planning, memory, tool invocation with explainability/traces.
- Invoke via Agent Runtime (InvokeAgent).
Knowledge Bases for Bedrock (RAG)
- Managed retrieval: ingest from S3, chunk, embed, and index in a vector store (serverless).
- Use Retrieve, Query, or RetrieveAndGenerate via Agent Runtime.
- Works standalone or with Agents.
Guardrails
- Configure safety, content filters, topic restrictions, PII redaction, contextual cues.
- Apply in Converse by passing guardrailIdentifier/version.
- Supports audit and tuning of thresholds.
Model customization
- Fine-tune supported models with CreateModelCustomizationJob.
- Manage custom model artifacts, KMS encryption, and S3 data.
Evaluation
- Built-in model evaluation and custom metrics to compare models/prompts.
Common architectures
- RAG chatbot
- User → API → Converse (with guardrails) → Knowledge Base RetrieveAndGenerate → Answer with citations.
- Agentic workflow with tools
- User → InvokeAgent → Agent plans → Calls Lambda/APIs → Optional KB retrieval → Final response + trace.
- Batch inference
- EventBridge/Step Functions → InvokeModel in parallel for documents or tasks.
IAM and security
- Minimize permissions. Separate control plane (bedrock) from runtime (bedrock-runtime, bedrock-agent-runtime).
- Data privacy: Your content isn’t used to train FMs by default.
- Encryption: KMS for S3 and job outputs.
- Networking: Use VPC endpoints (AWS PrivateLink) for Bedrock, S3, and supporting services.
- Logging: CloudWatch (traces for Agents), CloudTrail for API auditing.
Example user policy for runtime + KB:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["bedrock:ListFoundationModels"],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
"bedrock:Converse",
"bedrock:ConverseStream"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"bedrock:ApplyGuardrail",
"bedrock:InvokeAgent",
"bedrock:Retrieve",
"bedrock:Query",
"bedrock:RetrieveAndGenerate"
],
"Resource": "*"
}
]
}
Agent execution role trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "bedrock.amazonaws.com" },
"Action": "sts:AssumeRole"
}
]
}
Grant tools (e.g., Lambda), S3 read for KB sources, and KMS permissions as needed.
Using the Converse API (Node.js)
import {
BedrockRuntimeClient,
ConverseCommand,
ConverseStreamCommand,
} from "@aws-sdk/client-bedrock-runtime";
const client = new BedrockRuntimeClient({ region: "us-east-1" });
export async function chatOnce() {
const res = await client.send(
new ConverseCommand({
modelId: "anthropic.claude-3-sonnet-20240229-v1:0",
messages: [
{
role: "user",
content: [
{ text: "Summarize the benefits of Amazon Bedrock in 3 bullets." },
],
},
],
inferenceConfig: { maxTokens: 400, temperature: 0.2 },
// Optional guardrails
guardrailIdentifier: "gr-xxxxxxxx",
guardrailVersion: "1",
})
);
return res.output?.message?.content?.map((p) => p.text).join("\n");
}
export async function chatStream() {
const stream = await client.send(
new ConverseStreamCommand({
modelId: "anthropic.claude-3-sonnet-20240229-v1:0",
messages: [
{
role: "user",
content: [{ text: "Stream a 2-sentence answer: What is RAG?" }],
},
],
inferenceConfig: { maxTokens: 300, temperature: 0.3 },
})
);
for await (const event of stream.stream) {
if (event?.contentBlockDelta?.delta?.text)
process.stdout.write(event.contentBlockDelta.delta.text);
}
}
Using InvokeModel (Python)
import boto3, json
brt = boto3.client("bedrock-runtime", region_name="us-east-1")
payload = {
"max_tokens": 400,
"temperature": 0.2,
"messages": [
{"role": "user", "content": "Give 3 risks of RAG and how to mitigate them."}
]
}
res = brt.invoke_model(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
body=json.dumps(payload).encode("utf-8"),
contentType="application/json",
accept="application/json"
)
out = json.loads(res["body"].read())
print(out)
Knowledge Bases (RAG) via RetrieveAndGenerate
Node.js:
import {
BedrockAgentRuntimeClient,
RetrieveAndGenerateCommand,
} from "@aws-sdk/client-bedrock-agent-runtime";
const kb = new BedrockAgentRuntimeClient({ region: "us-east-1" });
export async function askKB(knowledgeBaseId: string, query: string) {
const res = await kb.send(
new RetrieveAndGenerateCommand({
input: { text: query },
retrieveAndGenerateConfiguration: {
type: "KNOWLEDGE_BASE",
knowledgeBaseConfiguration: {
knowledgeBaseId,
modelArn:
"arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0",
},
},
})
);
return res.output?.text;
}
Python:
import boto3
kb = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
def ask_kb(knowledge_base_id: str, query: str):
res = kb.retrieve_and_generate(
input={"text": query},
retrieveAndGenerateConfiguration={
"type": "KNOWLEDGE_BASE",
"knowledgeBaseConfiguration": {
"knowledgeBaseId": knowledge_base_id,
"modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
}
}
)
return res.get("output", {}).get("text")
Notes:
- Source data typically in S3; ingestion handles chunking/embeddings/vector index.
- You can bring your own embeddings model if supported.
- Cite sources using returned references where available.
Agents for Bedrock (tool use + planning)
Node.js:
import {
BedrockAgentRuntimeClient,
InvokeAgentCommand,
} from "@aws-sdk/client-bedrock-agent-runtime";
const agentClient = new BedrockAgentRuntimeClient({ region: "us-east-1" });
export async function runAgent(
agentId: string,
agentAliasId: string,
input: string
) {
const res = await agentClient.send(
new InvokeAgentCommand({
agentId,
agentAliasId,
sessionId: crypto.randomUUID(),
inputText: input,
enableTrace: true, // view reasoning/tool calls in CloudWatch
})
);
// Streaming chunks in res.completion is also supported; here we join plain text:
const final = res.completion
?.map((c) =>
c?.chunk?.bytes ? new TextDecoder().decode(c.chunk.bytes) : ""
)
.join("");
return final;
}
Tips:
- Define tools via Lambda functions or OpenAPI/JSON schema.
- Provide clear instructions, API schemas, and guardrail configs for safe tool use.
- Use session memory if you need context persistence.
Guardrails in Converse
import {
BedrockRuntimeClient,
ConverseCommand,
} from "@aws-sdk/client-bedrock-runtime";
const client = new BedrockRuntimeClient({ region: "us-east-1" });
await client.send(
new ConverseCommand({
modelId: "anthropic.claude-3-sonnet-20240229-v1:0",
guardrailIdentifier: "gr-xxxxxxxx",
guardrailVersion: "1",
messages: [
{ role: "user", content: [{ text: "Explain how to perform X." }] },
],
inferenceConfig: { maxTokens: 300, temperature: 0.3 },
})
);
Configure guardrails (PII redaction, topic filters, blocklists, contextual guidance) in the console or API, then reference here.
Fine-tuning (model customization)
import {
BedrockClient,
CreateModelCustomizationJobCommand,
} from "@aws-sdk/client-bedrock";
const bedrock = new BedrockClient({ region: "us-east-1" });
await bedrock.send(
new CreateModelCustomizationJobCommand({
jobName: "my-titan-text-ft",
customModelName: "my-titan-text-ft-01",
baseModelIdentifier: "amazon.titan-text-lite-v1",
roleArn: "arn:aws:iam::123456789012:role/bedrock-customization-role",
trainingDataConfig: { s3Uri: "s3://my-bucket/finetune/train/" },
outputDataConfig: { s3Uri: "s3://my-bucket/finetune/output/" },
hyperParameters: { epochCount: "3", learningRate: "2e-5" },
vpcConfig: { securityGroupIds: ["sg-..."], subnetIds: ["subnet-..."] },
stoppingCriteria: { maxRuntimeInSeconds: 36000 },
encryptionConfig: {
kmsKeyId: "arn:aws:kms:us-east-1:123456789012:key/...",
},
})
);
Networking, privacy, and regions
- Use VPC endpoints (AWS PrivateLink) for bedrock, bedrock-runtime, bedrock-agent-runtime to keep traffic on AWS backbone.
- Encrypt all buckets and outputs (SSE-KMS).
- Bedrock does not retain or use your data to train FMs by default.
- Check model/regional availability before deployment.
Pricing and quotas
- Costs vary by model and feature (tokens, input/output, RAG, agents, fine-tuning). See Amazon Bedrock pricing.
- Watch service quotas: token limits, request rate, job concurrency.
Troubleshooting
- AccessDenied: verify model access, region, and IAM actions (InvokeModel/Converse/InvokeAgent/RetrieveAndGenerate).
- Model not found: wrong modelId/region or not enabled in console.
- KB empty answers: check ingestion status, chunking, embeddings, and query length.
- Agent tool failures: inspect CloudWatch traces/logs; validate Lambda permissions and payload schema.
Last updated on