Amazon Bedrock – Deep Dive

Amazon Bedrock is a fully managed service for building GenAI apps with foundation models (FMs) from Amazon and leading model providers. It provides:

Unified APIs (InvokeModel, Converse, streaming)
Orchestration with Agents for Bedrock
Retrieval with Knowledge Bases
Safety with Guardrails
Fine-tuning/model evaluation
Enterprise security (IAM, VPC, KMS, PrivateLink)

This guide focuses on practical building blocks, IAM/security, and end-to-end snippets in Node.js and Python.

When to use Bedrock

You need managed access to multiple FMs (Anthropic Claude, Llama, Mistral, Cohere, Amazon Titan).
You want production features: RAG, tool use, safety filters, evaluation, observability.
You need enterprise controls: IAM, VPC endpoints, encryption, data privacy (no training on your data by default).

Core building blocks

Foundation Models (FMs)

Text/chat: Claude, Llama, Mistral, Titan
Image: Stability, Titan Image
Multimodal: Claude 3.x
Embeddings: Titan, Cohere, others

Choose the modelId per use case (reasoning, latency, token limits, cost).

Runtime APIs

InvokeModel / InvokeModelWithResponseStream
Converse / ConverseStream (structured multi-turn, tool use, guardrails, input/output modalities)

Notes:

Prefer Converse for chat, structured responses, tool use, and guardrails.
Use streaming for low-latency UX.

Agents for Bedrock

Define goals and tools (AWS Lambda, API schemas) for orchestration.
Built-in planning, memory, tool invocation with explainability/traces.
Invoke via Agent Runtime (InvokeAgent).

Knowledge Bases for Bedrock (RAG)

Managed retrieval: ingest from S3, chunk, embed, and index in a vector store (serverless).
Use Retrieve, Query, or RetrieveAndGenerate via Agent Runtime.
Works standalone or with Agents.

Guardrails

Configure safety, content filters, topic restrictions, PII redaction, contextual cues.
Apply in Converse by passing guardrailIdentifier/version.
Supports audit and tuning of thresholds.

Model customization

Fine-tune supported models with CreateModelCustomizationJob.
Manage custom model artifacts, KMS encryption, and S3 data.

Evaluation

Built-in model evaluation and custom metrics to compare models/prompts.

Common architectures

RAG chatbot
- User → API → Converse (with guardrails) → Knowledge Base RetrieveAndGenerate → Answer with citations.
Agentic workflow with tools
- User → InvokeAgent → Agent plans → Calls Lambda/APIs → Optional KB retrieval → Final response + trace.
Batch inference
- EventBridge/Step Functions → InvokeModel in parallel for documents or tasks.

IAM and security

Minimize permissions. Separate control plane (bedrock) from runtime (bedrock-runtime, bedrock-agent-runtime).
Data privacy: Your content isn’t used to train FMs by default.
Encryption: KMS for S3 and job outputs.
Networking: Use VPC endpoints (AWS PrivateLink) for Bedrock, S3, and supporting services.
Logging: CloudWatch (traces for Agents), CloudTrail for API auditing.

Example user policy for runtime + KB:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["bedrock:ListFoundationModels"],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream",
        "bedrock:Converse",
        "bedrock:ConverseStream"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:ApplyGuardrail",
        "bedrock:InvokeAgent",
        "bedrock:Retrieve",
        "bedrock:Query",
        "bedrock:RetrieveAndGenerate"
      ],
      "Resource": "*"
    }
  ]
}

Agent execution role trust policy:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "Service": "bedrock.amazonaws.com" },
      "Action": "sts:AssumeRole"
    }
  ]
}

Grant tools (e.g., Lambda), S3 read for KB sources, and KMS permissions as needed.

Using the Converse API (Node.js)


import {
  BedrockRuntimeClient,
  ConverseCommand,
  ConverseStreamCommand,
} from "@aws-sdk/client-bedrock-runtime";
 
const client = new BedrockRuntimeClient({ region: "us-east-1" });
 
export async function chatOnce() {
  const res = await client.send(
    new ConverseCommand({
      modelId: "anthropic.claude-3-sonnet-20240229-v1:0",
      messages: [
        {
          role: "user",
          content: [
            { text: "Summarize the benefits of Amazon Bedrock in 3 bullets." },
          ],
        },
      ],
      inferenceConfig: { maxTokens: 400, temperature: 0.2 },
      // Optional guardrails
      guardrailIdentifier: "gr-xxxxxxxx",
      guardrailVersion: "1",
    })
  );
  return res.output?.message?.content?.map((p) => p.text).join("\n");
}
 
export async function chatStream() {
  const stream = await client.send(
    new ConverseStreamCommand({
      modelId: "anthropic.claude-3-sonnet-20240229-v1:0",
      messages: [
        {
          role: "user",
          content: [{ text: "Stream a 2-sentence answer: What is RAG?" }],
        },
      ],
      inferenceConfig: { maxTokens: 300, temperature: 0.3 },
    })
  );
 
  for await (const event of stream.stream) {
    if (event?.contentBlockDelta?.delta?.text)
      process.stdout.write(event.contentBlockDelta.delta.text);
  }
}

Using InvokeModel (Python)


import boto3, json
 
brt = boto3.client("bedrock-runtime", region_name="us-east-1")
 
payload = {
  "max_tokens": 400,
  "temperature": 0.2,
  "messages": [
    {"role": "user", "content": "Give 3 risks of RAG and how to mitigate them."}
  ]
}
 
res = brt.invoke_model(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    body=json.dumps(payload).encode("utf-8"),
    contentType="application/json",
    accept="application/json"
)
 
out = json.loads(res["body"].read())
print(out)

Knowledge Bases (RAG) via RetrieveAndGenerate

Node.js:


import {
  BedrockAgentRuntimeClient,
  RetrieveAndGenerateCommand,
} from "@aws-sdk/client-bedrock-agent-runtime";
 
const kb = new BedrockAgentRuntimeClient({ region: "us-east-1" });
 
export async function askKB(knowledgeBaseId: string, query: string) {
  const res = await kb.send(
    new RetrieveAndGenerateCommand({
      input: { text: query },
      retrieveAndGenerateConfiguration: {
        type: "KNOWLEDGE_BASE",
        knowledgeBaseConfiguration: {
          knowledgeBaseId,
          modelArn:
            "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0",
        },
      },
    })
  );
  return res.output?.text;
}

Python:


import boto3
 
kb = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
 
def ask_kb(knowledge_base_id: str, query: str):
    res = kb.retrieve_and_generate(
        input={"text": query},
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                "knowledgeBaseId": knowledge_base_id,
                "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
            }
        }
    )
    return res.get("output", {}).get("text")

Notes:

Source data typically in S3; ingestion handles chunking/embeddings/vector index.
You can bring your own embeddings model if supported.
Cite sources using returned references where available.

Agents for Bedrock (tool use + planning)

Node.js:


import {
  BedrockAgentRuntimeClient,
  InvokeAgentCommand,
} from "@aws-sdk/client-bedrock-agent-runtime";
 
const agentClient = new BedrockAgentRuntimeClient({ region: "us-east-1" });
 
export async function runAgent(
  agentId: string,
  agentAliasId: string,
  input: string
) {
  const res = await agentClient.send(
    new InvokeAgentCommand({
      agentId,
      agentAliasId,
      sessionId: crypto.randomUUID(),
      inputText: input,
      enableTrace: true, // view reasoning/tool calls in CloudWatch
    })
  );
 
  // Streaming chunks in res.completion is also supported; here we join plain text:
  const final = res.completion
    ?.map((c) =>
      c?.chunk?.bytes ? new TextDecoder().decode(c.chunk.bytes) : ""
    )
    .join("");
  return final;
}

Tips:

Define tools via Lambda functions or OpenAPI/JSON schema.
Provide clear instructions, API schemas, and guardrail configs for safe tool use.
Use session memory if you need context persistence.

Guardrails in Converse


import {
  BedrockRuntimeClient,
  ConverseCommand,
} from "@aws-sdk/client-bedrock-runtime";
const client = new BedrockRuntimeClient({ region: "us-east-1" });
 
await client.send(
  new ConverseCommand({
    modelId: "anthropic.claude-3-sonnet-20240229-v1:0",
    guardrailIdentifier: "gr-xxxxxxxx",
    guardrailVersion: "1",
    messages: [
      { role: "user", content: [{ text: "Explain how to perform X." }] },
    ],
    inferenceConfig: { maxTokens: 300, temperature: 0.3 },
  })
);

Configure guardrails (PII redaction, topic filters, blocklists, contextual guidance) in the console or API, then reference here.

Fine-tuning (model customization)


import {
  BedrockClient,
  CreateModelCustomizationJobCommand,
} from "@aws-sdk/client-bedrock";
 
const bedrock = new BedrockClient({ region: "us-east-1" });
 
await bedrock.send(
  new CreateModelCustomizationJobCommand({
    jobName: "my-titan-text-ft",
    customModelName: "my-titan-text-ft-01",
    baseModelIdentifier: "amazon.titan-text-lite-v1",
    roleArn: "arn:aws:iam::123456789012:role/bedrock-customization-role",
    trainingDataConfig: { s3Uri: "s3://my-bucket/finetune/train/" },
    outputDataConfig: { s3Uri: "s3://my-bucket/finetune/output/" },
    hyperParameters: { epochCount: "3", learningRate: "2e-5" },
    vpcConfig: { securityGroupIds: ["sg-..."], subnetIds: ["subnet-..."] },
    stoppingCriteria: { maxRuntimeInSeconds: 36000 },
    encryptionConfig: {
      kmsKeyId: "arn:aws:kms:us-east-1:123456789012:key/...",
    },
  })
);

Networking, privacy, and regions

Use VPC endpoints (AWS PrivateLink) for bedrock, bedrock-runtime, bedrock-agent-runtime to keep traffic on AWS backbone.
Encrypt all buckets and outputs (SSE-KMS).
Bedrock does not retain or use your data to train FMs by default.
Check model/regional availability before deployment.

Pricing and quotas

Costs vary by model and feature (tokens, input/output, RAG, agents, fine-tuning). See Amazon Bedrock pricing.
Watch service quotas: token limits, request rate, job concurrency.

Troubleshooting

AccessDenied: verify model access, region, and IAM actions (InvokeModel/Converse/InvokeAgent/RetrieveAndGenerate).
Model not found: wrong modelId/region or not enabled in console.
KB empty answers: check ingestion status, chunking, embeddings, and query length.
Agent tool failures: inspect CloudWatch traces/logs; validate Lambda permissions and payload schema.