TL;DR — Quick Summary
Master Temporal for durable workflow orchestration in microservices. Covers architecture, installation, SDKs, saga pattern, and order processing example.
Distributed systems fail in partial ways — a payment service times out, a shipment record writes but confirmation never arrives, and now your data is inconsistent across three databases with no way back. Temporal solves this class of problem by making workflows durable by default, using event sourcing to survive crashes, replays to recover state, and structured retry policies to handle transient failures. This guide covers the full Temporal stack: server architecture, installation options, workflow and activity primitives across Go, TypeScript, and Python, advanced patterns including saga and human-in-the-loop, and a complete order processing example.
Prerequisites
- Docker and Docker Compose installed (for local Temporal Server)
- Go 1.22+, Node.js 22+, or Python 3.11+ depending on your chosen SDK
- Basic familiarity with microservices and distributed systems concepts
- Understanding of async programming patterns (promises, goroutines, or asyncio)
Temporal Architecture
Temporal Server is composed of four internal services that can run as a single binary or independently for scale:
Frontend — the externally-facing gRPC and HTTP gateway. Clients and Workers connect here to start workflows, send signals, run queries, and poll task queues.
History — the core of Temporal. Persists every workflow event to the database and drives workflow execution by replaying event history. Each workflow execution is managed by a single History shard, providing strong ordering guarantees.
Matching — manages task queues. When the History service needs a workflow or activity task executed, it pushes the task to Matching, which holds it until a Worker polls. This pull model means Workers are never overwhelmed.
Internal Worker — runs Temporal’s own system workflows for namespace management, archival, and replication. Not user-facing.
Workers are your application processes — they contain your workflow and activity code. Workers poll named task queues from the Temporal Server, execute the work locally, and return results. Workers are stateless and horizontally scalable. Your workflow code runs inside the Worker, not inside Temporal Server.
Event Sourcing and Replay: Every workflow maintains a complete ordered event history in the database. If a Worker crashes mid-workflow, a new Worker picks up the task, replays the event history to reconstruct the exact in-memory state (all variables, timers, awaited values), and continues execution from the last durable checkpoint. This is what makes Temporal durable without requiring workflow state to live in a database you manage.
Installation
Docker Compose (local development)
git clone https://github.com/temporalio/docker-compose.git
cd docker-compose
docker compose up
This starts:
temporal— Temporal Server on port 7233 (gRPC)temporal-ui— Web UI on port 8080temporal-admin-tools— container withtctlCLI pre-installedpostgresql— persistence backend
# Access tctl via the admin-tools container
docker exec -it temporal-admin-tools tctl namespace register --retention 7 default
# List running workflows
docker exec -it temporal-admin-tools tctl workflow list
Kubernetes with Helm
helm repo add temporal https://go.temporal.io/helm-charts
helm repo update
helm install temporal temporal/temporal \
--set server.replicaCount=3 \
--set cassandra.config.cluster_size=3 \
--set elasticsearch.enabled=true \
--namespace temporal \
--create-namespace
Temporal Cloud (managed)
Temporal Cloud eliminates operational overhead. You get a namespace endpoint, mTLS certificates, and usage-based billing. Connect via the SDK with your endpoint and certificates:
# Set in your Worker configuration
TEMPORAL_ADDRESS=<namespace>.tmprl.cloud:7233
TEMPORAL_TLS_CERT=path/to/client.pem
TEMPORAL_TLS_KEY=path/to/client.key
Core Concepts
Workflow — A deterministic function that orchestrates activities, timers, signals, and child workflows. Must be deterministic: no random numbers, no direct system calls, no accessing mutable global state. Temporal guarantees workflows survive any failure.
Activity — A function that performs non-deterministic side effects: HTTP calls, database writes, file I/O, sending emails. Activities run in Workers, have configurable retry policies, and report heartbeats for long-running operations.
Signal — An external event sent to a running workflow. Signals allow external systems to push data into a running workflow (e.g., “payment approved”, “user cancelled order”).
Query — A synchronous read of a workflow’s current state without affecting execution. Queries are answered by the Worker executing the workflow.
Task Queue — A named channel through which Temporal Server dispatches work to Workers. Workers register to poll specific task queues. This decouples which Workers handle which work.
Namespace — An isolation boundary for workflows. Each namespace has independent retention settings, security policies, and search attribute schemas.
Writing Workflows in Go
package order
import (
"time"
"go.temporal.io/sdk/workflow"
"go.temporal.io/sdk/activity"
)
// Activity retry policy — applied per activity call
var defaultRetryPolicy = &temporal.RetryPolicy{
InitialInterval: time.Second,
BackoffCoefficient: 2.0,
MaximumInterval: 30 * time.Second,
MaximumAttempts: 5,
NonRetryableErrorTypes: []string{"InvalidOrderError"},
}
// Workflow function — must be deterministic
func OrderWorkflow(ctx workflow.Context, order OrderInput) (OrderResult, error) {
ao := workflow.ActivityOptions{
StartToCloseTimeout: 30 * time.Second,
RetryPolicy: defaultRetryPolicy,
}
ctx = workflow.WithActivityOptions(ctx, ao)
// Execute activities sequentially
var paymentResult PaymentResult
err := workflow.ExecuteActivity(ctx, ValidatePayment, order).Get(ctx, &paymentResult)
if err != nil {
return OrderResult{}, err
}
var inventoryResult InventoryResult
err = workflow.ExecuteActivity(ctx, ReserveInventory, order).Get(ctx, &inventoryResult)
if err != nil {
// Compensate: refund payment
workflow.ExecuteActivity(ctx, RefundPayment, paymentResult).Get(ctx, nil)
return OrderResult{}, err
}
// Wait for shipping signal with timeout
signalChan := workflow.GetSignalChannel(ctx, "shipping-update")
var shippingInfo ShippingInfo
selector := workflow.NewSelector(ctx)
selector.AddReceive(signalChan, func(c workflow.ReceiveChannel, _ bool) {
c.Receive(ctx, &shippingInfo)
})
// Timer: wait up to 24 hours for shipping confirmation
timerFuture := workflow.NewTimer(ctx, 24*time.Hour)
selector.AddFuture(timerFuture, func(f workflow.Future) {
shippingInfo.Status = "timeout"
})
selector.Select(ctx)
// Sleep is durable — survives Worker restarts
workflow.Sleep(ctx, 7*24*time.Hour) // Wait 7 days
workflow.ExecuteActivity(ctx, SendDeliveryConfirmation, order, shippingInfo).Get(ctx, nil)
return OrderResult{OrderID: order.ID, Status: "completed"}, nil
}
// Activity function — performs the actual work
func ValidatePayment(ctx context.Context, order OrderInput) (PaymentResult, error) {
// Heartbeat for long-running activities
activity.RecordHeartbeat(ctx, "validating payment")
// Call payment provider API — non-deterministic, OK in activity
return callPaymentAPI(order)
}
Workflow Versioning with GetVersion
When deploying changes to a live workflow, use workflow.GetVersion to safely branch behavior:
func OrderWorkflow(ctx workflow.Context, order OrderInput) (OrderResult, error) {
// Returns DefaultVersion (−1) for existing executions, 1 for new ones
v := workflow.GetVersion(ctx, "add-fraud-check", workflow.DefaultVersion, 1)
if v >= 1 {
// New code path for new executions
workflow.ExecuteActivity(ctx, FraudCheck, order).Get(ctx, nil)
}
// ... rest of workflow
}
Writing Workflows in TypeScript
import { proxyActivities, sleep, setHandler, defineSignal, defineQuery,
condition, CancellationScope } from '@temporalio/workflow';
import type { Activities } from './activities';
const { validatePayment, reserveInventory, refundPayment,
sendDeliveryConfirmation } = proxyActivities<Activities>({
startToCloseTimeout: '30 seconds',
retry: {
initialInterval: '1s',
backoffCoefficient: 2,
maximumInterval: '30s',
maximumAttempts: 5,
nonRetryableErrorTypes: ['InvalidOrderError'],
},
});
const shippingSignal = defineSignal<[ShippingInfo]>('shipping-update');
const orderStatusQuery = defineQuery<string>('order-status');
export async function orderWorkflow(order: OrderInput): Promise<OrderResult> {
let currentStatus = 'processing';
// Register signal handler
setHandler(shippingSignal, (info: ShippingInfo) => {
currentStatus = `shipped:${info.trackingId}`;
});
// Register query handler — reads state synchronously
setHandler(orderStatusQuery, () => currentStatus);
const payment = await validatePayment(order);
let inventory;
try {
inventory = await reserveInventory(order);
} catch (err) {
await refundPayment(payment);
throw err;
}
// Wait for signal OR timeout using condition
const signalReceived = await condition(
() => currentStatus.startsWith('shipped:'),
'24 hours'
);
if (!signalReceived) {
currentStatus = 'shipping-timeout';
}
await sleep('7 days'); // Durable sleep — survives crashes
await sendDeliveryConfirmation(order, currentStatus);
return { orderId: order.id, status: 'completed' };
}
Activity with Cancellation Scope
import { CancellationScope, isCancellation } from '@temporalio/workflow';
export async function processWithTimeout(input: Input): Promise<void> {
try {
await CancellationScope.withTimeout('5 minutes', async () => {
await longRunningActivity(input);
});
} catch (err) {
if (isCancellation(err)) {
await compensateActivity(input);
}
throw err;
}
}
Writing Workflows in Python
from datetime import timedelta
from temporalio import workflow, activity
from temporalio.common import RetryPolicy
@workflow.defn
class OrderWorkflow:
def __init__(self) -> None:
self._status = "processing"
self._shipping_info: ShippingInfo | None = None
@workflow.run
async def run(self, order: OrderInput) -> OrderResult:
retry_policy = RetryPolicy(
initial_interval=timedelta(seconds=1),
backoff_coefficient=2.0,
maximum_interval=timedelta(seconds=30),
maximum_attempts=5,
non_retryable_error_types=["InvalidOrderError"],
)
payment = await workflow.execute_activity(
validate_payment,
order,
start_to_close_timeout=timedelta(seconds=30),
retry_policy=retry_policy,
)
try:
inventory = await workflow.execute_activity(
reserve_inventory,
order,
start_to_close_timeout=timedelta(seconds=30),
retry_policy=retry_policy,
)
except Exception:
await workflow.execute_activity(refund_payment, payment,
start_to_close_timeout=timedelta(seconds=30))
raise
# Wait for signal with timeout
await workflow.wait_condition(
lambda: self._shipping_info is not None,
timeout=timedelta(hours=24),
)
await workflow.sleep(timedelta(days=7))
await workflow.execute_activity(send_delivery_confirmation, order,
start_to_close_timeout=timedelta(seconds=30))
return OrderResult(order_id=order.id, status="completed")
@workflow.signal
def shipping_update(self, info: ShippingInfo) -> None:
self._shipping_info = info
@workflow.query
def order_status(self) -> str:
return self._status
@activity.defn
async def validate_payment(order: OrderInput) -> PaymentResult:
activity.heartbeat("validating payment with provider")
return await call_payment_api(order)
Activity Patterns
Heartbeating for Long Activities
Activities must heartbeat to tell Temporal they are still alive. If a Worker crashes, the heartbeat timeout triggers re-scheduling on another Worker:
func ProcessLargeFile(ctx context.Context, fileURL string) error {
for i, chunk := range chunks {
// Heartbeat with progress — also provides cancellation detection
activity.RecordHeartbeat(ctx, fmt.Sprintf("chunk %d/%d", i+1, len(chunks)))
// Check if workflow was cancelled
if ctx.Err() != nil {
return ctx.Err()
}
processChunk(chunk)
}
return nil
}
Local Activities
Local Activities run in the same Worker process as the Workflow, without round-tripping to Temporal Server. Use them for fast, low-latency operations (under a second) that still need retries:
lao := workflow.LocalActivityOptions{
StartToCloseTimeout: 5 * time.Second,
}
ctx = workflow.WithLocalActivityOptions(ctx, lao)
workflow.ExecuteLocalActivity(ctx, FormatOrderID, order).Get(ctx, &formattedID)
Workflow Patterns
Saga Pattern for Distributed Transactions
The Saga pattern models distributed transactions as a sequence of activities with compensating actions:
func OrderSagaWorkflow(ctx workflow.Context, order OrderInput) error {
var compensations []func(workflow.Context) error
ao := workflow.ActivityOptions{StartToCloseTimeout: 30 * time.Second}
ctx = workflow.WithActivityOptions(ctx, ao)
// Step 1: charge payment
var payment PaymentResult
if err := workflow.ExecuteActivity(ctx, ChargePayment, order).Get(ctx, &payment); err != nil {
return err
}
compensations = append(compensations, func(ctx workflow.Context) error {
return workflow.ExecuteActivity(ctx, RefundPayment, payment).Get(ctx, nil)
})
// Step 2: reserve inventory
var reservation InventoryReservation
if err := workflow.ExecuteActivity(ctx, ReserveInventory, order).Get(ctx, &reservation); err != nil {
// Run compensations in reverse
for i := len(compensations) - 1; i >= 0; i-- {
compensations[i](ctx) // Best-effort compensation
}
return err
}
compensations = append(compensations, func(ctx workflow.Context) error {
return workflow.ExecuteActivity(ctx, ReleaseInventory, reservation).Get(ctx, nil)
})
// Step 3: create shipment
if err := workflow.ExecuteActivity(ctx, CreateShipment, order, reservation).Get(ctx, nil); err != nil {
for i := len(compensations) - 1; i >= 0; i-- {
compensations[i](ctx)
}
return err
}
return nil
}
Scheduled Workflows with CronSchedule
// Start a workflow on a cron schedule
c.ExecuteWorkflow(ctx,
client.StartWorkflowOptions{
ID: "daily-report",
TaskQueue: "reporting",
CronSchedule: "0 9 * * MON-FRI", // Weekdays at 9am UTC
},
DailyReportWorkflow,
ReportInput{ReportType: "sales"},
)
Child Workflows
func ParentWorkflow(ctx workflow.Context, orders []OrderInput) error {
childCtx := workflow.WithChildOptions(ctx, workflow.ChildWorkflowOptions{
TaskQueue: "order-processing",
})
// Launch child workflows in parallel
futures := make([]workflow.Future, len(orders))
for i, order := range orders {
futures[i] = workflow.ExecuteChildWorkflow(childCtx, OrderWorkflow, order)
}
// Wait for all
for _, f := range futures {
if err := f.Get(ctx, nil); err != nil {
return err
}
}
return nil
}
Namespaces and Visibility
# Create a namespace with 30-day retention
tctl namespace register \
--retention 30 \
--description "Production order processing" \
production-orders
# Add custom search attributes for filtering
tctl admin cluster add-search-attributes \
--name OrderStatus --type Text \
--name CustomerTier --type Keyword \
--name OrderAmount --type Double
# List workflows with advanced filter
tctl workflow list \
--query 'OrderStatus="pending" AND CustomerTier="premium" ORDER BY StartTime DESC'
Custom search attributes let you filter and sort workflow executions by your domain-specific fields in the Temporal UI and via tctl. For advanced visibility (full text search across millions of executions), enable Elasticsearch in your Temporal deployment.
Temporal UI
The Temporal UI at localhost:8080 provides:
- Workflow List — searchable table of all executions with status, start time, and task queue
- Execution Detail — full event history showing every state transition with timestamps and payloads
- Stack Trace — shows what code the workflow is currently blocked on (which Activity, which sleep, which signal wait)
- Pending Activities — lists activities scheduled but not yet started, useful for debugging worker connectivity
Tool Comparison
| Feature | Temporal | Apache Airflow | AWS Step Functions | Prefect | Inngest | Conductor |
|---|---|---|---|---|---|---|
| Primary use | Durable microservice workflows | Data pipeline DAGs | Serverless state machines | Data workflow orchestration | Event-driven functions | Microservice orchestration |
| Execution model | Long-running durable | DAG batch runs | Managed serverless | Flow runs | Serverless steps | Workflow engine |
| Code language | Go, Java, TS, Python, .NET | Python DAGs | JSON/YAML DSL | Python | TypeScript | JSON/Java |
| Replay/Durability | Full event sourcing replay | None | Managed by AWS | Checkpoint-based | Limited | Limited |
| Signals/Queries | Yes — native | No | Callbacks only | No | Events only | Signals |
| Local dev | Docker Compose | Docker Compose | Requires AWS | Local server | Dev server | Docker |
| Managed cloud | Temporal Cloud | MWAA | Native | Prefect Cloud | Yes | Conductor Cloud |
| Best for | Complex, long-lived workflows | ETL pipelines | Simple AWS workflows | ML/data pipelines | Serverless event chains | Microservice choreography |
Practical Example: Order Processing Saga
This complete example shows a production-ready order workflow with payment, inventory, shipping, and compensation:
// workflows/order-saga.ts
import { proxyActivities, sleep, setHandler, defineSignal,
defineQuery, condition } from '@temporalio/workflow';
const { chargePayment, refundPayment, reserveInventory, releaseInventory,
createShipment, sendConfirmationEmail } = proxyActivities<Activities>({
startToCloseTimeout: '60 seconds',
retry: { maximumAttempts: 3, initialInterval: '2s', backoffCoefficient: 2 },
});
const cancelSignal = defineSignal('cancel-order');
const statusQuery = defineQuery<string>('status');
export async function orderSagaWorkflow(order: OrderInput): Promise<OrderResult> {
let status = 'received';
let cancelled = false;
setHandler(cancelSignal, () => { cancelled = true; });
setHandler(statusQuery, () => status);
// Charge payment
status = 'charging';
const payment = await chargePayment(order);
if (cancelled) {
await refundPayment(payment);
return { orderId: order.id, status: 'cancelled' };
}
// Reserve inventory
status = 'reserving';
let inventory;
try {
inventory = await reserveInventory(order);
} catch (err) {
await refundPayment(payment);
throw err;
}
// Create shipment
status = 'shipping';
try {
await createShipment(order, inventory);
} catch (err) {
await releaseInventory(inventory);
await refundPayment(payment);
throw err;
}
// Wait up to 30 days for delivery confirmation signal
status = 'awaiting-delivery';
const delivered = await condition(() => status === 'delivered', '30 days');
if (!delivered) {
status = 'delivery-timeout';
}
await sendConfirmationEmail(order, status);
return { orderId: order.id, status };
}
# Start the workflow
tctl workflow start \
--taskqueue order-processing \
--workflow_type orderSagaWorkflow \
--workflow_id "order-12345" \
--input '{"id":"12345","items":[{"sku":"PROD-001","qty":2}]}'
# Query current status
tctl workflow query --workflow_id order-12345 --query_type status
# Send delivery confirmation signal
tctl workflow signal \
--workflow_id order-12345 \
--name delivered \
--input '{"deliveredAt":"2026-03-23T14:00:00Z"}'
Gotchas and Common Mistakes
Non-determinism bugs are the most common issue in Temporal. Any code that produces different results on replay will corrupt workflow state. Never use time.Now(), rand, UUIDs, or direct API calls inside workflow functions — always use workflow.Now() and workflow.GetVersion() instead.
Missing heartbeats on long activities cause the activity to be rescheduled even though it is still running, creating duplicate executions. Always heartbeat in loops and check ctx.Err() after each heartbeat.
Unbounded event history accumulates when a workflow runs indefinitely without checkpointing. Use Continue-As-New for polling loops and long-running processes that accumulate many events.
Task queue mismatch — Workers and workflow starts must use the same task queue name. A typo means the workflow task sits in the queue forever with no worker to pick it up.
Summary
- Temporal’s event sourcing model makes workflows durable by default — crashes, deploys, and network partitions do not lose workflow state
- Workers poll task queues — the pull model means Workers are never overwhelmed and scale independently from the server
- Activities handle all non-deterministic side effects with configurable retry policies, heartbeats, and timeout controls
- Signals and Queries let external systems interact with running workflows without polling your database
- The Saga pattern with compensating Activities is the Temporal-native approach to distributed transactions
- GetVersion enables safe rolling deploys without breaking in-flight workflow executions
- Use Temporal Cloud for production to eliminate server operations overhead; use Docker Compose for local development