MongoDB is a document-oriented NoSQL database that stores data as flexible JSON-like documents instead of rows and columns. Where relational databases require you to define a schema upfront and restructure tables as requirements change, MongoDB lets each document in a collection have different fields — making it ideal for applications where data structures evolve rapidly. This guide covers MongoDB installation, core CRUD operations, indexing strategies, and replica set configuration for production deployments.

Prerequisites

  • A Linux server (Ubuntu 22.04 or RHEL 8+ recommended) with at least 2 GB RAM
  • Root or sudo access for installation
  • Basic understanding of JSON data format
  • Familiarity with command-line tools

Installing MongoDB

Ubuntu/Debian

# Import the MongoDB GPG key
curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | \
  sudo gpg -o /usr/share/keyrings/mongodb-server-7.0.gpg --dearmor

# Add the repository
echo "deb [ arch=amd64,arm64 signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] \
  https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/7.0 multiverse" | \
  sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list

# Install MongoDB
sudo apt update
sudo apt install -y mongodb-org

# Start and enable the service
sudo systemctl start mongod
sudo systemctl enable mongod

# Verify it is running
sudo systemctl status mongod
mongosh --eval "db.version()"

RHEL/CentOS

# Create the repository file
cat <<'EOF' | sudo tee /etc/yum.repos.d/mongodb-org-7.0.repo
[mongodb-org-7.0]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/7.0/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-7.0.asc
EOF

sudo dnf install -y mongodb-org
sudo systemctl start mongod
sudo systemctl enable mongod

Docker

docker run -d \
  --name mongodb \
  -p 27017:27017 \
  -v mongodb-data:/data/db \
  -e MONGO_INITDB_ROOT_USERNAME=admin \
  -e MONGO_INITDB_ROOT_PASSWORD=secretpassword \
  mongo:7

Core CRUD Operations

Connect to MongoDB using the shell:

# Connect to local MongoDB
mongosh

# Connect to a remote server with authentication
mongosh "mongodb://admin:password@192.168.1.100:27017/admin"

Create — Inserting Documents

// Switch to (or create) a database
use myapp

// Insert a single document
db.users.insertOne({
  name: "Alice Johnson",
  email: "alice@example.com",
  role: "admin",
  skills: ["python", "docker", "kubernetes"],
  created: new Date()
})

// Insert multiple documents
db.users.insertMany([
  { name: "Bob Smith", email: "bob@example.com", role: "developer", skills: ["javascript", "react"] },
  { name: "Carol Lee", email: "carol@example.com", role: "devops", skills: ["terraform", "aws", "docker"] }
])

MongoDB creates the database and collection automatically on first insert — no CREATE TABLE or CREATE DATABASE needed.

Read — Querying Documents

// Find all documents in a collection
db.users.find()

// Find with a filter
db.users.find({ role: "admin" })

// Query nested fields and arrays
db.users.find({ skills: "docker" })           // Array contains "docker"
db.users.find({ "skills.0": "python" })        // First skill is "python"

// Comparison operators
db.users.find({ age: { $gt: 25 } })            // Greater than
db.users.find({ role: { $in: ["admin", "devops"] } })  // In array

// Projection — select specific fields
db.users.find({ role: "admin" }, { name: 1, email: 1, _id: 0 })

// Sort and limit
db.users.find().sort({ created: -1 }).limit(10)

// Count documents
db.users.countDocuments({ role: "developer" })

Update — Modifying Documents

// Update one document
db.users.updateOne(
  { email: "alice@example.com" },
  { $set: { role: "superadmin", lastLogin: new Date() } }
)

// Add to an array
db.users.updateOne(
  { email: "bob@example.com" },
  { $push: { skills: "typescript" } }
)

// Update multiple documents
db.users.updateMany(
  { role: "developer" },
  { $set: { department: "engineering" } }
)

// Upsert — update if exists, insert if not
db.users.updateOne(
  { email: "dave@example.com" },
  { $set: { name: "Dave Wilson", role: "intern" } },
  { upsert: true }
)

Delete — Removing Documents

// Delete one document
db.users.deleteOne({ email: "dave@example.com" })

// Delete multiple documents
db.users.deleteMany({ role: "intern" })

// Drop an entire collection
db.users.drop()

Indexing for Performance

Without indexes, MongoDB performs a collection scan — reading every document to find matches. On a collection with millions of documents, this takes seconds instead of milliseconds.

// Create a single-field index
db.users.createIndex({ email: 1 })           // Ascending

// Create a compound index (multiple fields)
db.users.createIndex({ role: 1, created: -1 })

// Create a unique index (enforces uniqueness)
db.users.createIndex({ email: 1 }, { unique: true })

// Create a text index for full-text search
db.articles.createIndex({ title: "text", body: "text" })
db.articles.find({ $text: { $search: "mongodb tutorial" } })

// List all indexes on a collection
db.users.getIndexes()

// Explain a query to see if it uses an index
db.users.find({ email: "alice@example.com" }).explain("executionStats")

Indexing rules of thumb:

  • Index fields used in find() filters, sort(), and $match in aggregations
  • Compound indexes should order fields from most selective to least selective
  • Avoid indexing fields with low cardinality (e.g., boolean fields with only true/false)
  • Each index consumes RAM — monitor with db.stats() and db.collection.stats()

Comparing MongoDB with Other Databases

FeatureMongoDBPostgreSQLMySQLRedis
Data modelDocuments (JSON)Relational (tables)Relational (tables)Key-value / data structures
SchemaFlexible (schema-less)Strict (SQL DDL)Strict (SQL DDL)Schema-less
Query languageMQL + AggregationSQLSQLCommands
Joins$lookup (limited)Full SQL joinsFull SQL joinsNone
TransactionsMulti-document ACIDFull ACIDFull ACIDSingle-op atomic
ScalingHorizontal (sharding)Vertical (+ read replicas)Vertical (+ read replicas)In-memory, horizontal
Best forFlexible schemas, rapid developmentComplex queries, data integrityWeb applications, WordPressCaching, sessions, real-time

Aggregation Pipeline

The aggregation pipeline is MongoDB’s answer to complex SQL queries with GROUP BY, JOIN, and subqueries:

// Count users per role
db.users.aggregate([
  { $group: { _id: "$role", count: { $sum: 1 } } },
  { $sort: { count: -1 } }
])

// Find the most common skills across all users
db.users.aggregate([
  { $unwind: "$skills" },
  { $group: { _id: "$skills", count: { $sum: 1 } } },
  { $sort: { count: -1 } },
  { $limit: 10 }
])

// Join users with their orders (like SQL JOIN)
db.orders.aggregate([
  { $lookup: {
      from: "users",
      localField: "userId",
      foreignField: "_id",
      as: "user"
  }},
  { $unwind: "$user" },
  { $project: { orderTotal: 1, "user.name": 1, "user.email": 1 } }
])

Backup and Restore

# Backup a specific database
mongodump --db myapp --out /backup/$(date +%Y-%m-%d)

# Backup with authentication
mongodump --uri="mongodb://admin:password@localhost:27017" --db myapp --out /backup/

# Backup all databases
mongodump --out /backup/full-$(date +%Y-%m-%d)

# Restore a database
mongorestore --db myapp /backup/2025-12-13/myapp

# Restore dropping existing data first
mongorestore --drop --db myapp /backup/2025-12-13/myapp

Gotchas and Edge Cases

Document size limit: A single MongoDB document cannot exceed 16 MB. If you need to store large files, use GridFS — MongoDB’s specification for storing files larger than 16 MB across multiple documents.

No joins by default: While $lookup provides basic join functionality, it is slower than relational joins. Design your document schema to embed related data within a single document when possible (denormalization).

Write concern and data loss: By default, MongoDB acknowledges writes after they reach the primary. If the primary crashes before replication, data is lost. For critical data, use writeConcern: { w: "majority" } to wait for replication.

ObjectId is not sequential: MongoDB’s default _id field uses ObjectId, which encodes a timestamp but is not strictly sequential. Do not use _id ordering as a substitute for a created timestamp field.

Memory-mapped storage: MongoDB’s WiredTiger engine uses available RAM for caching. A MongoDB server with 8 GB RAM and a 100 GB database only caches the hot working set. Monitor cache hit rates with db.serverStatus().wiredTiger.cache.

Troubleshooting

MongoDB fails to start

# Check the log for errors
sudo journalctl -u mongod --no-pager -n 50

# Common cause: insufficient disk space or wrong permissions
ls -la /var/lib/mongodb/
sudo chown -R mongodb:mongodb /var/lib/mongodb

Queries are slow

// Use explain to check if queries use indexes
db.users.find({ email: "alice@example.com" }).explain("executionStats")

// Look for "COLLSCAN" in the output — it means no index is used
// Create an index on the queried field
db.users.createIndex({ email: 1 })

Connection refused from remote clients

# MongoDB binds to localhost by default
# Edit /etc/mongod.conf to allow remote connections
# net:
#   bindIp: 0.0.0.0    # Listen on all interfaces

# IMPORTANT: Enable authentication first!
sudo systemctl restart mongod

Summary

  • MongoDB stores JSON-like documents in collections without requiring a predefined schema — each document can have different fields, making it ideal for evolving data structures
  • CRUD operations use intuitive methods like insertOne, find, updateOne, and deleteOne with JSON query filters instead of SQL
  • Indexes are critical for performance — create indexes on fields used in queries and sort operations, and use explain() to verify index usage
  • The aggregation pipeline handles complex data transformations, grouping, and joins that would require GROUP BY and JOIN in SQL
  • Back up regularly with mongodump and test restores with mongorestore — a backup you have not tested is not a backup
  • Design documents to embed related data rather than normalizing across collections — this reduces the need for expensive $lookup joins