MongoDB is a document-oriented NoSQL database that stores data as flexible JSON-like documents instead of rows and columns. Where relational databases require you to define a schema upfront and restructure tables as requirements change, MongoDB lets each document in a collection have different fields — making it ideal for applications where data structures evolve rapidly. This guide covers MongoDB installation, core CRUD operations, indexing strategies, and replica set configuration for production deployments.
Prerequisites
- A Linux server (Ubuntu 22.04 or RHEL 8+ recommended) with at least 2 GB RAM
- Root or sudo access for installation
- Basic understanding of JSON data format
- Familiarity with command-line tools
Installing MongoDB
Ubuntu/Debian
# Import the MongoDB GPG key
curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | \
sudo gpg -o /usr/share/keyrings/mongodb-server-7.0.gpg --dearmor
# Add the repository
echo "deb [ arch=amd64,arm64 signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] \
https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/7.0 multiverse" | \
sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list
# Install MongoDB
sudo apt update
sudo apt install -y mongodb-org
# Start and enable the service
sudo systemctl start mongod
sudo systemctl enable mongod
# Verify it is running
sudo systemctl status mongod
mongosh --eval "db.version()"
RHEL/CentOS
# Create the repository file
cat <<'EOF' | sudo tee /etc/yum.repos.d/mongodb-org-7.0.repo
[mongodb-org-7.0]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/7.0/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-7.0.asc
EOF
sudo dnf install -y mongodb-org
sudo systemctl start mongod
sudo systemctl enable mongod
Docker
docker run -d \
--name mongodb \
-p 27017:27017 \
-v mongodb-data:/data/db \
-e MONGO_INITDB_ROOT_USERNAME=admin \
-e MONGO_INITDB_ROOT_PASSWORD=secretpassword \
mongo:7
Core CRUD Operations
Connect to MongoDB using the shell:
# Connect to local MongoDB
mongosh
# Connect to a remote server with authentication
mongosh "mongodb://admin:password@192.168.1.100:27017/admin"
Create — Inserting Documents
// Switch to (or create) a database
use myapp
// Insert a single document
db.users.insertOne({
name: "Alice Johnson",
email: "alice@example.com",
role: "admin",
skills: ["python", "docker", "kubernetes"],
created: new Date()
})
// Insert multiple documents
db.users.insertMany([
{ name: "Bob Smith", email: "bob@example.com", role: "developer", skills: ["javascript", "react"] },
{ name: "Carol Lee", email: "carol@example.com", role: "devops", skills: ["terraform", "aws", "docker"] }
])
MongoDB creates the database and collection automatically on first insert — no CREATE TABLE or CREATE DATABASE needed.
Read — Querying Documents
// Find all documents in a collection
db.users.find()
// Find with a filter
db.users.find({ role: "admin" })
// Query nested fields and arrays
db.users.find({ skills: "docker" }) // Array contains "docker"
db.users.find({ "skills.0": "python" }) // First skill is "python"
// Comparison operators
db.users.find({ age: { $gt: 25 } }) // Greater than
db.users.find({ role: { $in: ["admin", "devops"] } }) // In array
// Projection — select specific fields
db.users.find({ role: "admin" }, { name: 1, email: 1, _id: 0 })
// Sort and limit
db.users.find().sort({ created: -1 }).limit(10)
// Count documents
db.users.countDocuments({ role: "developer" })
Update — Modifying Documents
// Update one document
db.users.updateOne(
{ email: "alice@example.com" },
{ $set: { role: "superadmin", lastLogin: new Date() } }
)
// Add to an array
db.users.updateOne(
{ email: "bob@example.com" },
{ $push: { skills: "typescript" } }
)
// Update multiple documents
db.users.updateMany(
{ role: "developer" },
{ $set: { department: "engineering" } }
)
// Upsert — update if exists, insert if not
db.users.updateOne(
{ email: "dave@example.com" },
{ $set: { name: "Dave Wilson", role: "intern" } },
{ upsert: true }
)
Delete — Removing Documents
// Delete one document
db.users.deleteOne({ email: "dave@example.com" })
// Delete multiple documents
db.users.deleteMany({ role: "intern" })
// Drop an entire collection
db.users.drop()
Indexing for Performance
Without indexes, MongoDB performs a collection scan — reading every document to find matches. On a collection with millions of documents, this takes seconds instead of milliseconds.
// Create a single-field index
db.users.createIndex({ email: 1 }) // Ascending
// Create a compound index (multiple fields)
db.users.createIndex({ role: 1, created: -1 })
// Create a unique index (enforces uniqueness)
db.users.createIndex({ email: 1 }, { unique: true })
// Create a text index for full-text search
db.articles.createIndex({ title: "text", body: "text" })
db.articles.find({ $text: { $search: "mongodb tutorial" } })
// List all indexes on a collection
db.users.getIndexes()
// Explain a query to see if it uses an index
db.users.find({ email: "alice@example.com" }).explain("executionStats")
Indexing rules of thumb:
- Index fields used in
find()filters,sort(), and$matchin aggregations - Compound indexes should order fields from most selective to least selective
- Avoid indexing fields with low cardinality (e.g., boolean fields with only
true/false) - Each index consumes RAM — monitor with
db.stats()anddb.collection.stats()
Comparing MongoDB with Other Databases
| Feature | MongoDB | PostgreSQL | MySQL | Redis |
|---|---|---|---|---|
| Data model | Documents (JSON) | Relational (tables) | Relational (tables) | Key-value / data structures |
| Schema | Flexible (schema-less) | Strict (SQL DDL) | Strict (SQL DDL) | Schema-less |
| Query language | MQL + Aggregation | SQL | SQL | Commands |
| Joins | $lookup (limited) | Full SQL joins | Full SQL joins | None |
| Transactions | Multi-document ACID | Full ACID | Full ACID | Single-op atomic |
| Scaling | Horizontal (sharding) | Vertical (+ read replicas) | Vertical (+ read replicas) | In-memory, horizontal |
| Best for | Flexible schemas, rapid development | Complex queries, data integrity | Web applications, WordPress | Caching, sessions, real-time |
Aggregation Pipeline
The aggregation pipeline is MongoDB’s answer to complex SQL queries with GROUP BY, JOIN, and subqueries:
// Count users per role
db.users.aggregate([
{ $group: { _id: "$role", count: { $sum: 1 } } },
{ $sort: { count: -1 } }
])
// Find the most common skills across all users
db.users.aggregate([
{ $unwind: "$skills" },
{ $group: { _id: "$skills", count: { $sum: 1 } } },
{ $sort: { count: -1 } },
{ $limit: 10 }
])
// Join users with their orders (like SQL JOIN)
db.orders.aggregate([
{ $lookup: {
from: "users",
localField: "userId",
foreignField: "_id",
as: "user"
}},
{ $unwind: "$user" },
{ $project: { orderTotal: 1, "user.name": 1, "user.email": 1 } }
])
Backup and Restore
# Backup a specific database
mongodump --db myapp --out /backup/$(date +%Y-%m-%d)
# Backup with authentication
mongodump --uri="mongodb://admin:password@localhost:27017" --db myapp --out /backup/
# Backup all databases
mongodump --out /backup/full-$(date +%Y-%m-%d)
# Restore a database
mongorestore --db myapp /backup/2025-12-13/myapp
# Restore dropping existing data first
mongorestore --drop --db myapp /backup/2025-12-13/myapp
Gotchas and Edge Cases
Document size limit: A single MongoDB document cannot exceed 16 MB. If you need to store large files, use GridFS — MongoDB’s specification for storing files larger than 16 MB across multiple documents.
No joins by default: While $lookup provides basic join functionality, it is slower than relational joins. Design your document schema to embed related data within a single document when possible (denormalization).
Write concern and data loss: By default, MongoDB acknowledges writes after they reach the primary. If the primary crashes before replication, data is lost. For critical data, use writeConcern: { w: "majority" } to wait for replication.
ObjectId is not sequential: MongoDB’s default _id field uses ObjectId, which encodes a timestamp but is not strictly sequential. Do not use _id ordering as a substitute for a created timestamp field.
Memory-mapped storage: MongoDB’s WiredTiger engine uses available RAM for caching. A MongoDB server with 8 GB RAM and a 100 GB database only caches the hot working set. Monitor cache hit rates with db.serverStatus().wiredTiger.cache.
Troubleshooting
MongoDB fails to start
# Check the log for errors
sudo journalctl -u mongod --no-pager -n 50
# Common cause: insufficient disk space or wrong permissions
ls -la /var/lib/mongodb/
sudo chown -R mongodb:mongodb /var/lib/mongodb
Queries are slow
// Use explain to check if queries use indexes
db.users.find({ email: "alice@example.com" }).explain("executionStats")
// Look for "COLLSCAN" in the output — it means no index is used
// Create an index on the queried field
db.users.createIndex({ email: 1 })
Connection refused from remote clients
# MongoDB binds to localhost by default
# Edit /etc/mongod.conf to allow remote connections
# net:
# bindIp: 0.0.0.0 # Listen on all interfaces
# IMPORTANT: Enable authentication first!
sudo systemctl restart mongod
Summary
- MongoDB stores JSON-like documents in collections without requiring a predefined schema — each document can have different fields, making it ideal for evolving data structures
- CRUD operations use intuitive methods like
insertOne,find,updateOne, anddeleteOnewith JSON query filters instead of SQL - Indexes are critical for performance — create indexes on fields used in queries and sort operations, and use
explain()to verify index usage - The aggregation pipeline handles complex data transformations, grouping, and joins that would require GROUP BY and JOIN in SQL
- Back up regularly with
mongodumpand test restores withmongorestore— a backup you have not tested is not a backup - Design documents to embed related data rather than normalizing across collections — this reduces the need for expensive
$lookupjoins