Chapter 5: WriteBatch - Grouping Changes Together
Welcome back! In Chapter 4: DBImpl, we saw how DBImpl
acts as the general manager, coordinating writes, reads, and background tasks. We learned that when you call Put
or Delete
, DBImpl
handles writing to the Write-Ahead Log (WAL) and then updating the MemTable.
But what if you need to make multiple changes that should happen together?
What’s the Problem? Making Multiple Changes Atomically
Imagine you’re managing game scores. When Player A beats Player B, you need to do two things: increase Player A’s score and decrease Player B’s score.
// Goal: Increase playerA score, decrease playerB score
db->Put(options, "score_playerA", "101");
db->Put(options, "score_playerB", "49");
What happens if the system crashes right after the first Put
but before the second Put
? Player A gets a point, but Player B doesn’t lose one. The scores are now inconsistent! This isn’t good.
We need a way to tell LevelDB: “Please perform these multiple operations (like updating both scores) as a single, indivisible unit. Either all of them should succeed, or none of them should.” This property is called atomicity.
WriteBatch: The Atomic To-Do List
LevelDB provides the WriteBatch
class to solve this exact problem.
Think of a WriteBatch
like making a shopping list before you go to the store, or giving a librarian a list of multiple transactions to perform all at once (check out book A, return book B).
- Collect Changes: You create an empty
WriteBatch
object. Then, instead of callingdb->Put
ordb->Delete
directly, you callbatch.Put
andbatch.Delete
to add your desired changes to the batch object. This just adds items to your “to-do list” in memory; it doesn’t modify the database yet. - Apply Atomically: Once your list is complete, you hand the entire
WriteBatch
to the database using a singledb->Write(options, &batch)
call. - All or Nothing: LevelDB guarantees that all the operations (
Put
s andDelete
s) listed in theWriteBatch
will be applied atomically. They will either all succeed and become durable together, or if something goes wrong (like a crash during the process), none of them will appear to have happened after recovery.
Using WriteBatch
for our score update:
#include "leveldb/write_batch.h"
#include "leveldb/db.h"
// ... assume db is an open LevelDB database ...
leveldb::WriteOptions write_options;
write_options.sync = true; // Ensure durability
// 1. Create an empty WriteBatch
leveldb::WriteBatch batch;
// 2. Add changes to the batch (in memory)
batch.Put("score_playerA", "101"); // Add 'Put playerA' to the list
batch.Delete("old_temp_key"); // Add 'Delete old_temp_key' to the list
batch.Put("score_playerB", "49"); // Add 'Put playerB' to the list
// 3. Apply the entire batch atomically
leveldb::Status status = db->Write(write_options, &batch);
if (status.ok()) {
// Success! Both score_playerA and score_playerB are updated,
// and old_temp_key is deleted.
} else {
// Failure! The database state is unchanged. Neither score was updated,
// and old_temp_key was not deleted.
}
Explanation:
- We create a
WriteBatch
calledbatch
. - We call
batch.Put
andbatch.Delete
. These methods modify thebatch
object itself, not the database. They are very fast as they just record the desired operations internally. - We call
db->Write
with the completedbatch
. LevelDB now takes this list and applies it atomically. Thanks to the WAL, even if the system crashes during thedb->Write
call, recovery will ensure either all changes from the batch are applied or none are.
Performance Benefit Too!
Besides atomicity, WriteBatch
also often improves performance when making multiple changes:
- Single Log Write: LevelDB can write the entire batch as a single record to the WAL file on disk. This is usually much faster than writing separate log records for each individual
Put
orDelete
, reducing disk I/O. - Single Lock Acquisition: The
DBImpl
only needs to acquire its internal lock once for the entireWrite
call, rather than once per operation.
So, even if you don’t strictly need atomicity, using WriteBatch
for bulk updates can be faster.
Under the Hood: How WriteBatch Works
What happens inside LevelDB when you call db->Write(options, &batch)
?
- Serialization: The
WriteBatch
object holds a simple, serialized representation of all thePut
andDelete
operations you added. It’s basically a byte string (rep_
internally) containing the sequence of operations and their arguments. - DBImpl Coordination: The
DBImpl::Write
method receives theWriteBatch
. - WAL Write:
DBImpl
takes the entire serialized content of theWriteBatch
(fromWriteBatchInternal::Contents
) and writes it as one single record to the Write-Ahead Log (WAL) usinglog_->AddRecord()
. - MemTable Update: If the WAL write is successful (and synced to disk if
options.sync
is true),DBImpl
then iterates through the operations within theWriteBatch
. For each operation, it applies the change to the in-memory MemTable (WriteBatchInternal::InsertInto(batch, mem_)
).
This two-step process (WAL first, then MemTable) ensures both durability and atomicity. If a crash occurs after the WAL write but before the MemTable update finishes, the recovery process will read the entire batch from the WAL and re-apply it to the MemTable, ensuring all changes are present.
sequenceDiagram
participant App as Application
participant DBImpl as DBImpl::Write
participant WriteBatch as WriteBatch Object
participant WAL as WAL File (Disk)
participant MemTable as MemTable (RAM)
App->>WriteBatch: batch.Put("k1", "v1")
App->>WriteBatch: batch.Delete("k2")
App->>WriteBatch: batch.Put("k3", "v3")
App->>DBImpl: db->Write(options, &batch)
DBImpl->>WriteBatch: Get serialized contents (rep_)
WriteBatch-->>DBImpl: Return byte string representing all ops
DBImpl->>WAL: AddRecord(entire batch content)
Note right of WAL: Single disk write (if sync)
WAL-->>DBImpl: WAL Write OK
DBImpl->>WriteBatch: Iterate through operations
loop Apply each operation from Batch
WriteBatch-->>DBImpl: Next Op: Put("k1", "v1")
DBImpl->>MemTable: Add("k1", "v1")
WriteBatch-->>DBImpl: Next Op: Delete("k2")
DBImpl->>MemTable: Add("k2", deletion_marker)
WriteBatch-->>DBImpl: Next Op: Put("k3", "v3")
DBImpl->>MemTable: Add("k3", "v3")
end
MemTable-->>DBImpl: MemTable Updates Done
DBImpl-->>App: Write Successful
WriteBatch Internals (Code View)
Let’s peek at the code.
Adding to the Batch:
When you call batch.Put("key", "val")
or batch.Delete("key")
, the WriteBatch
simply appends a representation of that operation to its internal string buffer (rep_
).
// --- File: leveldb/write_batch.cc ---
// Simplified serialization format:
// rep_ :=
// sequence: fixed64 (8 bytes, initially 0)
// count: fixed32 (4 bytes, number of records)
// data: record[count]
// record :=
// kTypeValue varstring varstring |
// kTypeDeletion varstring
// varstring :=
// len: varint32
// data: uint8[len]
void WriteBatch::Put(const Slice& key, const Slice& value) {
// Increment the record count stored in the header
WriteBatchInternal::SetCount(this, WriteBatchInternal::Count(this) + 1);
// Append the type marker (kTypeValue)
rep_.push_back(static_cast<char>(kTypeValue));
// Append the key (length-prefixed)
PutLengthPrefixedSlice(&rep_, key);
// Append the value (length-prefixed)
PutLengthPrefixedSlice(&rep_, value);
}
void WriteBatch::Delete(const Slice& key) {
// Increment the record count stored in the header
WriteBatchInternal::SetCount(this, WriteBatchInternal::Count(this) + 1);
// Append the type marker (kTypeDeletion)
rep_.push_back(static_cast<char>(kTypeDeletion));
// Append the key (length-prefixed)
PutLengthPrefixedSlice(&rep_, key);
}
// Helper to get/set the 4-byte count from the header (bytes 8-11)
int WriteBatchInternal::Count(const WriteBatch* b) {
return DecodeFixed32(b->rep_.data() + 8); // Read count from header
}
void WriteBatchInternal::SetCount(WriteBatch* b, int n) {
EncodeFixed32(&b->rep_[8], n); // Write count to header
}
// Helper to get the full serialized content
Slice WriteBatchInternal::Contents(const WriteBatch* batch) {
return Slice(batch->rep_);
}
Explanation:
- Each
Put
orDelete
increments a counter stored in the first 12 bytes (kHeader
) of the internal stringrep_
. - It then appends a 1-byte type marker (
kTypeValue
orkTypeDeletion
). - Finally, it appends the key (and value for
Put
) usingPutLengthPrefixedSlice
, which writes the length of the slice followed by its data. This makes it easy to parse the operations back later.
Applying the Batch to MemTable:
When DBImpl::Write
calls WriteBatchInternal::InsertInto(batch, mem_)
, this helper function iterates through the serialized rep_
string and applies each operation to the MemTable.
// --- File: leveldb/write_batch.cc ---
// Helper class used by InsertInto
namespace {
class MemTableInserter : public WriteBatch::Handler {
public:
SequenceNumber sequence_; // Starting sequence number for the batch
MemTable* mem_; // MemTable to insert into
void Put(const Slice& key, const Slice& value) override {
// Add the Put operation to the MemTable
mem_->Add(sequence_, kTypeValue, key, value);
sequence_++; // Increment sequence number for the next operation
}
void Delete(const Slice& key) override {
// Add the Delete operation (as a deletion marker) to the MemTable
mem_->Add(sequence_, kTypeDeletion, key, Slice()); // Value is ignored
sequence_++; // Increment sequence number for the next operation
}
};
} // namespace
// Applies the batch operations to the MemTable
Status WriteBatchInternal::InsertInto(const WriteBatch* b, MemTable* memtable) {
MemTableInserter inserter;
// Get the starting sequence number assigned by DBImpl::Write
inserter.sequence_ = WriteBatchInternal::Sequence(b);
inserter.mem_ = memtable;
// Iterate() parses rep_ and calls handler.Put/handler.Delete
return b->Iterate(&inserter);
}
// Helper to get/set the 8-byte sequence number from header (bytes 0-7)
SequenceNumber WriteBatchInternal::Sequence(const WriteBatch* b) {
return SequenceNumber(DecodeFixed64(b->rep_.data()));
}
void WriteBatchInternal::SetSequence(WriteBatch* b, SequenceNumber seq) {
EncodeFixed64(&b->rep_[0], seq);
}
Explanation:
InsertInto
creates a helper objectMemTableInserter
.- It gets the starting
SequenceNumber
for this batch (which was assigned byDBImpl::Write
and stored in the batch’s header). - It calls
b->Iterate(&inserter)
. TheIterate
method (code not shown, but it reverses the serialization process) parses therep_
string. For each operation it finds, it calls the appropriate method on theinserter
object (Put
orDelete
). - The
inserter.Put
andinserter.Delete
methods simply callmem_->Add
, passing along the correct sequence number (which increments for each operation within the batch) and the type (kTypeValue
orkTypeDeletion
).
Conclusion
The WriteBatch
is a simple yet powerful tool in LevelDB. It allows you to:
- Group Multiple Changes: Collect several
Put
andDelete
operations together. - Ensure Atomicity: Apply these changes as a single, all-or-nothing unit using
db->Write
. This prevents inconsistent states if errors or crashes occur mid-operation. - Improve Performance: Often makes bulk updates faster by reducing the number of WAL writes and lock acquisitions.
It works by serializing the list of operations into a byte string, which LevelDB writes to the WAL as a single record and then replays into the MemTable.
Now that we understand how individual changes and batches of changes are safely written and stored temporarily in the MemTable and WAL, how does LevelDB manage the overall state of the database, including all the SSTable files on disk? How does it know which files contain the data for a particular key?
Next up: Chapter 6: Version & VersionSet
Generated by AI Codebase Knowledge Builder