Chapter 7: Iterator - Your Guide Through the Database
Welcome back! In Chapter 6: Version & VersionSet, we learned how LevelDB keeps track of all the live SSTable files using Version
objects and the VersionSet
. This catalog helps LevelDB efficiently find a single key by looking first in the MemTable and then pinpointing the right SSTables to check.
But what if you don’t want just one key? What if you want to see all the key-value pairs in the database, or all the keys within a specific range?
What’s the Problem? Scanning Multiple Keys
Imagine you have a database storing user scores, with keys like score:userA
, score:userB
, score:userC
, etc. How would you find all the users whose usernames start with ‘user’? Or how would you list all scores from highest to lowest?
Calling db->Get()
repeatedly for every possible key isn’t practical or efficient. We need a way to easily scan or traverse through the key-value pairs stored in the database, in sorted order.
Furthermore, this scan needs to be smart. It has to combine the data from the current MemTable (the fast notepad), potentially an older immutable MemTable, and all the different SSTable files on disk. It also needs to correctly handle situations where a key was updated or deleted – showing you only the latest live version of the data, just like Get
does.
Iterator: Your Database Research Assistant
LevelDB provides the Iterator
concept to solve this. Think of an Iterator
as a super-smart research assistant.
You tell the assistant what you’re looking for (e.g., “start from the beginning” or “find keys starting with ‘user’”). The assistant then efficiently looks through the current notepad (MemTable
), the previous notepad (imm_
), and all the relevant books on the shelves (SSTables
), using the latest catalog (Version
).
As the assistant finds relevant entries, it presents them to you one by one, in perfect sorted order by key. Crucially, the assistant knows how to:
- Merge Sources: Combine results from memory (MemTable) and disk (SSTables) seamlessly.
- Handle Versions: If the same key exists in multiple places (e.g., an old value in an SSTable and a newer value in the MemTable), the assistant only shows you the most recent one based on the database’s internal sequence numbers.
- Handle Deletions: If a key has been deleted, the assistant knows to skip it entirely, even if older versions of the key exist in SSTables.
- Provide a Snapshot: An iterator typically operates on a consistent snapshot of the database. Data added after the iterator was created won’t suddenly appear during your scan.
The main iterator you interact with, obtained via db->NewIterator()
, is often implemented internally by a class called DBIter
. DBIter
coordinates the work of lower-level iterators.
How to Use an Iterator
Using an iterator is quite straightforward. Here’s a typical pattern:
#include "leveldb/db.h"
#include "leveldb/iterator.h"
#include <iostream>
// ... assume db is an open LevelDB database ...
// 1. Create an iterator
leveldb::ReadOptions options;
// options.snapshot = db->GetSnapshot(); // Optional: Use a specific snapshot
leveldb::Iterator* it = db->NewIterator(options);
// 2. Position the iterator (e.g., seek to the first key >= "start_key")
std::string start_key = "user:";
it->Seek(start_key);
// 3. Loop through the keys
std::cout << "Keys starting with '" << start_key << "':" << std::endl;
for (; it->Valid(); it->Next()) {
leveldb::Slice key = it->key();
leveldb::Slice value = it->value();
// Optional: Stop if we go past the desired range
if (!key.starts_with(start_key)) {
break;
}
std::cout << key.ToString() << " => " << value.ToString() << std::endl;
}
// 4. Check for errors (optional but recommended)
if (!it->status().ok()) {
std::cerr << "Iterator error: " << it->status().ToString() << std::endl;
}
// 5. Clean up the iterator and snapshot (if used)
delete it;
// if (options.snapshot != nullptr) {
// db->ReleaseSnapshot(options.snapshot);
// }
Explanation:
db->NewIterator(options)
: You ask the database for a new iterator. You can passReadOptions
, optionally including a specific snapshot you obtained earlier usingdb->GetSnapshot()
. If you don’t provide a snapshot, the iterator uses an implicit snapshot of the database state at the time of creation.- Positioning:
it->Seek(slice)
: Moves the iterator to the first key-value pair whose key is greater than or equal to theslice
.it->SeekToFirst()
: Moves to the very first key-value pair in the database.it->SeekToLast()
: Moves to the very last key-value pair.
- Looping:
it->Valid()
: Returnstrue
if the iterator is currently pointing to a valid key-value pair,false
otherwise (e.g., if you’ve reached the end).it->Next()
: Moves the iterator to the next key-value pair in sorted order.it->Prev()
: Moves to the previous key-value pair (less common, but supported).it->key()
: Returns aSlice
representing the current key.it->value()
: Returns aSlice
representing the current value. Important: TheSlice
s returned bykey()
andvalue()
are only valid until the next call that modifies the iterator (Next
,Prev
,Seek
, etc.). If you need to keep the data longer, make a copy (e.g.,key.ToString()
).
it->status()
: After the loop, check this to see if any errors occurred during iteration (e.g., disk corruption).delete it;
: Crucially, you must delete the iterator when you’re done with it to free up resources. If you used an explicit snapshot, release it too.
This simple interface lets you scan through potentially vast amounts of data spread across memory and disk files without needing to know the complex details of where each piece resides.
Under the Hood: Merging and Filtering
How does the iterator provide this unified, sorted view? It doesn’t load everything into memory! Instead, it uses a clever strategy involving merging and filtering.
- Gather Internal Iterators: When you call
db->NewIterator()
, theDBImpl
asks for iterators from all the relevant sources, based on the current Version:- An iterator for the active
MemTable
. - An iterator for the immutable
imm_
(if it exists). - Iterators for all the files in Level-0.
- A special “concatenating” iterator for Level-1 (which opens SSTable files lazily as needed).
- Similar concatenating iterators for Level-2, Level-3, etc.
- An iterator for the active
-
Create MergingIterator: These individual iterators are then passed to a
MergingIterator
. TheMergingIterator
acts like a zipper, taking multiple sorted streams and producing a single output stream that is also sorted. It keeps track of the current position in each input iterator and always yields the smallest key currently available across all inputs. - Wrap with DBIter: The
MergingIterator
produces internal keys (with sequence numbers and types). This merged stream is then wrapped by theDBIter
.DBIter
is the “research assistant” we talked about. It reads the stream from theMergingIterator
and performs the final filtering:- It compares the sequence number of each internal key with the iterator’s snapshot sequence number. Keys newer than the snapshot are ignored.
- It keeps track of the current user key. If it sees multiple versions of the same user key, it only considers the one with the highest sequence number (that’s still <= the snapshot sequence).
- If the most recent entry for a user key is a deletion marker (
kTypeDeletion
), it skips that key entirely. - Only when it finds a valid, non-deleted key (
kTypeValue
) with the highest sequence number for that user key (within the snapshot) does it make that key/value available viait->key()
andit->value()
.
Sequence Diagram:
sequenceDiagram
participant App as Application
participant DBImpl
participant MemTable as Active MemTable
participant ImmMemTable as Immutable MemTable
participant Version as Current Version
participant MergingIter as MergingIterator
participant DBIter
App->>DBImpl: NewIterator(options)
DBImpl->>MemTable: NewIterator()
MemTable-->>DBImpl: Return mem_iter
DBImpl->>ImmMemTable: NewIterator()
ImmMemTable-->>DBImpl: Return imm_iter
DBImpl->>Version: AddIterators(options) # Gets SSTable iterators
Version-->>DBImpl: Return sstable_iters_list
DBImpl->>MergingIter: Create(mem_iter, imm_iter, sstable_iters...)
MergingIter-->>DBImpl: Return merged_iter
DBImpl->>DBIter: Create(merged_iter, snapshot_seq)
DBIter-->>DBImpl: Return db_iter
DBImpl-->>App: Return db_iter (as Iterator*)
App->>DBIter: Seek("some_key")
DBIter->>MergingIter: Seek to internal key for "some_key"
Note right of DBIter: DBIter finds the first valid user entry >= "some_key"
DBIter-->>App: Iterator positioned
App->>DBIter: Valid()?
DBIter-->>App: true
App->>DBIter: key()
DBIter-->>App: Return "user_key_A"
App->>DBIter: Next()
DBIter->>MergingIter: Next() until user key changes
Note right of DBIter: DBIter skips older versions or deleted keys
DBIter->>MergingIter: Next() to find next user key's latest version
DBIter-->>App: Iterator positioned at next valid entry
Code Dive: DBImpl::NewIterator
and DBIter
Let’s look at how this is initiated in the code.
1. Creating the Iterator (db_impl.cc
)
When you call db->NewIterator(options)
, it eventually calls DBImpl::NewIterator
:
// --- File: db/db_impl.cc ---
Iterator* DBImpl::NewIterator(const ReadOptions& options) {
SequenceNumber latest_snapshot;
uint32_t seed; // Used for read sampling randomization
// (1) Create the internal merging iterator
Iterator* internal_iter = NewInternalIterator(options, &latest_snapshot, &seed);
// (2) Determine the sequence number for the snapshot
SequenceNumber snapshot_seq =
(options.snapshot != nullptr
? static_cast<const SnapshotImpl*>(options.snapshot)
->sequence_number()
: latest_snapshot);
// (3) Wrap the internal iterator with DBIter
return NewDBIterator(this, // Pass DBImpl pointer for read sampling
user_comparator(),
internal_iter,
snapshot_seq,
seed);
}
Explanation:
NewInternalIterator
: This helper function (we’ll glance at it next) creates theMergingIterator
that combines MemTables and SSTables.snapshot_seq
: It figures out which sequence number to use. If the user provided an explicitoptions.snapshot
, it uses that snapshot’s sequence number. Otherwise, it uses the latest sequence number in the database when the iterator was created (latest_snapshot
).NewDBIterator
: This function (defined indb_iter.cc
) creates theDBIter
object, passing it the underlyinginternal_iter
and thesnapshot_seq
to use for filtering.
2. Creating the Internal Iterator (db_impl.cc
)
The NewInternalIterator
gathers all the source iterators:
// --- File: db/db_impl.cc ---
Iterator* DBImpl::NewInternalIterator(const ReadOptions& options,
SequenceNumber* latest_snapshot,
uint32_t* seed) {
mutex_.Lock(); // Need lock to access shared state (mem_, imm_, versions_)
*latest_snapshot = versions_->LastSequence();
*seed = ++seed_; // For random sampling
// Collect together all needed child iterators
std::vector<Iterator*> list;
// Add iterator for active MemTable
list.push_back(mem_->NewIterator());
mem_->Ref(); // Manage lifetime with ref counting
// Add iterator for immutable MemTable (if it exists)
if (imm_ != nullptr) {
list.push_back(imm_->NewIterator());
imm_->Ref();
}
// Add iterators for all SSTable files in the current Version
versions_->current()->AddIterators(options, &list);
versions_->current()->Ref();
// Create the MergingIterator
Iterator* internal_iter =
NewMergingIterator(&internal_comparator_, &list[0], list.size());
// Register cleanup function to Unref MemTables/Version when iterator is deleted
IterState* cleanup = new IterState(&mutex_, mem_, imm_, versions_->current());
internal_iter->RegisterCleanup(CleanupIteratorState, cleanup, nullptr);
mutex_.Unlock();
return internal_iter;
}
Explanation:
- It locks the database mutex to safely access the current MemTables (
mem_
,imm_
) and the currentVersion
. - It creates iterators for
mem_
andimm_
using theirNewIterator()
methods (MemTable uses a SkipList iterator). - It calls
versions_->current()->AddIterators(...)
. This method (inversion_set.cc
) adds iterators for Level-0 files and the special concatenating iterators for Levels 1+ to thelist
. See Version & VersionSet. NewMergingIterator
creates the iterator that merges all sources inlist
.RegisterCleanup
ensures that the MemTables and Version are properlyUnref
‘d when the iterator is eventually deleted.- It returns the
MergingIterator
.
3. DBIter
Filtering Logic (db_iter.cc
)
The DBIter
class takes the MergingIterator
and applies the filtering logic. Let’s look at a simplified Next()
method:
// --- File: db/db_iter.cc ---
void DBIter::Next() {
assert(valid_);
if (direction_ == kReverse) {
// ... code to switch from moving backward to forward ...
// Position iter_ at the first entry >= saved_key_
// Fall through to FindNextUserEntry...
direction_ = kForward;
} else {
// We are moving forward. Save the current user key so we can skip
// all other entries for it.
SaveKey(ExtractUserKey(iter_->key()), &saved_key_);
// Advance the internal iterator.
iter_->Next();
}
// Find the next user key entry that is visible at our sequence number.
FindNextUserEntry(true, &saved_key_);
}
// Find the next entry for a different user key, skipping deleted
// or older versions of the key in 'skip'.
void DBIter::FindNextUserEntry(bool skipping, std::string* skip) {
// Loop until we hit an acceptable entry
assert(iter_->Valid() || !valid_); // iter_ might be invalid if Next() moved past end
assert(direction_ == kForward);
do {
if (!iter_->Valid()) { // Reached end of internal iterator
valid_ = false;
return;
}
ParsedInternalKey ikey;
// Parse the internal key (key, sequence, type)
if (ParseKey(&ikey)) {
// Check if the sequence number is visible in our snapshot
if (ikey.sequence <= sequence_) {
// Check the type (Put or Deletion)
switch (ikey.type) {
case kTypeDeletion:
// This key is deleted. Save the user key so we skip
// any older versions of it we might encounter later.
SaveKey(ikey.user_key, skip);
skipping = true; // Ensure we skip older versions
break;
case kTypeValue:
// This is a potential result (a Put operation).
// Is it for the user key we are trying to skip?
if (skipping &&
user_comparator_->Compare(ikey.user_key, *skip) <= 0) {
// Yes, it's hidden by a newer deletion or is an older version
// of the key we just yielded. Skip it.
} else {
// Found a valid entry!
valid_ = true;
// Clear skip key since we found a new valid key
// saved_key_.clear(); // Done in Next() or Seek()
return; // Exit the loop, iterator is now positioned correctly.
}
break;
}
}
} else {
// Corrupted key, mark iterator as invalid
valid_ = false;
status_ = Status::Corruption("corrupted internal key in DBIter");
return;
}
// Current internal key was skipped (too new, deleted, hidden), move to next.
iter_->Next();
} while (true); // Loop until we return or reach the end
}
Explanation:
- The
Next()
method first handles switching direction if needed. If moving forward, it saves the current user key (saved_key_
) so it can skip other entries for the same key. It then advances the underlyingiter_
(theMergingIterator
). FindNextUserEntry
is the core loop. It repeatedly gets the next entry fromiter_
.ParseKey(&ikey)
decodes the internal key, sequence number, and type.- It checks if
ikey.sequence <= sequence_
(the iterator’s snapshot sequence number). If the entry is too new, it’s skipped. - If it’s a
kTypeDeletion
, the user key is saved inskip
, and theskipping
flag is set to true. Any older entries for thisuser_key
will be ignored. - If it’s a
kTypeValue
:- It checks if
skipping
is true and if the currentikey.user_key
is less than or equal to the key inskip
. If so, it means this entry is hidden by a newer deletion or is an older version of a key we just processed, so it’s skipped. - Otherwise, this is the newest, visible version of this user key! The loop terminates,
valid_
is set to true, and theDBIter
is now positioned at this entry.
- It checks if
- If the current entry from
iter_
was skipped for any reason, the loop continues by callingiter_->Next()
.
This careful dance ensures that DBIter
only exposes the correct, latest, non-deleted user key/value pairs according to the snapshot sequence number, while efficiently merging data from all underlying sources.
Conclusion
LevelDB’s Iterator
provides a powerful and convenient way to scan through key-value pairs. It acts like a smart assistant, giving you a unified, sorted view across data stored in the MemTable
and numerous SSTable
files.
Under the hood, it uses a MergingIterator
to combine multiple sorted sources and the DBIter
wrapper to filter out deleted entries and older versions based on sequence numbers and the requested snapshot.
This ability to efficiently scan sorted data is not just useful for application queries, but it’s also fundamental to how LevelDB maintains itself. How does LevelDB merge old SSTables and incorporate data flushed from the MemTable to keep the database structure efficient? It uses these very same iterator concepts!
Next up: Chapter 8: Compaction
Generated by AI Codebase Knowledge Builder