Chapter 3: BrowserContext - The Agent’s Isolated Workspace
In the previous chapter, we learned how the System Prompt
acts as the rulebook for the AI assistant (LLM) that guides our Agent
. We know the Agent uses the LLM to decide what to do next based on the current situation in the browser.
But where does the Agent actually “see” the webpage and perform its actions? How does it keep track of the current website address (URL), the page content, and things like cookies, all while staying focused on its specific task without getting mixed up with your other browsing?
This is where the BrowserContext comes in.
What Problem Does BrowserContext Solve?
Imagine you ask your Agent
to log into a specific online shopping website and check your order status. You might already be logged into that same website in your regular browser window with your personal account.
If the Agent just used your main browser window, it might:
- Get confused by your existing login.
- Accidentally use your personal cookies or saved passwords.
- Interfere with other tabs you have open.
We need a way to give the Agent its own, clean, separate browsing environment for each task. It needs an isolated “workspace” where it can open websites, log in, click buttons, and manage its own cookies without affecting anything else.
The BrowserContext
solves this by representing a single, isolated browser session.
Meet the BrowserContext: Your Agent’s Private Browser Window
Think of a BrowserContext
like opening a brand new Incognito Window or creating a separate User Profile in your web browser (like Chrome or Firefox).
- It’s Isolated: What happens in one
BrowserContext
doesn’t affect others or your main browser session. It has its own cookies, its own history (for that session), and its own set of tabs. - It Manages State: It keeps track of everything important about the current web session the Agent is working on:
- The current URL.
- Which tabs are open within its “window”.
- Cookies specific to that session.
- The structure and content of the current webpage (the DOM - Document Object Model, which we’ll explore in the next chapter).
- It’s the Agent’s Viewport: The
Agent
looks through theBrowserContext
to “see” the current state of the webpage. When the Agent decides to perform an action (like clicking a button), it tells the Action Controller to perform it within that specificBrowserContext
.
Essentially, the BrowserContext
is like a dedicated, clean desk or workspace given to the Agent for its specific job.
Using the BrowserContext
Before we can have an isolated session (BrowserContext
), we first need the main browser application itself. This is handled by the Browser
class. Think of Browser
as the entire Chrome or Firefox application installed on your computer, while BrowserContext
is just one window or profile within that application.
Here’s a simplified example of how you might set up a Browser
and then create a BrowserContext
to navigate to a page:
import asyncio
# Import necessary classes
from browser_use import Browser, BrowserConfig, BrowserContext, BrowserContextConfig
async def main():
# 1. Configure the main browser application (optional, defaults are usually fine)
browser_config = BrowserConfig(headless=False) # Show the browser window
# 2. Create the main Browser instance
# This might launch a browser application in the background (or connect to one)
browser = Browser(config=browser_config)
print("Browser application instance created.")
# 3. Configure the specific session/window (optional)
context_config = BrowserContextConfig(
user_agent="MyCoolAgent/1.0", # Example: Set a custom user agent
cookies_file="my_session_cookies.json" # Example: Save/load cookies
)
# 4. Create the isolated BrowserContext (like opening an incognito window)
# We use 'async with' to ensure it cleans up automatically afterwards
async with browser.new_context(config=context_config) as browser_context:
print(f"BrowserContext created (ID: {browser_context.context_id}).")
# 5. Use the context to interact with the browser session
start_url = "https://example.com"
print(f"Navigating to: {start_url}")
await browser_context.navigate_to(start_url)
# 6. Get information *from* the context
current_state = await browser_context.get_state() # Get current page info
print(f"Current page title: {current_state.title}")
print(f"Current page URL: {current_state.url}")
# The Agent would use this 'browser_context' object to see the page
# and tell the Controller to perform actions within it.
print("BrowserContext closed automatically.")
# 7. Close the main browser application when done
await browser.close()
print("Browser application closed.")
# Run the asynchronous code
asyncio.run(main())
What happens here?
- We set up a
BrowserConfig
(telling it not to run headless so we can see the window). - We create a
Browser
instance, which represents the overall browser program. - We create a
BrowserContextConfig
to specify settings for our isolated session (like a custom name or where to save cookies). - Crucially,
browser.new_context(...)
creates our isolated session. Theasync with
block ensures this session is properly closed later. - We use methods on the
browser_context
object likenavigate_to()
to control this specific session. - We use
browser_context.get_state()
to get information about the current page within this session. TheAgent
heavily relies on this method. - After the
async with
block finishes, thebrowser_context
is closed (like closing the incognito window), and finally, we close the mainbrowser
application.
How it Works Under the Hood
When the Agent
needs to understand the current situation to decide the next step, it asks the BrowserContext
for the latest state using the get_state()
method. What happens then?
- Wait for Stability: The
BrowserContext
first waits for the webpage to finish loading and for network activity to settle down (_wait_for_page_and_frames_load
). This prevents the Agent from acting on an incomplete page. - Analyze the Page: It then uses the DOM Representation service (
DomService
) to analyze the current HTML structure of the page. This service figures out which elements are visible, interactive (buttons, links, input fields), and where they are. - Capture Visuals: It often takes a screenshot of the current view (
take_screenshot
). This can be helpful for advanced agents or debugging. - Gather Metadata: It gets the current URL, page title, and information about any other tabs open within this context.
- Package the State: All this information (DOM structure, URL, title, screenshot, etc.) is bundled into a
BrowserState
object. - Return to Agent: The
BrowserContext
returns thisBrowserState
object to theAgent
. The Agent then uses this information (often sending it to the LLM) to plan its next action.
Here’s a simplified diagram of the get_state()
process:
sequenceDiagram
participant Agent
participant BC as BrowserContext
participant PlaywrightPage as Underlying Browser Page
participant DomService as DOM Service
Agent->>BC: get_state()
Note over BC: Wait for page to be ready...
BC->>PlaywrightPage: Ensure page/network is stable
PlaywrightPage-->>BC: Page is ready
Note over BC: Analyze the page content...
BC->>DomService: Get simplified DOM structure + interactive elements
DomService-->>BC: DOMState (element tree, etc.)
Note over BC: Get visuals and metadata...
BC->>PlaywrightPage: Take screenshot()
PlaywrightPage-->>BC: Screenshot data
BC->>PlaywrightPage: Get URL, Title
PlaywrightPage-->>BC: URL, Title data
Note over BC: Combine everything...
BC->>BC: Create BrowserState object
BC-->>Agent: Return BrowserState
Let’s look at some simplified code snippets from the library.
The BrowserContext
is initialized (__init__
in browser/context.py
) with its configuration and a reference to the main Browser
instance that created it.
# --- File: browser/context.py (Simplified __init__) ---
import uuid
# ... other imports ...
if TYPE_CHECKING:
from browser_use.browser.browser import Browser # Link to the Browser class
@dataclass
class BrowserContextConfig: # Configuration settings
# ... various settings like user_agent, cookies_file, window_size ...
pass
@dataclass
class BrowserSession: # Holds the actual Playwright context
context: PlaywrightBrowserContext # The underlying Playwright object
cached_state: Optional[BrowserState] = None # Stores the last known state
class BrowserContext:
def __init__(
self,
browser: 'Browser', # Reference to the main Browser instance
config: BrowserContextConfig = BrowserContextConfig(),
# ... other optional state ...
):
self.context_id = str(uuid.uuid4()) # Unique ID for this session
self.config = config # Store the configuration
self.browser = browser # Store the reference to the parent Browser
# The actual Playwright session is created later, when needed
self.session: BrowserSession | None = None
logger.debug(f"BrowserContext object created (ID: {self.context_id}). Session not yet initialized.")
# The 'async with' statement calls __aenter__ which initializes the session
async def __aenter__(self):
await self._initialize_session() # Creates the actual browser window/tab
return self
async def _initialize_session(self):
# ... (complex setup code happens here) ...
# Gets the main Playwright browser from self.browser
playwright_browser = await self.browser.get_playwright_browser()
# Creates the isolated Playwright context (like the incognito window)
context = await self._create_context(playwright_browser)
# Creates the BrowserSession to hold the context and state
self.session = BrowserSession(context=context, cached_state=None)
logger.debug(f"BrowserContext session initialized (ID: {self.context_id}).")
# ... (sets up the initial page) ...
return self.session
# ... other methods like navigate_to, close, etc. ...
The get_state
method orchestrates fetching the current information from the browser session.
# --- File: browser/context.py (Simplified get_state and helpers) ---
# ... other imports ...
from browser_use.dom.service import DomService # Imports the DOM analyzer
from browser_use.browser.views import BrowserState # Imports the state structure
class BrowserContext:
# ... (init, aenter, etc.) ...
async def get_state(self) -> BrowserState:
"""Get the current state of the browser session."""
logger.debug(f"Getting state for context {self.context_id}...")
# 1. Make sure the page is loaded and stable
await self._wait_for_page_and_frames_load()
# 2. Get the actual Playwright session object
session = await self.get_session()
# 3. Update the state (this does the heavy lifting)
session.cached_state = await self._update_state()
logger.debug(f"State update complete for {self.context_id}.")
# 4. Optionally save cookies if configured
if self.config.cookies_file:
asyncio.create_task(self.save_cookies())
return session.cached_state
async def _wait_for_page_and_frames_load(self, timeout_overwrite: float | None = None):
"""Ensures page is fully loaded before continuing."""
# ... (complex logic to wait for network idle, minimum times) ...
page = await self.get_current_page()
await page.wait_for_load_state('load', timeout=5000) # Simplified wait
logger.debug("Page load/network stability checks passed.")
await asyncio.sleep(self.config.minimum_wait_page_load_time) # Ensure minimum wait
async def _update_state(self) -> BrowserState:
"""Fetches all info and builds the BrowserState."""
session = await self.get_session()
page = await self.get_current_page() # Get the active Playwright page object
try:
# Use DomService to analyze the page content
dom_service = DomService(page)
# Get the simplified DOM tree and interactive elements map
content_info = await dom_service.get_clickable_elements(
highlight_elements=self.config.highlight_elements,
# ... other DOM options ...
)
# Take a screenshot
screenshot_b64 = await self.take_screenshot()
# Get URL, Title, Tabs, Scroll info etc.
url = page.url
title = await page.title()
tabs = await self.get_tabs_info()
pixels_above, pixels_below = await self.get_scroll_info(page)
# Create the BrowserState object
browser_state = BrowserState(
element_tree=content_info.element_tree,
selector_map=content_info.selector_map,
url=url,
title=title,
tabs=tabs,
screenshot=screenshot_b64,
pixels_above=pixels_above,
pixels_below=pixels_below,
)
return browser_state
except Exception as e:
logger.error(f'Failed to update state: {str(e)}')
# Maybe return old state or raise error
raise BrowserError("Failed to get browser state") from e
async def take_screenshot(self, full_page: bool = False) -> str:
"""Takes a screenshot and returns base64 encoded string."""
page = await self.get_current_page()
screenshot_bytes = await page.screenshot(full_page=full_page, animations='disabled')
return base64.b64encode(screenshot_bytes).decode('utf-8')
# ... many other helper methods (_get_current_page, get_tabs_info, etc.) ...
This shows how BrowserContext
acts as a manager for a specific browser session, using underlying tools (like Playwright and DomService
) to gather the necessary information (BrowserState
) that the Agent
needs to operate.
Conclusion
The BrowserContext
is a fundamental concept in Browser Use
. It provides the necessary isolated environment for the Agent
to perform its tasks, much like an incognito window or a separate browser profile. It manages the session’s state (URL, cookies, tabs, page content) and provides the Agent
with a snapshot of the current situation via the get_state()
method.
Understanding the BrowserContext
helps clarify where the Agent works. Now, how does the Agent actually understand the content of the webpage within that context? How is the complex structure of a webpage represented in a way the Agent (and the LLM) can understand?
In the next chapter, we’ll dive into exactly that: the DOM Representation.
Next Chapter: DOM Representation
Generated by AI Codebase Knowledge Builder