Chapter 3: Remembering Things - The Session Object

In Chapter 1, we learned the easiest way to make web requests using functions like requests.get(). In Chapter 2, we looked at the Request and Response objects that structure our communication with web servers.

We also saw that the simple functional API methods like requests.get() are great for single, one-off requests. But what if you need to talk to the same website multiple times? For example, maybe you need to:

  1. Log in to a website (which gives you a “session cookie” to prove you’re logged in).
  2. Make several requests to access different pages that require you to be logged in (using that cookie).

If you use requests.get() for each step, you’ll have a problem. Remember how requests.get() creates a temporary setup for each call and then throws it away? This means it forgets the login cookie immediately after the login request! Your next request will be like visiting the site as a brand new, logged-out user.

How can we make Requests remember things between requests, just like your web browser does when you navigate around a logged-in site?

Meet the Session Object: Your Persistent Browser Tab

This is where the requests.Session object comes in!

Think of a Session object as a dedicated browser tab you’ve opened just for interacting with a specific website or web service. What does a browser tab do?

  • Remembers Cookies: If you log in on a website in one tab, that tab remembers your login cookie. When you click a link within that same tab, the browser automatically sends the cookie back, keeping you logged in.
  • Keeps Connections Warm: Your browser often keeps the underlying network connection (TCP connection) to the website open for a little while. This makes clicking links and loading subsequent pages much faster because it doesn’t have to establish a new connection every single time. This is called connection pooling.
  • Applies Consistent Settings: You might have browser extensions that add specific headers to your requests, or your browser sends a consistent “User-Agent” string identifying itself.

A requests.Session object does all of these things for your Python script:

  1. Cookie Persistence: It automatically stores cookies sent by the server and sends them back on subsequent requests to the same domain.
  2. Connection Pooling: It reuses the underlying TCP connections for requests to the same host, significantly speeding up multiple requests. This is managed by components called Transport Adapters.
  3. Default Data: You can set default headers, authentication details, query parameters, or proxy settings directly on the Session object, and they will be applied to all requests made through that session.

Using a Session

Using a Session is almost as easy as using the functional API. Instead of calling requests.get(), you first create a Session object, and then call methods like get() or post() on that object.

import requests

# 1. Create a Session object
s = requests.Session()

# Let's try accessing a page that requires a login (we're not logged in yet)
login_required_url = 'https://httpbin.org/cookies' # This page shows cookies sent to it
print("Trying to access protected page without login...")
response1 = s.get(login_required_url)
print("Cookies sent (should be none):", response1.json()) # httpbin returns JSON

# Now, let's simulate 'logging in' by visiting a page that sets a cookie
cookie_setter_url = 'https://httpbin.org/cookies/set/sessioncookie/123456789'
print("\nSimulating login by getting a cookie...")
response2 = s.get(cookie_setter_url)
# The session automatically stored the cookie! Check the session's cookie jar:
print("Session cookies after setting:", s.cookies.get_dict())

# Now, try accessing the 'protected' page again using the SAME session
print("\nTrying to access protected page AGAIN with the session...")
response3 = s.get(login_required_url)
print("Cookies sent (should have sessioncookie):", response3.json())

# Compare with using the functional API (which forgets cookies)
print("\nTrying the same with functional API (will fail)...")
response4 = requests.get(cookie_setter_url) # Gets cookie, but immediately forgets
response5 = requests.get(login_required_url)
print("Cookies sent via functional API (should be none):", response5.json())

What happened here?

  1. s = requests.Session(): We created our “persistent browser tab”.
  2. response1 = s.get(login_required_url): Our first request sent no cookies, as expected.
  3. response2 = s.get(cookie_setter_url): We visited a URL designed to send back a Set-Cookie header. The Session object automatically noticed this and stored the sessioncookie in its internal Cookie Jar.
  4. s.cookies.get_dict(): We peeked inside the session’s cookie storage and saw the cookie was indeed saved.
  5. response3 = s.get(login_required_url): We made another request using the same session s. This time, the session automatically included the sessioncookie in the request headers. The server received it!
  6. The last part shows that if we used requests.get() instead, the cookie from response4 would be lost, and response5 would fail to send it. The Session was crucial for remembering the cookie.

Persistent Settings: Headers, Auth, etc.

Besides cookies, you can set other things on the Session that will apply to all its requests.

import requests
import os # To get environment variables for auth example

s = requests.Session()

# Set a default header for all requests made by this session
s.headers.update({'X-My-Custom-Header': 'HelloSession'})

# Set default authentication (using basic auth from environment variables for example)
# NOTE: Replace with actual username/password or use httpbin's basic-auth endpoint
# For httpbin, the user/pass is 'user'/'pass'
# s.auth = ('user', 'passwd') # Set directly if needed
httpbin_user = os.environ.get("HTTPBIN_USER", "testuser") # Fake user if not set
httpbin_pass = os.environ.get("HTTPBIN_PASS", "testpass") # Fake pass if not set
s.auth = (httpbin_user, httpbin_pass)

# Set default query parameters
s.params.update({'session_param': 'persistent'})

# Now make a request
url = 'https://httpbin.org/get' # Changed endpoint to see params
print(f"Making request with persistent session settings to: {url}")
response = s.get(url)

print(f"\nStatus Code: {response.status_code}")
# Check the response (httpbin.org/get echoes back request details)
response_data = response.json()
print("\nHeaders sent (look for X-My-Custom-Header):")
print(response_data['headers'])
# print("\nAuth info sent (if using httpbin basic-auth):")
# print(response_data.get('authenticated'), response_data.get('user')) # Won't show here for /get
print("\nQuery parameters sent (look for session_param):")
print(response_data['args'])

# Make another request to a different endpoint using the same session
headers_url = 'https://httpbin.org/headers'
print(f"\nMaking request to {headers_url}...")
response_headers = s.get(headers_url)
print("Headers received by second request (still has custom header):")
print(response_headers.json()['headers'])

What we see:

  • The X-My-Custom-Header we set on s.headers was automatically added to both requests.
  • The session_param we added to s.params was included in the query string of the first request.
  • If we had used a real authentication endpoint, the s.auth details would have been used automatically.
  • We didn’t have to specify these details on each s.get() call! The Session handled it.

Using Sessions with with (Context Manager)

Sessions manage resources like network connections. It’s good practice to explicitly close them when you’re done. The easiest way to ensure this happens is to use the Session as a context manager with the with statement.

import requests

url = 'https://httpbin.org/cookies'

# Use the Session as a context manager
with requests.Session() as s:
    s.get('https://httpbin.org/cookies/set/contextcookie/abc')
    response = s.get(url)
    print("Cookies sent within 'with' block:", response.json())

# After the 'with' block, the session 's' is automatically closed.
# Making a request now might fail or use a new connection pool if s was reused (not recommended)
# print("\nTrying to use session after 'with' block (might not work as expected)...")
# try:
#    response_after = s.get(url)
#    print(response_after.text)
# except Exception as e:
#    print(f"Error using session after close: {e}")

print("\nSession automatically closed after 'with' block.")

The with statement ensures that s.close() is called automatically at the end of the block, even if errors occur. This cleans up the underlying connections managed by the Transport Adapters.

How It Works Internally

So, how does the Session actually achieve this persistence and efficiency?

  1. State Storage: The Session object itself holds onto configuration like headers, cookies (in a Cookie Jar), auth, params, etc.
  2. Request Preparation: When you call a method like s.get(url, headers=...), the Session takes your request details and its own stored settings and merges them together. It uses these merged settings to create the PreparedRequest object we saw in Chapter 2. Session cookies and headers get added automatically during this step (Session.prepare_request).
  3. Transport Adapters & Pooling: The Session doesn’t directly handle network sockets. It delegates the sending of the PreparedRequest to a suitable Transport Adapter (usually HTTPAdapter for HTTP/HTTPS). Each Session typically keeps instances of these adapters. The adapter is responsible for managing the pool of underlying network connections (urllib3’s connection pool). When you make a request to https://example.com, the adapter checks if it already has an open, reusable connection to that host in its pool. If yes, it uses it (much faster!). If not, it creates a new one and potentially adds it to the pool for future reuse.
  4. Response Processing: When the adapter receives the response, it builds the Response object. The Session then gets the Response back from the adapter. Crucially, it inspects the response headers (like Set-Cookie) and updates its own state (e.g., adds new cookies to its Cookie Jar).

Here’s a simplified diagram showing two requests using a Session:

sequenceDiagram
    participant User as Your Code
    participant Sess as Session Object
    participant PrepReq as PreparedRequest
    participant Adapter as Transport Adapter (holds connection pool)
    participant Server as Web Server

    User->>Sess: Create Session()
    User->>Sess: s.get(url1, headers={'User-Header': 'A'})
    Sess->>Sess: Merge s.headers, s.cookies, s.auth... with User's headers/data
    Sess->>PrepReq: prepare_request(merged_settings)
    Sess->>Adapter: send(prepared_request)
    Adapter->>Adapter: Get connection from pool (or create new)
    Adapter->>Server: Send HTTP Request 1 (with session+user headers, session cookies)
    Server-->>Adapter: Send HTTP Response 1 (sets cookie 'C')
    Adapter->>Sess: Return Response 1
    Sess->>Sess: Extract cookie 'C' into s.cookies
    Sess-->>User: Return Response 1

    User->>Sess: s.get(url2)
    Sess->>Sess: Merge s.headers, s.cookies ('C'), s.auth...
    Sess->>PrepReq: prepare_request(merged_settings)
    Sess->>Adapter: send(prepared_request)
    Adapter->>Adapter: Get REUSED connection from pool
    Adapter->>Server: Send HTTP Request 2 (with session headers, cookie 'C')
    Server-->>Adapter: Send HTTP Response 2
    Adapter->>Sess: Return Response 2
    Sess-->>User: Return Response 2

You can see the core logic in requests/sessions.py. The Session.request method orchestrates the process:

# File: requests/sessions.py (Simplified View)

# [...] imports and helper functions

class Session(SessionRedirectMixin):
    def __init__(self):
        # Stores persistent headers, cookies, auth, etc.
        self.headers = default_headers()
        self.cookies = cookiejar_from_dict({})
        self.auth = None
        self.params = {}
        # [...] other defaults like verify, proxies, max_redirects
        self.adapters = OrderedDict() # Holds Transport Adapters
        self.mount('https://', HTTPAdapter()) # Default adapter for HTTPS
        self.mount('http://', HTTPAdapter())  # Default adapter for HTTP

    def prepare_request(self, request):
        """Prepares a Request object with Session settings."""
        p = PreparedRequest()

        # MERGE session settings with request settings
        merged_cookies = merge_cookies(RequestsCookieJar(), self.cookies)
        if request.cookies:
            merged_cookies = merge_cookies(merged_cookies, cookiejar_from_dict(request.cookies))

        merged_headers = merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict)
        merged_params = merge_setting(request.params, self.params)
        merged_auth = merge_setting(request.auth, self.auth)
        # [...] merge other settings like hooks

        p.prepare(
            method=request.method.upper(),
            url=request.url,
            headers=merged_headers,
            files=request.files,
            data=request.data,
            json=request.json,
            params=merged_params,
            auth=merged_auth,
            cookies=merged_cookies, # Pass merged cookies to PreparedRequest
            hooks=merge_hooks(request.hooks, self.hooks),
        )
        return p

    def request(self, method, url, **kwargs):
        """Constructs a Request, prepares it, sends it."""
        # Create the initial Request object from user args
        req = Request(method=method.upper(), url=url, **kwargs) # Simplified

        # Prepare the request, merging session state
        prep = self.prepare_request(req)

        # Get environment settings (proxies, verify, cert) merged with session settings
        proxies = kwargs.get('proxies') or {}
        settings = self.merge_environment_settings(prep.url, proxies,
                                                  kwargs.get('stream'),
                                                  kwargs.get('verify'),
                                                  kwargs.get('cert'))
        send_kwargs = {'timeout': kwargs.get('timeout'),
                       'allow_redirects': kwargs.get('allow_redirects', True)}
        send_kwargs.update(settings)

        # Send the prepared request using the appropriate adapter
        resp = self.send(prep, **send_kwargs)

        return resp

    def send(self, request, **kwargs):
        """Sends a PreparedRequest object."""
        # [...] set default kwargs if needed

        # Get the right adapter (e.g., HTTPAdapter) based on URL
        adapter = self.get_adapter(url=request.url)

        # The adapter sends the request (using connection pooling)
        r = adapter.send(request, **kwargs)

        # [...] response hook processing

        # IMPORTANT: Extract cookies from the response and store them in the session's cookie jar
        extract_cookies_to_jar(self.cookies, request, r.raw)

        # [...] redirect handling (which also extracts cookies)

        return r

    def get_adapter(self, url):
        """Finds the Transport Adapter for the URL (e.g., HTTPAdapter)."""
        # ... loops through self.adapters ...
        # Simplified: return self.adapters['http://'] or self.adapters['https://']
        for prefix, adapter in self.adapters.items():
            if url.lower().startswith(prefix.lower()):
                return adapter
        raise InvalidSchema(f"No connection adapters were found for {url!r}")

    def mount(self, prefix, adapter):
        """Attaches a Transport Adapter to handle URLs starting with 'prefix'."""
        self.adapters[prefix] = adapter
        # [...] sort adapters by prefix length

    def close(self):
        """Closes the session and all its adapters (and connections)."""
        for adapter in self.adapters.values():
            adapter.close()

    # [...] other methods like get(), post(), put(), delete() which call self.request()
    # [...] redirect handling logic in SessionRedirectMixin

The key takeaways are:

  • The Session object holds the state (headers, cookies, auth).
  • prepare_request merges this state with the details of the specific request you’re making.
  • send uses a Transport Adapter (like HTTPAdapter) which handles the actual network communication and connection pooling.
  • After a response is received, send (and the redirection logic) updates the Session’s cookies.

Conclusion

You’ve learned about the requests.Session object, a powerful tool for making multiple requests to the same host efficiently. You saw how it automatically handles cookie persistence and provides significant performance benefits through connection pooling (via Transport Adapters). You also learned how to set persistent headers, auth, and other settings on a session. Using a Session is the recommended approach when your script needs to interact with a website more than once.

We mentioned that the Session stores cookies in a “Cookie Jar”. What exactly is that, and can we interact with it more directly? Let’s find out.

Next: Chapter 4: The Cookie Jar


Generated by AI Codebase Knowledge Builder