Chapter 3: Remembering Things - The Session Object
In Chapter 1, we learned the easiest way to make web requests using functions like requests.get()
. In Chapter 2, we looked at the Request
and Response
objects that structure our communication with web servers.
We also saw that the simple functional API methods like requests.get()
are great for single, one-off requests. But what if you need to talk to the same website multiple times? For example, maybe you need to:
- Log in to a website (which gives you a “session cookie” to prove you’re logged in).
- Make several requests to access different pages that require you to be logged in (using that cookie).
If you use requests.get()
for each step, you’ll have a problem. Remember how requests.get()
creates a temporary setup for each call and then throws it away? This means it forgets the login cookie immediately after the login request! Your next request will be like visiting the site as a brand new, logged-out user.
How can we make Requests
remember things between requests, just like your web browser does when you navigate around a logged-in site?
Meet the Session
Object: Your Persistent Browser Tab
This is where the requests.Session
object comes in!
Think of a Session
object as a dedicated browser tab you’ve opened just for interacting with a specific website or web service. What does a browser tab do?
- Remembers Cookies: If you log in on a website in one tab, that tab remembers your login cookie. When you click a link within that same tab, the browser automatically sends the cookie back, keeping you logged in.
- Keeps Connections Warm: Your browser often keeps the underlying network connection (TCP connection) to the website open for a little while. This makes clicking links and loading subsequent pages much faster because it doesn’t have to establish a new connection every single time. This is called connection pooling.
- Applies Consistent Settings: You might have browser extensions that add specific headers to your requests, or your browser sends a consistent “User-Agent” string identifying itself.
A requests.Session
object does all of these things for your Python script:
- Cookie Persistence: It automatically stores cookies sent by the server and sends them back on subsequent requests to the same domain.
- Connection Pooling: It reuses the underlying TCP connections for requests to the same host, significantly speeding up multiple requests. This is managed by components called Transport Adapters.
- Default Data: You can set default headers, authentication details, query parameters, or proxy settings directly on the
Session
object, and they will be applied to all requests made through that session.
Using a Session
Using a Session
is almost as easy as using the functional API. Instead of calling requests.get()
, you first create a Session
object, and then call methods like get()
or post()
on that object.
import requests
# 1. Create a Session object
s = requests.Session()
# Let's try accessing a page that requires a login (we're not logged in yet)
login_required_url = 'https://httpbin.org/cookies' # This page shows cookies sent to it
print("Trying to access protected page without login...")
response1 = s.get(login_required_url)
print("Cookies sent (should be none):", response1.json()) # httpbin returns JSON
# Now, let's simulate 'logging in' by visiting a page that sets a cookie
cookie_setter_url = 'https://httpbin.org/cookies/set/sessioncookie/123456789'
print("\nSimulating login by getting a cookie...")
response2 = s.get(cookie_setter_url)
# The session automatically stored the cookie! Check the session's cookie jar:
print("Session cookies after setting:", s.cookies.get_dict())
# Now, try accessing the 'protected' page again using the SAME session
print("\nTrying to access protected page AGAIN with the session...")
response3 = s.get(login_required_url)
print("Cookies sent (should have sessioncookie):", response3.json())
# Compare with using the functional API (which forgets cookies)
print("\nTrying the same with functional API (will fail)...")
response4 = requests.get(cookie_setter_url) # Gets cookie, but immediately forgets
response5 = requests.get(login_required_url)
print("Cookies sent via functional API (should be none):", response5.json())
What happened here?
s = requests.Session()
: We created our “persistent browser tab”.response1 = s.get(login_required_url)
: Our first request sent no cookies, as expected.response2 = s.get(cookie_setter_url)
: We visited a URL designed to send back aSet-Cookie
header. TheSession
object automatically noticed this and stored thesessioncookie
in its internal Cookie Jar.s.cookies.get_dict()
: We peeked inside the session’s cookie storage and saw the cookie was indeed saved.response3 = s.get(login_required_url)
: We made another request using the same sessions
. This time, the session automatically included thesessioncookie
in the request headers. The server received it!- The last part shows that if we used
requests.get()
instead, the cookie fromresponse4
would be lost, andresponse5
would fail to send it. TheSession
was crucial for remembering the cookie.
Persistent Settings: Headers, Auth, etc.
Besides cookies, you can set other things on the Session
that will apply to all its requests.
import requests
import os # To get environment variables for auth example
s = requests.Session()
# Set a default header for all requests made by this session
s.headers.update({'X-My-Custom-Header': 'HelloSession'})
# Set default authentication (using basic auth from environment variables for example)
# NOTE: Replace with actual username/password or use httpbin's basic-auth endpoint
# For httpbin, the user/pass is 'user'/'pass'
# s.auth = ('user', 'passwd') # Set directly if needed
httpbin_user = os.environ.get("HTTPBIN_USER", "testuser") # Fake user if not set
httpbin_pass = os.environ.get("HTTPBIN_PASS", "testpass") # Fake pass if not set
s.auth = (httpbin_user, httpbin_pass)
# Set default query parameters
s.params.update({'session_param': 'persistent'})
# Now make a request
url = 'https://httpbin.org/get' # Changed endpoint to see params
print(f"Making request with persistent session settings to: {url}")
response = s.get(url)
print(f"\nStatus Code: {response.status_code}")
# Check the response (httpbin.org/get echoes back request details)
response_data = response.json()
print("\nHeaders sent (look for X-My-Custom-Header):")
print(response_data['headers'])
# print("\nAuth info sent (if using httpbin basic-auth):")
# print(response_data.get('authenticated'), response_data.get('user')) # Won't show here for /get
print("\nQuery parameters sent (look for session_param):")
print(response_data['args'])
# Make another request to a different endpoint using the same session
headers_url = 'https://httpbin.org/headers'
print(f"\nMaking request to {headers_url}...")
response_headers = s.get(headers_url)
print("Headers received by second request (still has custom header):")
print(response_headers.json()['headers'])
What we see:
- The
X-My-Custom-Header
we set ons.headers
was automatically added to both requests. - The
session_param
we added tos.params
was included in the query string of the first request. - If we had used a real authentication endpoint, the
s.auth
details would have been used automatically. - We didn’t have to specify these details on each
s.get()
call! TheSession
handled it.
Using Sessions with with
(Context Manager)
Sessions manage resources like network connections. It’s good practice to explicitly close them when you’re done. The easiest way to ensure this happens is to use the Session
as a context manager with the with
statement.
import requests
url = 'https://httpbin.org/cookies'
# Use the Session as a context manager
with requests.Session() as s:
s.get('https://httpbin.org/cookies/set/contextcookie/abc')
response = s.get(url)
print("Cookies sent within 'with' block:", response.json())
# After the 'with' block, the session 's' is automatically closed.
# Making a request now might fail or use a new connection pool if s was reused (not recommended)
# print("\nTrying to use session after 'with' block (might not work as expected)...")
# try:
# response_after = s.get(url)
# print(response_after.text)
# except Exception as e:
# print(f"Error using session after close: {e}")
print("\nSession automatically closed after 'with' block.")
The with
statement ensures that s.close()
is called automatically at the end of the block, even if errors occur. This cleans up the underlying connections managed by the Transport Adapters.
How It Works Internally
So, how does the Session
actually achieve this persistence and efficiency?
- State Storage: The
Session
object itself holds onto configuration likeheaders
,cookies
(in a Cookie Jar),auth
,params
, etc. - Request Preparation: When you call a method like
s.get(url, headers=...)
, theSession
takes your request details and its own stored settings and merges them together. It uses these merged settings to create thePreparedRequest
object we saw in Chapter 2. Session cookies and headers get added automatically during this step (Session.prepare_request
). - Transport Adapters & Pooling: The
Session
doesn’t directly handle network sockets. It delegates the sending of thePreparedRequest
to a suitable Transport Adapter (usuallyHTTPAdapter
for HTTP/HTTPS). EachSession
typically keeps instances of these adapters. The adapter is responsible for managing the pool of underlying network connections (urllib3
’s connection pool). When you make a request tohttps://example.com
, the adapter checks if it already has an open, reusable connection to that host in its pool. If yes, it uses it (much faster!). If not, it creates a new one and potentially adds it to the pool for future reuse. - Response Processing: When the adapter receives the response, it builds the
Response
object. TheSession
then gets theResponse
back from the adapter. Crucially, it inspects the response headers (likeSet-Cookie
) and updates its own state (e.g., adds new cookies to itsCookie Jar
).
Here’s a simplified diagram showing two requests using a Session
:
sequenceDiagram
participant User as Your Code
participant Sess as Session Object
participant PrepReq as PreparedRequest
participant Adapter as Transport Adapter (holds connection pool)
participant Server as Web Server
User->>Sess: Create Session()
User->>Sess: s.get(url1, headers={'User-Header': 'A'})
Sess->>Sess: Merge s.headers, s.cookies, s.auth... with User's headers/data
Sess->>PrepReq: prepare_request(merged_settings)
Sess->>Adapter: send(prepared_request)
Adapter->>Adapter: Get connection from pool (or create new)
Adapter->>Server: Send HTTP Request 1 (with session+user headers, session cookies)
Server-->>Adapter: Send HTTP Response 1 (sets cookie 'C')
Adapter->>Sess: Return Response 1
Sess->>Sess: Extract cookie 'C' into s.cookies
Sess-->>User: Return Response 1
User->>Sess: s.get(url2)
Sess->>Sess: Merge s.headers, s.cookies ('C'), s.auth...
Sess->>PrepReq: prepare_request(merged_settings)
Sess->>Adapter: send(prepared_request)
Adapter->>Adapter: Get REUSED connection from pool
Adapter->>Server: Send HTTP Request 2 (with session headers, cookie 'C')
Server-->>Adapter: Send HTTP Response 2
Adapter->>Sess: Return Response 2
Sess-->>User: Return Response 2
You can see the core logic in requests/sessions.py
. The Session.request
method orchestrates the process:
# File: requests/sessions.py (Simplified View)
# [...] imports and helper functions
class Session(SessionRedirectMixin):
def __init__(self):
# Stores persistent headers, cookies, auth, etc.
self.headers = default_headers()
self.cookies = cookiejar_from_dict({})
self.auth = None
self.params = {}
# [...] other defaults like verify, proxies, max_redirects
self.adapters = OrderedDict() # Holds Transport Adapters
self.mount('https://', HTTPAdapter()) # Default adapter for HTTPS
self.mount('http://', HTTPAdapter()) # Default adapter for HTTP
def prepare_request(self, request):
"""Prepares a Request object with Session settings."""
p = PreparedRequest()
# MERGE session settings with request settings
merged_cookies = merge_cookies(RequestsCookieJar(), self.cookies)
if request.cookies:
merged_cookies = merge_cookies(merged_cookies, cookiejar_from_dict(request.cookies))
merged_headers = merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict)
merged_params = merge_setting(request.params, self.params)
merged_auth = merge_setting(request.auth, self.auth)
# [...] merge other settings like hooks
p.prepare(
method=request.method.upper(),
url=request.url,
headers=merged_headers,
files=request.files,
data=request.data,
json=request.json,
params=merged_params,
auth=merged_auth,
cookies=merged_cookies, # Pass merged cookies to PreparedRequest
hooks=merge_hooks(request.hooks, self.hooks),
)
return p
def request(self, method, url, **kwargs):
"""Constructs a Request, prepares it, sends it."""
# Create the initial Request object from user args
req = Request(method=method.upper(), url=url, **kwargs) # Simplified
# Prepare the request, merging session state
prep = self.prepare_request(req)
# Get environment settings (proxies, verify, cert) merged with session settings
proxies = kwargs.get('proxies') or {}
settings = self.merge_environment_settings(prep.url, proxies,
kwargs.get('stream'),
kwargs.get('verify'),
kwargs.get('cert'))
send_kwargs = {'timeout': kwargs.get('timeout'),
'allow_redirects': kwargs.get('allow_redirects', True)}
send_kwargs.update(settings)
# Send the prepared request using the appropriate adapter
resp = self.send(prep, **send_kwargs)
return resp
def send(self, request, **kwargs):
"""Sends a PreparedRequest object."""
# [...] set default kwargs if needed
# Get the right adapter (e.g., HTTPAdapter) based on URL
adapter = self.get_adapter(url=request.url)
# The adapter sends the request (using connection pooling)
r = adapter.send(request, **kwargs)
# [...] response hook processing
# IMPORTANT: Extract cookies from the response and store them in the session's cookie jar
extract_cookies_to_jar(self.cookies, request, r.raw)
# [...] redirect handling (which also extracts cookies)
return r
def get_adapter(self, url):
"""Finds the Transport Adapter for the URL (e.g., HTTPAdapter)."""
# ... loops through self.adapters ...
# Simplified: return self.adapters['http://'] or self.adapters['https://']
for prefix, adapter in self.adapters.items():
if url.lower().startswith(prefix.lower()):
return adapter
raise InvalidSchema(f"No connection adapters were found for {url!r}")
def mount(self, prefix, adapter):
"""Attaches a Transport Adapter to handle URLs starting with 'prefix'."""
self.adapters[prefix] = adapter
# [...] sort adapters by prefix length
def close(self):
"""Closes the session and all its adapters (and connections)."""
for adapter in self.adapters.values():
adapter.close()
# [...] other methods like get(), post(), put(), delete() which call self.request()
# [...] redirect handling logic in SessionRedirectMixin
The key takeaways are:
- The
Session
object holds the state (headers
,cookies
,auth
). prepare_request
merges this state with the details of the specific request you’re making.send
uses aTransport Adapter
(likeHTTPAdapter
) which handles the actual network communication and connection pooling.- After a response is received,
send
(and the redirection logic) updates theSession
’s cookies.
Conclusion
You’ve learned about the requests.Session
object, a powerful tool for making multiple requests to the same host efficiently. You saw how it automatically handles cookie persistence and provides significant performance benefits through connection pooling (via Transport Adapters). You also learned how to set persistent headers
, auth
, and other settings on a session. Using a Session
is the recommended approach when your script needs to interact with a website more than once.
We mentioned that the Session
stores cookies in a “Cookie Jar”. What exactly is that, and can we interact with it more directly? Let’s find out.
Next: Chapter 4: The Cookie Jar
Generated by AI Codebase Knowledge Builder