-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize cookie handling #933
Comments
@Mantisus thanks for bringing this up. Let's split it into 3 separate PRs please. |
Regarding 1, this will probably involve changing the |
I would focus on 3 first, that feels like the biggest issue to me. Scraping multiple domains in a single crawler is not a very common use case. |
Agreed, even though 2. is similar in terms of severity (but yeah, playwright is a bit more popular) |
2 of 3. But for multi-domain cookie support, we'd really have to go to something like |
…in `PlaywrightCrawler` (#941) ### Description - Improve cookie handling for `PlaywrightCrawler`. Cookies are now stored in the `Session` and set in Playwright Context from the `Session`. - Add `use_incognito_pages` option for `browser_launch_options` allowing each new page to be launched in a separate context. ### Issues - #722 - #933
### Description - fix cookie handling. Behavior alignment with `HttpxHttpClient`. ### Issues - #933
Currently we have 3 main cookie handling mechanisms depending on the HTTP client or browser, and none work correctly.
HttpxHttpClient
.This solution is closest to expected. Cookies are stored in
Session
. However, we usedict
which loses the cookie-domain relationship. This can cause issues during cross-domain crawling.CurlImpersonateHttpClient
.Session
knows nothing about cookies and all cookies are stored at theAsyncSession
level. As a result, if we don't use proxies, all sessions have identical cookies. If we work with proxies, cookies become tied to the proxy.Playwright
.Session
knows nothing about cookies and all cookies are stored at thePlaywrightContext
level, meaning all sessions working from one context will operate with the same cookies.The text was updated successfully, but these errors were encountered: