Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem running nodriver in headless mode #5

Open
KenyOnFire opened this issue Sep 3, 2024 · 5 comments
Open

Problem running nodriver in headless mode #5

KenyOnFire opened this issue Sep 3, 2024 · 5 comments

Comments

@KenyOnFire
Copy link

I was doing tests with the nodriver module, when I tried to test the headless mode and I discovered that when activating this mode, the user-agent is modified and this makes the browser detectable as a bot, I attach the user-agent that is returned to me when using headless. Thank you!

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/128.0.0.0 Safari/537.36

TEMPORALY FIX:
Inside the nodriver module there is a class called Config, on line 185 after
if self.headless: args.append("--headless=new")
I have included a request with the requests module to obtain the latest useragent for chrome without that supposed 'Headless' and thanks to this before the execution the 'Headless' text disappears, I leave the code here in case it helps someone
so_key = {"windows": "windows", "linux": "linux", "darwin": "mac"}[platform.system().lower()] ua = next(ua for ua in requests.get("https://jnrbsn.github.io/user-agents/user-agents.json").json() if so_key in ua.lower() and "chrome" in ua.lower() and "firefox" not in ua.lower()) args.append('--user-agent=' + ua)

@ioio101
Copy link

ioio101 commented Sep 3, 2024

The irony in a library designed to ensure Chrome's stealth as a web scraper, yet inadvertently revealing itself by failing to suppress the very "HeadlessChrome" signature it was supposed to conceal in headless mode.

@devblack
Copy link

devblack commented Sep 4, 2024

requests.get("https://jnrbsn.github.io/user-agents/user-agents.json").json()

Hello. That is unnecessary. you can manually replace it with useragent_override and replace() method.

@KenyOnFire
Copy link
Author

requests.get("https://jnrbsn.github.io/user-agents/user-agents.json").json(.json())

Hello. That is unnecessary. you can manually replace it with useragent_override and replace() method.

I know that it is not necessary or practical in the long run, but I couldn't apply your logic, could you be more specific about using the useragent_override method since I can't find any documentation about that, besides the idea is that before initializing the browser , carry the useragent without the word Headless like undetected chromedriver does. If you could give me an example code in which you perform this fix, that would be great and I could conclude the thread.

PD: I have also tried this code but it only injects the cdp of the current tab, and not the entire browser
async def change_useragent(self, useragent): self.page.feed_cdp(cdp.emulation.set_user_agent_override( useragent )) return await self.page.reload()

@boludoz
Copy link

boludoz commented Sep 4, 2024

The irony in a library designed to ensure Chrome's stealth as a web scraper, yet inadvertently revealing itself by failing to suppress the very "HeadlessChrome" signature it was supposed to conceal in headless mode.

Just run a javascript that does it or start chrome with the custom agent from the commands and stop crying.

@Toxenskiy
Copy link

Study the documentation on user agents

@github-staff github-staff deleted a comment from KenyOnFire Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@boludoz @ioio101 @KenyOnFire @devblack @Toxenskiy and others