-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TakeScreenshot cannot take a full-page screenshot (beyond viewport) #322
Comments
Thanks for reporting. Can you share any more code or logs from 1? Secret Agent tracks everything in session databases for each "agent session" (https://secretagent.dev/docs/advanced/session) For 2, can you include your screenshot that got generated? For 3, I think I broke that trying to fix a different issue.. thanks for catching. |
On 1, I found that this happens for some reason when creating an http server that calls my test scraping function, and not when I call it more than once consecutive times in the same nodejs "thread". so here is my test function's typescript file:
|
and here is my http server written in javascript that compiles the typescript function above:
|
and here's how I invoked it from java (by running this standalone java file a second time while the node service is running):
|
I'll have to figure out how to take a session trace in a little while if you still need that. |
NOTE: in my testUrl function above, for this pretty simple webpage, it seems like waitForPaintingStable() didn't work as well as it should because on my machine, the scrollHeight obtained after waitForPaintingStable was 1413, but then when after taking the screenshot and writing it to a file, when I asked for scrollHeight again, it was 1971, prompting me to "retake" the screenshot, to make sure that wasn't part of why the screenshot is clipped. |
Here's the session db: |
I think what's happening is you are using the default "full-client" SecretAgent "connection" which is built for single use scrapes, but I think you're triggering the auto-shutdown when you call close the first time (think about booting up a script and then wanting the whole thing to tear down when you close). I think you'll get more reliable behavior by spinning up a CoreServer and then pointing your agents at the persistent server (SecretAgent already comes with a client/server setup - https://secretagent.dev/docs/advanced/remote). You can run the server in the same process as your existing server if you want - doesn't have to be a separate process. Regarding paintingStable - that event is specifically geared around the page being visible above the fold, not "all content loaded". You can add a "domContentLoaded" trigger to wait for the page to be fully "loaded" as well. With your screenshot, it seems like your viewport width & height are mismatched in your screenshot rectangle. Could that be why it's showing up with a strange shape? I guess it doesn't explain the x/y.. |
Thanks for the help. You were right about the viewport having switched the width and height. However, after fixing my code with everything you mentioned above, the screenshot still is clipped even though the height specified in takeScreenshot is always the full scrollHeight of 1971 for this url. I don't see anything else that could explain the clipping, and in fact, now it appears that though the scrollHeight is 1971, and indeed the screenshot image height is 1971, and includes the proper background for the full 1971, somehow the text content inside the dom looks like it is being clipped to the viewports height of 768. Is this possible? (Here's the fixed code)
|
Can you see if the latest version helps your screenshot issue if you provide no rectangle? |
Scratch that. I see it happening. No need to try it |
NOTE for implementation.. Looks like in Chromium, you have to change the visualViewport to take a full page screenshot then restore it. We need to think about how we should think about this from a detection perspective. |
Hi, I was wondering if this has turned out to be difficult to fix from the standpoint of bot detection, since I noticed that the behavior is still the same as of the latest version. What exactly would be the detection exposure if a quick-and-dirty fix were to be done? Is it possible that you could point us to an easy approach and we could take the risk of detection ourselves in some kind of plugin? |
@andynuss - I just haven't gotten to this. There's a lot of stuff on the plate to do, and this one just hasn't made it to the top of the priorities yet. You could give a plugin a try or a PR - I think for a plugin, you'd just want to be able to set the page to the full length of the page (here's how puppeteer does that: https://github.com/puppeteer/puppeteer/blob/327282e0475b1b680471cce6b9e74ecc14fd6536/src/common/Page.ts#L2664) |
The first issue is that after using this snippet to create a use-once agent:
and then when done scraping calling:
does work once after starting my node service that runs this function, but
subsequent times the same function is called, I get this error in node console:
The second surprise is the screenshot itself, taken with:
I used this url: https://www.whatsmyua.info
The visible text in the screenshot is not centered as one would expect for the page I used, but is more or less
left-justified, and a large portion of the page is clipped even though I used the scrollHeight, which I checked
had not grown after taking the screenshot.
The third problem is that if I call takeScreenshot this way it fails with an error, even though typescript tells me rectangle is optional:
Hope I didn't do something stupid!
The text was updated successfully, but these errors were encountered: