Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
618 views
in Technique[技术] by (71.8m points)

javascript - Puppeteer never completely loads the page

I've been trying to use Puppeteer to scrape a website, but when I try to obtain the screenshot it never loads it either goes to a TimeoutError or just never finishes.

(async () => {
        try{
        const navegador = await puppeteer.launch({headless: false},{defaultViewport: null});
        const pagina = await navegador.newPage();
        await pagina.setDefaultNavigationTimeout(3000);
        await pagina.goto(urlSitio, {waitUntil: 'load'});
        await pagina.setViewport({width: 1920, height: 1080});
        await pagina.waitForNavigation({waitUntil: 'load'});
        await pagina.screenshot({
            fullPage: true,
            path: `temporales/temporal.png`
        });
        await navegador.close();
        }catch(err){
            console.log(err);
        }
    })();

I've tried to set await pagina.setDefaultNavigationTimeout(3000); to 0 and multiple other numbers.

I've tried removing headless: false.

I've also tried putting all the different options for

await pagina.waitForNavigation({waitUntil: 'load'});

The website example I'm using is https://www.xtract.io/

Error message:

(node:9644) UnhandledPromiseRejectionWarning: TimeoutError: Navigation timeout of 3000 ms exceeded
    at C:UsersSamuelDesktopsomnus-monitorack
ode_modulespuppeteerlibcjspuppeteercommonLifecycleWatcher.js:106:111
    at async FrameManager.navigateFrame (C:UsersSamuelDesktopsomnus-monitorack
ode_modulespuppeteerlibcjspuppeteercommonFrameManager.js:90:21)
    at async Frame.goto (C:UsersSamuelDesktopsomnus-monitorack
ode_modulespuppeteerlibcjspuppeteercommonFrameManager.js:416:16)
    at async Page.goto (C:UsersSamuelDesktopsomnus-monitorack
ode_modulespuppeteerlibcjspuppeteercommonPage.js:789:16)
    at async C:UsersSamuelDesktopsomnus-monitorackindex.js:103:9
(Use `node --trace-warnings ...` to show where the warning was created)
(node:9644) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:9644) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

There appears to be an unnecessary waitForNavigation call here. Since you already waited until page load, waiting for another navigation that never occurs is going to cause a timeout. Re-add the commented-out line below to reproduce your problem.

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch({
    headless: false, 
    defaultViewport: null,
  });

  try {
    const [page] = await browser.pages();
    await page.setViewport({width: 1920, height: 1080});
    await page.goto("https://www.xtract.io/", {waitUntil: "load"});
    //await page.waitForNavigation({waitUntil: "load"}); // this will timeout
    await page.screenshot({
      fullPage: true,
      path: "temporal.png",
    });
  }
  catch (err) {
    console.error(err);
  }

  await browser.close();
})();

As an aside, I don't think you meant to pass multiple objects to puppeteer.launch. Just add all of the settings to a single object as the second argument as shown above.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...