Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] data loader batch load fails when non-data loader function is included among data-loader functions. #285

Open
Tracked by #297
hyunhoRIDI opened this issue Nov 7, 2021 · 8 comments

Comments

@hyunhoRIDI
Copy link

I am experiencing a weird edge case with data loader.

Suppose we have functions like below.

const firstQueryBatch = (ids: string[]) => do something batch with Ids...
const secondQueryNonBatch = (id: string) => ... do something with single id
const thirdQueryBatch = (ids: string[]) => ... do something batch with ids

const callThreeQueriesWithId = async (id: string) => {

  const firstResult = await (() => {
    new DataLoader(firstQueryBatch, {
        cache: false,
    });

    return (id: string) => loader.load(id)
  })();

  const secondResult = await secondQueryNonBatch(id); // non-data loader applied function

  const thirdResult = await (() => {
    new DataLoader(thirdQueryBatch, {
        cache: false,
    });

    return (id: string) => loader.load(id)
  })();

  return {
    firstResult,
    secondResult,
    thirdResult,
  };
};

const Ids = ['1', '2', '3'];
const finalResults = await Promise.all(Ids.map((id) => await callThreeQueriesWithId(id))));

Let's say there are total three async functions where the first and the last function use dataloader, but second function is just a plain non-dataloader function.

I simply expected that 'firstQueryBatch' and 'thirdQueryBatch' would be called once with batch Ids[](as dataloader should combine parameters) and only calling 'secondQueryNonBatch' three times as it is not a dataloader function.
However, when I debug the output, once it calls 'firstQueryBatch' once successfully, it ended up calling 'secondQueryNonBatch' AND 'thirdQueryBatch' three times. Dataloader fails to combine parameters for 'thirdQueryBatch' and calling it three times with parameters with single item in each array. ['1'], ['2'], then ['3'].

Could anyone explain why this is happening?

@hyunhoRIDI hyunhoRIDI changed the title [QUESTION] data loader batch load fails when non-data loader function is included among data-loader function. [QUESTION] data loader batch load fails when non-data loader function is included among data-loader functions. Nov 7, 2021
@saihaj saihaj mentioned this issue Mar 11, 2022
26 tasks
@lixurea
Copy link

lixurea commented Apr 22, 2022

Same question.
It fails to batch when there're other async functions around, more specifically, when using fetch from node-fetch in the async function.

For example:

const DataLoader = require('dataloader');
const fetch = require('node-fetch');

const myDataLoader = new DataLoader((keys) => {
  console.log('keys: ', keys);
  return Promise.resolve(keys);
});
const myAsyncFunc = () => fetch('https://www.google.com');

const myFunc = async (id) => {
  await myAsyncFunc(); // dataloader doesn't batch when this function is around
  await myDataLoader.load(id);
};

myFunc(1);
myFunc(2);
myFunc(3);

With await myAsyncFunc(), the result is:

keys:  [ 1 ]
keys:  [ 3 ]
keys:  [ 2 ]

Without await myAsyncFunc(), the result is:

keys:  [ 1, 2, 3 ]

@vivek-ng
Copy link

@lixurea @hyunhoRIDI This is expected because dataloader only batches requests made in the same event loop tick. #150

@hyunhoRIDI
Copy link
Author

@lixurea @hyunhoRIDI This is expected because dataloader only batches requests made in the same event loop tick. #150

werid thing is, if I put non-dataloader function behind, dataloader successfully collects the requests within a tick.
Using @lixurea's example,

const myFunc = async (id) => {
  await myAsyncFunc(); // dataloader doesn't batch when this function is around
  await myDataLoader.load(id);
};

this does not work. but

const myFunc = async (id) => {
  await myDataLoader.load(id);
  await myAsyncFunc(); // this way dataloader collects within a tick.
};

could you confirm this is right @lixurea ?

@lixurea
Copy link

lixurea commented Apr 25, 2022

@hyunhoRIDI yes, if the data loader function goes first, it successfully batches requests. Not sure why though.

@vivek-ng Thanks for the info. I've been reading about event loops, but still not sure what makes it a separate event loop tick?

@forrestwilkins
Copy link

@vivek-ng Is there any way to use permissions or authorization guards like directives or GraphQL Shield alongside dataloaders without .load getting hit multiple times? Was anyone else here able to solve this so far?

@forrestwilkins
Copy link

@anlauren
Copy link

@lixurea I'm currently having a look at how it works, and found your question.
One thing that I think might help to see why having another await before could cause a problem, is to re-write as we used to: using promises.

const myFunc = () => {
   return new Promise(resolve => {
      myDataLoader.load(id).then(result => {
         myAsyncFunc().then(resolve)
      });
   });
}

In this case, the code is first waiting at all dataloaders to run and then launches the async function
(all dataLoaders resolve at the same "time", when the batch function is ran, or rather they immediately resolve a value when the batch function returns).

however in the following case:

const myFunc = () => {
   return new Promise(resolve => {
      myAsyncFunc().then(result => {
         myDataLoader.load(id).then(resolve)
      });
   });
}

If ran in parallel, the calls to "myAsyncFunc" would not "wait" on each other before scheduling the next step which is to load data and then resolve it.

I find the "async await" writing does kind of hide the fact that we add to the promise queue some blocks of codes.

I'm basing that explanation on that interesting article and the node event loop

@EndyKaufman
Copy link

EndyKaufman commented Jan 25, 2025

Maybe it will help someone, I solved this problem like this

new DataLoader(
  async (ids) => {
    console.log(ids);
    //...some logic
  },
  {
    batchScheduleFn: (cb) => setTimeout(100).then(() => process.nextTick(cb)),
  }
);

the delay is needed to solve the problem with nextTick, if the number of parallel requests is very large, then you need to increase this delay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants