Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any interest / benefit in reducing number of GraphQL API calls required by combining similar queries? #64

Open
mosource21 opened this issue Jul 3, 2024 · 22 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@mosource21
Copy link

The library (title.php) currently uses many calls to the IMDB API.

All these separate GraphQL API calls are relatively slow and "costly" on resources.

Is there any interest / benefit on perhaps trying to reduce the number of API calls?

Proposal 1
Is is possible to combine the GraphQL for the "mainRating" and "Metacritic" into a single GraphQL query and get both values with a single API call?

Thanks

@duck7000
Copy link
Owner

duck7000 commented Jul 3, 2024

Well yes that is a possibility.
The slow response is mostly the waiting time for the api call to be executed in my understanding. The execution of the api call itself is quick (for the smaller methods)

I made all methods separate to be able to use what everybody exactly want without collecting lots of data that is not been used.
On the other side combining methods reduces API calls, but we/i have to be aware that the API query's not going to be too complicated to understand.

So for example combining all methods would be an impossible task as the query and the method would become so large.

Combining smaller methods with a small query would be a option as long as they somehow belong together

So @mosource21 would you do some more proposal's?

Maybe mainRating, Metacritic and Votes could be combined? they all belong to "rating stuff"

@mosource21
Copy link
Author

Well yes that is a possibility. The slow response is mostly the waiting time for the api call to be executed is my understanding. The execution of the api call itself is quick (for the smaller methods)
we/i have to be aware that the API query's not going to be too complicated to understand.
So for example combining all methods would be a impossible task as the query and the method would become so large.

Yes completely agree it is a balancing act. I wouldn't say it is slow at the moment it just could be much more efficient if we think there it is going to be a benefit. Everyone is using the library different but I could be using 10 API calls per title to only get back a relative small amount of data.

Combining smaller methods with a small query would be a option as long as they somehow belong together
Maybe mainRating, Metacritic and Votes could be combined? they all belong to "rating stuff"

Yes that sounds ideal. All assuming GraphQL can get all three items together?

So @mosource21 would you do some more proposal's?

These are specific to my use and exposure of all the library functions is limited so I may not have a great overview. Not sure if they will meet any criteria you are setting for "belonging together". If the method is returning "complex / multi fielded" data I thought it best to just leave as it is but anything which is just a single value or a list of single values I thought was fair game.

Proposal 2
Add "meta { canonicalId }" to titleYear(). Having the expense of an API call for just the canonicalId via checkRedirect seems excessive and wanting to know the canonicalid is something you would probably also want to know just like Titles, Year etc? Have a $this->imdbCanonicalID property that the end developer can check if they want to (set via titleYear and checkRedirect)?

Proposal 3
keyword(), language() and genre()
These are all simple lists of basic general data if you get one of them you may as well get the others - you probably need them anyway?

Try It First Before Investing To Much
Before going to far down the path I was thinking try it with "rating stuff" though. Are you going to try to keep compatibility or just change completely how it works? Some of the methods are currently returning values direct from the queried data and not $this->property so care is needed.

As before more than happy to be involved in the process if you want input, code suggestions and help with testing etc.

@duck7000
Copy link
Owner

duck7000 commented Jul 3, 2024

Thanks for your input, i'l take all this into consideration.

I'll check first if those three even can be combined at all

@GeorgeFive what is your view on this?

@GeorgeFive
Copy link

I don't think it's a bad idea. I'd like to keep compatibility of course, and the one thing I do enjoy now is how easy it is to find something. Could that be kept when combining? For example, if I want to get languages... I really don't even need the wiki or docs or anything, I can just look at $langs. Plot? It's over there in... dun dun dun... $plot.

If we start combining stuff, ie, $grading contains rating, metacritic, votes, etc... that may make it more difficult to find stuff.

I also happen to enjoy how everything is spread out, because I can get exactly what I need and nothing else. For example, I actually do get keywords, but not language and genres.

I don't think this is a HUGE issue, but it's just something to keep in mind.

@duck7000
Copy link
Owner

duck7000 commented Jul 3, 2024

Mm still thinking about this

Combining wouldn't much help i guess, let me give a example for my use case
if combining rating, metacritics and votes this will output an associative array with values. In my program i use those values separately so i have to call the same function over and over again thus still making the same api calls

@duck7000
Copy link
Owner

duck7000 commented Jul 3, 2024

Thanks @GeorgeFive

I also enjoy that everything is split out, so everybody can use exactly what they need. And combining can make that method slower as there is more data to be processed.
And yes compatibility is a issue after combining

So it may not be the best solution after all i guess.
So sorry @mosource21 but i will keep it like it is, it works great, simple to understand and imdb GraphQL does not care about a lot of api calls, they use it on the fly on their own website.

I do appreciate you all for thinking about how to make it better though, so thank you all!

@mosource21
Copy link
Author

mosource21 commented Jul 4, 2024

No problem that is why I asked. I can't argue it does work great.

I may have a try myself though is there any resources on the basics of GraphQL and specifics for IMDB (fields name list) that you use?

Can I just change the query name to whatever I like and add more fields like below and it should just work or does things like query name (NewRating in my example below) need to be something specific to IMDB to work etc?

$query = <<<EOF
    query NewRating(\$id: ID!) {
      title(id: \$id) {
        ratingsSummary {
          aggregateRating
          voteCount
        }
      }
    }
EOF;

@duck7000
Copy link
Owner

duck7000 commented Jul 4, 2024

Documentation is here
https://developer.imdb.com/documentation/api-documentation/sample-queries/title-name/?ref_=side_nav

QueryName is indeed anything you like as long as you use the same name in api request ("TitltleYear" must be the same as queryName)
$data = $this->graphql->query($query, "TitleYear", ["id" => "tt$this->imdbID"]);

You can use multiple query's like in your example.
Any spaces inside the query are stripped off so don't worry about that but keep the structure humanly readable.
EDIT: i reverted my changes about query layout (GeorgeFive did have issues with it) so start your query's from the left without spaces and indent with 2 spaces

I really doubt combining would benefit speed but it could make some difference i suppose

If you really want to know everything about how GraphQL works you have to study al lot though, it is a lot!
So easiest is to re use code that already is there and working

@mosource21 mosource21 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 4, 2024
@duck7000
Copy link
Owner

duck7000 commented Jul 5, 2024

@mosource21 @GeorgeFive
I'm still thinking about this combining thing

I can for example add titleYear() to __construct() so that the basic info is always available?
It still will be callable through separate methods but saves excessive calls to the same method over and over again

Example
If i want title, movieType and year we (in the current state) call those separate methods each calling titleYear() 3 times for the same info, this is stupid.

We can add other basic info that anyone would want to titleYear() accessible through separate methods

So combining stuff might not a bad idea after all?

Basis info could/would be:
title
original title
year
endYear
movieType
language
country
genre
plotoutline?
runtime?
principalCredits
rating
metacritic?
votes?
Rank?

@duck7000 duck7000 reopened this Jul 5, 2024
@GeorgeFive
Copy link

Not a bad idea... it won't do much for me (this would save me two calls total), but if it helps others, I'm definitely for it.

@duck7000
Copy link
Owner

duck7000 commented Jul 6, 2024

In my case this will save a total of 9 calls as i use all.
So yes it is worth the trouble i guess.

I'll leave this on the back burner for now

@mosource21
Copy link
Author

My original aim with this question was to try to reduce the "load" on the IMDB servers and to some degree speed up things at my end as each request has an overhead for me.

If IMDB use the GraphQL interface themselves on the website I guess it may not be a huge priority as this library usage would just be a drop in the ocean but I would be very surprised there isn't a limit on the number of requests before IMDB notice and temporarily block access.

Example If i want title, movieType and year we (in the current state) call those separate methods each calling titleYear() 3 times for the same info, this is stupid.

Strictly speaking you should always have caching enabled. I do not see any reason why you need "live" information from IMDB. If you have the cache enabled it doesn't really matter calling titleYear() three times as the second and third requests should use the cache.

Even if you cache requests for a day (or even a few hours if you do have a need for more "live" information) it will have a decent impact of the "load" on the IMDB servers and therefore reduce the possibility of them noticing or blocking accessing. It also has the benefit of speeding up the library as any repeat requests will be served from the cache.

We can add other basic info that anyone would want to titleYear() accessible through separate methods
So combining stuff might not a bad idea after all?
Basis info could/would be: title original title year endYear movieType language country genre plotoutline? runtime? principalCredits rating metacritic? votes? Rank?

The main problem is everyone will have different needs.

I have combined some of the more basic requests and achieved a significant improvement for my specific usage - this is not guaranteed and will all depend on how you are using the library. It will make updating the library a bit more difficult but not much so as my changes are quite modular.

I think an argument can be made for combining similar things (see original proposals/suggestions) but take it too far and combine everything you are sending a massive GraphQL query which may not be the best thing to do.

I can for example add titleYear() to __construct() so that the basic info is always available? It still will be callable through separate methods but saves excessive calls to the same method over and over again

See above - for me I would just end up having to comment out this line in __construct as I have my own implementation for the data normally retrieved by titleYear().

@duck7000
Copy link
Owner

Strictly speaking you should always have caching enabled. I do not see any reason why you need "live" information from IMDB. If you have the cache enabled it doesn't really matter calling titleYear() three times as the second and third requests should use the cache.

This is a fair point of course!
For my use case i need/want "live" info as i only need it to add a specific movie to my program so i want the latest info.
But caching is a good solution for calling titleYear() 3 times

And yes we all have different use cases and different needs.

So i think that our main conclusion is that this is, for the main use of this library, not the way to go.
Caching will be more beneficial so we might put more focus on that.
If we do combine stuff it will be like titleYear() with separate methods and use caching to avoid too many requests to imdb.

For now this will be on the backburner but i'll keep it in mind.

Thanks everyone for your input!

@duck7000 duck7000 added enhancement New feature or request question Further information is requested labels Jul 28, 2024
@duck7000
Copy link
Owner

@GeorgeFive , @mosource21

I'm thinking again about this and came up with this idea

Would it be of any interest to make a second Title class (TitleCombined for example) and use that to make combined methods?
This class would contain only combined methods in the form of categorized items like rating, main, didYouKnow etc.

Still it would be not suitable for everyone but it might be a start?

@GeorgeFive
Copy link

I don't think it's a bad idea, I don't see a problem there. Could be interesting!

@duck7000
Copy link
Owner

Okay i'm going to create it.

I need some input as to what methods need/can be combined
@mosource21 already gave some examples

@duck7000
Copy link
Owner

@mosource21
@GeorgeFive

I did made a new class TitleCombined, it contains (for now) one main method that fetches the main info of a movie or series. It contains data what is visible on IMDb movie pages inside the black background part at the top (minus the mpaa)
It is not completely finished jet but this will be the basic idea

<?php
#############################################################################
# imdbGraphQLPHP                                 ed (github user: duck7000) #
# written by Ed                                                             #
# ------------------------------------------------------------------------- #
# This program is free software; you can redistribute and/or modify it      #
# under the terms of the GNU General Public License (see doc/LICENSE)       #
#############################################################################

namespace Imdb;

use Psr\SimpleCache\CacheInterface;

/**
 * A title on IMDb
 * @author Ed
 * @copyright (c) 2024 Ed
 */
class TitleCombined extends MdbBase
{

    protected $main = array();
    protected $mainCreditsPrincipal = array();
    protected $mainPoster = "";
    protected $mainPosterThumb = "";
    protected $mainPlotoutline = "";
    protected $mainMovietype = "";
    protected $mainTitle = "";
    protected $mainOriginalTitle = "";
    protected $mainYear = -1;
    protected $mainEndYear = -1;
    protected $mainRating = 0;
    protected $mainGenres = array();
    protected $mainRuntime = 0;

    /**
     * @param string $id IMDb ID. e.g. 285331 for https://www.imdb.com/title/tt0285331/
     * @param Config $config OPTIONAL override default config
     * @param LoggerInterface $logger OPTIONAL override default logger `\Imdb\Logger` with a custom one
     * @param CacheInterface $cache OPTIONAL override the default cache with any PSR-16 cache.
     */
    public function __construct($id, Config $config = null, LoggerInterface $logger = null, CacheInterface $cache = null)
    {
        parent::__construct($config, $logger, $cache);
        $this->setid($id);
    }

    /**
     * This method will only get main values of a imdb title
     */
    public function main()
    {
        $query = <<<EOF
query TitleYear(\$id: ID!) {
  title(id: \$id) {
    titleText {
      text
    }
    originalTitleText {
      text
    }
    titleType {
      text
    }
    releaseYear {
      year
      endYear
    }
    primaryImage {
      url
      width
      height
    }
    runtime {
      seconds
    }
    ratingsSummary {
      aggregateRating
    }
    titleGenres {
      genres {
        genre {
          text
        }
        subGenres {
          keyword {
            text {
              text
            }
          }
        }
      }
    }
    plot {
      plotText {
        plainText
      }
    }
    principalCredits {
      credits {
        name {
          nameText {
            text
          }
          id
        }
        category {
          text
        }
      }
    }
  }
}
EOF;
        $data = $this->graphql->query($query, "TitleYear", ["id" => "tt$this->imdbID"]);

        $this->mainTitle = trim(str_replace('"', ':', trim($data->title->titleText->text, '"')));
        $this->mainOriginalTitle  = trim(str_replace('"', ':', trim($data->title->originalTitleText->text, '"')));
        $this->mainMovietype = isset($data->title->titleType->text) ? $data->title->titleType->text : '';
        $this->mainYear = isset($data->title->releaseYear->year) ? $data->title->releaseYear->year : '';
        $this->mainEndYear = isset($data->title->releaseYear->endYear) ? $data->title->releaseYear->endYear : null;
        if ($this->mainYear == "????") {
            $this->mainYear = "";
        }
        $this->mainRuntime = isset($data->title->runtime->seconds) ? $data->title->runtime->seconds / 60 : 0;
        $this->mainRating = isset($data->title->ratingsSummary->aggregateRating) ? $data->title->ratingsSummary->aggregateRating : 0;
        $this->mainPlotoutline = isset($data->title->plot->plotText->plainText) ? $data->title->plot->plotText->plainText : "";
        
        // Image
        $this->populatePoster($data);

        // Genres
        $this->genre($data);

        // Credits
        $this->principalCredits($data);

        $this->main = array(
            'title' => $this->mainTitle,
            'originalTitle' => $this->mainOriginalTitle,
            'imdbid' => $this->imdbID,
            'movieType' => $this->mainMovietype,
            'year' => $this->mainYear,
            'endYear' => $this->mainEndYear,
            'imgThumb' => $this->mainPosterThumb,
            'imgFull' => $this->mainPoster,
            'runtime' => $this->mainRuntime,
            'rating' => $this->mainRating,
            'genre' => $this->mainGenres,
            'plotoutline' => $this->mainPlotoutline,
            'credits' => $this->mainCreditsPrincipal
        );
        return $this->main;
    }


    #========================================================[ Helper functions ]===
    #===============================================================================

    #========================================================[ photo/poster ]===
    /**
     * Setup cover photo (thumbnail and big variant)
     * @see IMDB page / (TitlePage)
     */
    private function populatePoster($data)
    {
        if (isset($data->title->primaryImage->url) && $data->title->primaryImage->url != null) {
            $fullImageWidth = $data->title->primaryImage->width;
            $fullImageHeight = $data->title->primaryImage->height;
            $newImageWidth = 190;
            $newImageHeight = 281;
            $img = str_replace('.jpg', '', $data->title->primaryImage->url);
            $parameter = $this->resultParameter($fullImageWidth, $fullImageHeight, $newImageWidth, $newImageHeight);
            
            // thumb image
            $this->mainPosterThumb = $img . $parameter;
            
            // full image
            $this->mainPoster = $img . 'QL100_SX1000_.jpg';
        }
    }

    /**
     * Calculate The total result parameter and determine if SX or SY is used
     * @parameter $fullImageWidth the width in pixels of the large original image
     * @parameter $fullImageHeight the height in pixels of the large original image
     * @parameter $newImageWidth the width in pixels of the desired cropt/resized thumb image
     * @parameter $newImageHeight the height in pixels of the desired cropt/resized thumb image
     * @return string example 'QL100_SX190_CR0,15,190,281_.jpg'
     * QL100 = Quality Level, 100 the highest, 0 the lowest quality
     * SX190 = S (scale) X190 desired width
     * CR = Crop (crop left and right, crop top and bottom, New width, New Height)
     * @see IMDB page / (TitlePage)
     */
    private function resultParameter($fullImageWidth, $fullImageHeight, $newImageWidth, $newImageHeight)
    {
        // original source aspect ratio
        $ratio_orig = $fullImageWidth / $fullImageHeight;

        // new aspect ratio
        $ratio_new = $newImageWidth / $newImageHeight;

        // check if the image must be treated as SX or SY
        if ($ratio_new < $ratio_orig) {
            $cropParameter = $this->thumbUrlCropParameter($fullImageWidth, $fullImageHeight, $newImageWidth, $newImageHeight);
            return 'QL75_SY' . $newImageHeight . '_CR' . $cropParameter . ',0,' . $newImageWidth . ',' . $newImageHeight . '_.jpg';
        } else {
            $cropParameter = $this->thumbUrlCropParameterVertical($fullImageWidth, $fullImageHeight, $newImageWidth, $newImageHeight);
            return 'QL75_SX' . $newImageWidth . '_CR0,' . $cropParameter . ',' . $newImageWidth .',' . $newImageHeight . '_.jpg';
        }
    }

    /**
     * Calculate if cropValue has to be round to previous or next even integer
     * @parameter $totalPixelCropSize how much pixels in total need to be cropped
     */
    private function roundInteger($totalPixelCropSize)
    {
        if ((($totalPixelCropSize - floor($totalPixelCropSize)) < 0.5)) {
            // Previous even integer
            $num = 2 * round($totalPixelCropSize / 2.0);
        } else {
            // Next even integer
            $num = ceil($totalPixelCropSize);
            $num += $num % 2;
        }
        return $num;
    }

    /**
     * Calculate HORIZONTAL (left and right) crop value for primary, cast, episode, recommendations and mainphoto images
     * Output is for portrait images!
     * @parameter $fullImageWidth the width in pixels of the large original image
     * @parameter $fullImageHeight the height in pixels of the large original image
     * @parameter $newImageWidth the width in pixels of the desired cropt/resized thumb image
     * @parameter $newImageHeight the height in pixels of the desired cropt/resized thumb image
     * @see IMDB page / (TitlePage)
     */
    private function thumbUrlCropParameter($fullImageWidth, $fullImageHeight, $newImageWidth, $newImageHeight)
    {
        $newScalefactor = $fullImageHeight / $newImageHeight;
        $scaledWidth = $fullImageWidth / $newScalefactor;
        $totalPixelCropSize = $scaledWidth - $newImageWidth;
        $cropValue = max($this->roundInteger($totalPixelCropSize)/2, 0);
        return $cropValue;
    }

    /**
     * Calculate VERTICAL (Top and bottom)crop value for primary, cast, episode and recommendations images
     * Output is for landscape images!
     * @parameter $fullImageWidth the width in pixels of the large original image
     * @parameter $fullImageHeight the height in pixels of the large original image
     * @parameter $newImageWidth the width in pixels of the desired cropt/resized thumb image
     * @parameter $newImageHeight the height in pixels of the desired cropt/resized thumb image
     * @see IMDB page / (TitlePage)
     */
    private function thumbUrlCropParameterVertical($fullImageWidth, $fullImageHeight, $newImageWidth, $newImageHeight)
    {
        $newScalefactor = $fullImageWidth / $newImageWidth;
        $scaledHeight = $fullImageHeight / $newScalefactor;
        $totalPixelCropSize = $scaledHeight - $newImageHeight;
        $cropValue = max($this->roundInteger($totalPixelCropSize)/2, 0);
        return $cropValue;
    }
    
    #--------------------------------------------------------------[ Genre(s) ]---
    /** Get all genres the movie is registered for
     * @return array genres (array[0..n] of mainGenre| string, subGenre| array())
     * @see IMDB page / (TitlePage)
     */
    private function genre($data)
    {
        if (empty($this->mainGenres)) {
            if (isset($data->title->titleGenres->genres) && !empty($data->title->titleGenres->genres)) {
                foreach ($data->title->titleGenres->genres as $edge) {
                    $subGenres = array();
                    if (isset($edge->subGenres) && !empty($edge->subGenres)) {
                        foreach ($edge->subGenres as $subGenre) {
                            $subGenres[] = $subGenre->keyword->text->text;
                        }
                    }
                    $this->mainGenres[] = array(
                        'mainGenre' => $edge->genre->text,
                        'subGenre' => $subGenres
                    );
                }
            }
        }
    }
    
    #=====================================================[ /fullcredits page ]===
    #----------------------------------------------------------------[ PrincipalCredits ]---
    /*
    * Get the PrincipalCredits for this title
    * @return array creditsPrincipal[category][Director, Writer, Creator, Stars] (array[0..n] of array[name,imdbid])
    * Not all categories are always available, TV series has Creator instead of writer
    */
    private function principalCredits($data)
    {
        if (empty($this->mainCreditsPrincipal)) {
            foreach ($data->title->principalCredits as $value){
                $cat = $value->credits[0]->category->text;
                if ($cat == "Actor" || $cat == "Actress") {
                    $category = "Star";
                } else {
                    $category = $cat;
                }
                $temp = array();
                foreach ($value->credits as $key => $credit) {
                    $temp[] = array(
                        'name' => isset($credit->name->nameText->text) ? $credit->name->nameText->text : '',
                        'imdbid' => isset($credit->name->id) ? str_replace('nm', '', $credit->name->id) : ''
                    );
                    if ($key == 2) {
                        break;
                    }
                }
                $this->mainCreditsPrincipal[$category] = $temp;
            }
        }
    }

}

@mosource21
Copy link
Author

I think this idea is near perfect.

The functionality is focused (i.e. you are aiming for black background part at the top) and as everyone has different needs it can then be used as an example / template for customizing.

The only slight disadvantage is if you need to fix / update / enhance Title.php you also need to update TitleCombined.php so there is an admin overhead.

A couple of minor suggestions (untested):

1. Update TitleYear in both these lines to TitleCombined

query TitleYear(\$id: ID!) {
$data = $this->graphql->query($query, "TitleYear", ["id" => "tt$this->imdbID"]);

2. Include the canonicalId so you can check that the imdbid hasn't been updated if you want

meta {
  canonicalId
}
$this->mainCanonicalId = isset($data->meta->canonicalId) ? $data->meta->canonicalId : "tt$this->imdbID";

@duck7000
Copy link
Owner

duck7000 commented Sep 30, 2024

@mosource21

Thanks for your comments!

But i don't understand what you mean at point 1?
I do want to keep those classes separated so they don't depend on each other. But yes it is slightly more work to update both classes.

Point 2:
I will add as this is indeed something that everyone probably need anyway

@mosource21
Copy link
Author

But i don't understand what you mean at point 1?

Nothing of major importance purely a code readability suggestion.

Change the name of the GraphQL query from TitleYear to something more descriptive - TitleCombined (could equally be just Combined, BlackBox etc but TitleCombined seems the best fit in my mind 😃)

        $query = <<<EOF
query TitleCombined(\$id: ID!) {
  title(id: \$id) {
    titleText {
...
<snip>
...
}
EOF;
        $data = $this->graphql->query($query, "TitleCombined", ["id" => "tt$this->imdbID"]);

@duck7000
Copy link
Owner

duck7000 commented Oct 1, 2024

Ah i understand what you mean now, i will change it to something more meaningful, thanks.

@duck7000
Copy link
Owner

duck7000 commented Oct 1, 2024

@mosource21
@GeorgeFive
I added this new class so check it out if you like and let me know if there are issues or room for improvements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants