Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better IDE support #2180

Closed
jayoung-lee opened this issue Aug 1, 2019 · 40 comments
Closed

Better IDE support #2180

jayoung-lee opened this issue Aug 1, 2019 · 40 comments
Labels
type-enhancement A request for a change that isn't a bug

Comments

@jayoung-lee
Copy link

Respondents of the Q2 2019 Flutter user survey asked for better IDE support for developing and using packages, such as “add package” UI, “install the latest package” UI, etc. (Related: #1447)

cc: @mit-mit

@jonasfj jonasfj added the type-enhancement A request for a change that isn't a bug label Aug 5, 2019
@jonasfj
Copy link
Member

jonasfj commented Aug 5, 2019

I understand that "add package UI" would require a pub add <package> command.

What exactly is "install the latest package"? Do you mean pub upgrade or pub upgrade <package> ie. upgrade a specific package. And if so, then what about dependencies?
Should other packages respect the pubspec?

Or do you mean upgrade the version in pubspec.yaml rather than bumping the lock in pubspec.lock?


Note. programmatically modifying YAML is non-trivial if we want to preserve whitespace and comments. We could write a special parser for this, or we could drop all comments and whitespace formatting from the YAML files.

@InMatrix
Copy link

We might need more feedback from users on this issue. One concrete enhancement the community has been working on is autocompleting package names in the IDE as seen here: https://plugins.jetbrains.com/plugin/12693-flutter-enhancement-suite

@jonasfj
Copy link
Member

jonasfj commented Aug 14, 2019

One concrete enhancement the community has been working on is autocompleting package names in the IDE...

I suppose some API on the client or server to search package names would be useful things like this.

@DanTup
Copy link

DanTup commented Oct 24, 2019

I can think of a few features that would make a big difference in VS Code:

Some of these could be done in VS Code by using pub and/or its API directly, though it'd be nice long-term if some of these were handled by the analysis server. It might be possible we could build some already - maybe someone can answer the following questions:

  • Is there an API/location where I could get a list of all pub packages (either a full list, or matching some prefix the user has typed)?
  • I see I can already get all the versions for a package (eg. https://pub.dev/packages/devtools.json). Is this JSON stable, and is there any problem with VS Code calling it (if it's already well-cached because it's used by the pub client, I'm guessing VS Code calling it for completion would be a drop in the ocean?)
  • Is there a way to get a list of all updates for a list of packages efficiently (eg. if a user has 20 packages in pubspec, can we avoid 20 HTTP requests?)

@jonasfj
Copy link
Member

jonasfj commented Oct 29, 2019

Is there an API/location where I could get a list of all pub packages (either a full list, or matching some prefix the user has typed)?

We have no such stable API. And the API we have have not best hardened to handle load from IDEs just yet.

Is there a way to get a list of all updates for a list of packages efficiently (eg. if a user has 20 packages in pubspec, can we avoid 20 HTTP requests?)

I think this is efficient, be sure to make requests concurrently.

I see I can already get all the versions for a package (eg. https://pub.dev/packages/devtools.json). Is this JSON stable...

The API end-points documented in dartlang/pub/doc/repository-spec-v2.md is stable. These are used by the pub client and are heavily cached.


side note: please set a custom user-agent when calling pub.dev/api, something like User-Agent: dartcode/3.6.0 (+https://dartcode.org/) -- that way we can notify you if we're changing the APIs you're using.

@jonasfj
Copy link
Member

jonasfj commented Oct 29, 2019

@DanTup, is it really only a list of all packages that we need to make a stable API for?
I assume we can return the entire list in a single JSON response, gzipped and support conditional requests like If-None-Match: <previous etag> to enable easy client-side caching.

This wouldn't be hard to add. Would we need anything else?

@DanTup
Copy link

DanTup commented Oct 29, 2019

is it really only a list of all packages that we need to make a stable API for?
I assume we can return the entire list in a single JSON response, gzipped and support conditional requests like If-None-Match: <previous etag> to enable easy client-side caching.

I don't know how big a list like this would be, but this seems like the most sensible solution to me (we wouldn't want to send HTTP requests during typing to have the filtering done on the server if we could just cache it locally and filter it there).

cc @pq in case he has any opinions about this for IntelliJ.

The API end-points documented in dartlang/pub/doc/repository-spec-v2.md is stable. These are used by the pub client and are heavily cached.

Perfect!

please set a custom user-agent when calling pub.dev/api, something like User-Agent: dartcode/3.6.0 (+https://dartcode.org/)

Will do, thanks!

@jonasfj
Copy link
Member

jonasfj commented Oct 29, 2019

I don't know how big a list like this would be, but this seems like the most sensible solution to me.

Some math on scalability:

  • We currently have around 9k packages (unique package names),
  • JSON list of all 9k package names is 150kb
  • Gzipped the JSON list of 9k package names is 55kb

Thus, 10x or even 100x would still be fine. At 1000x this would start to become problematic and the API would have to be redesigned. So an API that returns a list of all package names might be reasonable for now.

If the client downloads the list of package names once per day, this should be fine in terms of load.

@DanTup
Copy link

DanTup commented Oct 29, 2019

That all sounds reasonable to me :)

@pq
Copy link
Member

pq commented Oct 31, 2019

Thanks for looping me in! Pub package discovery is something we've ruminated about off and on for a long time.

. cc @pq in case he has any opinions about this for IntelliJ.

My gut says this is a service we'd want to the analysis server to provide. Or certainly not implemented (and re-implemented) in clients. (Having it in DAS would play most naturally with improved code completion.)

@DanTup
Copy link

DanTup commented Oct 31, 2019

My gut says this is a service we'd want to the analysis server to provide

Yeah, I think this makes sense too. I was considering building a PoC directly in VS Code as an easier way to try it out - however the YAML-parsing options don't look too hot, and since I think the server already has a good parser, it might end up being easier to do there anyway.

@DanTup
Copy link

DanTup commented Nov 19, 2020

@jonasfj I've been thinking about this again lately (the server now has some basic support for code-completion in YAML, so it would fit in nicely there).

Previously I mentioned getting just the package names, though some of the metadata (for example deprecated packages, descriptions) could also be quite useful to include in the completion data (a list of only package names alone would still be a good step though).

Is there any way for me to get that data today, even if only for building a proof-of-concept (and not shipping) for now?

@jonasfj
Copy link
Member

jonasfj commented Nov 20, 2020

@DanTup
Yes, there is the version listing API: https://github.com/dart-lang/pub/blob/master/doc/repository-spec-v2.md
This is fairly stable and unlikely to change.

For listing all package names there: https://pub.dev/api/packages?compact=1 -- this is NOT stable.
If this is useful for something I think we could build an API that is stable and supplies all package names.

Once you have package name, using the version listing API to get the pubspec with package description, list of dependencies, and what versions of the package exists is easy.

Side note: we recently added dart pub add <package> which will find the latest version of <package> that is compatible with your other dependencies, and add a dependency upon it, like foo: ^1.2.3. We can't really add that functionality in the analysis server because this does package resolutions. If we wanted that we would probably need to find a way for pub to collaborate with analysis server -- but just doing completion for package names and showing package descriptions is probably a good start :D

@DanTup
Copy link

DanTup commented Nov 23, 2020

@jonasfj great, I'll have a look - thanks!

For listing all package names there: https://pub.dev/api/packages?compact=1 -- this is NOT stable.

There are a lot of packages I think have been deleted that show up there (eg. search for "film") that maybe should be filtered out?

@DanTup
Copy link

DanTup commented Nov 23, 2020

@jonasfj I had a quick play with the (unstable) API mentioned above, and managed to get something working:

Screenshot 2020-11-23 at 16 32 26

It's currently just package names though, it'd be nice to have descriptions there. I think there are two options:

  1. We instead use an API (that doesn't exist yet) that allows us to fetch package names + descriptions
  2. We lazily fetch descriptions as the user moves through the list

The first would significantly increase the initial payload, but would avoid additional requests as the user moves the selection through the list. The second would add many more requests, but avoid transferring a huge amount of data for the descriptions that might never be seen. What do you think would be best?

Assuming the second would be best (and therefore the package-name-list API meets the needs), what would be involved in getting to a stable API to the point you'd be happy with the server using it?

FWIW, some other minor things I noticed about the list that may or may not be desired:

  • Deleted packages (mentioned above)
  • Casing is inconsistent (this may be by design, though I was surprised they weren't all lowercase)
  • Sorting is case-sensitive (so those starting with caps are at the top)
  • Some packages start with underscore - not sure if they're intended to be shown (I can find them on pub too, but they look weird when at the top of the list)

@jonasfj
Copy link
Member

jonasfj commented Nov 23, 2020

Deleted packages (mentioned above)

This is a relic from spam fighting, they're with-held pending eventual deletion.

Casing is inconsistent (this may be by design, though I was surprised they weren't all lowercase)

Legacy from when pub.dartlang.org allowed package names with mixed cases.

Some packages start with underscore - not sure if they're intended to be show

Should be shown, but we can sort them differently client-side or serverside.
We could also sort them by popularity * pub_points * likes or maybe just sort by popularity (as that's the best indicator of how likely you are to pick the package).

what would be involved in getting to a stable API to the point you'd be happy with the server using it?

My concerns mainly related to

  • Trivial scalability of the API:
    • Number of data-sources we need to combine,
    • Size of the response,
    • Acceptable server staleness, and,
    • Caching of the response on client (can we cache it locally for XX hours/days).
  • Defining a useful API:
    • We don't want to expose generic filtering (too many data-sources, hence, not scalable).
    • We don't want to show:
      • Dart 1.x packages,
      • Unlisted packages,
      • Discontinued packages,
    • We want to return:
      • Package names
      • Package description
      • Package tags (perhaps?)
    • We might not want to show:
      • Packages with incompatible SDK constraints
      • Packages with incompatible packages platforms (perhaps?)

I suppose that if were to return:

  • Package name,
  • Description,
  • Tags,

Sorted by popularity then client-side filtering could apply filtering as appropriate.

Depending on how cool much we can reasonably squeeze into the UI, I suppose we could later add things like:

  • Likes
  • Last Published
  • Latest version
  • Popularity
  • X / Y pub points

I'm thinking we should try to compile this information and see how big it would be assuming that the number of packages can grow 10x - 100x.

@isoos, any thoughts?

Last thing we need is an API end-point, maybe GET /api/package-name-completion.

@DanTup
Copy link

DanTup commented Nov 23, 2020

I suppose that if were to return:

  • Package name,
  • Description,
  • Tags,

Sorted by popularity then client-side filtering could apply filtering as appropriate.

That sounds perfect to me - though the latest version number would be great too - then we could complete on that too once the package name is selected.

Caching of the response on client (can we cache it locally for XX hours/days).

I think that should be reasonable. If there was a way to get packages since a data (eg. created in last x days) that could allow us to update the cache without having to re-fetch the whole lot?

We might not want to show:

  • Packages with incompatible SDK constraints
  • Packages with incompatible packages platforms (perhaps?)

If the returned packages from the server depend on the project (or values in the pubspec) that might affect how we can cache it. It may be better to do that client side - although it would require having the data to do this, and may make using stale data less ideal. If this is complicated, it may be better as a future enhancement - the benefit is likely much lower than basic package name/description/version completion.

@isoos
Copy link
Contributor

isoos commented Nov 23, 2020

Sorted by popularity

I think the "default" sort order of pub.dev (which is currently the combination of score, likes and popularity) would be a more consistent experience.

@jonasfj
Copy link
Member

jonasfj commented Nov 23, 2020

I did a little quick and dirty hack, fetching all pubspec and compiling data from public APIs (for ease of debugging).

If we have:

  {
    "package": "retry",
    "description": "Utility for wrapping an asynchronous function in automatic retry logic with\nexponential back-off, useful when making requests over network.\n",
    "tags": [
      "sdk:dart",
      "sdk:flutter",
      "platform:android",
      "platform:ios",
      "platform:windows",
      "platform:linux",
      "platform:macos",
      "platform:web",
      "runtime:native-aot",
      "runtime:native-jit",
      "runtime:web"
    ],
    "latest": "3.0.1",
    "updated": "2020-07-01T17:29:53.216548Z",
    "likes": 90
  },

Then size is:

  • Raw size: 4788126 bytes
  • Gzip size: 957759 bytes
    (assuming 18060 packages)

That's 1MB, I think we have to cut a few things away :D

@DanTup
Copy link

DanTup commented Nov 23, 2020

If this API is specifically for completion, there's probably not much value in likes/update. Tags could probably also be squashed into something simpler (I wouldn't expect to filter except maybe for Dart vs Flutter). I don't know if that makes a big enough difference though?

@jonasfj
Copy link
Member

jonasfj commented Nov 23, 2020

I cut the description to 160 characters, as this is recommended max, but it still gets somewhat large.

  • package (90k gzip / 335k raw)
  • package, description (548k gzip / 2M raw)
  • package, description, tags, updated, likes (897k gzip / 4.4M raw)
  • package, description, tags (612k gzip / 3.6M raw)
  • package, description, tags, likes (642k gzip / 3.7M raw)

If we use all the tricks in the book, single letter for property names in JSON, encode dates as unix-timestamp, and encode tags as a bitfield 🙈, then we get something like:

  • package, description, tags, updated, likes (693k gzip -9 / 2.2M raw)

(Just having a little fun, I don't think encoding tags a bitfield qualifies as a good idea, hehe)


Doing just package is definitely feasible, and this is something we should have as a stable API end-point anyways. We should clearly do:

GET /api/package-names
{
  "packages": [..., 'retry', ...]
}

Doing package + description is perhaps the most attractive option for completion. But if I am to imagine that we had 10x as many packages, I would really prefer it if we designed the API to be paging. What if we did something like:

GET /api/package-completion-data
{
  "packages": [
    {
      "package": "retry",
      "description": "..."
    }, ...
  ],
  "nextPage": "https://pub.dartlang.org/api/package-completion-data?token=<...>"
}

If we sort packages by a "combination of score, likes and popularity" (as @isoos suggested), then the first page is always going to be the most relevant. And whether it contains 1k packages or 10k packages doesn't matter so much.

@DanTup:

  • (A) How would an incomplete list work with fuzzy matching logic?
  • (B) Does the ordering matter much when doing fuzzy matching?
  • (C) Should we consider a /api/package-completion-data that contains:
    • A list of all package names, and,
    • Descriptions for the N most popular packages (N in the range 1k - 10k).
    • Arguing that we can fetch description on-demand for less popular packages (or just not surface them).
  • (D) Should consider a /api/package-completion-data that contains:
    • A list of name and description for N most popular packages (N in the 1k - 10k).
    • Arguing that completion is less important for less popular packages.
  • (E) We could consider a /api/package-completion-data that only contains package names, then fetch the package description on-demand -- indeed we already have: https://pub.dev/api/packages/retry/versions/3.0.1 for fetching a single pubspec. This is already very fast (especially once server cache is hot), on my internal curl says ~300ms.
  • (F) We could also consider expanding dart pub with a dart pub cache list which lists all packages installed in the local PUB_CACHE. That doesn't make discovering new packages easy, but it does avoid polluting auto-completion with package names the user haven't used before (or at-least used as transitive dependencies).

An alternative could be do an API that takes the first 2 characters of the package name, like:

GET /api/package-completion-data/re
{
  "packages": [
    {
      "package": "retry",
      "description": "..."
    }, ...
  ],
}

Then maybe the client would use /api/package-names first, to get all package names. And when the user hovers a package for more than 200ms we fetch the description for all packages starting with those two letters..

I'm not so sure this is smart, because lots of packages starts with fl (as in flutter_).


though the latest version number would be great too - then we could complete on that too once the package name is selected.

I'm thinking we should cut latestVersion, if we want to do completion the version number, it's probably best to fetch the list of package versions using the existing version listing API. It actually seems perfectly fine to use this when completing versions for a specific package.

Tags could probably also be squashed into something simpler (I wouldn't expect to filter except maybe for Dart vs Flutter)

Yeah, I don't know how rich the interface is... but using icons to represent tags like: platform:android, platform:ios, platform:windows, platform:linux, platform:macos, platform:web, might be very attractive -- ofcourse many packages will have all these tags.

@DanTup
Copy link

DanTup commented Nov 23, 2020

But if I am to imagine that we had 10x as many packages, I would really prefer it if we designed the API to be paging

Is the intention that we'd just immediately page through everything to cache them locally? If so, beside a little longer warm-up time due to multiple requests, I think that'd work just as well to me.

(A) How would an incomplete list work with fuzzy matching logic?

Right now I only provide the list of completions at the time the IDE requests is (so the list is then filtered in the IDE). If the list is incomplete at the time the completion is invoked, some packages would just be missing. I don't think that's a big problem though if we intend to populate the cache before the user is using the completion (eg. when the server starts up, or when a pubspec.yaml file is opened).

Does the ordering matter much when doing fuzzy matching?

For VS Code, the ranking we provide is used before the user has typed anything and then VS Code takes over ranking as they start typing. For the other IDEs, I'm not sure how they rank when there are fuzzy matches (@bwilkerson may know).

Arguing that we can fetch description on-demand for less popular packages (or just not surface them).

My current PoC is fetching the description (+ latest version number) on-demand. Pre-populating some of them from the seed data would be a trivial change. I think it'd may be weird to not have descriptions for the less-used packages though, so I think fetching those on demand would be better than not showing.

(E) We could consider a /api/package-completion-data that only contains package names, then fetch the package description on-demand -- indeed we already have: https://pub.dev/api/packages/retry/versions/3.0.1 for fetching a single pubspec. This is already very fast (especially once server cache is hot), on my internal curl says ~300ms.

This is essentially what I'm currently doing (using the unstable API above for the package names).. it was very fast in my testing, though I don't have the version number so have to get https://pub.dev/api/packages/retry and then read latest field. It may be nice to be able to do something like https://pub.dev/api/packages/retry/versions/latest to avoid fetching the older versions we probably don't care about?

That doesn't make discovering new packages easy, but it does avoid polluting auto-completion with package names the user haven't used before

I think it would be a little strange if the completion list only included packages you'd used before?

And when the user hovers a package for more than 200ms we fetch the description for all packages starting with those two letters

For VS Code, we don't really get to delay this for 200ms. If we want to show the description against the completion we need to fetch it as soon as it's highlighted. We could artificially delay responding for 200ms (and if we're asked to cancel in that time, skip) though it'd make the descriptions appear to populate quite slowly. If we can provide a batch of descriptions for some starting characters instead of only one specific package, that may speed up showing them if the user moves through a few surrounding packages, though whether that's worth adding a new API for, I'm not sure).

I'm thinking we should cut latestVersion, if we want to do completion the version number, it's probably best to fetch the list of package versions using the existing version listing API.

That sounds reasonable to me.

@jonasfj
Copy link
Member

jonasfj commented Nov 24, 2020

Talking with @sigurdm and @isoos today, we've agreed to do a package name listing API in
dart-lang/pub-dev#4282

I think we should do another API similar to that:

GET /api/dart2-package-name-completion-data
{
  "packages": [
    ..., "retry", ... // ordered by popularity * likes * pub points
  ],
}

Where we filter out packages that:

  • only supports Dart 1.x
  • is flagged discontinued,
  • is flagged unlisted.

And then we should introduce an API for getting the description for the latest version:
GET /api/packages/<package>/versions/latest

This is not idea, there is a few other options I outlined in the braindump below... hehe :D

But it's probably better that we ship something simple, that we can maintain over time.
Then if we have time we can file a bug for making a better solution in the future. I don't think we're oppose to an API that returns package names and descriptions for some very large subset of packages (but promising everything might be a bit much).

So let's start simple and over-engineer the solution later :D


Braindump of options I was contemplating

Option (1): A package completion data API

Pros:

  • We have little new APIs to add,
  • Existing version listing API has good (and hot) server-side caching, and is already pretty fast, adding a https://pub.dev/api/packages/<package>/versions/latest API is an easy optimization.
  • We we reasonably believe these APIs will remain stable and scalable in the future.
  • All package names are present, and behavior is easy to understand.

Cons:

  • We don't filter out dart 1.x packages and packages marked discontinued or unlisted.
  • We don't have a useful ordering on package names that is suitable for completion.
  • When hovering package names, fetching the description will lag and be somewhat slow.

Option (2): A package name completion data API

  • We introduce an API for listing package names for completion. This returns:
    • A subset of package names we deem relevant for completion (omitting Dart 1.x packges), and,
    • An ordering of packages names we deem useful for completion.
GET /api/dart2-package-name-completion-data
{
  "packages": [
    ..., "retry", ... // ordered by popularity * likes * pub points
  ],
}
  • Introduce a https://pub.dev/api/packages/<package>/versions/latest API for getting the description of the latest version.

Pros:

  • Existing version listing API has good (and hot) server-side caching, and is already pretty fast, adding a https://pub.dev/api/packages/<package>/versions/latest API is an easy optimization.
  • We we reasonably believe these APIs will remain stable and scalable in the future.
  • All relevant package names are present, and behavior is easy to understand.
  • We will filter out dart 1.x packages and packages marked discontinued or unlisted.
  • We have a useful ordering on packages

Cons:

  • When hovering package names, fetching the description will lag and be somewhat slow.

Option (3): A package completion data API

GET /api/dart2-package-completion-data
{
  "packages": [
    {
      "package": "retry",
      "description": "..."
    }, ...
  ],
}

With following caveats documented in documentation:

  • Clients are encouraged to cache the response for 24 hours,
    • When refreshing the response clients are encouraged to re-use cached response, until they've finished downloading the new response.
  • Clients should cache a 404 response as API end-point is no-longer being offered, Client should cache this response for 24 hours (when Dart 3.x ships, we'll eventually stop supporting this).
  • Server may return incomplete data, only returning the most popular packages, and/or truncating or entirely omitting descriptions from certain packages.
  • Ordering is based on what server thinks is most likely to be useful for clients.

Pros:

  • We get everything we want.

Cons:

  • Behavior of which packages will have a description featured and which will be omitted might feel arbitrary.

I think we can pick between:

  • (1) No useful ordering, quickly implemented, no sanity filtering, has delay fetching description,
  • (2) Useful ordering, rather quick implementation, sanity filtering, has delay fetching description.
  • (3) Everything we want, but old unpopular packages may be ignored.

The behavior might feel arbitrary. But I doubt anyone will notice if unpopular packages, without many likes, that haven't been published recently are omitted.
For example, search for prompter_ and go to page 10 there is nothing interesting here :D

@jonasfj
Copy link
Member

jonasfj commented Nov 24, 2020

@DanTup wdyt? would what I outlined above be a good starting point for vscode?
(when we can do better API end-points packing more data in the future)

@DanTup
Copy link

DanTup commented Nov 25, 2020

@jonasfj that solution sounds good to me. I also think it's only a small change from the PoC I've been working on (which currently uses the unstable API you gave for the list, and the main package API (without /latest) to get the description/version). Changing them should be trivial.

In my testing, the request to fetch the description has been incredibly fast. I'm also caching the version number with the description so we can provide completion on that.

Nov-25-2020 17-06-43

Assuming we'll go with this solution, I have some questions I'm interested in opinions on (+ @bwilkerson @devoncarew):

  1. When should we begin fetching the package names?
    • When initial analysis completes? (may be wasteful if never used)
    • When a pubspec.yaml file is first opened?
    • When completion is first invoked? (may result in a delay, or first completion not returning anything)
  2. Should we cache to disk? (I'm guessing probably). How long should we consider this data good for?
  3. If an IDE is left open for a long time, should we automatically re-fetch the package list eagerly, or wait for another trigger (eg. opening pubspec again)?
  4. For how long should we keep the description/latest-version-number cached?
  5. If we get an error from the API, when/how many times should we retry? If we give up after multiple failures, should we ever retry again if the IDE is left open for a long time?

@bwilkerson
Copy link
Member

  1. When should we begin fetching the package names?

I would start no later than when a pubspec.yaml file is opened. Waiting until completion is requested will almost certainly be too slow on some networks. However, depending on the answer to the next question, we might actually want to fetch the data on start-up.

  1. Should we cache to disk? (I'm guessing probably). How long should we consider this data good for?

Yes, we should cache to disk to improve the UX when there is no network connectivity. Large portions of the world still experience this regularly.

I don't know how frequently the data coming from pub.dev is likely to need to be refreshed, but my understanding is that it should almost exclusively be additive. That is, new data should always be a superset of older data. Given that (and maybe even if that isn't true), I think we should use the cached data until we are able to refresh it. What's there might be incomplete, but it won't be wrong.

  1. If an IDE is left open for a long time, should we automatically re-fetch the package list eagerly, or wait for another trigger (eg. opening pubspec again)?

We know that many of our users leave their IDEs running for multiple days / weeks. I know that I almost never restart my IDE except when there are updates to either the IDE or my OS that force it. All to say, it might be more common than not.

If we cache the data to disk, then we could potentially have a background thread to fetch the data from pub.dev and write it to disk, and then have the main thread watch the file and update its in-memory version when the file changes.

  1. For how long should we keep the description/latest-version-number cached?

I think we should cache it indefinitely, but there might be a negative consequence that I'm not aware of. Is there a cost to presenting outdated data to the user when we are unable to get more up-to-date data?

  1. If we get an error from the API, when/how many times should we retry? If we give up after multiple failures, should we ever retry again if the IDE is left open for a long time?

Exponential backoff is probably a good strategy for this.

@jonasfj
Copy link
Member

jonasfj commented Nov 26, 2020

In my testing, the request to fetch the description has been incredibly fast. I'm also caching the version number with the description so we can provide completion on that.

Wow, yes, this is very fast :D

If we get an error from the API, when/how many times should we retry?

  • If you get a 4xx error don't retry: it may be that we've stopped maintaining the API end-point (could happen in Dart 3, who knows).
  • Use exponential-backoff, I would suggest 5 retries, after that just give up for the remainder of the session (this should be rare, so little need to think much about it, just keep it simple).
  • 5xx -> retry
  • Ensure that your client will follow redirects.
  • Ensure you set: Accept-Encoding: gzip
  • Ensure you set: User-Agent: my-pub-bot/1.2.3 (+https://github.com/<organization>/<repository>) or something like it.
  • When you get a response, check that the content-length matches the payload you recieved (if a content-length is present).

How long should we consider this data good for?

I think we get 10-30 new packages per day. I see little value to refreshing if cached data is no older than 18 hours -- but that's just a number I'm pulling out of thin air :D

There is little reason to refresh frequently, but refreshing the next working day seems like a good idea.

For how long should we keep the description/latest-version-number cached?

Won't you keep the list of all version numbers? Or at-least the list of all major and minor versions (and maybe just the latest patch version), if we don't want to pollute the completion options.

Note: I would suggest our completion strongly favors caret-notation ^1.2.3, and strongly discourages pinning version numbers.

on topic: we get around 100-300 new versions per day, so keeping this for 18 hours is probably reasonable.


We could also control the caching server-side by specifying a Cache-Control header. I would be comfortable setting a high expiration on /api/dart2-package-name-completion-data, but perhaps not as high on /api/packages/<package>/versions/latest because it's not specifically targeting the completion use-case.

@DanTup
Copy link

DanTup commented Nov 26, 2020

If I've interpreted the comments above correctly:

  • Begin fetching once first pubspec.yaml is opened
  • Cache to disk
  • Consider data stale after 18 hours (discard cache on disk if it's older than this, and refresh from server at this frequency if server left running)
  • For failed requests:
    • If 4xx response, give up (will retry after 18 hours)
    • If 5xx response, retry up to 5 times with 1/2/4/8 second delays (after this, revert to the 18 hour refresh)
  • Only show ^x.y.z in completion, not x.y.z

When you get a response, check that the content-length matches the payload you recieved (if a content-length is present).

Is contentLength == body.length a valid comparison here? contentLength says it's in bytes, but String.length says it's the number of UTF-16 code units

Won't you keep the list of all version numbers? Or at-least the list of all major and minor versions (and maybe just the latest patch version), if we don't want to pollute the completion options.

My current plan was only to keep the latest one to avoid a big list (on the assumption that if you're going to add a version, it'll probably be the latest). It also avoids editor ranking messing with the ordering and potentually putting a non-latest version at the top when you start typing (though it's possible other editors don't re-order things like VS Code does here).

Right now the version numbers will only show in VS Code because we only fetch the description/package info in the resolve request. We might need to extend the server API or go async in the completion producer to support this for non-LSP (@bwilkerson - interested in your thoughts on that).

We could also control the caching server-side by specifying a Cache-Control header.

Just for setting max-age (eg. letting the server control the cache time instead of hard-coding 18 hours), or to also using other options (like max-stale/min-fresh)? I don't think any should be particularly complicated, though we would need to handle them specifically.

@jonasfj
Copy link
Member

jonasfj commented Nov 26, 2020

Is contentLength == body.length a valid comparison here? contentLength says it's in bytes, but String.length says it's the number of UTF-16 code units

Good point, it's best if this is handled in the HTTP client. Perhaps just handle FormatException from json.decode and assume that you got an invalid reply if the response is not valid JSON.

My current plan was only to keep the latest one to avoid a big list (on the assumption that if you're going to add a version, it'll probably be the latest).

That might also be a good option, I'm not a UX designer :D

It might be worth having the latest stable and latest prerelease, if the latest prerelease is newer than the latest stable.

On pub.dev we generally say that the latest is the latest stable (if there is a stable release), so if package:foo has both 1.2.3 and 1.3.0-dev, then pub.dev/package/foo will show information for 1.2.3, and a link to the prerelease 1.3.0-dev will be display on the page. So while 1.3.0-dev is semantically greater than 1.2.3 we still prefer to show people the latest stable version.

Example:

5SwywW9TPWSjk4C

@DanTup
Copy link

DanTup commented Nov 27, 2020

It might be worth having the latest stable and latest prerelease, if the latest prerelease is newer than the latest stable.

Makes sense - I think this will mean sticking with /api/packages/retry (and therefore not need /latest that was discussed above)? The data we want is currently just description + latest stable version number + latest pre-release version number.

@jonasfj
Copy link
Member

jonasfj commented Nov 27, 2020

I think this will mean sticking with /api/packages/retry (and therefore not need /latest that was discussed above)?

let's just make that a start, then we can always optimize later by making a /latest or something like it.

The GET /api/packages/<package> API is also used by the pub client so:

  • (A) we're not going to break it,
  • (B) it's cached serverside and the cache is fairly hot.

Only downside is that for packages with a lot of versions the API end-point can become somewhat large. But I think we limit packages to 1000 versions or something like that. So worst case it'll be a few MBs, but that's probably very very rare :D

@DanTup
Copy link

DanTup commented Nov 27, 2020

Sounds good to me. If I've understand correctly, the only thing required from Pub for us to be able to ship this completion in the server is a live/stable equiv of /api/packages?compact=1 (with the above mentioned sorting/filtering)? If the JSON format remains the same, it should just be a case of updating the URL endpoint in my current branch (though it's trivial to change if not).

@isoos
Copy link
Contributor

isoos commented Nov 27, 2020

This has already shipped: https://pub.dev/api/package-names
The format is the same, with the exception of the nextUrl.

@DanTup
Copy link

DanTup commented Nov 27, 2020

This has already shipped

Oh my, thank you! 😀

@jonasfj
Copy link
Member

jonasfj commented Nov 27, 2020

@DanTup, that's doesn't have the ordering, or the filtering we mentioned.

I filed dart-lang/pub-dev#4302 for creating an end-point specifically for code completion of package names.

@DanTup
Copy link

DanTup commented Nov 28, 2020

Got it - I'll hold off landing this work until that API is done then. Thanks!

@jonasfj
Copy link
Member

jonasfj commented Dec 11, 2020

@DanTup checkout: https://pub.dev/api/package-name-completion-data

Note: we created the end-point so that it requires accept-encoding: gzip, and always returns gzip encoded data.

@DanTup
Copy link

DanTup commented Dec 15, 2020

@jonasfj thanks! I've an open change here if you want to review the interaction with the API:

https://dart-review.googlesource.com/c/sdk/+/176120

The related bits are in pub_api.dart.

@sigurdm
Copy link
Contributor

sigurdm commented Mar 31, 2023

Stumbled upon this issue. It seems rather unfocused, and the part about naming completion seems finished.

Should we close it? Or split it into a few more concrete features that we could imagine improves the IDE support.

@jonasfj
Copy link
Member

jonasfj commented Mar 31, 2023

Yeah, it's probably done...

We even have suggestions for version numbers, though it's possible they aren't triggering automatically. But let's close it.

@jonasfj jonasfj closed this as completed Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

8 participants