Caching for fetched/url datasets #7316
-
In a situation similar to #4146, I would like to be able to embed two or more charts that rely on same fetched datasets/urls minimizing network data exchange.
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {"url": "data/unemployment.tsv"},
"transform": [{"calculate": "slice(datum.id, 0, 2)", "as": "state"}],
"mark": "bar",
"encoding": {
"x": {"field": "rate", "type": "quantitative", "aggregate": "mean"},
"y": {"field": "state", "type": "ordinal"}
}
}
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {"url": "data/us-10m.json", "format": {"type": "topojson", "feature": "counties"}},
"transform": [{"lookup": "id", "from": {
"data": {"url": "data/unemployment.tsv"}, "key": "id", "fields": ["rate"]
}}],
"projection": {"type": "albersUsa"},
"mark": "geoshape",
"encoding": {"color": {"field": "rate", "type": "quantitative"}}
} that use following datasets:
So each chart fetches its own data with a parallel request: this could easily scale towards a series of many multiple requests to the same resources, if more similar charts were added to the page. In order to avoid multiple requests, I know I can fetch these data once and pass them to the charts using View API, but I cannot because I want each chart to encapsulate the transformation logic and be self-sufficient for reusability and documentation purposes. So I ended up building the following caching mechanism (a bit rude but it works). Given a spec, it finds each data object requesting a import { loader as loaderFactory, read } from "vega-loader";
import { iterateDeep } from "./helpers";
const loader = loaderFactory();
const cache = {};
/**
* Eventually add loader promise to cache and return cached dataset promise.
* @param {string} url The dataset url to fetch
* @returns {Promise<string>} A Promise of the unparsed dataset from cache
*/
// REF: https://github.com/vega/vega-lite/issues/4146
function syncCache(url) {
if (cache[url] == null) cache[url] = loader.load(url); // Set cached data
return cache[url];
}
/**
* Convert a data object into an inline one of cached data values
* @param {{url?: string, format?: object, values?: string}} dataObj The data object to convert (and cache)
* @returns {Promise<void>} A promise of conversion success
*/
function convertDataObj(dataObj) {
return syncCache(/** @type {string} */ (dataObj.url)).then(rawData => {
// BUG: In some cases, as lookup transformation on json data, Vega doesn't parse raw data using format
const parsedData = read(rawData, dataObj.format); // TODO: Use Vega internals to parse raw data
delete dataObj.url;
delete dataObj.format; // TODO: Keep format object
dataObj.values = parsedData;
});
}
/**
* Given a Vega or Vega-Lite spec, convert all dataset with urls into cached inline ones.
* @param {vegaSpec | vegaLiteSpec} chartSpec The spec with urls to fetch
* @returns {Promise<vegaSpec | vegaLiteSpec>} A Promise of the converted spec with inline data
*/
export function cacheSpec(chartSpec) {
const newSpec = JSON.parse(JSON.stringify(chartSpec)); // Clone spec
const promises = []; // Collect all the promises to sync all data replacements
iterateDeep(newSpec, (key, obj) => {
if (key === "url") {
promises.push(
// Replace each data object with inline data values
convertDataObj(obj)
);
}
});
// Sync all promises and return the spec with inline data values
return Promise.all(promises).then(() => newSpec);
} Now I can embed the charts, avoiding multiple call to cacheSpec(barSpec).then(cachedSpec => embed("#bar", cachedSpec, chartConfig));
cacheSpec(mapSpec).then(cachedSpec => embed("#map", cachedSpec, chartConfig)); And immediately I thought that it would be possible to extend the {"data": {"url": "data/unemployment.tsv", "cache": true}}, or disable with: {"data": {"url": "data/us-10m.json", "cache": false, "format": {"type": "topojson", "feature": "counties"}}}, If all network requests were handled within a single instance of vega loader, which I believe is already happening, I think adding a promise-based caching mechanism might not be too complicated. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Thank you for the detailed feature request. However, I will close this request for a few reasons that I hope make sense. Please ask about anything that's unclear.
Again, I appreciate the detailed feature request. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the immediate and accurate reply. Then:
The problem with the above solution is that the spec had to be cloned every time and it got polluted with inline values. Finally I followed this comment vega/vega#2095 (comment) and refactored all the code using a custom loader. The result is a much cleaner and lighter solution, without reinventing the wheel: import { loader } from "vega";
const cache = {};
const cacheLoader = loader();
const originalHttp = cacheLoader.http;
/**
* Wrap original http method to use cache.
* See {@link https://github.com/vega/vega/tree/master/packages/vega-loader#load_file}.
* Eventually add loader promise to cache and return cached dataset promise.
*/
cacheLoader.http = function cacheLoaderHttp(url, options) {
if (cache[url] == null) {
cache[url] = originalHttp.call(this, url, options); // Set cached data
}
return cache[url];
}; This allows us to enable the cache by doing: const chartConfig = {
loader: cacheLoader
};
embed("#bar", barSpec, chartConfig);
embed("#map", mapSpec, chartConfig); It would be great for me if you could add this example (or a similar one) on the documentation, to give importance to this feature, which could serve other use cases like adding dynamic parameters or custom headers to http requests. |
Beta Was this translation helpful? Give feedback.
-
Thank you for writing up your solution here. I think these issues are actually a good resource many people use. I converted this issue into a discussion at https://github.com/vega/vega-lite/discussions. If you feel we need this in the docs, please send a pull request. |
Beta Was this translation helpful? Give feedback.
Thank you for the detailed feature request. However, I will close this request for a few reasons that I hope make sense. Please ask about anything that's unclear.