Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possibly improve performance of json API #3

Open
daslu opened this issue Nov 2, 2014 · 2 comments
Open

possibly improve performance of json API #3

daslu opened this issue Nov 2, 2014 · 2 comments

Comments

@daslu
Copy link

daslu commented Nov 2, 2014

(require '[opencpu-clj.core :refer :all]
                '[clojure.core.matrix.dataset :as ds :refer [dataset]]
                '[opencpu-clj.test-support :refer [j]])

 (let [server-url "http://localhost:1723"
      rows (->> {:x (vec (range 9999))} 
                ds/dataset
                ds/row-maps)]
    (time (call-function-json-RPC server-url "base" "dim" {:x (j rows)})))

(Replace sever-url with your local opencpu sever url.)

Here is the output at my machine:

   11562.178474 msecs
   {:result (9999 1), :status 200}

This kind of performance requires some investigation. One suspect for time waste is the need to pass the whole dataset as a json string.

@behrica
Copy link
Owner

behrica commented Nov 2, 2014

Thanks for this measurement.

I did not look at all at performance so far.

The function you used is from the high-level API, which is currently not more then a prove of concept.

The timing you took contains at least this elements:

  • converting the matrix/dataset to json
  • send json over wire
  • convert json back to the R data.frame

But I agree, that it is very slow.

In any case the low-level function used ( "object" ) to execute R functions will have the same performance if used with passing parameters as so called "inline json".
OpenCPU allows inline JSON as one of a few parameter formats. See here : https://www.opencpu.org/api.html#api-arguments for alternatives.

So the way to improve performance,
is to upload data in a more efficient, binary format. (protocol buffers ?)
So you call a R remote functions from this package http://cran.r-project.org/web/packages/RProtoBuf/index.html and send it pb encoded data via file upload.
(The file upload possibility is not yet implemented ....)

This would then give you a session key for the data stored as a data.frame on the server.
So this session key you could use in following calls as parameter.

To "automate" this, would be one of the functions of a high-level API.

As the purpose of the low-level interface is supposed to be a thin wrapper on top of the HTTP OpenCPU API, the performance problems cannot be addressed there. It requires Json, and sends it as-is over the net to the server.

On general one remark, how IMHO the OpenCPU server is ment to be used.
It should be used for coarse grained tasks, so not for interactive data analysis.

So you would write R code (as one or more specific functions, which inside do the work of the analysis).
Then you put those in one R package and install it on the OpenCPU. The instructions of OpenCPU explain, how this can be done full automaticaly from a Github repo on each commit.
Then from Clojure you call your own method, send it ONES the data and let it calculate its stuff (for minutes to hours), so that the time for marshall/unmarshall and send the data becomes irrelevant compared to full execution time.

@jeroen
Copy link

jeroen commented Nov 2, 2014

Json parsing and generating on the opencpu side should be pretty fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants