Implemented the Rendler C++ Framework using V1 API. #45

chhsia0 · 2017-03-22T22:48:21Z

Implemented the RendlerV1 scheduler, the CrawlV1Executor and RenderV1Executor
executors using Mesos V1 API over http.
A new RendlerV1Executor abstract class is introduced to wrap up the protobuf
calls to subscribe and send updates/messages to the Mesos agent, then
the two concrete executors inherit RendelrV1Executor and override runTask().

Also fixed the offer allocation problem in the Rendler V0 scheduler.

This change is

Implemented the RendlerV1 scheduler, the CrawlV1Executor and RenderV1Executor executors using Mesos V1 API over http. A new RendlerV1Executor abstract class is introduced to wrap up the protobuf calls to subscribe and send updates/messages to the Mesos agent, then the two concrete executors inherit RendelrV1Executor and override runTask(). Also fixed the offer allocation problem in the Rendler V0 scheduler.

Added a new target "v1" which builds rendler_v1, crawl_v1_executor, and render_v1_executor. Made "all" depend on "v1" so everything would be built.

asridharan · 2017-03-23T03:53:56Z

@hatred can you review?

asridharan · 2017-03-23T03:57:07Z

cpp/crawl_v1_executor.cpp

+using mesos::vectorToString;
+
+
+static int writer(char *data, size_t size, size_t nmemb, string *writerData)


Why do we need this helper? This helper seems pretty trivial?

This is passed as a function pointer to curl_easy_setopt in line 72. Can be removed if we use curl instead.

asridharan · 2017-03-23T04:01:31Z

cpp/crawl_v1_executor.cpp

+    result.push_back(task.task_id().value());
+    result.push_back(url);
+
+    CURL *conn;


Any specific reason we are using libcurl here? We generally have been using curl directly because of the pointer semantics required for libcurl. @jieyu is building a wrapper for libcurl in stout that will help us avoid the pointer semantics. I would just just use regular curl till the stout version of libcurl is available.

It's also from the original code. Sure I'll use curl instead.

Ah !! thought so. Thanks

I'm assuming you're talking about running curl through system(). But in addition to the web contents, we also need libcurl to obtain the redirected URL so we can recover the absolute URL of a link with relative path.

asridharan · 2017-03-23T04:03:46Z

cpp/crawl_v1_executor.cpp

+    string baseUrl = redirectUrl.substr(0, sp); // No trailing slash.
+    string dirUrl = redirectUrl.substr(0, lsp); // No trailing slash.
+
+    cout << "redirectUrl " << redirectUrl << " baseURL: " << baseUrl << endl;


s/redirectUrl/redirectURL

I'm using the naming style used in Mesos' codebase. See https://github.com/apache/mesos/blob/81cd023eb9945a22c220edc966393dcfcdbce256/src/slave/containerizer/fetcher.hpp#L68 .

makes sense. I guess the confusion I had was with the baseURL

Lets do

s\redirectUrl\redirectUrl: s\baseURL\baseUrl: s\dirUrl\dirUrl:

Done. I missed the baseURL one.

asridharan · 2017-03-23T04:11:00Z

cpp/crawl_v1_executor.cpp

+    }
+    curl_easy_cleanup(conn);
+
+    size_t scheme = redirectUrl.find_first_of("://");


Why can't you use http::URL from libprocess/process/http.hpp?

I can use process::URL. But then I'll need to write some other code snippets to reconstruct the base URL (for checking the domains) and dir URL (for reconstructing URLs from relative paths). The input URL may has either IP or domain name, may or may not include a port number, and may have other variants. Not sure if it worths the effort.

asridharan · 2017-03-23T04:12:27Z

cpp/crawl_v1_executor.cpp

+    string::const_iterator f = buffer.begin();
+    string::const_iterator l = buffer.end();
+
+    while (f != buffer.end() &&


you can do

foreach ( const string& f, buffer) { if (boost...) { } }

The boost::regex_search call in the next line requires iterator 'f'.

I think it has an overload for a string as well?
http://www.boost.org/doc/libs/1_61_0/libs/regex/doc/html/boost_regex/ref/regex_search.html

You are right. But I missed another point: buffer itself is a single string containing the whole page contents. In each iteration of this loop, it looks for the next match of the RE of hyperlink anchors, and move 'f' forward to the end of the matched anchor. So it's not just iterating through a collection of strings.

asridharan · 2017-03-23T06:32:16Z

cpp/crawl_v1_executor.cpp

+      // Remove the anchor
+      if (link.find_first_of('#') != string::npos) {
+        link.erase(link.find_first_of('#'));
+      }


Add a newline.

asridharan · 2017-03-23T06:32:29Z

cpp/crawl_v1_executor.cpp

+      }
+      if (link.empty()) {
+        continue;
+      }


Add a new line.

asridharan · 2017-03-23T06:48:19Z

cpp/rendler_v1_executor.cpp

+  call.mutable_executor_id()->CopyFrom(executorId);
+  call.set_type(Call::MESSAGE);
+
+  Call::Message* message = call.mutable_message();


don't need message. Just invoke call.mutable_message()->set_data(data)

asridharan · 2017-03-23T06:49:53Z

cpp/rendler_v1_executor.cpp

+
+  mesos->send(call);
+
+  // Re-registrate after 1 second if not subscribed then.


s/Re-registrate/Re-register
Also
"Re-register after one second if not subscribed by then."

Comments not addressed yet: https://goo.gl/QvidN2 https://goo.gl/OUofUp https://goo.gl/E3CJGb Modified Makefile to match rendler_v1_executor.o specially. Fixed the issue that phantomjs requires X11 to run.

Chun-Hung Hsiao added 2 commits March 22, 2017 14:56

Fixed build dependency for rendler_v1.

478c45c

Added a new target "v1" which builds rendler_v1, crawl_v1_executor, and render_v1_executor. Made "all" depend on "v1" so everything would be built.

hatred self-requested a review March 23, 2017 03:57

asridharan suggested changes Mar 23, 2017

View reviewed changes

chhsia0 self-assigned this Mar 23, 2017

Addressed some comments in PR d2iq-archive#45.

db72856

Comments not addressed yet: https://goo.gl/QvidN2 https://goo.gl/OUofUp https://goo.gl/E3CJGb Modified Makefile to match rendler_v1_executor.o specially. Fixed the issue that phantomjs requires X11 to run.

chhsia0 requested a review from asridharan March 23, 2017 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented the Rendler C++ Framework using V1 API. #45

Implemented the Rendler C++ Framework using V1 API. #45

chhsia0 commented Mar 22, 2017 •

edited by asridharan

Loading

asridharan commented Mar 23, 2017

asridharan Mar 23, 2017

chhsia0 Mar 23, 2017

asridharan Mar 23, 2017

chhsia0 Mar 23, 2017

asridharan Mar 23, 2017

chhsia0 Mar 23, 2017

asridharan Mar 23, 2017

chhsia0 Mar 23, 2017

asridharan Mar 23, 2017

chhsia0 Mar 23, 2017 •

edited

Loading

asridharan Mar 23, 2017

chhsia0 Mar 23, 2017

asridharan Mar 23, 2017

chhsia0 Mar 23, 2017

asridharan Mar 23, 2017

chhsia0 Mar 23, 2017

asridharan Mar 23, 2017

chhsia0 Mar 23, 2017

asridharan Mar 23, 2017

chhsia0 Mar 23, 2017

asridharan Mar 23, 2017

chhsia0 Mar 23, 2017

asridharan Mar 23, 2017

chhsia0 Mar 23, 2017

		using mesos::vectorToString;


		static int writer(char data, size_t size, size_t nmemb, string writerData)


		mesos->send(call);

		// Re-registrate after 1 second if not subscribed then.

Implemented the Rendler C++ Framework using V1 API. #45

Are you sure you want to change the base?

Implemented the Rendler C++ Framework using V1 API. #45

Conversation

chhsia0 commented Mar 22, 2017 • edited by asridharan Loading

asridharan commented Mar 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chhsia0 Mar 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chhsia0 commented Mar 22, 2017 •

edited by asridharan

Loading

chhsia0 Mar 23, 2017 •

edited

Loading