Skip to content
This repository has been archived by the owner on Jan 22, 2020. It is now read-only.

Implemented the Rendler C++ Framework using V1 API. #45

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

chhsia0
Copy link

@chhsia0 chhsia0 commented Mar 22, 2017

Implemented the RendlerV1 scheduler, the CrawlV1Executor and RenderV1Executor
executors using Mesos V1 API over http.
A new RendlerV1Executor abstract class is introduced to wrap up the protobuf
calls to subscribe and send updates/messages to the Mesos agent, then
the two concrete executors inherit RendelrV1Executor and override runTask().

Also fixed the offer allocation problem in the Rendler V0 scheduler.


This change is Reviewable

Chun-Hung Hsiao added 2 commits March 22, 2017 14:56
Implemented the RendlerV1 scheduler, the CrawlV1Executor and RenderV1Executor
executors using Mesos V1 API over http.
A new RendlerV1Executor abstract class is introduced to wrap up the protobuf
calls to subscribe and send updates/messages to the Mesos agent, then
the two concrete executors inherit RendelrV1Executor and override runTask().

Also fixed the offer allocation problem in the Rendler V0 scheduler.
Added a new target "v1" which builds rendler_v1, crawl_v1_executor, and
render_v1_executor. Made "all" depend on "v1" so everything would be built.
@asridharan
Copy link
Contributor

@hatred can you review?

@hatred hatred self-requested a review March 23, 2017 03:57
using mesos::vectorToString;


static int writer(char *data, size_t size, size_t nmemb, string *writerData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this helper? This helper seems pretty trivial?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is passed as a function pointer to curl_easy_setopt in line 72. Can be removed if we use curl instead.

result.push_back(task.task_id().value());
result.push_back(url);

CURL *conn;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason we are using libcurl here? We generally have been using curl directly because of the pointer semantics required for libcurl. @jieyu is building a wrapper for libcurl in stout that will help us avoid the pointer semantics. I would just just use regular curl till the stout version of libcurl is available.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also from the original code. Sure I'll use curl instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah !! thought so. Thanks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming you're talking about running curl through system(). But in addition to the web contents, we also need libcurl to obtain the redirected URL so we can recover the absolute URL of a link with relative path.

string baseUrl = redirectUrl.substr(0, sp); // No trailing slash.
string dirUrl = redirectUrl.substr(0, lsp); // No trailing slash.

cout << "redirectUrl " << redirectUrl << " baseURL: " << baseUrl << endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/redirectUrl/redirectURL

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. I guess the confusion I had was with the baseURL

Lets do

s\redirectUrl\redirectUrl:
s\baseURL\baseUrl:
s\dirUrl\dirUrl:

Copy link
Author

@chhsia0 chhsia0 Mar 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I missed the baseURL one.

}
curl_easy_cleanup(conn);

size_t scheme = redirectUrl.find_first_of("://");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you use http::URL from libprocess/process/http.hpp?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use process::URL. But then I'll need to write some other code snippets to reconstruct the base URL (for checking the domains) and dir URL (for reconstructing URLs from relative paths). The input URL may has either IP or domain name, may or may not include a port number, and may have other variants. Not sure if it worths the effort.

string::const_iterator f = buffer.begin();
string::const_iterator l = buffer.end();

while (f != buffer.end() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do

foreach ( const string& f, buffer) {
      if (boost...) {
      }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The boost::regex_search call in the next line requires iterator 'f'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. But I missed another point: buffer itself is a single string containing the whole page contents. In each iteration of this loop, it looks for the next match of the RE of hyperlink anchors, and move 'f' forward to the end of the matched anchor. So it's not just iterating through a collection of strings.

// Remove the anchor
if (link.find_first_of('#') != string::npos) {
link.erase(link.find_first_of('#'));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a newline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
if (link.empty()) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

call.mutable_executor_id()->CopyFrom(executorId);
call.set_type(Call::MESSAGE);

Call::Message* message = call.mutable_message();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need message. Just invoke call.mutable_message()->set_data(data)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


mesos->send(call);

// Re-registrate after 1 second if not subscribed then.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Re-registrate/Re-register
Also
"Re-register after one second if not subscribed by then."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@chhsia0 chhsia0 self-assigned this Mar 23, 2017
Comments not addressed yet:
  https://goo.gl/QvidN2
  https://goo.gl/OUofUp
  https://goo.gl/E3CJGb

Modified Makefile to match rendler_v1_executor.o specially.
Fixed the issue that phantomjs requires X11 to run.
@chhsia0 chhsia0 requested a review from asridharan March 23, 2017 19:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants