-
Notifications
You must be signed in to change notification settings - Fork 77
Implemented the Rendler C++ Framework using V1 API. #45
base: master
Are you sure you want to change the base?
Conversation
Implemented the RendlerV1 scheduler, the CrawlV1Executor and RenderV1Executor executors using Mesos V1 API over http. A new RendlerV1Executor abstract class is introduced to wrap up the protobuf calls to subscribe and send updates/messages to the Mesos agent, then the two concrete executors inherit RendelrV1Executor and override runTask(). Also fixed the offer allocation problem in the Rendler V0 scheduler.
Added a new target "v1" which builds rendler_v1, crawl_v1_executor, and render_v1_executor. Made "all" depend on "v1" so everything would be built.
@hatred can you review? |
using mesos::vectorToString; | ||
|
||
|
||
static int writer(char *data, size_t size, size_t nmemb, string *writerData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this helper? This helper seems pretty trivial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is passed as a function pointer to curl_easy_setopt in line 72. Can be removed if we use curl instead.
result.push_back(task.task_id().value()); | ||
result.push_back(url); | ||
|
||
CURL *conn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason we are using libcurl
here? We generally have been using curl
directly because of the pointer semantics required for libcurl. @jieyu is building a wrapper for libcurl in stout that will help us avoid the pointer semantics. I would just just use regular curl till the stout
version of libcurl
is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also from the original code. Sure I'll use curl instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah !! thought so. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming you're talking about running curl through system(). But in addition to the web contents, we also need libcurl to obtain the redirected URL so we can recover the absolute URL of a link with relative path.
cpp/crawl_v1_executor.cpp
Outdated
string baseUrl = redirectUrl.substr(0, sp); // No trailing slash. | ||
string dirUrl = redirectUrl.substr(0, lsp); // No trailing slash. | ||
|
||
cout << "redirectUrl " << redirectUrl << " baseURL: " << baseUrl << endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/redirectUrl/redirectURL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using the naming style used in Mesos' codebase. See https://github.com/apache/mesos/blob/81cd023eb9945a22c220edc966393dcfcdbce256/src/slave/containerizer/fetcher.hpp#L68 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. I guess the confusion I had was with the baseURL
Lets do
s\redirectUrl\redirectUrl:
s\baseURL\baseUrl:
s\dirUrl\dirUrl:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I missed the baseURL one.
} | ||
curl_easy_cleanup(conn); | ||
|
||
size_t scheme = redirectUrl.find_first_of("://"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you use http::URL
from libprocess/process/http.hpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can use process::URL. But then I'll need to write some other code snippets to reconstruct the base URL (for checking the domains) and dir URL (for reconstructing URLs from relative paths). The input URL may has either IP or domain name, may or may not include a port number, and may have other variants. Not sure if it worths the effort.
string::const_iterator f = buffer.begin(); | ||
string::const_iterator l = buffer.end(); | ||
|
||
while (f != buffer.end() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do
foreach ( const string& f, buffer) {
if (boost...) {
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The boost::regex_search call in the next line requires iterator 'f'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it has an overload for a string as well?
http://www.boost.org/doc/libs/1_61_0/libs/regex/doc/html/boost_regex/ref/regex_search.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. But I missed another point: buffer itself is a single string containing the whole page contents. In each iteration of this loop, it looks for the next match of the RE of hyperlink anchors, and move 'f' forward to the end of the matched anchor. So it's not just iterating through a collection of strings.
// Remove the anchor | ||
if (link.find_first_of('#') != string::npos) { | ||
link.erase(link.find_first_of('#')); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cpp/crawl_v1_executor.cpp
Outdated
} | ||
if (link.empty()) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a new line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cpp/rendler_v1_executor.cpp
Outdated
call.mutable_executor_id()->CopyFrom(executorId); | ||
call.set_type(Call::MESSAGE); | ||
|
||
Call::Message* message = call.mutable_message(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need message
. Just invoke call.mutable_message()->set_data(data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cpp/rendler_v1_executor.cpp
Outdated
|
||
mesos->send(call); | ||
|
||
// Re-registrate after 1 second if not subscribed then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Re-registrate/Re-register
Also
"Re-register after one second if not subscribed by then."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Comments not addressed yet: https://goo.gl/QvidN2 https://goo.gl/OUofUp https://goo.gl/E3CJGb Modified Makefile to match rendler_v1_executor.o specially. Fixed the issue that phantomjs requires X11 to run.
Implemented the RendlerV1 scheduler, the CrawlV1Executor and RenderV1Executor
executors using Mesos V1 API over http.
A new RendlerV1Executor abstract class is introduced to wrap up the protobuf
calls to subscribe and send updates/messages to the Mesos agent, then
the two concrete executors inherit RendelrV1Executor and override runTask().
Also fixed the offer allocation problem in the Rendler V0 scheduler.
This change is