-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rebased, test WIP] i559 TexyServiceEngine #485
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. This is a very quick review. First things first. Thank you :)
It would be nice to cover the service and the new engine with tests. I added an inline comment. We need some kind of error handling, if the service is not available or does not respond as expected.
That's all I have to say for now.
In the long term we want to add new markup and template engines as extensions to the core.
To maintain them seperately and keep the core stable and clean.
But for now I think we should add them and seperate them out as soon as some kind of extension mechanism evolves.
String documentBody = context.getBody(); | ||
|
||
try (InputStream stream = new ByteArrayInputStream(documentBody.getBytes(StandardCharsets.UTF_8))){ | ||
TexyRestService texyService = new TexyRestService(new URL("http://localhost:8022/TexyService.rest.php")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a candidate for another configuration option. Maybe a check if the service is available and handle the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config - sure, I am looking at how the config is propagated around.
In the end it will have around 8 config entries, which fine-tune Texy's behavior.
Check for availability - yes. I would basically just throw an exception and it should be up to JBake to tell the user that "couldn't convert document XY" + the reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can find the configuration in the ParserContext class.
Wow lots of activity right now, apologies for the radio silence from me, I've been preparing for a JBake talk I'm giving tonight. Once that's out of the way I can properly reply to this and all the other activity. |
@@ -32,6 +33,9 @@ | |||
public abstract class MarkupEngine implements ParserEngine { | |||
private static final Logger LOGGER = LoggerFactory.getLogger(MarkupEngine.class); | |||
|
|||
public static final int MAX_HEADER_LINES = 50; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a PoC so this would also be a config.
However, I think I will put the MarkupEngine changes aside from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jap. good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to limit changes that are needed for the TexyService thing. Please have a look at the failing tests and fix them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's in my plan. Sorry for the noise, I was getting familiar with JBake. Now this is a bit mess but I will split soon, perhaps during the weekend, and update (reduce) this PR.
|
||
|
||
//for (String line : contents) { | ||
for (int i = 0; i < contents.size() && i < MAX_HEADER_LINES; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some files which are thousands lines long. And JBake scans them 3 times. I think we can optimize a bit and apply some configurable limit.
In answer to your original questions, if you think you've found a bug please raise an issue. For enhancements I've tried to encourage users to suggest them on the mailing list first (via the contributors guide) but this doesn't tend to happen in practice and we seem to get more input on GH issues rather than anywhere else. |
I'm still catching up on activity right now.... but we do have some outstanding PR's that when merged will most likely break other PR's that have recently been raised so it might be an idea to throttle back on new PR's for the time being... |
I'll rebase this to the new master, soon. |
7b6ad40
to
7fe542f
Compare
Rebased, not tested. Will test soon. |
imports cleanup. Final touches. Extract the title from the Texy documents. Improve hasHeader(): only scan first N lines, skip #... lines, skip blank lines, test against a regex; do not require status and type if defaults are set. Add DebugUtil with map printing, since JBake code uses maps heavily Wrap Crawler iteration into a try/catch Start refactoring MarkupEngine so it supports files without a header. Make DebugUtil more generic Improve MarkupEngine: Support no header if all values are known; Improve validation; Refactor. Extract title from Texy documents. Implement RawMarkupEngine: Extract title from the HTML; Normalize HTML; Pretty print HTML; Export as XHTML; Change exported charset; Introduce `input.charset`. Fix MarkupEngine - don't return headers map if the header separator is not found. Allow .-_ in the header names Refactor HtmlUtil#fixImageSourceUrls(). Keeps the same behavior. Fix jbake-org#499 file names encoding. Implements jbake-org#500 Make URL fixing optional jbake-org#500 Refactor createUri() and createNoExtensionUri() into one. Make index creation bit more readable (just reorder) Make index creation bit more readable (reuse the attrib name) Refactor Crawler#crawlSourceFile() logic around updating cache flag. Implement ContentStore#mergeDocument(Map<String, Object>) to update docs. Implement Make "relative <img src> points to assets" optional jbake-org#502 Implement Make URL fixing optional jbake-org#500 These two are hitting the same code, so it's hard to split them. MarkupUtil and RawMarkupUtil cleanup DebugUtil call Force normalize HTML files if they contain <body>. Rename vars Allow deduplication of title autodetected from the document's header - mark that header with a CSS class. Fix: Storing the altered DOM wrapped in <div> resulted in this <div> being serialized too. This removes it. Make innerXml more robust.
fd6e6b9
to
b95f8d0
Compare
Squashed. I might need some help with getting the tests to pass. |
All right this review will take some time. Did you try to create a branch that first of all just integrates the new Engine you want to add? It looks like there are changes which are addressed in different other PR's you originaly split out but no changed test case which reflects the new behavior. I'm missing a test for the new Engine. Don't know when I find some time to see what change exactly breaks the tests at the moment. |
|
||
// If this is enabled, then this already happened in Crawler. | ||
// TODO: Remove the fixing from Crawler. | ||
// We should keep the pristine doc body as long as possible, or change it locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's a thing we should discuss. But I don't think it it's necessary to change this behavior to introduce the new Engine. It really should be integrated into the rendering phase to produce alternative paths for index files for types, that live in a totally different location than the listed document of the specific type.
* Handle invalid or unavailable charset. | ||
*/ | ||
private Charset getAsCharset(String key, Charset defaultCharset) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't mix styles. The unwritten convention is placing the bracket on the same line as the method definition or controll statement.
Yes:
if ( foo ) {
return bar;
}
No:
if ( foo )
{
return bar;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will align the styles. I blame IDEA. Or my little finger twitching above enter :)
|
||
if (isRelative(source)) { | ||
source = uri + source.replaceFirst("\\./", ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of this line causes a few tests to fail.
If it really is unnecessary to replace paths starting with "./" you need to change the tests accordingly.
https://github.com/jbake-org/jbake/blob/master/jbake-core/src/test/java/org/jbake/util/HtmlUtilTest.java#L53 for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ancho , that's something for discussion (WIP). I just rebased and din't have time to test. But IIRC, I needed some per-case way to turn on or off the site URL placing. The most natural way is to use such "neutral" prefix in src
and give it a meaning.
I will finish this and comment here.
@@ -14,6 +14,7 @@ | |||
<logger name="org.eclipse" level="WARN"/> | |||
<logger name="org.apache" level="WARN"/> | |||
<logger name="org.jbake" level="INFO"/> | |||
<logger name="org.jbake.parser.texy" level="DEBUG"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe INFO is better for production.
Regarding isolating a branch that only integrates Texy. That is possible. I had some branches which took some changes from this PR, and most were merged. After that, this become quite tedious to rebase. Some things are still necessary, like getExtractTitleFromDoc() - Texy documents typically rely on that. So I will take what's needed to make Texy markup work and do a smaller PR. ETA 2 to 4 weeks. |
Here is a working extension of an MarkupEngine that calls a remote service to convert Texy syntax.
I would be glad if it got to JBake, so please tell me what would you like me to do with the code to meet JBake project's criteria.
Thanks :)
Edit:
I am migrating to JBake from my own JTexy + Wicket + Wildfly + OpenShift based solution. So I have quite a few Texy files that I would prefer not to edit.
However, the header is not actually needed. Everything either has a default (type, status), or can be extracted from the document or filesystem metadata (title, date).
This PR contains changes that allow not to have a header at all, if the defaults are set.
I will separate it from the TexyService changes and create 2nd PR, and clean up this one.
I hope I don't annoy you with the multitude of issues and PRs. If so, let me know, I will throttle down :) On the other hand, I will do a batch of PRs and then phase out for a while unless you want some help with 3.0.
Related to that - is there any document with the processes? Like, if I have a suspicion for a bug, should it be discussed first in Google Groups or can I file a bug right away (when confident enough)? Or, if I see a refactoring oportunity, should that go as an issue or jbake-dev post?
Are the tests supposed to pass? It fails with