-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new ruby parser that uses Prism #1144
Conversation
lib/rdoc/parser/ruby.rb
Outdated
# Creates a module alias in +container+ at +rhs_name+ (or at the top-level | ||
# for "::") with the name from +constant+. | ||
def scan | ||
@tokens = RDoc::Parser::RipperStateLex.parse(@content).sort_by { |t| [t.line_no, t.char_no] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted, you could use Prism.parse_lex
, which will give you a parse result with both the AST and the tokens. Alternatively, you could use Prism.lex
or Prism.lex_compat
to get back a list of tokens in a separate pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for letting me know.
Ideally, we should use it, but RDoc::Parser::RipperStateLex
is doing some concatenation of tokens and RDoc's source code colorization seems to depends on it.
I want to make that part out of scope of this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense!
24df3db
to
1fef14c
Compare
lib/rdoc/parser/ruby.rb
Outdated
@@ -145,21 +136,8 @@ class RDoc::Parser::Ruby < RDoc::Parser | |||
|
|||
parse_files_matching(/\.rbw?$/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it easier to rollback or compare behaviour between the 2 parsers, can we create a new parser/prism_ruby.rb
file to host the new parser, and do the same for the test file too?
And to switch we simply remove this line from the current parser and add it to the new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated 👍
Added parser Parser::PrismRuby
can be enabled by setting ENV['RDOC_USE_PRISM_PARSER']
Added test test_rdoc_parser_prism_ruby.rb
tests both Parser::PrismRuby
and Parser::Ruby
(the above env is not needed for this test)
1fef14c
to
76d7786
Compare
In my understanding, https://github.com/ruby/rdoc/blob/master/rdoc.gemspec#L233 This PR drop to support Ruby 2.6 from |
2.6 has been EOL for 2 years now. That being said, if it's a blocker, I can add 2.6 support to Prism. |
Personally, I'm fine with dropping Ruby 2.6 support because |
|
||
blankline | ||
else | ||
result = yield directive, param if block_given? | ||
result = yield directive, param, line if block_given? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these changes for? Looks like they're fixing issues that'd also affect the default Ruby parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the default ruby parser, the third parameter line
is just ignored.
The line where :method:
is written is the source line number of ghost method.
1| class A
2| ##
3| # this is a ghost method `foo(x, y). Source location is line 4
4| # :method: foo
5| # :args: x, y
6| end
The original code is calculating line number by this code.
Lines 1094 to 1095 in 1bb3bec
if (comment.text = comment.text.sub(/^# +:?method: *(\S*).*?\n/i, '')) && !!$~ then | |
line_no += $`.count("\n") |
pre_process.rb
parses comment directives, parse_comment
also parses some comment directives.
To avoid re-implementing these kind of thing, I changed PreProcess#handle
to also provide line_number for each directive.
This is not the best way. pre_process has problem and I think it should be fixed in the future. pre_process needs code_object already created, but we need to parse the comment first to identify code_object type which can be either attribute or method.
lib/rdoc/parser/prism_ruby.rb
Outdated
# def regular_method() end | ||
# | ||
# Note that by default, the :method: directive will be ignored if there is a | ||
# standard rdocable item following it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much do these comments still hold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of them, I think. And we should.
I removed these comments and add a short description instead because it is a duplication of lib/rdoc/parser/ruby.rb
's comment. We can restore it to this file when removing lib/rdoc/parser/ruby.rb
lib/rdoc/parser/prism_ruby.rb
Outdated
@yields = [] | ||
@calls_super = false | ||
@params = "(#{def_node.parameters&.slice})" | ||
def_node.body&.accept(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for readability visitors should NOT invoke itself, so we should move this out of initialize
and do this in visit_def_node
instead
if node.body
signature = MethodSignature.new(node)
node.body.accept(signature)
end
lib/rdoc/parser/prism_ruby.rb
Outdated
end | ||
|
||
class RDocVisitor < Prism::Visitor # :nodoc: | ||
DSL = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make these visit_call
methods more easily distinguishable from the visitor's external interface? I think adding _
as prefix + moving them to the private section should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added _
👍
Visibility is still public because the code below is calling method through v.
so we can't change the visibility to private.
-> (v, call_node) { v._visit_call_attr_reader_writer_accessor(call_node, 'R') }
Of course we can use instance_eval or instance_exec to remove v.
part, but I think not using instance_eval/exec
is more straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think expanding these conditions in visit_call_node
doesn't take much space but improves the readability quite significantly:
diff --git a/lib/rdoc/parser/prism_ruby.rb b/lib/rdoc/parser/prism_ruby.rb
index 5ea596b4..c49cafbc 100644
--- a/lib/rdoc/parser/prism_ruby.rb
+++ b/lib/rdoc/parser/prism_ruby.rb
@@ -670,25 +670,6 @@ class RDoc::Parser::PrismRuby < RDoc::Parser
end
class RDocVisitor < Prism::Visitor # :nodoc:
- DSL = {
- attr: -> (v, call_node) { v._visit_call_attr_reader_writer_accessor(call_node, 'R') },
- attr_reader: -> (v, call_node) { v._visit_call_attr_reader_writer_accessor(call_node, 'R') },
- attr_writer: -> (v, call_node) { v._visit_call_attr_reader_writer_accessor(call_node, 'W') },
- attr_accessor: -> (v, call_node) { v._visit_call_attr_reader_writer_accessor(call_node, 'RW') },
- include: -> (v, call_node) { v._visit_call_include(call_node) },
- extend: -> (v, call_node) { v._visit_call_extend(call_node) },
- public: -> (v, call_node, &block) { v._visit_call_public_private_protected(call_node, :public, &block) },
- private: -> (v, call_node, &block) { v._visit_call_public_private_protected(call_node, :private, &block) },
- protected: -> (v, call_node, &block) { v._visit_call_public_private_protected(call_node, :protected, &block) },
- private_constant: -> (v, call_node) { v._visit_call_private_constant(call_node) },
- public_constant: -> (v, call_node) { v._visit_call_public_constant(call_node) },
- require: -> (v, call_node) { v._visit_call_require(call_node) },
- alias_method: -> (v, call_node) { v._visit_call_alias_method(call_node) },
- module_function: -> (v, call_node, &block) { v._visit_call_module_function(call_node, &block) },
- public_class_method: -> (v, call_node, &block) { v._visit_call_public_private_class_method(call_node, :public, &block) },
- private_class_method: -> (v, call_node, &block) { v._visit_call_public_private_class_method(call_node, :private, &block) },
- }
-
def initialize(scanner, top_level, store)
@scanner = scanner
@top_level = top_level
@@ -697,8 +678,43 @@ class RDoc::Parser::PrismRuby < RDoc::Parser
def visit_call_node(node)
@scanner.process_comments_until(node.location.start_line - 1)
- if node.receiver.nil? && (dsl_proc = DSL[node.name])
- dsl_proc.call(self, node) { super }
+ if node.receiver.nil?
+ case node.name
+ when :attr, :attr_reader
+ _visit_call_attr_reader_writer_accessor(node, 'R')
+ when :attr_writer
+ _visit_call_attr_reader_writer_accessor(node, 'W')
+ when :attr_accessor
+ _visit_call_attr_reader_writer_accessor(node, 'RW')
+ when :include
+ _visit_call_include(node)
+ when :extend
+ _visit_call_extend(node)
+ when :public, :private, :protected
+ _visit_call_public_private_protected(node, node.name) do
+ super
+ end
+ when :private_constant
+ _visit_call_private_constant(node)
+ when :public_constant
+ _visit_call_public_constant(node)
+ when :require
+ _visit_call_require(node)
+ when :alias_method
+ _visit_call_alias_method(node)
+ when :module_function
+ _visit_call_module_function(node) do
+ super
+ end
+ when :public_class_method
+ _visit_call_public_private_class_method(node, :public) do
+ super
+ end
+ when :private_class_method
+ _visit_call_public_private_class_method(node, :private) do
+ super
+ end
+ end
else
super
end
This way the reader can immediately know what special calls we handle without jumping to DSL
, and the relationships between them will also be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good and this way we can make it private!
done
lib/rdoc/parser/prism_ruby.rb
Outdated
# 4| | ||
# 5| def f; end # comment linked to this line | ||
# 6| end | ||
@unprocessed_comments = consecutive_comments.reject(&:empty?).map do |comments| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid allocating multiple arrays by changing this to:
@unprocessed_comments = consecutive_comments.reject(&:empty?).map do |comments| | |
consecutive_comments.reject!(&:empty?) | |
@unprocessed_comments = consecutive_comments.map! do |comments| |
I normally consider this micro-optimization. But given projects could have long consecutive comments, I feel this may help in some cases.
…isit_[NODE_TYPE]_node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're very close to merging this 👍
lib/rdoc/parser/prism_ruby.rb
Outdated
end | ||
|
||
class RDocVisitor < Prism::Visitor # :nodoc: | ||
DSL = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think expanding these conditions in visit_call_node
doesn't take much space but improves the readability quite significantly:
diff --git a/lib/rdoc/parser/prism_ruby.rb b/lib/rdoc/parser/prism_ruby.rb
index 5ea596b4..c49cafbc 100644
--- a/lib/rdoc/parser/prism_ruby.rb
+++ b/lib/rdoc/parser/prism_ruby.rb
@@ -670,25 +670,6 @@ class RDoc::Parser::PrismRuby < RDoc::Parser
end
class RDocVisitor < Prism::Visitor # :nodoc:
- DSL = {
- attr: -> (v, call_node) { v._visit_call_attr_reader_writer_accessor(call_node, 'R') },
- attr_reader: -> (v, call_node) { v._visit_call_attr_reader_writer_accessor(call_node, 'R') },
- attr_writer: -> (v, call_node) { v._visit_call_attr_reader_writer_accessor(call_node, 'W') },
- attr_accessor: -> (v, call_node) { v._visit_call_attr_reader_writer_accessor(call_node, 'RW') },
- include: -> (v, call_node) { v._visit_call_include(call_node) },
- extend: -> (v, call_node) { v._visit_call_extend(call_node) },
- public: -> (v, call_node, &block) { v._visit_call_public_private_protected(call_node, :public, &block) },
- private: -> (v, call_node, &block) { v._visit_call_public_private_protected(call_node, :private, &block) },
- protected: -> (v, call_node, &block) { v._visit_call_public_private_protected(call_node, :protected, &block) },
- private_constant: -> (v, call_node) { v._visit_call_private_constant(call_node) },
- public_constant: -> (v, call_node) { v._visit_call_public_constant(call_node) },
- require: -> (v, call_node) { v._visit_call_require(call_node) },
- alias_method: -> (v, call_node) { v._visit_call_alias_method(call_node) },
- module_function: -> (v, call_node, &block) { v._visit_call_module_function(call_node, &block) },
- public_class_method: -> (v, call_node, &block) { v._visit_call_public_private_class_method(call_node, :public, &block) },
- private_class_method: -> (v, call_node, &block) { v._visit_call_public_private_class_method(call_node, :private, &block) },
- }
-
def initialize(scanner, top_level, store)
@scanner = scanner
@top_level = top_level
@@ -697,8 +678,43 @@ class RDoc::Parser::PrismRuby < RDoc::Parser
def visit_call_node(node)
@scanner.process_comments_until(node.location.start_line - 1)
- if node.receiver.nil? && (dsl_proc = DSL[node.name])
- dsl_proc.call(self, node) { super }
+ if node.receiver.nil?
+ case node.name
+ when :attr, :attr_reader
+ _visit_call_attr_reader_writer_accessor(node, 'R')
+ when :attr_writer
+ _visit_call_attr_reader_writer_accessor(node, 'W')
+ when :attr_accessor
+ _visit_call_attr_reader_writer_accessor(node, 'RW')
+ when :include
+ _visit_call_include(node)
+ when :extend
+ _visit_call_extend(node)
+ when :public, :private, :protected
+ _visit_call_public_private_protected(node, node.name) do
+ super
+ end
+ when :private_constant
+ _visit_call_private_constant(node)
+ when :public_constant
+ _visit_call_public_constant(node)
+ when :require
+ _visit_call_require(node)
+ when :alias_method
+ _visit_call_alias_method(node)
+ when :module_function
+ _visit_call_module_function(node) do
+ super
+ end
+ when :public_class_method
+ _visit_call_public_private_class_method(node, :public) do
+ super
+ end
+ when :private_class_method
+ _visit_call_public_private_class_method(node, :private) do
+ super
+ end
+ end
else
super
end
This way the reader can immediately know what special calls we handle without jumping to DSL
, and the relationships between them will also be clearer.
@@ -2,6 +2,8 @@ | |||
|
|||
require_relative 'helper' | |||
|
|||
return if ENV['RDOC_USE_PRISM_PARSER'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see us running the test for prism and classic ruby parser without requiring these environment variables. This would keep our CI relatively simple and would ensure we're catching any bugs early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to update CI setups myself after the PR is merged as it's easier to test those with write access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added test test_rdoc_parser_prism_ruby.rb
already runs with both RDoc::Parser::Ruby and RDoc::Parser::PrismRuby on CI.
I don't have a good idea to run other test with RDoc::Parser::Ruby
without using this environment variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the impressive work!
I'll merge this PR now and add a temporary CI step for it.
Let's start refactoring parsing process to make this new parser less coupled with legacy components before replacing the old one completely.
(ruby/rdoc#1144) * Add a new ruby parser RDoc::Parser::PrismRuby * Add a new ruby parser testcase independent from parser's internal implementation * unknown meta method * Use MethodSignatureVisitor only to scan params, block_params and calls_super * Add calls_super test * Drop ruby 2.6. Prism requires ruby >= 2.7 * Remove duplicated documentation comment from prism_ruby.rb * Add test for wrong argument passed to metaprogramming method * Rename visit_call_[DSL_METHOD_NAME] to make it distinguishable from visit_[NODE_TYPE]_node * Method receiver switch of true/false/nil to a case statement * Extract common part of add_method(by def keyword) and add meta_comment method * Reuse consecutive comments array when collecting comments * Simplify DSL call_node handling * Refactor extracting method visibility arguments ruby/rdoc@fde99f1be6
@st0012 @tompng https://github.com/ruby/rdoc/blob/master/rdoc.gemspec#L233 and https://github.com/ruby/rdoc/blob/master/rdoc.gemspec#L236 are bit of aggressive changes. The release model of We should keep |
After discussing with other maintainers, we agreed to hold off making That being said, I think it's still great that we have this PR merged as we can use the Prism parser to evaluate all the upcoming refactorings 👍 |
Adds a new ruby parser
rdoc/parser/prism_ruby.rb
that uses Prism.The added new ruby parser is enabled by
ENV['RDOC_USE_PRISM_PARSER']
Background
RDoc should use Prism as a parser to fix parsing bugs someday. These are few examples.
Ruby parser should be refactored. Even small bugs are hard to fix because parser and comment handling is tightly coupled.
So I think
rdoc/parser/ruby.rb
needs to be completely rewritten.This pull request completely rewrites
lib/rdoc/parser/ruby.rb
using Prism.Note that this change will fix many bugs but might introduce unintentional changes or bugs.
Implementation details
Comments
There are two kind of comments. consecutive comments and modifier comments.
Consecutive comment
Consecutive comment is a set of continual comments written in blank line. They are linked to the next non-blank line.
Consecutive comment that starts with
##\n
is a metaprogramming method comment, but this rule is not strictly followed. RDoc sourcecode itself does not follow this rule.Modifier comment
Modifier comment is a comment written in non-blank line.
AST Traversing
Prism::Visitor is used to traverse AST node and to collect documentable comments and AST nodes.
When visitor visits a documentable node, we need to first process consecutive comments before start_line of the node which is not processed yet.
These unprocessed consecutive comment might be a metaprogramming comment that starts with
##\n
.After that, we can document the node itself with consecutive comment linked to
node.start_line
.In this phase, RDoc also looks modifier comment on
node.start_line
and onnode.end_line
for some node types.When visitor enters class/module node, we need to push that class/module to module_nesting stack.
Before leaving class/module node, we need to process unprocessed consecutive comments because these comments might be a metaprogramming comment that defines a method to the current class/module node. After that, we can pop class/module from module_nesting.
Test
A new test
test_rdoc_parser_prism_ruby.rb
is added. Some ruby code passed toutil_parser
are extracted fromtest_rdoc_parser_ruby.rb
.To ensure there is no bad difference between the original RDoc::Parser::Ruby and the new RDoc::Parser::PrismRuby, test is run with both parsers.
Unfortunately, there are many difference that I think it's a bug. Assertion for these cases are skipped by
unless accept_legacy_bug?
Output HTML file differences
These are the difference of the generated html files between master branch and this pull request.
Most file are identical except for newlines between html tags.
Removed files, these are not linked from any another html files.
Link in html file changed
Other changes