forked from rouge-ruby/rouge
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request rouge-ruby#489 from jneen/refactor.guessers
Refactor.guessers
- Loading branch information
Showing
11 changed files
with
311 additions
and
88 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
module Rouge | ||
class Guesser | ||
def self.guess(guessers, lexers) | ||
original_size = lexers.size | ||
|
||
guessers.each do |g| | ||
new_lexers = case g | ||
when Guesser then g.filter(lexers) | ||
when proc { |x| x.respond_to? :call } then g.call(lexers) | ||
else raise "bad guesser: #{g}" | ||
end | ||
|
||
lexers = new_lexers && new_lexers.any? ? new_lexers : lexers | ||
end | ||
|
||
# if we haven't filtered the input at *all*, | ||
# then we have no idea what language it is, | ||
# so we bail and return []. | ||
lexers.size < original_size ? lexers : [] | ||
end | ||
|
||
def collect_best(lexers, opts={}, &scorer) | ||
best = [] | ||
best_score = opts[:threshold] | ||
|
||
lexers.each do |lexer| | ||
score = scorer.call(lexer) | ||
|
||
next if score.nil? | ||
|
||
if best_score.nil? || score > best_score | ||
best_score = score | ||
best = [lexer] | ||
elsif score == best_score | ||
best << lexer | ||
end | ||
end | ||
|
||
best | ||
end | ||
|
||
def filter(lexers) | ||
raise 'abstract' | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
module Rouge | ||
module Guessers | ||
class Filename < Guesser | ||
attr_reader :fname | ||
def initialize(filename) | ||
@filename = filename | ||
end | ||
|
||
# returns a list of lexers that match the given filename with | ||
# equal specificity (i.e. number of wildcards in the pattern). | ||
# This helps disambiguate between, e.g. the Nginx lexer, which | ||
# matches `nginx.conf`, and the Conf lexer, which matches `*.conf`. | ||
# In this case, nginx will win because the pattern has no wildcards, | ||
# while `*.conf` has one. | ||
def filter(lexers) | ||
mapping = {} | ||
lexers.each do |lexer| | ||
mapping[lexer.name] = lexer.filenames || [] | ||
end | ||
|
||
GlobMapping.new(mapping, @filename).filter(lexers) | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
module Rouge | ||
module Guessers | ||
# This class allows for custom behavior | ||
# with glob -> lexer name mappings | ||
class GlobMapping < Guesser | ||
def self.by_pairs(mapping, filename) | ||
glob_map = {} | ||
mapping.each do |(glob, lexer_name)| | ||
lexer = Lexer.find(lexer_name) | ||
|
||
# ignore unknown lexers | ||
next unless lexer | ||
|
||
glob_map[lexer.name] ||= [] | ||
glob_map[lexer.name] << glob | ||
end | ||
|
||
new(glob_map, filename) | ||
end | ||
|
||
attr_reader :glob_map, :filename | ||
def initialize(glob_map, filename) | ||
@glob_map = glob_map | ||
@filename = filename | ||
end | ||
|
||
def filter(lexers) | ||
basename = File.basename(filename) | ||
|
||
collect_best(lexers) do |lexer| | ||
score = (@glob_map[lexer.name] || []).map do |pattern| | ||
if test_pattern(pattern, basename) | ||
# specificity is better the fewer wildcards there are | ||
-pattern.scan(/[*?\[]/).size | ||
end | ||
end.compact.min | ||
end | ||
end | ||
|
||
private | ||
def test_pattern(pattern, path) | ||
File.fnmatch?(pattern, path, File::FNM_DOTMATCH | File::FNM_CASEFOLD) | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
module Rouge | ||
module Guessers | ||
class Mimetype < Guesser | ||
attr_reader :mimetype | ||
def initialize(mimetype) | ||
@mimetype = mimetype | ||
end | ||
|
||
def filter(lexers) | ||
lexers.select { |lexer| lexer.mimetypes.include? @mimetype } | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
module Rouge | ||
module Guessers | ||
class Modeline < Guesser | ||
# [jneen] regexen stolen from linguist | ||
EMACS_MODELINE = /-\*-\s*(?:(?!mode)[\w-]+\s*:\s*(?:[\w+-]+)\s*;?\s*)*(?:mode\s*:)?\s*([\w+-]+)\s*(?:;\s*(?!mode)[\w-]+\s*:\s*[\w+-]+\s*)*;?\s*-\*-/i | ||
|
||
# First form vim modeline | ||
# [text]{white}{vi:|vim:|ex:}[white]{options} | ||
# ex: 'vim: syntax=ruby' | ||
VIM_MODELINE_1 = /(?:vim|vi|ex):\s*(?:ft|filetype|syntax)=(\w+)\s?/i | ||
|
||
# Second form vim modeline (compatible with some versions of Vi) | ||
# [text]{white}{vi:|vim:|Vim:|ex:}[white]se[t] {options}:[text] | ||
# ex: 'vim set syntax=ruby:' | ||
VIM_MODELINE_2 = /(?:vim|vi|Vim|ex):\s*se(?:t)?.*\s(?:ft|filetype|syntax)=(\w+)\s?.*:/i | ||
|
||
MODELINES = [EMACS_MODELINE, VIM_MODELINE_1, VIM_MODELINE_2] | ||
|
||
def initialize(source, opts={}) | ||
@source = source | ||
@lines = opts[:lines] || 5 | ||
end | ||
|
||
def filter(lexers) | ||
# don't bother reading the stream if we've already decided | ||
return lexers if lexers.size == 1 | ||
|
||
source_text = @source | ||
source_text = source_text.read if source_text.respond_to? :read | ||
|
||
lines = source_text.split(/\r?\n/) | ||
|
||
search_space = (lines.first(@lines) + lines.last(@lines)).join("\n") | ||
|
||
matches = MODELINES.map { |re| re.match(search_space) }.compact | ||
match_set = Set.new(matches.map { |m| m[1] }) | ||
|
||
lexers.select { |l| (Set.new([l.tag] + l.aliases) & match_set).any? } | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
module Rouge | ||
module Guessers | ||
class Source < Guesser | ||
attr_reader :source | ||
def initialize(source) | ||
@source = source | ||
end | ||
|
||
def filter(lexers) | ||
# don't bother reading the input if | ||
# we've already filtered to 1 | ||
return lexers if lexers.size == 1 | ||
|
||
# If we're filtering against *all* lexers, we only use confident return | ||
# values from analyze_text. But if we've filtered down already, we can trust | ||
# the analysis more. | ||
threshold = lexers.size < 10 ? 0 : 0.5 | ||
|
||
source_text = case @source | ||
when String | ||
@source | ||
when ->(s){ s.respond_to? :read } | ||
@source.read | ||
else | ||
raise 'invalid source' | ||
end | ||
|
||
Lexer.assert_utf8!(source_text) | ||
|
||
source_text = TextAnalyzer.new(source_text) | ||
|
||
collect_best(lexers, threshold: threshold) do |lexer| | ||
next unless lexer.methods(false).include? :analyze_text | ||
lexer.analyze_text(source_text) | ||
end | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.