-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training a model fails #552
Comments
Just to be sure I tried with a different repo, this time with vue (https://github.com/vuejs/vue) and the same error occurred (just much faster).
|
After a while, it seems something happened and maybe the analyzer that I started running initially actually trained something? I see these logs: /home/francesc/.local/lib/python3.5/site-packages/sklearn/metrics/classification.py:1143: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.
0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)
INFO:9d89:FormatAnalyzer:trained {'__init__': True,
'created_at': datetime.datetime(2019, 1, 26, 0, 36, 12, 350828),
'dependencies': [],
'model': 'code-format',
'uuid': '5bc4738b-e235-4719-b9c7-1599b5346e6d',
'version': [1]}
style.format.analyzer.FormatAnalyzer/[1] https://github.com/campoy/node.git 4385240d999708ab6a3904095d9666c5aba5221c
# javascript
23 rules, avg.len. 6.4
DEBUG:code-format:/ruless/thresholds/ -> lz4 compression
DEBUG:code-format:/ruless/features/ -> lz4 compression
DEBUG:code-format:/ruless/cls/ -> lz4 compression
DEBUG:code-format:/ruless/support/ -> lz4 compression
DEBUG:code-format:/ruless/cmps/ -> lz4 compression
DEBUG:code-format:/ruless/conf/ -> lz4 compression
DEBUG:code-format:/ruless/lengths/ -> lz4 compression
DEBUG:code-format:/ruless/artificial/ -> lz4 compression
DEBUG:code-format:/origin_configs/feature_extractor/selected_features/ -> lz4 compression
INFO:9d89:EventListener:OK 464.649 The logs are full of messages of this style: WARNING:9d89:FeaturesExtractor:could not parse file test/parallel/test-fs-read-stream-fd.js with error 'Couldn't find the token in the specified position:
Node role: Operator
Parsed form: “+=”
Raw form: “”
Start position: 0, 0, 0
End position: 0, 0, 0', skipping And when I send a PR https://github.com/campoy/node/pull/4/files to the analyzer the parser fails: INFO:9d89:EventListener:new ReviewEvent
WARNING:9d89:FeaturesExtractor:could not parse file benchmark/cluster/echo.js with error 'Couldn't find the token in the specified position:
Node role: Operator
Parsed form: “=”
Raw form: “”
Start position: 0, 0, 0
End position: 0, 0, 0', skipping
INFO:9d89:AnalyzerManager:style.format.analyzer.FormatAnalyzer: 0 comments
INFO:9d89:EventListener:OK 0.551 |
According to the logs, the babelfish driver has a wrong version. We will insert the check since it is critical. So we need to update the docs, because everything has changed recently. There are two ways to run the thing, you tried the developer's way and it is more tricky to setup. |
Version check is blocked by bblfsh/python-client#141 |
The babelfish driver for javascript is docker://bblfsh/javascript-driver:v1.2.0 ... isn't that the correct one? |
From my experience, sometimes you are sure that the driver is correct - but it's actually not. I had exactly the same problem before the Eng demo. |
This is a serious problem, then. Does this mean there's an issue on babelfish not pulling the right version? If so, the @src-d/language-analysis should be aware of this. |
It pulls but it is still tricky because a tiny mistake ruins everything. |
Sorry, I may be missing some context here: What kind of mistake can cause this to happen? If we can do something to make such errors less likely, I'm interested to know. |
I was mainly talking about docker: a container restart kills the driver if there is no volume. |
In my experience it happens when you first install the recommended driver and then install the correct version. |
My experience is the same as Hugo's. And there is an issue bblfsh/bblfshd#184. |
Ok, I'll close this as a duplicate of bblfsh/bblfshd#184 in that case |
Hi there,
Today I tried to use
style-analyzer
on a new repository, starting from scratch, as part of a demo and to better understand the whole project.I failed, and this is a report on how.
Following the quickstart guide
I tried to follow the steps in this list.
Unfortunately I don't understand how this could work, since there's no model to be trained. Is it?
I tried it anyway:
Ok, so I do need a trained model first. I search for "train" on the README and it doesn't help, so I start to search for "train" on the filenames and I find there's a
train.py
file underlookout/style/format/research
.Giving up on the docs, let's train this thing!
Ok, so the docs don't really tell me much ... I'll read the python code.
It seems like I need to create an input and output directories:
And finally, once everything is set-up I start training the model!
It seems like it's working but it's quite slow ... I start looking at the logs of the containers started by
docker-compose up
for lookout and I see something interesting in the logs of bblfsh:Wait ... why are we trying to parse markdown? And ... why is it failing?
status Fatal
is far from being meaningful. Anyway, it seems like we're spending a crazy amount of time parsing markdown, python, and other languages which don't seem relevant to thestyle-analyzer
since it only supports javascript.So I drop all the unnecessary drivers using
bbflshctl
now there's only one:After doing this the training process accelerates and soon I get to 16525 iterations ... where I got this:
Oh ... that's bad. Maybe it's bad luck and I should run it again.
Ok, so at least now it fails much faster (7 minutes vs 38), but the error is still pretty cryptic.
Time to go to bed.
The text was updated successfully, but these errors were encountered: