-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
b.collector down for 8.5 hours #157
Comments
Once again. Timeline UTC: The cause is same: malformed raw reports from TransportCanary/0.0.10-beta. The trigger is same: letsencrypt cert update. Another batch is stored in |
FTR, once again. Timeline UTC: |
FTR, once again. Timeline UTC: The stacktrace was
That was probably caused by libnettest2 (testing? version 0.0.0 does not sound like release version):
xref: ooni/backend#115 |
FTR, once again. Timeline UTC: The stracktrace was It was libnettest2 once again:
|
That's decided as WONTFIX for the moment: #158 |
Impact: TBD, it's the primary collector for the mobile app
Detection: email & IRC alert
Timeline UTC, Sep 10:
00:00:01: b systemd[1]: Starting Certbot...
00:00:03: certbot Should renew, less than 30 days before certificate expiry 2017-10-09 23:01:00 UTC. Running pre-hook command: docker stop ooni-backend-b.collector.ooni.io
00:00:15: certbot: Running post-hook command: ... && docker start ooni-backend-b.collector.ooni.io
00:00:32: checkForStaleReports() ValueError: time data '2017-50-20 18:7:3' does not match format '%Y-%m-%d %H:%M:%S'
00:05:55: AlertManager FIRING InstanceDown
07:32:41 @channel can somebody look at this?
07:41:41 good morning
08:30:55: AlertManager RESOLVED InstanceDown
10:14:95: incident published
What went well:
/data/b.collector.ooni.io/raw_reports
to/home/darkk/20170910/
to hotfix the issueWhat went wrong:
What is still unclear:
/home/darkk/20170910/archive.ls-ltr
These files were moved to main.archive_dir after successful daemon restart. Is it some case to be monitored? Seems, the spice was flowing from b.collector.ooni.io according toooni-pipeline-cron.log
at chameleon.infra.ooni.io, rsync was taking tens minutes./data/b.collector.ooni.io/var/log/ooni
like404 POST /report/20170714T164513Z_AS47589_Lw5RfUUfj5kHbr1MGn7WmnmxKQX3WmqZM3gmrykqRuSTpZUt10
, do these lines mean, we're dropping data on the floor? Seems, the client thinks so and retries.What could be done to prevent relapse and decrease impact:
TransportCanary/0.0.10-beta
, what is it?The text was updated successfully, but these errors were encountered: