Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure HTTP requests are explicitly closed (fixes #340 and fixes #342) #341

Merged
merged 9 commits into from
Jul 2, 2019

Conversation

ChrisTimperley
Copy link
Collaborator

No description provided.

@ChrisTimperley
Copy link
Collaborator Author

@pdreiter Do you mind seeing whether this fixes the "Too many open files" bug for you?

@pdreiter
Copy link
Contributor

pdreiter commented Jul 1, 2019

sorry for the delay - i'm getting the following error:

  File "/home/bss-lab-1/Darjeeling/djling_venv/lib/python3.6/site-packages/bugzoo/client/__init__.py", line 37, in __init__
    self.__api = APIClient(base_url, timeout_connection=timeout_connection)
  File "/home/bss-lab-1/Darjeeling/djling_venv/lib/python3.6/site-packages/bugzoo/client/api.py", line 70, in __init__
    r.close()
UnboundLocalError: local variable 'r' referenced before assignment

Btw, I tested this by cloning the bugzoo repository, checking out the close-requests branch, then copying over the bugzoo directory into my darjeeling virtual environment, i.e. $VIRTUAL_ENV/lib/python3.6/site-packages/bugzoo

@pdreiter
Copy link
Contributor

pdreiter commented Jul 1, 2019

However, I added r = None before the try and checked if r: before

            r = None
            try:
                r = requests.get(url, timeout=time_left)
                connected = r.status_code == 204
            except requests.exceptions.ConnectionError:
                time.sleep(1.0)
            except requests.exceptions.Timeout:
                logger.error("Failed to establish connection to server: %s",
                             base_url)
                raise ConnectionFailure
            finally:
                if r:
                    r.close()

With this minor set of changes on top of the close-requests branch, I'm getting this issue with a single darjeeling invocation:
bugzoo.exceptions.ConnectionFailure: failed to connect to BugZoo server within timeout window.

I double-checked my ports/associations via lsof -i :6060 and netstat -ltnp, to make sure that there wasn't an existing process, but this was clean.

I reverted the bugzoo changes and I seem to be getting the same issue, so there's something else wrong on my end - will update when I figure this out.

@ChrisTimperley
Copy link
Collaborator Author

It looks like the port checking code isn't working correctly.

2019-07-01 15:53:03:bugzoo.server:INFO: BugZoo version: 2.1.27
2019-07-01 15:53:03:bugzoo.server:INFO: DockerPy version: 3.5.1
2019-07-01 15:53:03:bugzoo.server:INFO: psutil version: 5.4.7
2019-07-01 15:53:03:bugzoo.server:INFO: Flask version: 1.0.2
2019-07-01 15:53:03:bugzoo.server:INFO: GitPython version: 2.1.11
2019-07-01 15:53:03:bugzoo.server:ERROR: Cannot launch server: port [6060] is in use
ERROR: Cannot launch server: port [6060] is in use
2019-07-01 15:53:03:bugzoo.manager:INFO: Shutting down daemon...
2019-07-01 15:53:03:bugzoo.manager:INFO: Shut down daemon

@ChrisTimperley
Copy link
Collaborator Author

This is so weird:

 * Serving Flask app "bugzoo.server" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: on
2019-07-01 16:06:03:bugzoo.server:INFO: BugZoo version: 2.1.27
2019-07-01 16:06:03:bugzoo.server:INFO: DockerPy version: 3.5.1
2019-07-01 16:06:03:bugzoo.server:INFO: psutil version: 5.4.7
2019-07-01 16:06:03:bugzoo.server:INFO: Flask version: 1.0.2
2019-07-01 16:06:03:bugzoo.server:INFO: GitPython version: 2.1.11
Traceback (most recent call last):
  File "/home/chris/.local/share/virtualenvs/darjeeling-KYygC-ZZ/bin/bugzood", line 11, in <module>
    sys.exit(main())
  File "/home/chris/.local/share/virtualenvs/darjeeling-KYygC-ZZ/lib/python3.6/site-packages/bugzoo/server/__init__.py", line 731, in main
    docker_client_api_version=args.docker_client_api_version)
  File "/home/chris/.local/share/virtualenvs/darjeeling-KYygC-ZZ/lib/python3.6/site-packages/bugzoo/server/__init__.py", line 677, in run
    if is_port_in_use(port):
  File "/home/chris/.local/share/virtualenvs/darjeeling-KYygC-ZZ/lib/python3.6/site-packages/bugzoo/util.py", line 21, in is_port_in_use
    sock.bind(('127.0.0.1', port))
OSError: [Errno 98] Address already in use

It looks like the server is being launched before the call to app.run is reached.

@pdreiter
Copy link
Contributor

pdreiter commented Jul 1, 2019

Yeah, that's what I'm thinking too - although I don't know where it's opened? and I launched netstat and lsof and can't find anything on that port :(
I was trying out some other stuff and I added sock.setblocking(False) into is_port_in_use code block and both simultaneous darjeeling processes worked much better (one passed and the other got that too many open files error)... BUT, I think the placement of this basically invalidates the premise of the method?
My bad earlier, too, I just visually inspected the error message for the is_port_in_use code fix :(

@ChrisTimperley
Copy link
Collaborator Author

ChrisTimperley commented Jul 1, 2019

2019-07-01 16:15:16:bugzoo.server:INFO: BugZoo version: 2.1.27
2019-07-01 16:15:16:bugzoo.server:INFO: DockerPy version: 3.5.1
2019-07-01 16:15:16:bugzoo.server:INFO: psutil version: 5.4.7
2019-07-01 16:15:16:bugzoo.server:INFO: Flask version: 1.0.2
2019-07-01 16:15:16:bugzoo.server:INFO: GitPython version: 2.1.11
2019-07-01 16:15:16:bugzoo.server:INFO: launching BugZoo daemon
...
2019-07-01 16:15:21:bugzoo.server:INFO: launched BugZoo daemon
...
2019-07-01 16:15:22:bugzoo.server:INFO: BugZoo version: 2.1.27
2019-07-01 16:15:22:bugzoo.server:INFO: DockerPy version: 3.5.1
2019-07-01 16:15:22:bugzoo.server:INFO: psutil version: 5.4.7
2019-07-01 16:15:22:bugzoo.server:INFO: Flask version: 1.0.2
2019-07-01 16:15:22:bugzoo.server:INFO: GitPython version: 2.1.11
...

It looks like main is being called twice.

@ChrisTimperley
Copy link
Collaborator Author

I found the issue: https://stackoverflow.com/questions/25504149/why-does-running-the-flask-dev-server-run-itself-twice

@ChrisTimperley ChrisTimperley changed the title Ensure HTTP requests are explicitly closed (fixes #340) Ensure HTTP requests are explicitly closed (fixes #340 and fixes #342) Jul 1, 2019
@ChrisTimperley
Copy link
Collaborator Author

@pdreiter This should be good to test now.

@pdreiter
Copy link
Contributor

pdreiter commented Jul 1, 2019

@pdreiter This should be good to test now.

we have a single darjeeling repair <yml> running!

..about to test my darjeeling program that has the 'Too many open files' ConnectionError

@pdreiter
Copy link
Contributor

pdreiter commented Jul 1, 2019

Bad news : looks like still getting "Too many open files" ConnectionError,
BUT Good news: the number of ConnectionErrors have been almost halved.

  • from that original anonymous.debug.log, it looks like there were 58 ConnectionErrors
  • and now, with this fix, there are 30 ConnectionErrors

@ChrisTimperley
Copy link
Collaborator Author

If you're running multiple Darjeeling instances, it may simply be that you need to increase your file limit:
https://easyengine.io/tutorials/linux/increase-open-files-limit/

What is your file limit right now?

@pdreiter
Copy link
Contributor

pdreiter commented Jul 1, 2019

If you're running multiple Darjeeling instances, it may simply be that you need to increase your file limit:
https://easyengine.io/tutorials/linux/increase-open-files-limit/

What is your file limit right now?

well, right now, I'm running just a single Darjeeling instance.
as for my file limits:

  • hard limit: 4096
  • soft limit: 1024

@ChrisTimperley
Copy link
Collaborator Author

well, right now, I'm running just a single Darjeeling instance.
as for my file limits:

* hard limit: 4096

* soft limit: 1024

I would expect that to be good enough for a single instance, although increasing that wouldn't hurt. How many threads are you using?

Also, I'm not quite sure how you're getting multiple connection errors. Would you mind attaching your logs?

@pdreiter
Copy link
Contributor

pdreiter commented Jul 1, 2019

I'm using 16 threads - I prepended my yml file to the beginning of the gzip file I've attached so you can look at anything else that could be impacting it.
bugfix-341-test.anon.ga.log.gz

*I just launched a version of the yml file with generations: 100, a configuration with which I had seen ConnectionErrors before. *this resulted in a very small reduction of ConnectionErrors

*edit 1:

@pdreiter
Copy link
Contributor

pdreiter commented Jul 1, 2019

Follow-up:
I increased the soft-limit to 4096, but now I'm seeing
ValueError: filedescriptor out of range in select()
bugfix-341-ulimit-test.ga.anon.log.gz

I think that the docker api might be missing this fix from this pull-request:
Use poll() instead of select()
testing the fix out locally.

@ChrisTimperley
Copy link
Collaborator Author

So, I'm going to go ahead and merge this PR. From inspecting the log, it looks like docker-py is unfortunately producing the underlying OSError(24, 'Too many open files'). From a quick glance at issues reported on the docker-py repo, it seems that there are numerous reported file descriptor leaks with pending fixes that have existed since April.

A potential solution, albeit one that I hate to make, is that you or I fork docker-py, merge the pending fixes into our fork, and see whether the issues are resolved. If that's the case, then I'll setup BugZoo (via setup.py) to use the forked version of docker-py until the fixes are merged and pushed to a release.

@ChrisTimperley ChrisTimperley marked this pull request as ready for review July 2, 2019 20:10
@ChrisTimperley ChrisTimperley merged commit 8e71bb9 into master Jul 2, 2019
@ChrisTimperley ChrisTimperley deleted the close-requests branch July 2, 2019 20:11
@pdreiter
Copy link
Contributor

pdreiter commented Jul 2, 2019

Sounds good to me. I launched the darjeeling with the poll() fix annotated by that link I sent out and no more ConnectionErrors
However, I did get 2 of these errors:

 Traceback (most recent call last):
   File "/home/bss-lab-1/Darjeeling/djling_venv/lib/python3.6/site-packages/darjeeling/evaluator.py", line 285, in evaluate
     outcome = self._evaluate(candidate)
   File "/home/bss-lab-1/Darjeeling/djling_venv/lib/python3.6/site-packages/darjeeling/evaluator.py", line 192, in _evaluate
     if candidate in self.outcomes:
 RuntimeError: dictionary changed size during iteration

yikes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants