Skip to content
forked from bdheath/pytor

Python wrapper for scraping over the Tor network

Notifications You must be signed in to change notification settings

jhoogeboom/pytor

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pytor

Pytor is a Python wrapper for scraping over the Tor network. This is useful if you've been blocked (either locally or remotely) from the server you're attempting to scrape, or if it's otherwise important to not reveal your identity.

Pytor allows you to channel simple http requests through a Tor proxy. It also allows you to route more complex Mechanize requests through the proxy if your script needs to emulate a browser. Pytor also allows you to periodically establish a new Tor identity, either directly or by setting a custom interval for the use of a particular identity.

Requirements

Assumptions (because it's not done yet)

For now, Pytor assumes that your Tor control port is set to 9051. If your configuration is different, you can edit the global variables at the top to adjust. Future versions will acommodate different ports and authentication passwords.

Next steps

Future iterations of Pytor will:

  • Allow support for different Tor control ports and authentication

Basic usage

Create a basic Pytor instance and send a simple http request:

from pytor import pytor

tor = pytor()
html = tor.get('http://bradheath.org')

Or, if your Tor configuration requires a different host or port:

from pytor import pytor
tor = pytor(host='localhost', port=9055)
html = tor.get('http://bradheath.org')

Check the IP address that remote servers will see:

print tor.ip()

Download a file:

tor.downloadFile(url, local_filename)

Request a new identity from Tor: (Note that the network won't always assign you one, and even when it does, you may end up with the same exit node and therefore the same IP address. Also note that you shouldn't change your identity too ofen to avoid stressing the network.)

tor.newIdentity()

Have Pytor periodically assign a new identity. (Note that this currently works with get() and download_file() requests, but does not currently force a new identity when using a mechanize browser. More on that later.)

tor.identityTime(1200)  # Request a new identity every 1,200 seconds (20 minutes)

To create a new instance of a mechanize browser:

br = tor.mechanizeBrowser()
br.open('http://bradheath.org')
# continue using br just like any other mechanize object

or

tor.mechanizeBrowser()
tor.browser.open('http://bradheath.org')
# etc.

About

Python wrapper for scraping over the Tor network

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%