Install requirements with pip3 install -r requirements.txt
python3 main.py {check,gather}
-
Gather uses the plugins in the
plugins/
directory to gather and save proxies to a file (by defaultoutput/scraped.json
). -
Check reads an input file (default
output/scraped.json
) and checks them on multiple threads, then writing all working proxies to an output file (defaultoutput/live.json
andoutput/raw/(http|socks4|socks5).txt
) once complete.
PLEASE RUN python3 main.py --help
FOR A FULL LIST AND EXPLAINATION OF CLI ARGS
- The checker uses asyncronous requests with a defined open connection limit, whether it has ssl supported, the fraud detection score and the whois data (whois and fraud detection score can be turned off with --basic/-b). Fraud score is checked not for nefarious activity, but as many services will not work on blacklisted ips.
-
You can add/remove plain text sources by adding them to the list at
data/sources.py
. This is handled by the fileplugins/txt.py
. In order to check if a source returns plain text, simply runcurl --user-agent $UA $URL
, note user agent may not always be needed but helps with evading robots.txt -
In order to add sources which require complex parsing, you can copy
plugins/template.py
to for exampleplugins/myplugin.py
. Then make the required edits to return a list of proxies with protocol as a list. The module will be automatically loaded and run when gathering if done correctly.
Q: GUI WHEN?
A: I am working on an IPython notebook which will be pushed to main in the near future. This should make it easier for non-gui users.
Q: What is the open connection limit (-L)? Which value for this is safe?
A: As this code is asynchronous, if no limit was set the code would keep opening connections for each proxy until the list is exhausted. This leads to the OS Error "Too many open files", as well as sending way too many packets. This setting introduces a wait period while the active connections if above the threshold. Through tests I've determined setting this too high can introduce false positives, so I suggest lowering this and testing if the results change.
TL;DR, It depends on OS limits for open files and internet connection, but 1024 is considered 'safe' by me and you can increase it to whatever without any lasting damage
- Rework cli arguments
- Rework gather code
- Add more user options
- Abitity to enable basic checks
- Logging
- Add more sources
- [_] Clean up plugins and create better template
- Objectify proxies from gather, allowing for plugin/source evaluation
- Move proxy functions outside class
- Create proxy object dump utility
- Create asynchronous branch
- Load arguments only on main.py
- Simple GUI
and more...