Skip to content

Commit

Permalink
Huge update
Browse files Browse the repository at this point in the history
  • Loading branch information
monosans committed Jan 19, 2024
1 parent b0e690a commit 1f9236a
Show file tree
Hide file tree
Showing 26 changed files with 2,231 additions and 657 deletions.
27 changes: 27 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,30 @@ jobs:
steps:
- uses: actions/checkout@v4
- run: pipx run pre-commit run --all-files --show-diff-on-failure
build:
strategy:
matrix:
os:
- ubuntu
- macos
- windows
fail-fast: false
runs-on: ${{ matrix.os }}-latest
steps:
- uses: actions/checkout@v4
- run: pipx install poetry
- uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: poetry
check-latest: true
- run: poetry install --only main,nuitka --sync --no-root --no-interaction
- run: poetry run --no-interaction python -m nuitka --onefile --python-flag='-m' --prefer-source-code --assume-yes-for-downloads --lto=yes proxy_scraper_checker
- uses: actions/upload-artifact@v4
with:
name: artifact-${{ matrix.os }}
path: |
config.toml
proxy_scraper_checker.bin
proxy_scraper_checker.exe
if-no-files-found: error
5 changes: 1 addition & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,4 @@ $RECYCLE.BIN/

# End of https://www.toptal.com/developers/gitignore/api/jetbrains+all,linux,macos,python,vim,visualstudiocode,windows

proxies/
proxies_anonymous/
proxies_geolocation/
proxies_geolocation_anonymous/
results/
7 changes: 7 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,13 @@ repos:
- --scripts-are-modules
additional_dependencies:
- aiohttp<4
- attrs
- certifi
- charset-normalizer<4
- maxminddb<3
- platformdirs<5
- rich<14
- types-aiofiles
- typing-extensions<5
- tomli<3; python_version < "3.11"
- uvloop<0.20; implementation_name == "cpython" and (sys_platform == "darwin" or sys_platform == "linux")
36 changes: 17 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@
HTTP, SOCKS4, SOCKS5 proxies scraper and checker.

- Asynchronous.
- Uses regex to search for proxies (ip:port format) on a web page, allowing proxies to be extracted even from json without making changes to the code.
- Uses regex to find proxies of format `protocol://username:password@ip:port` on a web page or in a local file, allowing proxies to be extracted even from json without code changes.
- It is possible to specify the URL to which to send a request to check the proxy.
- Supports saving to plain text or json.
- Can sort proxies by speed.
- Supports determining the geolocation of the proxy exit node.
- Can determine if the proxy is anonymous.
Expand All @@ -17,16 +18,24 @@ You can get proxies obtained using this script in [monosans/proxy-list](https://

## Installation and usage

### Desktop
### Pre-compiled binary

This is the easiest way, but it is only available for x64 Windows, macOS and Linux. Just download the archive for your OS from <https://nightly.link/monosans/proxy-scraper-checker/workflows/ci/main?preview>, unzip it, edit `config.toml` and run the executable.

If Windows Defender detects an executable file as a trojan, please read [this](https://github.com/Nuitka/Nuitka/issues/2495#issuecomment-1762836583).

### Running from source code

#### Desktop

- Install [Python](https://python.org/downloads). The minimum version required is 3.8. The recommended version is 3.11, because 3.12 may not install some libraries in the absence of a C compiler.
- Download and unpack [the archive with the program](https://github.com/monosans/proxy-scraper-checker/archive/refs/heads/main.zip).
- Edit `config.ini` to your preference.
- Install [Python](https://python.org/downloads) (minimum required version is 3.7).
- Edit `config.toml` to your preference.
- Run the script that installs dependencies and starts `proxy-scraper-checker`:
- On Windows run `start.cmd`
- On Unix-like operating systems run `start.sh`

### Termux
#### Termux

To use `proxy-scraper-checker` in Termux, knowledge of the Unix command-line interface is required.

Expand All @@ -35,26 +44,15 @@ To use `proxy-scraper-checker` in Termux, knowledge of the Unix command-line int
```bash
bash <(curl -fsSL 'https://raw.githubusercontent.com/monosans/proxy-scraper-checker/main/install-termux.sh')
```
- Edit `~/proxy-scraper-checker/config.ini` to your preference using a text editor (vim/nano).
- Edit `~/proxy-scraper-checker/config.toml` to your preference using a text editor (vim/nano).
- To run `proxy-scraper-checker` use the following command:
```bash
cd ~/proxy-scraper-checker && sh start-termux.sh
```

## Checking local proxy lists

To check the local proxy lists, start the Python HTTP server on your local machine by running the `python -m http.server --bind localhost` command in the folder with the proxy lists. After that, add links to the appropriate files in `config.ini`.

## Folders description

When the script finishes running, the following folders will be created (this behavior can be changed in the config):

- `proxies` - proxies with any anonymity level.
- `proxies_anonymous` - anonymous proxies.
- `proxies_geolocation` - same as `proxies`, but includes exit-node's geolocation.
- `proxies_geolocation_anonymous` - same as `proxies_anonymous`, but includes exit-node's geolocation.
## Something else?

Geolocation format is `ip:port|Country|Region|City`.
All other info is available in `config.toml` file.

## License

Expand Down
157 changes: 0 additions & 157 deletions config.ini

This file was deleted.

Loading

0 comments on commit 1f9236a

Please sign in to comment.