Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge update #125

Merged
merged 1 commit into from
Jan 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .github/workflows/auto-merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,17 @@ concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true
jobs:
auto-merge-dependabot:
runs-on: ubuntu-latest
if: ${{ github.actor == 'dependabot[bot]' }}
steps:
- id: dependabot-metadata
uses: dependabot/fetch-metadata@v1
- if: ${{ steps.dependabot-metadata.outputs.update-type != 'version-update:semver-major' }}
run: gh pr merge --auto --delete-branch --squash "${PR_URL}"
env:
PR_URL: ${{ github.event.pull_request.html_url }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
auto-merge-updates:
runs-on: ubuntu-latest
if: ${{ github.actor == 'monosans' && startsWith(github.head_ref, 'update/') }}
Expand Down
27 changes: 27 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,30 @@ jobs:
steps:
- uses: actions/checkout@v4
- run: pipx run pre-commit run --all-files --show-diff-on-failure
build:
strategy:
matrix:
os:
- ubuntu
- macos
- windows
fail-fast: false
runs-on: ${{ matrix.os }}-latest
steps:
- uses: actions/checkout@v4
- run: pipx install poetry
- uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: poetry
check-latest: true
- run: poetry install --only main,nuitka --sync --no-root --no-interaction
- run: poetry run --no-interaction python -m nuitka --onefile --python-flag='-m' --prefer-source-code --assume-yes-for-downloads --lto=yes proxy_scraper_checker
- uses: actions/upload-artifact@v4
with:
name: artifact-${{ matrix.os }}
path: |
config.toml
proxy_scraper_checker.bin
proxy_scraper_checker.exe
if-no-files-found: error
3 changes: 3 additions & 0 deletions .github/workflows/update-dependencies.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ jobs:
strategy:
matrix:
include:
- cmd: pipx run poetry lock --no-interaction
commit-msg: Update poetry.lock
branch: update/poetry-lock
- cmd: pipx run pre-commit autoupdate
commit-msg: Update .pre-commit-config.yaml
branch: update/pre-commit-config
Expand Down
5 changes: 1 addition & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,4 @@ $RECYCLE.BIN/

# End of https://www.toptal.com/developers/gitignore/api/jetbrains+all,linux,macos,python,vim,visualstudiocode,windows

proxies/
proxies_anonymous/
proxies_geolocation/
proxies_geolocation_anonymous/
out/
11 changes: 9 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,13 @@ repos:
- --scripts-are-modules
additional_dependencies:
- aiohttp<4
- attrs
- certifi
- charset-normalizer<4
- maxminddb<3
- platformdirs<5
- rich<14
- typing-extensions<5
- uvloop<0.20; implementation_name == "cpython" and (sys_platform == "darwin" or sys_platform == "linux")
- types-aiofiles
- typing-extensions<5; python_version < "3.11"
- tomli<3; python_version < "3.11"
- uvloop<0.20; platform_python_implementation == "CPython" and (sys_platform == "darwin" or sys_platform == "linux")
49 changes: 25 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,37 @@

HTTP, SOCKS4, SOCKS5 proxies scraper and checker.

- Asynchronous.
- Uses regex to search for proxies (ip:port format) on a web page, allowing proxies to be extracted even from json without making changes to the code.
- It is possible to specify the URL to which to send a request to check the proxy.
- Can sort proxies by speed.
- Supports determining the geolocation of the proxy exit node.
- Can determine if the proxy is anonymous.
- Supports determining the geolocation of the proxy exit node.
- Can sort proxies by speed.
- Uses regex to find proxies of format `protocol://username:password@ip:port` on a web page or in a local file, allowing proxies to be extracted even from json without code changes.
- Supports proxies with authentication.
- It is possible to specify the URL to which to send a request to check the proxy.
- Supports saving to plain text and json.
- Asynchronous.

You can get proxies obtained using this script in [monosans/proxy-list](https://github.com/monosans/proxy-list).
You can get proxies obtained using this project in [monosans/proxy-list](https://github.com/monosans/proxy-list).

## Installation and usage

### Desktop
### Pre-compiled binary

This is the easiest way, but it is only available for x64 Windows, macOS and Linux. Just download the archive for your OS from <https://nightly.link/monosans/proxy-scraper-checker/workflows/ci/main?preview>, unzip it, edit `config.toml` and run the executable.

If Windows Defender detects an executable file as a virus, please read [this](https://github.com/Nuitka/Nuitka/issues/2495#issuecomment-1762836583).

### Running from source code

#### Desktop

- Install [Python](https://python.org/downloads). The minimum version required is 3.8. The recommended version is 3.11, because 3.12 may not install some libraries in the absence of a C compiler.
- Download and unpack [the archive with the program](https://github.com/monosans/proxy-scraper-checker/archive/refs/heads/main.zip).
- Edit `config.ini` to your preference.
- Install [Python](https://python.org/downloads) (minimum required version is 3.7).
- Edit `config.toml` to your preference.
- Run the script that installs dependencies and starts `proxy-scraper-checker`:
- On Windows run `start.cmd`
- On Unix-like operating systems run `start.sh`

### Termux
#### Termux

To use `proxy-scraper-checker` in Termux, knowledge of the Unix command-line interface is required.

Expand All @@ -35,27 +45,18 @@ To use `proxy-scraper-checker` in Termux, knowledge of the Unix command-line int
```bash
bash <(curl -fsSL 'https://raw.githubusercontent.com/monosans/proxy-scraper-checker/main/install-termux.sh')
```
- Edit `~/proxy-scraper-checker/config.ini` to your preference using a text editor (vim/nano).
- Edit `~/proxy-scraper-checker/config.toml` to your preference using a text editor (vim/nano).
- To run `proxy-scraper-checker` use the following command:
```bash
cd ~/proxy-scraper-checker && sh start-termux.sh
```

## Checking local proxy lists

To check the local proxy lists, start the Python HTTP server on your local machine by running the `python -m http.server --bind localhost` command in the folder with the proxy lists. After that, add links to the appropriate files in `config.ini`.
## Something else?

## Folders description

When the script finishes running, the following folders will be created (this behavior can be changed in the config):

- `proxies` - proxies with any anonymity level.
- `proxies_anonymous` - anonymous proxies.
- `proxies_geolocation` - same as `proxies`, but includes exit-node's geolocation.
- `proxies_geolocation_anonymous` - same as `proxies_anonymous`, but includes exit-node's geolocation.

Geolocation format is `ip:port|Country|Region|City`.
All other info is available in `config.toml` file.

## License

[MIT](LICENSE)

This product includes GeoLite2 Data created by MaxMind, available from <https://www.maxmind.com>.
157 changes: 0 additions & 157 deletions config.ini

This file was deleted.

Loading