Skip to content

Commit

Permalink
Huge update
Browse files Browse the repository at this point in the history
  • Loading branch information
monosans committed Jan 21, 2024
1 parent b0e690a commit 3e47e21
Show file tree
Hide file tree
Showing 35 changed files with 2,542 additions and 834 deletions.
11 changes: 11 additions & 0 deletions .github/workflows/auto-merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,17 @@ concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true
jobs:
auto-merge-dependabot:
runs-on: ubuntu-latest
if: ${{ github.actor == 'dependabot[bot]' }}
steps:
- id: dependabot-metadata
uses: dependabot/fetch-metadata@v1
- if: ${{ steps.dependabot-metadata.outputs.update-type != 'version-update:semver-major' }}
run: gh pr merge --auto --delete-branch --squash "${PR_URL}"
env:
PR_URL: ${{ github.event.pull_request.html_url }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
auto-merge-updates:
runs-on: ubuntu-latest
if: ${{ github.actor == 'monosans' && startsWith(github.head_ref, 'update/') }}
Expand Down
27 changes: 27 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,30 @@ jobs:
steps:
- uses: actions/checkout@v4
- run: pipx run pre-commit run --all-files --show-diff-on-failure
build:
strategy:
matrix:
os:
- ubuntu
- macos
- windows
fail-fast: false
runs-on: ${{ matrix.os }}-latest
steps:
- uses: actions/checkout@v4
- run: pipx install poetry
- uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: poetry
check-latest: true
- run: poetry install --only main,nuitka --sync --no-root --no-interaction
- run: poetry run --no-interaction python -m nuitka --onefile --python-flag='-m' --prefer-source-code --assume-yes-for-downloads --lto=yes proxy_scraper_checker
- uses: actions/upload-artifact@v4
with:
name: artifact-${{ matrix.os }}
path: |
config.toml
proxy_scraper_checker.bin
proxy_scraper_checker.exe
if-no-files-found: error
3 changes: 3 additions & 0 deletions .github/workflows/update-dependencies.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ jobs:
strategy:
matrix:
include:
- cmd: pipx run poetry lock --no-interaction
commit-msg: Update poetry.lock
branch: update/poetry-lock
- cmd: pipx run pre-commit autoupdate
commit-msg: Update .pre-commit-config.yaml
branch: update/pre-commit-config
Expand Down
5 changes: 1 addition & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,4 @@ $RECYCLE.BIN/

# End of https://www.toptal.com/developers/gitignore/api/jetbrains+all,linux,macos,python,vim,visualstudiocode,windows

proxies/
proxies_anonymous/
proxies_geolocation/
proxies_geolocation_anonymous/
out/
11 changes: 9 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,13 @@ repos:
- --scripts-are-modules
additional_dependencies:
- aiohttp<4
- attrs
- certifi
- charset-normalizer<4
- maxminddb<3
- platformdirs<5
- rich<14
- typing-extensions<5
- uvloop<0.20; implementation_name == "cpython" and (sys_platform == "darwin" or sys_platform == "linux")
- types-aiofiles
- typing-extensions<5; python_version < "3.11"
- tomli<3; python_version < "3.11"
- uvloop<0.20; platform_python_implementation == "CPython" and (sys_platform == "darwin" or sys_platform == "linux")
49 changes: 25 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,37 @@

HTTP, SOCKS4, SOCKS5 proxies scraper and checker.

- Asynchronous.
- Uses regex to search for proxies (ip:port format) on a web page, allowing proxies to be extracted even from json without making changes to the code.
- It is possible to specify the URL to which to send a request to check the proxy.
- Can sort proxies by speed.
- Supports determining the geolocation of the proxy exit node.
- Can determine if the proxy is anonymous.
- Supports determining the geolocation of the proxy exit node.
- Can sort proxies by speed.
- Uses regex to find proxies of format `protocol://username:password@ip:port` on a web page or in a local file, allowing proxies to be extracted even from json without code changes.
- Supports proxies with authentication.
- It is possible to specify the URL to which to send a request to check the proxy.
- Supports saving to plain text and json.
- Asynchronous.

You can get proxies obtained using this script in [monosans/proxy-list](https://github.com/monosans/proxy-list).
You can get proxies obtained using this project in [monosans/proxy-list](https://github.com/monosans/proxy-list).

## Installation and usage

### Desktop
### Pre-compiled binary

This is the easiest way, but it is only available for x64 Windows, macOS and Linux. Just download the archive for your OS from <https://nightly.link/monosans/proxy-scraper-checker/workflows/ci/main?preview>, unzip it, edit `config.toml` and run the executable.

If Windows Defender detects an executable file as a virus, please read [this](https://github.com/Nuitka/Nuitka/issues/2495#issuecomment-1762836583).

### Running from source code

#### Desktop

- Install [Python](https://python.org/downloads). The minimum version required is 3.8. The recommended version is 3.11, because 3.12 may not install some libraries in the absence of a C compiler.
- Download and unpack [the archive with the program](https://github.com/monosans/proxy-scraper-checker/archive/refs/heads/main.zip).
- Edit `config.ini` to your preference.
- Install [Python](https://python.org/downloads) (minimum required version is 3.7).
- Edit `config.toml` to your preference.
- Run the script that installs dependencies and starts `proxy-scraper-checker`:
- On Windows run `start.cmd`
- On Unix-like operating systems run `start.sh`

### Termux
#### Termux

To use `proxy-scraper-checker` in Termux, knowledge of the Unix command-line interface is required.

Expand All @@ -35,27 +45,18 @@ To use `proxy-scraper-checker` in Termux, knowledge of the Unix command-line int
```bash
bash <(curl -fsSL 'https://raw.githubusercontent.com/monosans/proxy-scraper-checker/main/install-termux.sh')
```
- Edit `~/proxy-scraper-checker/config.ini` to your preference using a text editor (vim/nano).
- Edit `~/proxy-scraper-checker/config.toml` to your preference using a text editor (vim/nano).
- To run `proxy-scraper-checker` use the following command:
```bash
cd ~/proxy-scraper-checker && sh start-termux.sh
```

## Checking local proxy lists

To check the local proxy lists, start the Python HTTP server on your local machine by running the `python -m http.server --bind localhost` command in the folder with the proxy lists. After that, add links to the appropriate files in `config.ini`.
## Something else?

## Folders description

When the script finishes running, the following folders will be created (this behavior can be changed in the config):

- `proxies` - proxies with any anonymity level.
- `proxies_anonymous` - anonymous proxies.
- `proxies_geolocation` - same as `proxies`, but includes exit-node's geolocation.
- `proxies_geolocation_anonymous` - same as `proxies_anonymous`, but includes exit-node's geolocation.

Geolocation format is `ip:port|Country|Region|City`.
All other info is available in `config.toml` file.

## License

[MIT](LICENSE)

This product includes GeoLite2 Data created by MaxMind, available from <https://www.maxmind.com>.
157 changes: 0 additions & 157 deletions config.ini

This file was deleted.

Loading

0 comments on commit 3e47e21

Please sign in to comment.