A web service for scanning media hosted on a Matrix media repository.
This project requires libmagic to be installed on the system. On Debian/Ubuntu:
sudo apt install libmagic1
Then, preferably in a virtual environment, install the Matrix Content Scanner:
pip install matrix-content-scanner
Copy and edit the sample configuration file. Each key is documented in this file.
Then run the content scanner (from within your virtual environment if one was created):
python -m matrix_content_scanner.mcs -c CONFIG_FILE
Where CONFIG_FILE
is the path to your configuration file.
This project provides a Docker image to run it, published as
vectorim/matrix-content-scanner
.
To use it, copy the sample configuration file into a dedicated
directory, edit it accordingly with your requirements, and then mount this directory as
/data
in the image. Do not forget to also publish the port that the content scanner's
Web server is configured to listen on.
For example, assuming the port for the Web server is 8080
:
docker run -p 8080:8080 -v /path/to/your/config/directory:/data vectorim/matrix-content-scanner
See the API documentation for information about how clients are expected to interact with the Matrix Content Scanner.
Migrating from the legacy Matrix Content Scanner
Because it uses the same APIs and Olm pickle format as the legacy Matrix Content Scanner, this project can be used as a drop-in replacement. The only change (apart from the deployment instructions) is the configuration format:
- the
server
section is renamedweb
scan.tempDirectory
is renamedscan.temp_directory
scan.baseUrl
is renameddownload.base_homeserver_url
(and becomes optional)scan.doNotCacheExitCodes
is renamedresult_cache.exit_codes_to_ignore
scan.directDownload
is removed. Direct download always happens whendownload.base_homeserver_url
is absent from the configuration file, and setting a value for it will always cause files to be downloaded from the server configured.proxy
is renameddownload.proxy
middleware.encryptedBody.pickleKey
is renamedcrypto.pickle_key
middleware.encryptedBody.picklePath
is renamedcrypto.pickle_path
acceptedMimeType
is renamedscan.allowed_mimetypes
requestHeader
is renameddownload.additional_headers
and turned into a dictionary.
Note that the format of the cryptographic pickle file and key are compatible between this project and the legacy Matrix Content Scanner. If no file exist at that path one will be created automatically.
In a virtual environment with poetry (>=1.8.3) installed, run
poetry install
To run the unit tests, you can use:
tox -e py
To run the linters and mypy
type checker, use ./scripts-dev/lint.sh
.
The exact steps for releasing will vary; but this is an approach taken by the Synapse developers (assuming a Unix-like shell):
-
Set a shell variable to the version you are releasing (this just makes subsequent steps easier):
version=X.Y.Z
-
Update
setup.cfg
so that theversion
is correct. -
Stage the changed files and commit.
git add -u git commit -m v$version -n
-
Push your changes.
git push
-
When ready, create a signed tag for the release:
git tag -s v$version
Base the tag message on the changelog.
-
Push the tag.
git push origin tag v$version
-
Create a release, based on the tag you just pushed, on GitHub or GitLab.
-
Create a source distribution and upload it to PyPI:
python -m build twine upload dist/matrix_content_scanner-$version*