Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Articles plagiarism check implementation #451

Merged
merged 22 commits into from
Oct 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
d8353b6
finished hono service
kol3x Jun 6, 2024
b8fae0e
started coding a plagiarism-check github workflow
kol3x Jun 6, 2024
1a81119
finished workflow for plagiarism-check service
kol3x Jun 6, 2024
6c13d2e
refactored interface declarations for api results
kol3x Jun 6, 2024
436d0ad
added a comment to clarify code
kol3x Jun 6, 2024
58b39c1
rewrote github workflow, small changes to plagiarism check tool
kol3x Jun 11, 2024
5b9ebf9
don't send the sentence back if it had no matches
kol3x Jun 11, 2024
06b801b
properly handle case where there are no results
kol3x Jun 11, 2024
86a03e0
Fixed empty results appending to final json file
kol3x Jun 11, 2024
f00ed25
added formatting to action output
kol3x Jun 11, 2024
f705aad
Fixed formatting
kol3x Jun 11, 2024
9c15537
tidied up the workflow file a bit
kol3x Jun 11, 2024
c9a7c73
refactored promise all to avoid possible race conditions
kol3x Jun 12, 2024
906dfcd
added filtering out headers and complex sentence splitting, removed 1…
kol3x Jun 12, 2024
3cc03f5
plagiarism percent formula fix
kol3x Jun 12, 2024
bcf09db
added setup instructions
kol3x Jun 13, 2024
abdd556
returned permission check for github action
kol3x Jul 17, 2024
f7c866b
fixed logic to not include sentences with no matches in the response
kol3x Jul 29, 2024
9a7d1e4
rounded up results percentage to 2 digits after the decimal point
kol3x Jul 29, 2024
a98ffcc
moved formatting logic from githib actions to worker, fixed some form…
kol3x Jul 30, 2024
1f5de35
cleaned up github actions, removed formatting logic
kol3x Jul 30, 2024
8498300
fixed permission check and simplified worker
kol3x Aug 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions .github/workflows/plagiarism-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
on:
issue_comment:
types: [created]

permissions:
contents: read
issues: read
pull-requests: write

jobs:
permission-check-job:
runs-on: ubuntu-latest
if: |
github.event.issue.pull_request &&
contains(github.event.comment.body, '/plagiarismcheck')
outputs:
permission: ${{ steps.permissions-check.outputs.defined }}
steps:
- name: Check for Secret availability
id: permissions-check
shell: bash
run: |
echo "defined=${{ contains(fromJSON(secrets.WIKI_REVIEWERS), github.actor) }}" >> $GITHUB_OUTPUT;


plagiarism-check:
runs-on: ubuntu-latest
name: "Checks a new article from a PR for plagiarism"
needs: [ permission-check-job ]
if: needs.permission-check-job.outputs.permission == 'true'
env:
GH_TOKEN: "${{ secrets.GITHUB_TOKEN }}"

steps:
- name: Check out repository
uses: actions/checkout@v4

- name: Go to PR files
run: gh pr checkout "${{ github.event.issue.number }}"

- name: Save article contents
run: |
pr_number="${{ github.event.issue.number }}"
file_path="$(gh pr diff --name-only $pr_number | grep '\.md' | head -n 1)"
if [ -n "$file_path" ]; then
cat "$file_path" > article.txt
else
gh pr comment "${{ github.event.issue.number }}" --body "No .md file found in the PR."
exit 1
fi
- name: Check for plagiarism
run: |
content="$(cat article.txt)"
escaped_content=$(jq -Rs . <<<"$content")
result="$(curl -X POST "${{ secrets.WORKER_URL }}" -H "Content-Type: application/json" -d "{\"text\": $escaped_content}")"
echo "$result" > results.txt
- name: Format and post response
run: |
response=$(cat results.txt)
results=$(echo "$response" | jq -r '.results')
gh pr comment "${{ github.event.issue.number }}" --body "$results"
33 changes: 33 additions & 0 deletions tools/plagiarism-checker/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# prod
dist/

# dev
.yarn/
!.yarn/releases
.vscode/*
!.vscode/launch.json
!.vscode/*.code-snippets
.idea/workspace.xml
.idea/usage.statistics.xml
.idea/shelf

# deps
node_modules/
.wrangler

# env
.env
.env.production
.dev.vars

# logs
logs/
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*

# misc
.DS_Store
28 changes: 28 additions & 0 deletions tools/plagiarism-checker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Plagiarism Checker

This service does plagiarism evaluation throw a Cloudflare Worker.

### Setup

- Get Google API key and search engine ID from [here](https://developers.google.com/custom-search/v1/overview#api_key)

- Set up wrangler.toml according to your Cloudflare credentials and add two of following enviromental variables:

- **GOOGLE_SEARCH_ENGINE_CX**

- **GOOGLE_API_KEY**

- Instal dependencies and deploy Worker

```bash
npm i
npm run deploy
```

- Save a deployed worker URL.

- Add a **WORKER_URL** enviromental variable to your repository secrets, so Github Actions can access the service.

### Usage

Leave a comment with *"/plagiarismcheck"* in a pull request with new article to activate bot.
Loading