Skip to content

Commit

Permalink
👽️ Fix PDF regex
Browse files Browse the repository at this point in the history
  • Loading branch information
cristianlivella committed Oct 1, 2024
1 parent e555308 commit efaacb6
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/scraper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ echo 'Starting scraping'
html=$(wget --header "Cookie: pasw_law_cookie=yes" -qO - https://www.itispaleocapa.edu.it/orario-classi/)

name=$(echo "$html" | grep -o -P '(?<=\<h2 class\="posttitle"\>).*(?=\<\/h2\>)')
pdf_url=$(echo $(echo "$html" | grep -o -P 'src="https:\/\/www\.itispaleocapa\.edu\.it\?url=(\K.*\.pdf)') | sed 's@+@ @g;s@%@\\x@g' | xargs -0 printf "%b")
pdf_url=$(echo $(echo "$html" | grep -o -P 'src="https:\/\/www\.itispaleocapa\.edu\.it[\/]{0,1}\?url=(\K.*\.pdf)') | sed 's@+@ @g;s@%@\\x@g' | xargs -0 printf "%b")

wget -qO 'orario.pdf' $pdf_url

Expand Down

0 comments on commit efaacb6

Please sign in to comment.