Blocking JavaScript without Breaking the Web Abdul Haddi Amjad, Zubair Shafiq, Muhammad Ali Gulzar Proceedings on Privacy Enhancing Technologies Symposium (PETS), 2023
The ArXiv version of the manuscript is avaibable at : Blocking JS without Breaking the Web
This repository provides the complete instrumentation to evaluate different JS blocking strategies proposed in the paper.
In this paper we propose a three step process:
-
JavaScript Corpus Collection: In this process we crawl landing pages of websites using chrome extension to capture network requests and its associated call stacks. Then, each request is labeled using Filter Lists.
-
Localizing Tracking and Functional JS Code: We use previously labelled dataset to generate spectra of entites (script and methods) using spectra-based fault localization.
-
JS Blocking Impact Analysis: Eventually we use the annotated spectra of entities to try different JS blocking strategies.
-
We report (1) network request count and (2) missing functional tag URLs as a breakage metric.
These artifacts are rigrously tested on Macbook with following configuration:
- Intel core i7 processor
- 4-core CPU
- 16 GB Ram
- 500 GB SSD
Note that the base image in the dockerfile supports
amd64
i.e, x86-64 machines.
- numpy==1.24.3
- pandas==2.0.1
- adblockparser==0.7
- openpyxl==3.1.2
- pyvirtualdisplay==3.0
- selenium==4.9.1
- seaborn==0.12.2
- tldextract==3.4.1
- webdriver-manager==3.8.6
- matplotlib==3.7.1
- xlrd==2.0.1
- beautifulsoup4==4.12.2
git clone https://github.com/hadiamjad/Blocking-JavaScript-without-Breaking-the-Web.git
and move in the directory using cd
command
Make sure Docker has these minimum requirements:
Expected time for the build: approximately 50 seconds
Expected image size: 2.05 GB
-
This command
docker build -t blockingjs .
will build docker image using Dockerfile. -
Try running
ls
command in the docker terminal and it will list all the files inside this repository.
In this tutorial we will be using tmux sessions to create server and client sessions for testing. Some important commands:
- To create new session
tmux new -s {session-name}
- To leave existing session
Cntrl + b
followed byd
- To join existing session
tmux a -t {session-name}
Make sure your docker build was successful.
-
Create new tmux session for running all server
tmux new -s server
. This will automatically join the session as well. -
Run the following command
bash server.sh
this will start all servers for different configurations. -
Leave the
server
session usingCntrl + b
followed byd
.
-
Create new tmux session for running JavaScript Corpus Collection & Localizing Tracking and Functional JS Code using this
tmux new -s client
command. This will automatically join the session as well. -
Now run
cd Control/webpage-crawler-extension
and then once you are inside the directory, simply runbash client.sh
orchmod +x ./client.sh
followed by./client.sh
to start crawler.
It involves crawlling the landing pages of 10 sample websites listed in 'Control\webpage-crawler-extension\csv\test.csv', then label it using filter lists, print the number of tracking and functional requests count in control setting, eventually run SBFL.py to generate tracking score for other configurations.
Expected time:
Once all steps are complete the output will look like this:
Make sure you have stable internet otherwise it might end up with this error
- Using the same tmux session i.e
client
, you can test other configurations.
Expected time:
- Staying inside
client
session. - Go inside
cd ../../ALL/webpage-crawler-extension
and runbash client.sh
orchmod +x ./client.sh
followed by./client.sh
.
This will crawl the landing pages of websites(from previous step) in ALL setting(all tracking, functional and mixed scripts are blocked). This step will crawl the websites, then label it using filter lists, print the number of tracking and functional requests count in ALL setting.
Note last two lines report the raw numbers (1) network request count and (2) missing functional tag URLs as a breakage metric. These numbers may vary from screenshot due to dynamic nature of website
The output will look like this(number may vary due to dynamic nature of websites):
Make sure you have stable internet otherwise it might end up with this error
-
Staying inside
client
session. -
Go inside
cd ../../TS/webpage-crawler-extension
and runbash client.sh
orchmod +x ./client.sh
followed by./client.sh
.
This will crawl the landing pages of websites(from Control setting) using chrome extension configured with TS setting where all tracking scripts are blocked. Then label it using filter lists, and print the number of tracking and functional requests count in TS setting.
Note last two lines report the raw numbers of (1) network request count and (2) missing functional tag URLs as a breakage metric. These numbers may vary from screenshot due to dynamic nature of website
The output will look like this(number may vary due to dynamic nature of websites):
Make sure you have stable internet otherwise it might end up with this error
- Staying inside
client
session. - Go inside
cd ../../MS/webpage-crawler-extension
and runbash client.sh
orchmod +x ./client.sh
followed by./client.sh
.
This will crawl the landing pages of websites(from Control setting) using chrome extension configured with MS setting where all mixed scripts are blocked. Then label it using filter lists, and print the number of tracking and functional requests count in MS setting.
Note last two lines report the raw numbers of (1) network request count and (2) missing functional tag URLs as a breakage metric. These numbers may vary from screenshot due to dynamic nature of website.
The output will look like this(number may vary due to dynamic nature of websites):
Make sure you have stable internet otherwise it might end up with this error
- Staying inside
client
session. - Go inside
cd ../../TMS/webpage-crawler-extension
and runbash client.sh
orchmod +x ./client.sh
followed by./client.sh
.
This will crawl the landing pages of websites(from Control setting) using chrome extension configured with TMS setting where all tracking scripts and mixed scripts are blocked. Then label it using filter lists, and print the number of tracking and functional requests count in TMS setting.
Note last two lines report the raw numbers of (1) network request count and (2) missing functional tag URLs as a breakage metric. These numbers may vary from screenshot due to dynamic nature of website
The output will look like this(number may vary due to dynamic nature of websites):
Make sure you have stable internet otherwise it might end up with this error
- Staying inside
client
session. - Go inside
cd ../../TM/webpage-crawler-extension
and runbash client.sh
orchmod +x ./client.sh
followed by./client.sh
.
This will crawl the landing pages of websites(from Control setting) using chrome extension configured with TM setting where all tracking methods are blocked. Then label it using filter lists, and print the number of tracking and functional requests count in TM setting.
Note last two lines report the raw numbers of (1) network request count and (2) missing functional tag URLs as a breakage metric. These numbers may vary from screenshot due to dynamic nature of website
The output will look like this(number may vary due to dynamic nature of websites):
Make sure you have stable internet otherwise it might end up with this error
Before reeunning the client.sh
script, clear the folders inside server/output
for specific configuration.
Go back in main directory cd ../../
- You can simply run the following command:
python -W ignore requestCountBarPlots.py {Configuration 1} {Configuration 2}
--- here{Configuration 1}
and{Configuration 2}
are placeholders. For example: - RQ4:
python -W ignore requestCountBarPlots.py TMS TM
- RQ3:
python -W ignore requestCountBarPlots.py TS MS
- RQ1:
python -W ignore requestCountBarPlots.py ALL
This will generate plot pdf in
Figures/BarPlot.pd
- You can simply run the following command:
python -W ignore reductionBarPlots.py {Configuration 1} {Configuration 2}
--- here{Configuration 1}
and{Configuration 2}
are placeholders. For example: - RQ4:
python -W ignore reductionBarPlots.py TMS TM
- RQ4:
python -W ignore reductionBarPlots.py TS MS
- RQ4:
python -W ignore reductionBarPlots.py ALL
This will generate plot pdf in
Figures/ReductionPlot.pd
- You can simply run the following command:
python -W ignore requestDistribution.py {Configuration 1}
--- here{Configuration 1}
is placeholders. For example, if you want to create for RQ2 you can run following command:python -W ignore rrequestDistribution.py TMS
- This will generate plot pdf in
Figures/DistributionPlot.pd
You can run the following command in LOCAL REPOSITORY SHELL to copy the Figures folder to local machine:
docker cp {container_id}:/Crawler/Figures .
The container id is located on the docker shell
docker cp 7c486e87b63a:/Crawler/Figures .
- The artifact as of hash
4260a02
- contact: [email protected]