Privacy Checklist

Code for NAACL 25 paper: Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory (https://arxiv.org/abs/2408.10053)

Preparation

numpy
torch
transformers
requests
tqdm
SentenceTransformer
nltk
networkx

Reproduce the Results

We have prepared shell scripts to run the experiments with corresponding arguments. Our implementation currently supports LLMs from huggingface transformers and SiliconFlow API. Here are some explanations about the arguments we used:

api_name: if api_name is None, the huggingface transformer is used to load models. Otherwise, SiliconFlow API is used with API_KEY specified in config.py.

model: If SiliconFlow API is used, it refers to the model in the API. Otherwise, it is the model_id from transformers.

xxx_template: The template used for parsing.

xxx_round: The rounds of generations for xxx.

xxx_tokens: The maximal generation tokens for xxx.

Direct prompt (DP)

For real court cases, run:

run_dp_real.sh

For synthetic cases, run:

run_dp_generate.sh

CoT prompt with automatic planning (CoT-auto)

For real court cases, run:

run_cot_auto_real.sh

For synthetic cases, run:

run_cot_auto_generate.sh

CoT Prompt with manual guidelines (CoT-manual).

For real court cases, run:

run_cot_manual_real.sh

For synthetic cases, run:

run_cot_manual_generate.sh

CoT prompt with regulation IDs from agent-based retrieval (Agent-ID)

For real court cases, run:

run_agent_id_real.sh

For synthetic cases, run:

run_agent_id_generate.sh

CoT prompt with LLM explanation and regulations retrieved via BM25 (BM25-content)

For real court cases, run:

run_bm25_content_real.sh

For synthetic cases, run:

run_bm25_content_generate.sh

CoT prompt with CI characteristics extraction and regulations retrieved via embedding similarity (CI-ES-content)

For real court cases, run:

run_emb_content_real.sh

For synthetic cases, run:

run_emb_content_generate.sh

Citation

Please kindly cite the following paper if you found our method and resources helpful!

@misc{li-2024-checklist,
      title={Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory}, 
      author={Li, Haoran and Fan, Wei and Chen, Yulin and Cheng, Jiayang and Chu, Tianshu and Zhou, Xuebing and Hu, Peizhao and Song, Yangqiu},
      journal={arXiv preprint arXiv:2408.10053},
      year={2024}
}

Miscellaneous

Please send any questions about the code and/or the algorithm to [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
agents		agents
chatgpt_answer_case		chatgpt_answer_case
checklist		checklist
data/HIPAA		data/HIPAA
prompts		prompts
HIPAA.graphml		HIPAA.graphml
HIPAA_defs.json		HIPAA_defs.json
KB_annotated.json		KB_annotated.json
LICENSE		LICENSE
README.md		README.md
attribute_kg.graphml		attribute_kg.graphml
config.py		config.py
cot_auto_answer_HIPAA.py		cot_auto_answer_HIPAA.py
data.json		data.json
direct_answer_HIPAA.py		direct_answer_HIPAA.py
emb_search_with_gt.sh		emb_search_with_gt.sh
emb_search_without_gt.sh		emb_search_without_gt.sh
emb_sim_search.py		emb_sim_search.py
overview-1.png		overview-1.png
overview.pdf		overview.pdf
parse_string.py		parse_string.py
prompt_templates.py		prompt_templates.py
role_kg.graphml		role_kg.graphml
run_agent_id_generate.sh		run_agent_id_generate.sh
run_agent_id_real.sh		run_agent_id_real.sh
run_bm25_content_generate.sh		run_bm25_content_generate.sh
run_bm25_content_real.sh		run_bm25_content_real.sh
run_cot_auto_generate.sh		run_cot_auto_generate.sh
run_cot_auto_real.sh		run_cot_auto_real.sh
run_cot_manual_generate.sh		run_cot_manual_generate.sh
run_cot_manual_real.sh		run_cot_manual_real.sh
run_dp_generate.sh		run_dp_generate.sh
run_dp_real.sh		run_dp_real.sh
run_emb_content_generate.sh		run_emb_content_generate.sh
run_emb_content_real.sh		run_emb_content_real.sh
search_content_for_answer_HIPAA.py		search_content_for_answer_HIPAA.py
search_kb_for_answer_HIPAA.py		search_kb_for_answer_HIPAA.py
search_trie_for_answer_HIPAA.py		search_trie_for_answer_HIPAA.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Privacy Checklist

Preparation

Reproduce the Results

Direct prompt (DP)

CoT prompt with automatic planning (CoT-auto)

CoT Prompt with manual guidelines (CoT-manual).

CoT prompt with regulation IDs from agent-based retrieval (Agent-ID)

CoT prompt with LLM explanation and regulations retrieved via BM25 (BM25-content)

CoT prompt with CI characteristics extraction and regulations retrieved via embedding similarity (CI-ES-content)

Citation

Miscellaneous

About

Releases

Packages

Languages

License

HKUST-KnowComp/privacy_checklist

Folders and files

Latest commit

History

Repository files navigation

Privacy Checklist

Preparation

Reproduce the Results

Direct prompt (DP)

CoT prompt with automatic planning (CoT-auto)

CoT Prompt with manual guidelines (CoT-manual).

CoT prompt with regulation IDs from agent-based retrieval (Agent-ID)

CoT prompt with LLM explanation and regulations retrieved via BM25 (BM25-content)

CoT prompt with CI characteristics extraction and regulations retrieved via embedding similarity (CI-ES-content)

Citation

Miscellaneous

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages