Automation of Text-Based Economic Indicator Construction:
A Pilot Exploration on Economic Policy Uncertainty Index
This repo aims to facilitate reproducing tables in this paper. You can find the poster here and for more details, please check out our paper!
Our work has been accepted at CIKM 2024 for the Short Research Paper track.
This is a poetry project so you can set up the enviroment through the command:
poetry install
To check the prompt for either definition or simple task, you can browse the configuration file config/main.yaml
, modify some parameters and run the command:
poetry run python auto_EPU/keyword.py
Or configure it directly in CLI through the functionality of the package hydra.
poetry run python auto_EPU/keyword.py keyword.role=economist
We utilize the package llm-research built on top of the LangChain framework to interact with OpenAI API. The llm-research package will log predictions using MLflow. We choose this package because it provides a easy-to-use API to generate structured outputs and adopt few-shot prompting strategy. Notice that we' ve inspected the implementation carefully and you can use your own dataset to follow instructions below.
- add your OpenAI API key in the file .env.example and reanme it as .env
- modify the configuration file
config/model/openai.yaml
- run the command:
poetry run python auto_EPU/denoise.py
- check out the MLflow ui
poetry run mlflow ui
- modify the configuration
config/model/openai.yaml
- run the command to prepare the training dataset for OpenAI Fine-tuning API
poetry run python auto_EPU/finetune_format.py
- Head to the OpenAI platform to create a new fine-tuning job
Please check out the directory notebooks
. For Table 3, we directly utilize the MLflow tracking service supported by llm-research package to record the metrics. We use the model gpt-3.5-turbo-1106
with 0 temperature to perform denoise task. The number of few-shot examples is 6. Moreover, we leverage 1000 training examples fine-tuning on gpt-3.5-turbo-1106
with default parameters of OpenAI's Fine-tuning API.