This is a simple script used to run this code from scratch. Using the default settings, you can get macro f1 score 0.70769.
The data preprocess steps are no the same as the one I used during the competition, so the final f1 score may be not the same:
Some major differences:
- Jieba is used instead of LTP here
- NER is not used here
- No custom dictionary used here
- Download data from this data link.
- Unzip the files to get the raw csv files.
- Download embedding file from this embedding link.
- Unzip the file to get the embedding file.
Modify the file paths in preprocess.sh:
- TRAIN_FILE
- VALIDATION_FILE
- TESTA_FILE
- TESTB_FILE
Refer to 1 to get the data file path
- EMBEDDING_FILE
Refer to 2 to get the embedding file path
- VOCAB_SIZE
You can try different vocab size
Then run
bash preprocess.sh
We will create all the files needed under ./data folder.
Change your workdir to parent folder, and run the training scripts:
bash bash/elmo_train.sh
After training, we can get the predicted results of test files:
bash bash/elmo_inference.sh