Skip to content
This repository has been archived by the owner on Mar 29, 2022. It is now read-only.

allow multiple token values #185

Merged
merged 16 commits into from
Apr 4, 2020
Merged

allow multiple token values #185

merged 16 commits into from
Apr 4, 2020

Conversation

de-code
Copy link
Collaborator

@de-code de-code commented Apr 2, 2020

@de-code de-code self-assigned this Apr 2, 2020
@de-code de-code changed the title [wip] allow multiple tokens allow multiple token values Apr 4, 2020
@de-code de-code merged commit 9825915 into develop Apr 4, 2020
@de-code de-code deleted the allow-multiple-tokens branch April 4, 2020 11:31
@de-code
Copy link
Collaborator Author

de-code commented Apr 7, 2020

With a new dataset, where line numbers have been removed.

With only the first token (for reference)

DL glove.840B.300d word embedding, mseq 3000, no stride

Evaluation:
	f1 (micro): 91.58
                  precision    recall  f1-score   support

          <body>     0.8904    0.8773    0.8838     17911
        <header>     0.7907    0.7777    0.7842       967
          <page>     0.9414    0.9803    0.9605     17644
    <references>     0.9075    0.9197    0.9136     47515

all (micro avg.)     0.9099    0.9218    0.9158     84037

With first two tokens

Wapiti (max epochs: 2000)

Evaluation:
	f1 (micro): 89.17
                  precision    recall  f1-score   support

          <body>     0.8532    0.8314    0.8421     18378
        <header>     0.6598    0.7322    0.6941       967
          <page>     0.9432    0.9485    0.9458     17974
    <references>     0.8993    0.8894    0.8943     48262

all (micro avg.)     0.8959    0.8876    0.8917     85581

DL glove.840B.300d word embedding, mseq 3000, no stride

Evaluation:
	f1 (micro): 93.59
                  precision    recall  f1-score   support

          <body>     0.8918    0.8925    0.8922     17911
        <header>     0.8065    0.7932    0.7998       967
          <page>     0.9395    0.9833    0.9609     17644
    <references>     0.9419    0.9494    0.9457     47515

all (micro avg.)     0.9293    0.9426    0.9359     84037

DL no word embeddings, mseq 3000, no stride

Evaluation:
	f1 (micro): 92.75
                  precision    recall  f1-score   support

          <body>     0.8931    0.8833    0.8882     17911
        <header>     0.7987    0.7839    0.7912       967
          <page>     0.9439    0.9718    0.9576     17644
    <references>     0.9345    0.9326    0.9336     47515

all (micro avg.)     0.9263    0.9286    0.9275     84037

DL glove.840B.300d word embedding, mseq 3000, stride 1500

Evaluation:
	f1 (micro): 93.59
                  precision    recall  f1-score   support

          <body>     0.8944    0.8922    0.8933     17911
        <header>     0.8246    0.8118    0.8181       967
          <page>     0.9436    0.9777    0.9603     17644
    <references>     0.9394    0.9507    0.9450     47515

all (micro avg.)     0.9296    0.9423    0.9359     84037

DL no word embeddings, mseq 3000, stride 1500

Evaluation:
	f1 (micro): 90.21
                  precision    recall  f1-score   support

          <body>     0.8786    0.8720    0.8753     17911
        <header>     0.7787    0.7642    0.7714       967
          <page>     0.9334    0.9716    0.9521     17644
    <references>     0.8849    0.9075    0.8960     47515

all (micro avg.)     0.8928    0.9117    0.9021     84037

DL glove.840B.300d word embedding, mseq 3000, stride 3000

Evaluation:
	f1 (micro): 93.90
                  precision    recall  f1-score   support

          <body>     0.9097    0.8900    0.8998     17911
        <header>     0.8219    0.8066    0.8142       967
          <page>     0.9544    0.9760    0.9651     17644
    <references>     0.9446    0.9482    0.9464     47515

all (micro avg.)     0.9381    0.9400    0.9390     84037

@de-code
Copy link
Collaborator Author

de-code commented Apr 14, 2020

For comparison using the previous dataset

DL glove.840B.300d word embedding, mseq 3000, no stride (first token only)

Evaluation:
	f1 (micro): 73.66
                  precision    recall  f1-score   support

          <body>     0.8774    0.8727    0.8750     79092
        <header>     0.8035    0.6762    0.7344      5973
          <page>     0.7616    0.8838    0.8182     11824
    <references>     0.2557    0.7706    0.3840     16897

all (micro avg.)     0.6508    0.8483    0.7366    113786

DL glove.840B.300d word embedding, mseq 3000, no stride (two tokens)

Evaluation:
	f1 (micro): 74.17
                  precision    recall  f1-score   support

          <body>     0.8770    0.8749    0.8759     79092
        <header>     0.8131    0.6678    0.7333      5973
          <page>     0.8019    0.8756    0.8371     11824
    <references>     0.2610    0.7698    0.3898     16897

all (micro avg.)     0.6588    0.8485    0.7417    113786

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant