Skip to content

Latest commit

 

History

History
118 lines (100 loc) · 11 KB

openai_community_gpt2_medium.md

File metadata and controls

118 lines (100 loc) · 11 KB

Report for openai-community/gpt2-medium

Model info

  • Model Info:
    • Tied embeddings: True
    • LM head uses bias: False
    • Embeddings shape: [50257, 1024]
  • Tokenizer Info:
    • Vocab Size: 50257
    • Tokenizer Class: GPT2Tokenizer
    • Bytes handling: Byte Input
    • Tokenizer Type: BPE
    • Token for verification prompt building: BuyableInstoreAndOnline
    • Token id for verification prompt building: 40242
  • Indicator summary:
    • Indicator for under-trained tokens: E_{out} Cosine Distance
    • Overall distribution: 0.489 +/- 0.053
  • Detected Token Counts:
    • Number of tested under-trained tokens: 999, 967 non-special, 17 below p = 0.01 threshold, 11 below soft indicator threshold
    • Number of single byte tokens: 256, of which 45 below indicator threshold
    • Number of special tokens: 0, of which 0 below indicator threshold
    • Number of non-single-byte UTF-fragment tokens: 216, of which 1 below soft indicator threshold

Under-trained token indicators plot

Indicators scatter plots

Verification plot

Verification plot

Under-trained token verification results

11 entries below threshold of 0.041

token_id token indicator max_prob in_other_tokens
30897 reportprint 0.00444567 3.9e-07 embedreportprint, cloneembedreportprint, rawdownloadcloneembedreportprint
45544 ▁サーティ 0.00456727 2.7e-07 ▁サーティワン
30212 ▁externalToEVA 0.00459385 3.3e-07 ▁externalToEVAOnly
30905 rawdownload 0.00463021 3.3e-07 rawdownloadcloneembedreportprint
39752 quickShip 0.00471389 2.4e-07 quickShipAvailable
36173 ▁RandomRedditor 0.00473255 2.7e-07 ▁RandomRedditorWithNo
42089 ▁TheNitrome 0.00477672 3.1e-07 ▁TheNitromeFan
40241 InstoreAndOnline 0.00498092 4.4e-07 BuyableInstoreAndOnline
30898 embedreportprint 0.00511497 3.1e-07 cloneembedreportprint, rawdownloadcloneembedreportprint
40240 oreAndOnline 0.00512677 3e-07 InstoreAndOnline, BuyableInstoreAndOnline
30208 ▁externalTo 0.0234748 6.7e-06 ▁externalToEVA, ▁externalToEVAOnly

Tokens with partial UTF-8 sequences

1 entries below threshold of 0.041

token_id token indicator in_other_tokens
39820 龍<0xE5><0xA5> 0.00473803 龍契士

Byte tokens

45 entries below threshold of 0.005

token_id token indicator ord hex byte_type
178 <0xF6> 0.00357658 246 0xF6 unused_utf8
183 <0xFB> 0.00366396 251 0xFB unused_utf8
185 <0xFD> 0.00367862 253 0xFD unused_utf8
180 <0xF8> 0.0036788 248 0xF8 unused_utf8
184 <0xFC> 0.00368083 252 0xFC unused_utf8
187 <0xFF> 0.00386095 255 0xFF unused_utf8
179 <0xF7> 0.00395703 247 0xF7 unused_utf8
186 <0xFE> 0.00401849 254 0xFE unused_utf8
177 <0xF5> 0.00404406 245 0xF5 unused_utf8
182 <0xFA> 0.00411302 250 0xFA unused_utf8
210 \x16 0.00416565 22 0x16 ascii
197 \t 0.00420243 9 0x09 ascii
181 <0xF9> 0.00422907 249 0xF9 unused_utf8
207 \x13 0.00434786 19 0x13 ascii
124 <0xC0> 0.00435007 192 0xC0 unused_utf8
189 \x01 0.00436115 1 0x01 ascii
192 \x04 0.00437033 4 0x04 ascii
215 \x1b 0.00446457 27 0x1B ascii
217 \x1d 0.00447732 29 0x1D ascii
188 \x00 0.00448298 0x00 ascii
25 additional entries below threshold
token_id token indicator ord hex byte_type
205 \x11 0.00454181 17 0x11 ascii
221 \x7f 0.0045619 127 0x7F ascii
196 \x08 0.00457859 8 0x08 ascii
191 \x03 0.00458455 3 0x03 ascii
211 \x17 0.00458777 23 0x17 ascii
209 \x15 0.00459915 21 0x15 ascii
218 \x1e 0.00462502 30 0x1E ascii
219 \x1f 0.00462788 31 0x1F ascii
201 \r 0.00464106 13 0x0D ascii
199 \x0b 0.00464898 11 0x0B ascii
125 <0xC1> 0.00469691 193 0xC1 unused_utf8
213 \x19 0.00470841 25 0x19 ascii
214 \x1a 0.00472432 26 0x1A ascii
216 \x1c 0.00474203 28 0x1C ascii
204 \x10 0.00481361 16 0x10 ascii
195 \x07 0.00482035 7 0x07 ascii
208 \x14 0.00483632 20 0x14 ascii
202 \x0e 0.00483972 14 0x0E ascii
200 \x0c 0.0048672 12 0x0C ascii
193 \x05 0.00487089 5 0x05 ascii
206 \x12 0.00490856 18 0x12 ascii
190 \x02 0.00498682 2 0x02 ascii
194 \x06 0.00503385 6 0x06 ascii
212 \x18 0.00508702 24 0x18 ascii
203 \x0f 0.005108 15 0x0F ascii

Special tokens

0 entries below threshold of 0.005