Report for HuggingFaceH4/zephyr-7b-beta
Model Info:
Tied embeddings: False
LM head uses bias: False
Embeddings shape: [32000, 4096]
Tokenizer Info:
Vocab Size: 32000
Tokenizer Class: LlamaTokenizer
Tokenizer Type: BPE
Bytes handling: Byte Fallback
Token for verification prompt building: includegraphics
Token id for verification prompt building: 7621
Indicator summary:
Indicator for under-trained tokens: E_{in} L2 Norm
Overall distribution: 0.177 +/- 0.021
Detected Token Counts:
Number of tested under-trained tokens: 637, 529 non-special, 70 below p = 0.01 threshold, 45 below soft indicator threshold
Number of single byte tokens: 380, of which 145 below indicator threshold
Number of special tokens: 0, of which 0 below indicator threshold
Under-trained token indicators plot
Under-trained token verification results
45 entries below threshold of 0.050
token_id
token
indicator
max_prob
in_other_tokens
31738
\uefc0
0.00256505
2.6e-08
20418
▁/**\r
0.00368849
3.1e-08
26636
});\r
0.00488573
3.7e-09
26407
};\r
0.00519729
7.3e-09
26392
▁});\r
0.00557457
1.8e-08
18759
';\r
0.00600828
2.1e-08
26083
▁//\r
0.00611446
3.7e-08
9823
*/\r
0.00744269
1.6e-08
25833
>?[<
0.00774109
7.4e-08
7608
▁*/\r
0.00841445
3.7e-08
28171
]);\r
0.00898351
5.5e-08
23139
▁};\r
0.00917953
3e-08
17695
},\r
0.0093152
1.4e-08
▁},\r
15056
());\r
0.00938823
1.8e-08
12193
▁);\r
0.00941279
5.1e-08
31363
\x85
0.00975407
1.4e-09
14756
/**\r
0.010301
2.3e-08
▁/**\r
16943
');\r
0.0108607
3.1e-08
20692
▁},\r
0.0110284
6.4e-08
10278
',\r
0.0124934
5.5e-07
25 additional entries below threshold
token_id
token
indicator
max_prob
in_other_tokens
11880
";\r
0.0141034
2e-07
30929
᥀
0.0149118
2e-07
14420
];\r
0.0156988
6.5e-08
18055
){\r
0.0159617
1.4e-07
10941
));\r
0.0173721
8.9e-08
());\r
14980
">\r
0.0174355
4.2e-07
6913
");\r
0.0252151
6.3e-07
25900
iNdEx
0.0259386
0.00025
22186
')\r
0.0270944
2.4e-06
10939
",\r
0.027903
6.4e-07
26831
▁febbra
0.0298659
1.2e-05
▁febbraio
4420
();\r
0.0299867
5.1e-06
19248
NdEx
0.03231
0.00035
iNdEx
3426
▁}\r
0.0359886
4.6e-06
9962
()\r
0.0381682
0.00012
31853
⇽
0.039285
0.00056
4441
{\r
0.0398455
2.2e-06
){\r
23486
),\r
0.0402817
6.9e-06
14619
▁)\r
0.0432961
1.6e-05
17334
(\r
0.0452383
4.9e-05
15641
▁uitgen
0.0471153
3.6e-05
▁uitgenodigd
27732
'\r
0.0474714
9.2e-05
2519
}\r
0.0483518
2.9e-05
▁}\r
1969
▁{\r
0.0494827
1.9e-05
31656
≮
0.0500745
0.025
145 entries below threshold of 0.039
token_id
token
indicator
ord
hex
byte_type
reencoded
4
<0x01>
0
1
0x01
ascii
29534: \x01
5
<0x02>
0
2
0x02
ascii
30551: \x02
6
<0x03>
0
3
0x03
ascii
30662: \x03
7
<0x04>
0
4
0x04
ascii
30724: \x04
8
<0x05>
0
5
0x05
ascii
30550: \x05
9
<0x06>
0
6
0x06
ascii
30314: \x06
10
<0x07>
0
7
0x07
ascii
30963: \x07
11
<0x08>
0
8
0x08
ascii
31129: \x08
14
<0x0B>
0
11
0x0B
ascii
30638: \x0b
15
<0x0C>
0
12
0x0C
ascii
29683: \x0c
16
<0x0D>
0
13
0x0D
ascii
28801: \r
17
<0x0E>
0
14
0x0E
ascii
30517: \x0e
18
<0x0F>
0
15
0x0F
ascii
30698: \x0f
19
<0x10>
0
16
0x10
ascii
30388: \x10
20
<0x11>
0
17
0x11
ascii
30557: \x11
21
<0x12>
0
18
0x12
ascii
30298: \x12
22
<0x13>
0
19
0x13
ascii
30453: \x13
23
<0x14>
0
20
0x14
ascii
30721: \x14
24
<0x15>
0
21
0x15
ascii
30675: \x15
25
<0x16>
0
22
0x16
ascii
30935: \x16
125 additional entries below threshold
token_id
token
indicator
ord
hex
byte_type
reencoded
26
<0x17>
0
23
0x17
ascii
30841: \x17
27
<0x18>
0
24
0x18
ascii
30555: \x18
28
<0x19>
0
25
0x19
ascii
30969: \x19
29
<0x1A>
0
26
0x1A
ascii
30759: \x1a
30
<0x1B>
0
27
0x1B
ascii
30246: \x1b
31
<0x1C>
0
28
0x1C
ascii
31134: \x1c
32
<0x1D>
0
29
0x1D
ascii
31236: \x1d
33
<0x1E>
0
30
0x1E
ascii
31150: \x1e
34
<0x1F>
0
31
0x1F
ascii
31217: \x1f
35
<0x20>
0
32
0x20
ascii
28705: ▁
36
<0x21>
0
33
0x21
ascii
28808: !
37
<0x22>
0
34
0x22
ascii
28739: "
38
<0x23>
0
35
0x23
ascii
28771: #
39
<0x24>
0
36
0x24
ascii
28776: $
40
<0x25>
0
37
0x25
ascii
28823: %
41
<0x26>
0
38
0x26
ascii
28800: &
42
<0x27>
0
39
0x27
ascii
28742: '
43
<0x28>
0
40
0x28
ascii
28732: (
44
<0x29>
0
41
0x29
ascii
28731: )
45
<0x2A>
0
42
0x2A
ascii
28736: *
46
<0x2B>
0
43
0x2B
ascii
28806: +
47
<0x2C>
0
44
0x2C
ascii
28725: ,
48
<0x2D>
0
45
0x2D
ascii
28733: -
49
<0x2E>
0
46
0x2E
ascii
28723: .
50
<0x2F>
0
47
0x2F
ascii
28748: /
51
<0x30>
0
48
0x30
ascii
28734: 0
52
<0x31>
0
49
0x31
ascii
28740: 1
53
<0x32>
0
50
0x32
ascii
28750: 2
54
<0x33>
0
51
0x33
ascii
28770: 3
55
<0x34>
0
52
0x34
ascii
28781: 4
56
<0x35>
0
53
0x35
ascii
28782: 5
57
<0x36>
0
54
0x36
ascii
28784: 6
58
<0x37>
0
55
0x37
ascii
28787: 7
59
<0x38>
0
56
0x38
ascii
28783: 8
60
<0x39>
0
57
0x39
ascii
28774: 9
61
<0x3A>
0
58
0x3A
ascii
28747: :
62
<0x3B>
0
59
0x3B
ascii
28745: ;
63
<0x3C>
0
60
0x3C
ascii
28789: <
64
<0x3D>
0
61
0x3D
ascii
28746: =
65
<0x3E>
0
62
0x3E
ascii
28767: >
66
<0x3F>
0
63
0x3F
ascii
28804: ?
67
<0x40>
0
64
0x40
ascii
28818: @
68
<0x41>
0
65
0x41
ascii
28741: A
69
<0x42>
0
66
0x42
ascii
28760: B
70
<0x43>
0
67
0x43
ascii
28743: C
71
<0x44>
0
68
0x44
ascii
28757: D
72
<0x45>
0
69
0x45
ascii
28749: E
73
<0x46>
0
70
0x46
ascii
28765: F
74
<0x47>
0
71
0x47
ascii
28777: G
75
<0x48>
0
72
0x48
ascii
28769: H
76
<0x49>
0
73
0x49
ascii
28737: I
77
<0x4A>
0
74
0x4A
ascii
28798: J
78
<0x4B>
0
75
0x4B
ascii
28796: K
79
<0x4C>
0
76
0x4C
ascii
28758: L
80
<0x4D>
0
77
0x4D
ascii
28755: M
81
<0x4E>
0
78
0x4E
ascii
28759: N
82
<0x4F>
0
79
0x4F
ascii
28762: O
83
<0x50>
0
80
0x50
ascii
28753: P
84
<0x51>
0
81
0x51
ascii
28824: Q
85
<0x52>
0
82
0x52
ascii
28754: R
86
<0x53>
0
83
0x53
ascii
28735: S
87
<0x54>
0
84
0x54
ascii
28738: T
88
<0x55>
0
85
0x55
ascii
28779: U
89
<0x56>
0
86
0x56
ascii
28790: V
90
<0x57>
0
87
0x57
ascii
28780: W
91
<0x58>
0
88
0x58
ascii
28814: X
92
<0x59>
0
89
0x59
ascii
28802: Y
93
<0x5A>
0
90
0x5A
ascii
28828: Z
94
<0x5B>
0
91
0x5B
ascii
28792: [
95
<0x5C>
0
92
0x5C
ascii
28756: \
96
<0x5D>
0
93
0x5D
ascii
28793: ]
97
<0x5E>
0
94
0x5E
ascii
28815: ^
98
<0x5F>
0
95
0x5F
ascii
28730: _
99
<0x60>
0
96
0x60
ascii
28832: `
100
<0x61>
0
97
0x61
ascii
28708: a
101
<0x62>
0
98
0x62
ascii
28726: b
102
<0x63>
0
99
0x63
ascii
28717: c
103
<0x64>
0
100
0x64
ascii
28715: d
104
<0x65>
0
101
0x65
ascii
28706: e
105
<0x66>
0
102
0x66
ascii
28722: f
106
<0x67>
0
103
0x67
ascii
28721: g
107
<0x68>
0
104
0x68
ascii
28716: h
108
<0x69>
0
105
0x69
ascii
28710: i
109
<0x6A>
0
106
0x6A
ascii
28768: j
110
<0x6B>
0
107
0x6B
ascii
28729: k
111
<0x6C>
0
108
0x6C
ascii
28714: l
112
<0x6D>
0
109
0x6D
ascii
28719: m
113
<0x6E>
0
110
0x6E
ascii
28711: n
114
<0x6F>
0
111
0x6F
ascii
28709: o
115
<0x70>
0
112
0x70
ascii
28720: p
116
<0x71>
0
113
0x71
ascii
28775: q
117
<0x72>
0
114
0x72
ascii
28712: r
118
<0x73>
0
115
0x73
ascii
28713: s
119
<0x74>
0
116
0x74
ascii
28707: t
120
<0x75>
0
117
0x75
ascii
28718: u
121
<0x76>
0
118
0x76
ascii
28728: v
122
<0x77>
0
119
0x77
ascii
28727: w
123
<0x78>
0
120
0x78
ascii
28744: x
124
<0x79>
0
121
0x79
ascii
28724: y
125
<0x7A>
0
122
0x7A
ascii
28764: z
126
<0x7B>
0
123
0x7B
ascii
28751: {
127
<0x7C>
0
124
0x7C
ascii
28766: |
128
<0x7D>
0
125
0x7D
ascii
28752: }
129
<0x7E>
0
126
0x7E
ascii
28845: ~
130
<0x7F>
0
127
0x7F
ascii
30982: \x7f
195
<0xC0>
0
192
0xC0
unused_utf8
196
<0xC1>
0
193
0xC1
unused_utf8
197
<0xC2>
0
194
0xC2
utf8
198
<0xC3>
0
195
0xC3
utf8
248
<0xF5>
0
245
0xF5
unused_utf8
249
<0xF6>
0
246
0xF6
unused_utf8
250
<0xF7>
0
247
0xF7
unused_utf8
251
<0xF8>
0
248
0xF8
unused_utf8
252
<0xF9>
0
249
0xF9
unused_utf8
253
<0xFA>
0
250
0xFA
unused_utf8
254
<0xFB>
0
251
0xFB
unused_utf8
255
<0xFC>
0
252
0xFC
unused_utf8
256
<0xFD>
0
253
0xFD
unused_utf8
257
<0xFE>
0
254
0xFE
unused_utf8
258
<0xFF>
0
255
0xFF
unused_utf8
31134
\x1c
0.0109612
28
0x1C
ascii
31150
\x1e
0.0117701
30
0x1E
ascii
31236
\x1d
0.0138711
29
0x1D
ascii
29683
\x0c
0.0178993
12
0x0C
ascii
30638
\x0b
0.0250177
11
0x0B
ascii
1 entries below threshold of 0.039
token_id
token
indicator
max_prob
0
<unk>
0.000668355
1.1e-08