Report for mistralai/Mixtral-8x7B-v0.1
Model Info:
Tied embeddings: False
LM head uses bias: False
Embeddings shape: [32000, 4096]
Tokenizer Info:
Vocab Size: 32000
Tokenizer Class: LlamaTokenizer
Tokenizer Type: BPE
Bytes handling: Byte Fallback
Token for verification prompt building: includegraphics
Token id for verification prompt building: 7621
Indicator summary:
Indicator for under-trained tokens: E_{in} L2 Norm
Overall distribution: 0.709 +/- 0.080
Detected Token Counts:
Number of tested under-trained tokens: 637, 542 non-special, 42 below p = 0.01 threshold, 23 below soft indicator threshold
Number of single byte tokens: 380, of which 143 below indicator threshold
Number of special tokens: 0, of which 0 below indicator threshold
Under-trained token indicators plot
Under-trained token verification results
23 entries below threshold of 0.065
token_id
token
indicator
max_prob
in_other_tokens
31738
\uefc0
0.00952411
1.2e-05
20418
▁/**\r
0.0137132
0.0011
26083
▁//\r
0.0149266
0.0032
26636
});\r
0.0164495
0.0025
26392
▁});\r
0.0213868
0.0028
9823
*/\r
0.0227401
0.0022
26407
};\r
0.0242334
0.0011
28171
]);\r
0.0242576
0.0016
23139
▁};\r
0.026813
0.001
7608
▁*/\r
0.0271379
0.0021
15056
());\r
0.0323113
0.0011
12193
▁);\r
0.0341091
0.0016
20692
▁},\r
0.0352363
0.0015
18759
';\r
0.0359167
0.0013
16943
');\r
0.037474
0.0026
17695
},\r
0.0383723
0.0025
▁},\r
14756
/**\r
0.0385622
0.0013
▁/**\r
10278
',\r
0.0440239
0.0014
11880
";\r
0.0494156
0.0012
14420
];\r
0.0494746
0.00095
3 additional entries below threshold
token_id
token
indicator
max_prob
in_other_tokens
30929
᥀
0.0521718
0.013
10941
));\r
0.0574462
0.0014
());\r
25833
>?[<
0.0619776
0.27
143 entries below threshold of 0.062
token_id
token
indicator
ord
hex
byte_type
reencoded
4
<0x01>
0
1
0x01
ascii
29534: \x01
5
<0x02>
0
2
0x02
ascii
30551: \x02
6
<0x03>
0
3
0x03
ascii
30662: \x03
7
<0x04>
0
4
0x04
ascii
30724: \x04
8
<0x05>
0
5
0x05
ascii
30550: \x05
9
<0x06>
0
6
0x06
ascii
30314: \x06
10
<0x07>
0
7
0x07
ascii
30963: \x07
11
<0x08>
0
8
0x08
ascii
31129: \x08
14
<0x0B>
0
11
0x0B
ascii
30638: \x0b
15
<0x0C>
0
12
0x0C
ascii
29683: \x0c
16
<0x0D>
0
13
0x0D
ascii
28801: \r
17
<0x0E>
0
14
0x0E
ascii
30517: \x0e
18
<0x0F>
0
15
0x0F
ascii
30698: \x0f
19
<0x10>
0
16
0x10
ascii
30388: \x10
20
<0x11>
0
17
0x11
ascii
30557: \x11
21
<0x12>
0
18
0x12
ascii
30298: \x12
22
<0x13>
0
19
0x13
ascii
30453: \x13
23
<0x14>
0
20
0x14
ascii
30721: \x14
24
<0x15>
0
21
0x15
ascii
30675: \x15
25
<0x16>
0
22
0x16
ascii
30935: \x16
123 additional entries below threshold
token_id
token
indicator
ord
hex
byte_type
reencoded
26
<0x17>
0
23
0x17
ascii
30841: \x17
27
<0x18>
0
24
0x18
ascii
30555: \x18
28
<0x19>
0
25
0x19
ascii
30969: \x19
29
<0x1A>
0
26
0x1A
ascii
30759: \x1a
30
<0x1B>
0
27
0x1B
ascii
30246: \x1b
31
<0x1C>
0
28
0x1C
ascii
31134: \x1c
32
<0x1D>
0
29
0x1D
ascii
31236: \x1d
33
<0x1E>
0
30
0x1E
ascii
31150: \x1e
34
<0x1F>
0
31
0x1F
ascii
31217: \x1f
35
<0x20>
0
32
0x20
ascii
28705: ▁
36
<0x21>
0
33
0x21
ascii
28808: !
37
<0x22>
0
34
0x22
ascii
28739: "
38
<0x23>
0
35
0x23
ascii
28771: #
39
<0x24>
0
36
0x24
ascii
28776: $
40
<0x25>
0
37
0x25
ascii
28823: %
41
<0x26>
0
38
0x26
ascii
28800: &
42
<0x27>
0
39
0x27
ascii
28742: '
43
<0x28>
0
40
0x28
ascii
28732: (
44
<0x29>
0
41
0x29
ascii
28731: )
45
<0x2A>
0
42
0x2A
ascii
28736: *
46
<0x2B>
0
43
0x2B
ascii
28806: +
47
<0x2C>
0
44
0x2C
ascii
28725: ,
48
<0x2D>
0
45
0x2D
ascii
28733: -
49
<0x2E>
0
46
0x2E
ascii
28723: .
50
<0x2F>
0
47
0x2F
ascii
28748: /
51
<0x30>
0
48
0x30
ascii
28734: 0
52
<0x31>
0
49
0x31
ascii
28740: 1
53
<0x32>
0
50
0x32
ascii
28750: 2
54
<0x33>
0
51
0x33
ascii
28770: 3
55
<0x34>
0
52
0x34
ascii
28781: 4
56
<0x35>
0
53
0x35
ascii
28782: 5
57
<0x36>
0
54
0x36
ascii
28784: 6
58
<0x37>
0
55
0x37
ascii
28787: 7
59
<0x38>
0
56
0x38
ascii
28783: 8
60
<0x39>
0
57
0x39
ascii
28774: 9
61
<0x3A>
0
58
0x3A
ascii
28747: :
62
<0x3B>
0
59
0x3B
ascii
28745: ;
63
<0x3C>
0
60
0x3C
ascii
28789: <
64
<0x3D>
0
61
0x3D
ascii
28746: =
65
<0x3E>
0
62
0x3E
ascii
28767: >
66
<0x3F>
0
63
0x3F
ascii
28804: ?
67
<0x40>
0
64
0x40
ascii
28818: @
68
<0x41>
0
65
0x41
ascii
28741: A
69
<0x42>
0
66
0x42
ascii
28760: B
70
<0x43>
0
67
0x43
ascii
28743: C
71
<0x44>
0
68
0x44
ascii
28757: D
72
<0x45>
0
69
0x45
ascii
28749: E
73
<0x46>
0
70
0x46
ascii
28765: F
74
<0x47>
0
71
0x47
ascii
28777: G
75
<0x48>
0
72
0x48
ascii
28769: H
76
<0x49>
0
73
0x49
ascii
28737: I
77
<0x4A>
0
74
0x4A
ascii
28798: J
78
<0x4B>
0
75
0x4B
ascii
28796: K
79
<0x4C>
0
76
0x4C
ascii
28758: L
80
<0x4D>
0
77
0x4D
ascii
28755: M
81
<0x4E>
0
78
0x4E
ascii
28759: N
82
<0x4F>
0
79
0x4F
ascii
28762: O
83
<0x50>
0
80
0x50
ascii
28753: P
84
<0x51>
0
81
0x51
ascii
28824: Q
85
<0x52>
0
82
0x52
ascii
28754: R
86
<0x53>
0
83
0x53
ascii
28735: S
87
<0x54>
0
84
0x54
ascii
28738: T
88
<0x55>
0
85
0x55
ascii
28779: U
89
<0x56>
0
86
0x56
ascii
28790: V
90
<0x57>
0
87
0x57
ascii
28780: W
91
<0x58>
0
88
0x58
ascii
28814: X
92
<0x59>
0
89
0x59
ascii
28802: Y
93
<0x5A>
0
90
0x5A
ascii
28828: Z
94
<0x5B>
0
91
0x5B
ascii
28792: [
95
<0x5C>
0
92
0x5C
ascii
28756: \
96
<0x5D>
0
93
0x5D
ascii
28793: ]
97
<0x5E>
0
94
0x5E
ascii
28815: ^
98
<0x5F>
0
95
0x5F
ascii
28730: _
99
<0x60>
0
96
0x60
ascii
28832: `
100
<0x61>
0
97
0x61
ascii
28708: a
101
<0x62>
0
98
0x62
ascii
28726: b
102
<0x63>
0
99
0x63
ascii
28717: c
103
<0x64>
0
100
0x64
ascii
28715: d
104
<0x65>
0
101
0x65
ascii
28706: e
105
<0x66>
0
102
0x66
ascii
28722: f
106
<0x67>
0
103
0x67
ascii
28721: g
107
<0x68>
0
104
0x68
ascii
28716: h
108
<0x69>
0
105
0x69
ascii
28710: i
109
<0x6A>
0
106
0x6A
ascii
28768: j
110
<0x6B>
0
107
0x6B
ascii
28729: k
111
<0x6C>
0
108
0x6C
ascii
28714: l
112
<0x6D>
0
109
0x6D
ascii
28719: m
113
<0x6E>
0
110
0x6E
ascii
28711: n
114
<0x6F>
0
111
0x6F
ascii
28709: o
115
<0x70>
0
112
0x70
ascii
28720: p
116
<0x71>
0
113
0x71
ascii
28775: q
117
<0x72>
0
114
0x72
ascii
28712: r
118
<0x73>
0
115
0x73
ascii
28713: s
119
<0x74>
0
116
0x74
ascii
28707: t
120
<0x75>
0
117
0x75
ascii
28718: u
121
<0x76>
0
118
0x76
ascii
28728: v
122
<0x77>
0
119
0x77
ascii
28727: w
123
<0x78>
0
120
0x78
ascii
28744: x
124
<0x79>
0
121
0x79
ascii
28724: y
125
<0x7A>
0
122
0x7A
ascii
28764: z
126
<0x7B>
0
123
0x7B
ascii
28751: {
127
<0x7C>
0
124
0x7C
ascii
28766: |
128
<0x7D>
0
125
0x7D
ascii
28752: }
129
<0x7E>
0
126
0x7E
ascii
28845: ~
130
<0x7F>
0
127
0x7F
ascii
30982: \x7f
195
<0xC0>
0
192
0xC0
unused_utf8
196
<0xC1>
0
193
0xC1
unused_utf8
197
<0xC2>
0
194
0xC2
utf8
198
<0xC3>
0
195
0xC3
utf8
248
<0xF5>
0
245
0xF5
unused_utf8
249
<0xF6>
0
246
0xF6
unused_utf8
250
<0xF7>
0
247
0xF7
unused_utf8
251
<0xF8>
0
248
0xF8
unused_utf8
252
<0xF9>
0
249
0xF9
unused_utf8
253
<0xFA>
0
250
0xFA
unused_utf8
254
<0xFB>
0
251
0xFB
unused_utf8
255
<0xFC>
0
252
0xFC
unused_utf8
256
<0xFD>
0
253
0xFD
unused_utf8
257
<0xFE>
0
254
0xFE
unused_utf8
258
<0xFF>
0
255
0xFF
unused_utf8
31150
\x1e
0.0481115
30
0x1E
ascii
31134
\x1c
0.0546376
28
0x1C
ascii
31236
\x1d
0.054877
29
0x1D
ascii
2 entries below threshold of 0.062
token_id
token
indicator
max_prob
0
<unk>
0
1.5e-05
2
</s>
0.00266472
0.082