Skip to content

Latest commit

 

History

History
676 lines (657 loc) · 180 KB

google_gemma_7b.md

File metadata and controls

676 lines (657 loc) · 180 KB

Report for google/gemma-7b

Model info

  • Model Info:
    • Tied embeddings: True
    • LM head uses bias: False
    • Embeddings shape: [256000, 3072]
  • Tokenizer Info:
    • Vocab Size: 256000
    • Tokenizer Class: GemmaTokenizer
    • Tokenizer Type: BPE
    • Bytes handling: Byte Fallback
    • Token for verification prompt building: TouchableOpacity
    • Token id for verification prompt building: 39886
  • Indicator summary:
    • Indicator for under-trained tokens: E_{out} Cosine Distance
    • Overall distribution: 0.074 +/- 0.034
  • Detected Token Counts:
    • Number of tested under-trained tokens: 5119, 5013 non-special, 694 below p = 0.01 threshold, 353 below soft indicator threshold
    • Number of single byte tokens: 380, of which 144 below indicator threshold
    • Number of special tokens: 1, of which 1 below indicator threshold
    • Number of non-single-byte unreachable tokens: 1, of which 1 below indicator threshold

Under-trained token indicators plot

Indicators scatter plots

Verification plot

Verification plot

Under-trained token verification results

353 entries below threshold of 0.001

token_id token indicator max_prob in_other_tokens
164525 हिंदीखरीदारी 6.55651e-06 0.00042
196609 \u200cآمباردا 7.39098e-06 0.00044 ▁ویکی\u200cآمباردا
229433 ^(@)$_ 7.7486e-06 0.00043
127237 ▁coachTry 9.41753e-06 0.00043
185507 ▁queſto 1.60933e-05 0.00041
121349 ▁AcceptedLoading 2.11e-05 0.00042
134910 ſammen 2.59876e-05 0.00047 ▁zuſammen
184138 ▁zuſammen 2.83122e-05 0.00046
222309 ▁queſta 2.98619e-05 0.00042
59098 EnglishChoose 3.00407e-05 0.00045 ▁EnglishChoose
91282 ▁ſelb 3.34382e-05 0.00057 ▁ſelbſt
213138 ſſung 3.34382e-05 0.00051
143473 )$_. 3.54052e-05 0.00039
252915 \uf3f5 3.74317e-05 0.00037
227644 ▁ſeines 3.94583e-05 0.00046
225573 ▁Geiſt 4.20213e-05 0.00048
158454 ▁unſer 4.55976e-05 0.00044
255245 \uf3cc 4.56572e-05 0.00044
220218 ▁ſehen 4.95315e-05 0.00046
254350 \uf5ce 5.01871e-05 0.00052
333 additional entries below threshold
token_id token indicator max_prob in_other_tokens
45971 ▁linkCC 5.126e-05 0.0004
216622 ▁Dieſe 5.24521e-05 0.00041
230983 ▁wiſſen 5.43594e-05 0.00041
210616 ▁geweſen 5.64456e-05 0.00039
121705 ▁ſondern 5.75185e-05 0.00044
203019 ▁daſs 5.99027e-05 0.0005
123221 >\<^ 6.03199e-05 0.00045
254175 𐁘 6.14524e-05 0.00072
161080 ▁ſeyn 6.39558e-05 0.00041
254455 \ued90 6.6936e-05 0.00057
255645 \uef0e 6.86049e-05 0.00063
153473 ▁Menſchen 7.04527e-05 0.00053
143114 ▁ſeinem 7.45654e-05 0.00042
173899 ▁メンテナ 7.46846e-05 0.0005 ▁メンテナンス
253613 \U000e0041 7.59363e-05 0.00054
174176 ▁ſoll 7.689e-05 0.00045
42380 ▁stockbild 7.90358e-05 0.00053 ▁stockbilder
2873 ICTOGRAM 8.15392e-05 0.00041 ▁PICTOGRAM, PICTOGRAM
255790 \ue734 8.36253e-05 0.0004
254944 8.45194e-05 0.0005
253228 \ue275 8.4579e-05 0.00045
148617 ▁deſſen 8.63075e-05 0.00039
253377 \ue386 8.66055e-05 0.0005
252631 \uf51a 8.93474e-05 0.00046
192547 ▁erſten 9.05395e-05 0.00048
123984 ▁ſeinen 9.28044e-05 0.00051
151521 ▁müſſen 9.42349e-05 0.00046
254071 \uef5a 0.000103891 0.00061
254591 \u0e72 0.00010401 0.00045
233201 ▁Weiſe 0.000105441 0.00035
255154 0.000105619 0.0007
113990 ▁ſehr 0.000106096 0.00039
193385 iſen 0.000109136 0.00044
171300 rbrakk 0.000110984 0.00056
151848 ▁ſei 0.000113904 0.00055 ▁ſeines
128625 ▁dieſem 0.000115395 0.00046
255279 0.00011605 0.00057
195121 ▁Waſſer 0.000116706 0.00041
255011 𓇠 0.000118256 0.00091
255267 \u0e63 0.000119805 0.00037
253034 \uf7a0 0.000120521 0.00046
255795 \uec4c 0.000121891 0.00059
232866 ▁stiefe 0.000124753 0.00059
153064 ▁stockbilder 0.000126243 0.00049
159234 ſehen 0.000127852 0.00046 ▁ſehen
255647 \uf35e 0.000129342 0.00053
96098 ▁ſelbſt 0.000129521 0.00063
251499 0.000130177 0.00041
254456 \uefa6 0.000132561 0.00042
253441 \ue984 0.000135124 0.00042
255849 0.000136852 0.0011
255122 \uf540 0.000138819 0.00058
255807 𝆣 0.000141025 0.00042
136616 ▁verſch 0.000143588 0.00053
171654 lbrakk 0.000143647 0.00061
109547 ▁ſchon 0.000144184 0.00048
155980 ▁beſch 0.000144601 0.00095
123190 ſelben 0.000145137 0.00047
97619 ▁ſeiner 0.000145972 0.00042
224365 ikusbot 0.000146985 0.00064 haikusbot
255510 \ue51e 0.000149727 0.00045
254566 \ue776 0.000150383 0.00094
177069 ▁티즈 0.000151753 0.00062
254258 \ue5d0 0.00015229 0.00062
167982 ▁stockfotografie 0.000154436 0.00063
130087 ▁daysTop 0.000155449 0.00092
125919 Билгалда 0.000157297 0.0005 Билгалдахарш
225539 isGridAdvEx 0.000159562 0.00082
195351 niſſe 0.000160456 0.00045
172465 iſche 0.000162423 0.00059 ▁zwiſchen
206857 ▁tartalo 0.00016278 0.0005 ▁tartalomajánló
202616 ▁erſt 0.000164807 0.00062
252436 0.000169873 0.00087
250800 \u0ba1 0.00017488 0.00069
152266 ▁imagui 0.000178754 0.00048
254908 𖧹 0.000181615 0.00065
254885 0.000183046 0.00067
108162 久しぶ 0.000184536 0.00068 久しぶりに, 久しぶり, 久しぶりの
255379 \uf2ba 0.000187278 0.00055
253901 \ue676 0.000188291 0.00053
118456 ロウィン 0.000191033 0.00053 ハロウィン, ▁ハロウィン
135639 ▁dieſen 0.000191927 0.00044
120213 iſchen 0.000192106 0.00045 ▁zwiſchen
208306 ▁beſte 0.000198185 0.00076
75807 ▁dieſe 0.000198483 0.0004 ▁dieſer, ▁dieſes, ▁dieſem, ▁dieſen
199696 ſicht 0.000198543 0.0005
208438 ▁ſuo 0.000199556 0.00078
167630 ▁PeEn 0.000201643 0.00058 ▁PeEnEo
114402 ▁Geſch 0.000203252 0.00075
198203 ▁zwiſchen 0.000207782 0.00041
255242 \ue6f0 0.00020808 0.00076
255792 \ue762 0.000208259 0.00065
214340 ▁パンチラ 0.000217378 0.00038
254213 0.000219405 0.0019
255420 0.000222504 0.00064
253992 \ue7b5 0.000225425 0.00047
158241 vorschaubild 0.000226021 0.00053
255793 \ue777 0.000228763 0.00081
89379 ▁ſeine 0.000231385 0.00039 ▁ſeiner, ▁ſeinen, ▁ſeinem, ▁ſeines
253100 \ue791 0.000232697 0.0004
253326 0.000239789 0.0006
255271 0.000239789 0.00086
252790 0.000241637 0.00082
254565 \ue67b 0.000246167 0.00042
225065 bildtitel 0.000247598 0.00043
255948 \ue2d6 0.000247717 0.0007
251525 \ueae4 0.000249863 0.0004
253758 0.000250518 0.00095
252858 0.000252485 0.00046
181784 ▁་་ 0.000257254 0.0004
253247 0.00025773 0.00048
200906 ▁ſua 0.000258088 0.0012
25269 NdEx 0.000258386 0.00053 iNdEx, ▁iNdEx
128951 ▁laſſen 0.000258386 0.00046
167294 ▁GoogleContinue 0.000258386 0.00052
252787 \ue2cd 0.000259042 0.0007
254114 0.000265062 0.0023
88138 ſchaft 0.000269771 0.00053
255947 \ue2ca 0.000273049 0.00075
80527 ▁dieſer 0.000273585 0.00048
253927 0.000280917 0.00098
253173 \ue2f9 0.000281632 0.00072
254600 0.000283241 0.00075
254486 0.000285625 0.00062
209936 ▁展板 0.000291109 0.00043
255641 \ue290 0.000296474 0.00072
221674 ">😂 0.000299156 0.0011
252567 0.000300705 0.00094
254686 𑄮 0.000302613 0.0029
253899 \ue2bd 0.000308573 0.0012
255371 \ue2c9 0.000308692 0.00071
176775 ▁盗撮 0.000308812 0.00052
255003 \ue2e0 0.000318527 0.0008
251560 \ue978 0.000319481 0.0054
254549 0.000323772 0.00049
150747 ſcher 0.000327587 0.00063
254169 \ue2f1 0.000338376 0.00088
254259 \uf117 0.000339627 0.00043
182427 )$_, 0.000339985 0.00071
255123 𑄥 0.000342607 0.0023
252682 \uf55f 0.00034374 0.00047
253898 \ue298 0.000344336 0.0008
248337 \uf21d 0.000353098 0.00088
255791 \ue73e 0.000356615 0.00055
169039 ▁ſche 0.000357211 0.0013
249717 0.000357747 0.00089
131560 ▁desmotivaciones 0.000358701 0.00062
255517 𑄝 0.000361145 0.0031
253989 \ue2b1 0.000361383 0.00087
89812 存档备份 0.000362992 0.00057 页面存档备份
255945 \ue29a 0.000364125 0.0008
247641 ܇ 0.000366628 0.0011
255237 \ue2db 0.000370324 0.0007
220260 ▁beſti 0.000372589 0.00052
255248 𐑥 0.000376582 0.0024
253828 0.000377059 0.00046
254927 0.000377953 0.0011
253987 \ue2a4 0.000381231 0.00081
249784 ܞ 0.000383794 0.0011
72182 ▁版税 0.000385821 0.0013
253104 𓆱 0.000395 0.0012
255236 \ue2d1 0.000395238 0.0013
249361 0.00040108 0.00081
255806 𑄠 0.000403821 0.0033
253904 𑄣 0.000407517 0.0029
205674 нгред 0.000409305 0.0014 нгредіє, нгредієнти
116882 ▁geſch 0.000409842 0.00053
255124 𑄪 0.000410199 0.0043
187776 ▁Verſ 0.000416636 0.00052
207398 ▁plufieurs 0.000417054 0.0019
251632 𑄨 0.00041759 0.0054
255955 \ue6ec 0.000421107 0.00047
249663 0.0004251 0.0019
248691 0.000426531 0.00062
255376 \ueb9a 0.000429034 0.00087
253030 0.000430763 0.0014
255116 \ue2e1 0.000434518 0.0011
255814 𞤑 0.000439942 0.0009
220916 ▁vooz 0.00044179 0.00044
112171 Diwed 0.000448287 0.00048 Diwedd, Diweddar, Diweddarwch
254460 0.000451505 0.0017
253187 ݯ 0.000452101 0.0014
176309 enablog 0.000455081 0.0016 hatenablog
253926 0.000455916 0.00094
250433 0.000455976 0.0011
253103 𑄚 0.000456214 0.0046
68314 ▁例证 0.000458121 0.0016
253229 \ue2fd 0.000459373 0.00086
140439 ▁stockfotos 0.000461638 0.0011
254255 \ue293 0.000463605 0.0013
252788 \ue2d7 0.000467896 0.0013
250887 0.000470579 0.0019
255382 0.000472665 0.0012
159995 ▁剪影 0.000474334 0.0008
255705 0.000475764 0.0037
254349 \uf412 0.000479341 0.026
252594 0.000487804 0.0016
253510 \uec1d 0.000488579 0.0007
255934 0.000490963 0.00074
251778 0.000493109 0.0053
255840 0.000493765 0.0035
251670 0.000496209 0.0024
250639 \ue977 0.000497162 0.0015
253371 0.000497341 0.0017
250185 0.000498891 0.0015
252005 0.000499725 0.0019
206788 majánló 0.00050211 0.00079 ▁tartalomajánló
254482 \u0bab 0.0005036 0.00086
255956 \ue823 0.000505447 0.0031
252680 \ue2ef 0.000506103 0.00091
255380 \uf8e0 0.000507832 0.09
253027 0.000508428 0.00054
255682 0.000509262 0.0017
253841 𑄬 0.000510991 0.0041
255275 0.000512838 0.0094
254484 0.000517905 0.00099
252372 𑄢 0.000523031 0.0095
253523 \U000900b0 0.000523746 0.14
254076 𑄟 0.00052619 0.011
255233 \ue297 0.00052774 0.0011
255953 \ue65a 0.000529408 0.0025
255117 \ue2e4 0.000533402 0.00092
72920 ▁ſind 0.000534296 0.00069
244450 ܃ 0.000536621 0.00085
208229 $_? 0.000541389 0.0017
248384 0.000542283 0.0011
190189 lxtask 0.0005427 0.014
139931 Дерекк 0.000543594 0.0018 Дереккөздер
136017 ▁简谱 0.00054872 0.00072
254574 𖡻 0.000550449 0.0005
254798 \ufe67 0.000560582 0.0015
246547 0.000561237 0.0021
196059 باردا 0.000564396 0.0019 \u200cآمباردا, ▁ویکی\u200cآمباردا
247780 0.000564754 0.00065
253705 0.000570357 0.0026
252789 \uf172 0.000571489 0.00041
254833 0.000572681 0.0095
214470 ოლიო 0.000577271 0.00065 სქოლიო
64069 ディネート 0.000577867 0.00057 ▁コーディネート, コーディネート
252852 🜲 0.000579238 0.0012
253052 \u0bc4 0.000579536 0.0012
254903 \ue66e 0.000581205 0.00051
252176 \ue738 0.000582874 0.0032
255950 \ue2f0 0.000584722 0.0018
172769 征詢我 0.000584781 0.0023
188927 ▁ddelwed 0.0005849 0.0014 ▁ddelweddau
251496 \u0ba5 0.000585079 0.00078
79309 AnswerStep 0.000588417 0.004
254911 𞤶 0.000589073 0.0012
255663 \U000f023b 0.000594616 0.0024
255389 𞤼 0.000596225 0.0012
255797 \uf17d 0.000604093 0.00062
125835 харш 0.000612795 0.0021 Билгалдахарш
254901 \ue2d9 0.000616252 0.00073
252966 𝆺 0.000617087 0.00056
252626 0.000624418 0.0026
253706 0.000629127 0.0055
254075 𑄇 0.000631154 0.036
75991 ▁indígen 0.000640988 0.0014 ▁indígenas, ▁indígena
255241 \ue614 0.000643373 0.0032
255243 \ue704 0.00064677 0.00065
180346 ſſo 0.000648201 0.00067
252570 \ufff3 0.000650048 0.0014
255642 \ue2ab 0.000652373 0.0012
254496 0.000652671 0.0026
141456 isOraColElement 0.00065881 0.17
253376 \ue2a7 0.000660658 0.0015
252910 \ue2ee 0.000663877 0.00093
195112 ▁好文分享 0.000667393 0.05
65939 \<^ 0.000668466 0.0059 >\<^
252054 0.000671506 0.0021
254626 0.000678182 0.0053
248619 0.000681937 0.00052
254257 \ue2c0 0.000684977 0.0045
134830 往下閱讀 0.000685275 0.0036 請繼續往下閱讀
254270 𞤴 0.000687361 0.0019
253442 \uf141 0.000690043 0.4
255728 0.00069201 0.0013
171349 ▁FacebookSign 0.000702322 0.00047
165739 ▁$_- 0.000707328 0.00087
254573 𑄃 0.000707388 0.1
147134 Чыгана 0.000713706 0.0033 Чыганаклар
253460 \u0b8b 0.000716746 0.00082
251646 ܖ 0.00072068 0.0015
255368 \ue201 0.000721753 0.0016
248911 \ue5f1 0.000725865 0.008
255785 \ue2e8 0.000729561 0.0035
35321 ſchen 0.000731885 0.00093 iſchen, ▁Menſchen, ▁zwiſchen
253509 \ue5cf 0.000731885 0.00094
129755 ſam 0.000734329 0.0019 ſammen, ▁ſame, ▁zuſammen
254449 \ue12d 0.000735223 0.0018
254089 \u0e6c 0.000738978 0.00087
254453 \ue609 0.000743926 0.0057
255018 𞥄 0.000749648 0.0077
254256 \ue2b4 0.000752091 0.0012
253590 \ue2d8 0.000754297 0.0035
254822 0.000754476 0.0029
171222 征詢 0.000755668 0.0013 征詢我
254790 \ue2be 0.00075829 0.001
247445 0.000760376 0.021
252707 0.000765145 0.00094
255838 0.000768423 0.021
255796 \uf102 0.00076896 0.0076
97155 ))$_ 0.000771999 0.00078
245817 \U00071706 0.000773191 0.00038
253723 0.000775218 0.0032
254709 0.0007779 0.0084
139168 ▁巨乳 0.000778913 0.00085
114373 ▁témoig 0.000778973 0.0032 ▁témoignage, ▁témoignages
251214 \u3130 0.000789344 0.00088
254561 \ue305 0.000792801 0.0028
253231 \ue835 0.000796974 0.00064
252872 0.000797212 0.011
253699 ݮ 0.000802636 0.0008
254488 0.000804663 0.0015
255234 \ue2ba 0.000805557 0.00084
253787 0.000805795 0.0035
115666 ▁verſ 0.000811815 0.0018 ▁verſch
253036 𑄧 0.000814319 0.42
42993 ésultats 0.000819087 0.00072 ▁résultats, ▁Résultats, Résultats
159588 ziyaretçi 0.000820756 0.00058
248959 0.000822008 0.0019
255001 \ue184 0.000826716 0.24
255050 0.000828862 0.019
252145 \ue5d9 0.000832438 0.00085
32602 ▁ſich 0.000833631 0.00083
251959 𞥅 0.000833809 0.0037
253604 𞤳 0.000837028 0.0041
254560 \ue2d3 0.000837624 0.00092
250199 𑄴 0.00083828 0.37
254681 \uf131 0.000841856 0.059
254110 0.000847757 0.002
252909 \ue2dc 0.000849187 0.0035

Byte tokens

144 entries below threshold of 0.001

token_id token indicator ord hex byte_type reencoded
313 <0x60> 6.07967e-06 96 0x60 ascii 235376: `
248 <0x1F> 6.19888e-06 31 0x1F ascii 251698: \x1f
309 <0x5C> 6.19888e-06 92 0x5C ascii 235286: \
336 <0x77> 6.19888e-06 119 0x77 ascii 235271: w
274 <0x39> 6.25849e-06 57 0x39 ascii 235315: 9
241 <0x18> 6.31809e-06 24 0x18 ascii 250600: \x18
320 <0x67> 6.3777e-06 103 0x67 ascii 235264: g
412 <0xC3> 6.3777e-06 195 0xC3 utf8
237 <0x14> 6.55651e-06 20 0x14 ascii 250861: \x14
265 <0x30> 6.55651e-06 48 0x30 ascii 235276: 0
311 <0x5E> 6.55651e-06 94 0x5E ascii 235393: ^
293 <0x4C> 6.61612e-06 76 0x4C ascii 235301: L
299 <0x52> 6.61612e-06 82 0x52 ascii 235294: R
219 <0x02> 6.67572e-06 2 0x02 ascii 247977: \x02
278 <0x3D> 6.67572e-06 61 0x3D ascii 235293: =
307 <0x5A> 6.67572e-06 90 0x5A ascii 235382: Z
308 <0x5B> 6.67572e-06 91 0x5B ascii 235309: [
230 <0x0D> 6.73532e-06 13 0x0D ascii 235316: \r
264 <0x2F> 6.73532e-06 47 0x2F ascii 235283: /
326 <0x6D> 6.73532e-06 109 0x6D ascii 235262: m
124 additional entries below threshold
token_id token indicator ord hex byte_type reencoded
334 <0x75> 6.79493e-06 117 0x75 ascii 235261: u
409 <0xC0> 6.79493e-06 192 0xC0 unused_utf8
252 <0x23> 6.85453e-06 35 0x23 ascii 235345: #
266 <0x31> 6.85453e-06 49 0x31 ascii 235274: 1
296 <0x4F> 6.85453e-06 79 0x4F ascii 235302: O
471 <0xFE> 6.85453e-06 254 0xFE unused_utf8
316 <0x63> 6.91414e-06 99 0x63 ascii 235260: c
330 <0x71> 6.91414e-06 113 0x71 ascii 235317: q
340 <0x7B> 6.91414e-06 123 0x7B ascii 235282: {
239 <0x16> 6.97374e-06 22 0x16 ascii 254362: \x16
250 <0x21> 6.97374e-06 33 0x21 ascii 235341: !
268 <0x33> 6.97374e-06 51 0x33 ascii 235304: 3
276 <0x3B> 6.97374e-06 59 0x3B ascii 235289: ;
284 <0x43> 6.97374e-06 67 0x43 ascii 235288: C
319 <0x66> 6.97374e-06 102 0x66 ascii 235266: f
282 <0x41> 7.03335e-06 65 0x41 ascii 235280: A
333 <0x74> 7.03335e-06 116 0x74 ascii 235251: t
231 <0x0E> 7.09295e-06 14 0x0E ascii 252689: \x0e
272 <0x37> 7.09295e-06 55 0x37 ascii 235324: 7
285 <0x44> 7.09295e-06 68 0x44 ascii 235299: D
306 <0x59> 7.09295e-06 89 0x59 ascii 235342: Y
318 <0x65> 7.09295e-06 101 0x65 ascii 235249: e
238 <0x15> 7.15256e-06 21 0x15 ascii 253776: \x15
256 <0x27> 7.15256e-06 39 0x27 ascii 235303: '
262 <0x2D> 7.15256e-06 45 0x2D ascii 235290: -
267 <0x32> 7.15256e-06 50 0x32 ascii 235284: 2
315 <0x62> 7.15256e-06 98 0x62 ascii 235268: b
414 <0xC5> 7.15256e-06 197 0xC5 utf8
247 <0x1E> 7.21216e-06 30 0x1E ascii 253777: \x1e
249 <0x20> 7.21216e-06 32 0x20 ascii 235248:
289 <0x48> 7.21216e-06 72 0x48 ascii 235314: H
229 <0x0C> 7.27177e-06 12 0x0C ascii 238092: \x0c
244 <0x1B> 7.27177e-06 27 0x1B ascii 242385: \x1b
257 <0x28> 7.27177e-06 40 0x28 ascii 235278: (
277 <0x3C> 7.27177e-06 60 0x3C ascii 235322: <
324 <0x6B> 7.27177e-06 107 0x6B ascii 235273: k
411 <0xC2> 7.27177e-06 194 0xC2 utf8
228 <0x0B> 7.33137e-06 11 0x0B ascii 249154: \x0b
469 <0xFC> 7.33137e-06 252 0xFC unused_utf8
290 <0x49> 7.39098e-06 73 0x49 ascii 235285: I
304 <0x57> 7.39098e-06 87 0x57 ascii 235325: W
312 <0x5F> 7.39098e-06 95 0x5F ascii 235298: _
232 <0x0F> 7.45058e-06 15 0x0F ascii 249949: \x0f
275 <0x3A> 7.45058e-06 58 0x3A ascii 235292: :
301 <0x54> 7.45058e-06 84 0x54 ascii 235279: T
338 <0x79> 7.45058e-06 121 0x79 ascii 235267: y
233 <0x10> 7.51019e-06 16 0x10 ascii 248775: \x10
292 <0x4B> 7.51019e-06 75 0x4B ascii 235333: K
302 <0x55> 7.51019e-06 85 0x55 ascii 235327: U
468 <0xFB> 7.62939e-06 251 0xFB unused_utf8
221 <0x04> 7.689e-06 4 0x04 ascii 250124: \x04
223 <0x06> 7.689e-06 6 0x06 ascii 251368: \x06
251 <0x22> 7.689e-06 34 0x22 ascii 235281: "
263 <0x2E> 7.689e-06 46 0x2E ascii 235265: .
270 <0x35> 7.689e-06 53 0x35 ascii 235308: 5
294 <0x4D> 7.689e-06 77 0x4D ascii 235296: M
323 <0x6A> 7.689e-06 106 0x6A ascii 235312: j
222 <0x05> 7.7486e-06 5 0x05 ascii 250940: \x05
279 <0x3E> 7.7486e-06 62 0x3E ascii 235313: >
295 <0x4E> 7.7486e-06 78 0x4E ascii 235300: N
314 <0x61> 7.7486e-06 97 0x61 ascii 235250: a
225 <0x08> 7.80821e-06 8 0x08 ascii 245584: \x08
258 <0x29> 7.80821e-06 41 0x29 ascii 235275: )
310 <0x5D> 7.80821e-06 93 0x5D ascii 235307: ]
253 <0x24> 7.86781e-06 36 0x24 ascii 235323: $
298 <0x51> 7.86781e-06 81 0x51 ascii 235368: Q
331 <0x72> 7.86781e-06 114 0x72 ascii 235255: r
332 <0x73> 7.86781e-06 115 0x73 ascii 235256: s
227 <0x0A> 7.92742e-06 10 0x0A ascii 108: \n
234 <0x11> 7.92742e-06 17 0x11 ascii 253614: \x11
245 <0x1C> 7.92742e-06 28 0x1C ascii 255818: \x1c
224 <0x07> 7.98702e-06 7 0x07 ascii 249340: \x07
255 <0x26> 7.98702e-06 38 0x26 ascii 235343: &
325 <0x6C> 7.98702e-06 108 0x6C ascii 235257: l
291 <0x4A> 8.04663e-06 74 0x4A ascii 235338: J
300 <0x53> 8.04663e-06 83 0x53 ascii 235277: S
339 <0x7A> 8.04663e-06 122 0x7A ascii 235306: z
271 <0x36> 8.10623e-06 54 0x36 ascii 235318: 6
317 <0x64> 8.10623e-06 100 0x64 ascii 235258: d
342 <0x7D> 8.10623e-06 125 0x7D ascii 235270: }
466 <0xF9> 8.10623e-06 249 0xF9 unused_utf8
467 <0xFA> 8.10623e-06 250 0xFA unused_utf8
472 <0xFF> 8.10623e-06 255 0xFF unused_utf8
260 <0x2B> 8.16584e-06 43 0x2B ascii 235340: +
281 <0x40> 8.16584e-06 64 0x40 ascii 235348: @
410 <0xC1> 8.16584e-06 193 0xC1 unused_utf8
236 <0x13> 8.22544e-06 19 0x13 ascii 252752: \x13
269 <0x34> 8.22544e-06 52 0x34 ascii 235310: 4
421 <0xCC> 8.22544e-06 204 0xCC utf8
303 <0x56> 8.28505e-06 86 0x56 ascii 235330: V
273 <0x38> 8.34465e-06 56 0x38 ascii 235321: 8
283 <0x42> 8.34465e-06 66 0x42 ascii 235305: B
235 <0x12> 8.40425e-06 18 0x12 ascii 252232: \x12
280 <0x3F> 8.40425e-06 63 0x3F ascii 235336: ?
297 <0x50> 8.46386e-06 80 0x50 ascii 235295: P
305 <0x58> 8.46386e-06 88 0x58 ascii 235356: X
254 <0x25> 8.52346e-06 37 0x25 ascii 235358: %
288 <0x47> 8.58307e-06 71 0x47 ascii 235319: G
462 <0xF5> 8.64267e-06 245 0xF5 unused_utf8
464 <0xF7> 8.70228e-06 247 0xF7 unused_utf8
242 <0x19> 8.76188e-06 25 0x19 ascii 254472: \x19
259 <0x2A> 8.76188e-06 42 0x2A ascii 235287: *
465 <0xF8> 8.76188e-06 248 0xF8 unused_utf8
470 <0xFD> 8.76188e-06 253 0xFD unused_utf8
327 <0x6E> 8.82149e-06 110 0x6E ascii 235254: n
218 <0x01> 8.88109e-06 1 0x01 ascii 238213: \x01
243 <0x1A> 8.88109e-06 26 0x1A ascii 243931: \x1a
343 <0x7E> 8.88109e-06 126 0x7E ascii 235436: ~
335 <0x76> 9.05991e-06 118 0x76 ascii 235272: v
287 <0x46> 9.23872e-06 70 0x46 ascii 235311: F
220 <0x03> 9.35793e-06 3 0x03 ascii 249006: \x03
286 <0x45> 9.35793e-06 69 0x45 ascii 235291: E
322 <0x69> 9.35793e-06 105 0x69 ascii 235252: i
413 <0xC4> 9.35793e-06 196 0xC4 utf8
422 <0xCD> 9.47714e-06 205 0xCD utf8
341 <0x7C> 9.53674e-06 124 0x7C ascii 235371: |
261 <0x2C> 9.95398e-06 44 0x2C ascii 235269: ,
337 <0x78> 9.95398e-06 120 0x78 ascii 235297: x
328 <0x6F> 1.00136e-05 111 0x6F ascii 235253: o
329 <0x70> 1.03116e-05 112 0x70 ascii 235263: p
344 <0x7F> 1.03116e-05 127 0x7F ascii 244423: \x7f
246 <0x1D> 1.04308e-05 29 0x1D ascii 254363: \x1d
463 <0xF6> 1.04904e-05 246 0xF6 unused_utf8
321 <0x68> 1.12653e-05 104 0x68 ascii 235259: h

Special tokens

106 entries below threshold of 0.001

token_id token indicator max_prob
100 <unused93> 6.07967e-06 0.00043
14 <unused7> 6.13928e-06 0.00041
38 <unused31> 6.13928e-06 0.00042
87 <unused80> 6.13928e-06 0.00042
34 <unused27> 6.19888e-06 0.00045
90 <unused83> 6.31809e-06 0.00048
19 <unused12> 6.3777e-06 0.00041
30 <unused23> 6.3777e-06 0.00043
80 <unused73> 6.3777e-06 0.00043
12 <unused5> 6.4373e-06 0.00043
51 <unused44> 6.4373e-06 0.00042
18 <unused11> 6.55651e-06 0.00047
93 <unused86> 6.55651e-06 0.00045
31 <unused24> 6.61612e-06 0.00041
74 <unused67> 6.61612e-06 0.00044
55 <unused48> 6.67572e-06 0.00044
75 <unused68> 6.67572e-06 0.00049
105 <unused98> 6.67572e-06 0.00046
29 <unused22> 6.73532e-06 0.00043
11 <unused4> 6.79493e-06 0.00044
86 additional entries below threshold
token_id token indicator max_prob
33 <unused26> 6.79493e-06 0.00046
35 <unused28> 6.79493e-06 0.00045
91 <unused84> 6.79493e-06 0.00042
97 <unused90> 6.79493e-06 0.00044
56 <unused49> 6.85453e-06 0.00045
67 <unused60> 6.85453e-06 0.00043
76 <unused69> 6.85453e-06 0.00044
10 <unused3> 6.91414e-06 0.00043
48 <unused41> 6.91414e-06 0.00042
72 <unused65> 6.91414e-06 0.00046
92 <unused85> 6.91414e-06 0.00044
21 <unused14> 7.03335e-06 0.0004
69 <unused62> 7.09295e-06 0.00044
3 <unk> 7.15256e-06 0.00042
25 <unused18> 7.15256e-06 0.00044
94 <unused87> 7.15256e-06 0.00048
104 <unused97> 7.15256e-06 0.00044
15 <unused8> 7.21216e-06 0.00042
85 <unused78> 7.21216e-06 0.00044
89 <unused82> 7.21216e-06 0.00044
50 <unused43> 7.27177e-06 0.00045
99 <unused92> 7.27177e-06 0.00042
13 <unused6> 7.33137e-06 0.00044
0 <pad> 7.39098e-06 1.2e-09
6 [@BOS@] 7.39098e-06 0.00044
17 <unused10> 7.39098e-06 0.0004
22 <unused15> 7.39098e-06 0.00044
78 <unused71> 7.45058e-06 0.00042
61 <unused54> 7.51019e-06 0.00044
16 <unused9> 7.56979e-06 0.00042
43 <unused36> 7.56979e-06 0.0004
62 <unused55> 7.56979e-06 0.00041
81 <unused74> 7.56979e-06 0.00044
77 <unused70> 7.62939e-06 0.00045
42 <unused35> 7.689e-06 0.00043
49 <unused42> 7.689e-06 0.00046
86 <unused79> 7.689e-06 0.00045
98 <unused91> 7.689e-06 0.0004
101 <unused94> 7.689e-06 0.00043
23 <unused16> 7.7486e-06 0.00043
27 <unused20> 7.7486e-06 0.00041
47 <unused40> 7.7486e-06 0.00043
52 <unused45> 7.7486e-06 0.00042
9 <unused2> 7.86781e-06 0.00041
40 <unused33> 7.86781e-06 0.00043
102 <unused95> 7.86781e-06 0.00045
39 <unused32> 7.98702e-06 0.00043
44 <unused37> 7.98702e-06 0.00045
66 <unused59> 7.98702e-06 0.00047
73 <unused66> 8.04663e-06 0.00043
106 <start_of_turn> 8.04663e-06 0.00041
36 <unused29> 8.10623e-06 0.00045
58 <unused51> 8.16584e-06 0.00044
64 <unused57> 8.16584e-06 0.00044
107 <end_of_turn> 8.22544e-06 0.00042
57 <unused50> 8.28505e-06 0.00043
26 <unused19> 8.34465e-06 0.00047
82 <unused75> 8.34465e-06 0.00042
83 <unused76> 8.46386e-06 0.00044
95 <unused88> 8.52346e-06 0.00032
71 <unused64> 8.64267e-06 0.00042
28 <unused21> 8.70228e-06 0.00043
59 <unused52> 8.70228e-06 0.00041
60 <unused53> 8.70228e-06 0.00044
24 <unused17> 8.82149e-06 0.00045
103 <unused96> 8.82149e-06 0.00043
84 <unused77> 8.88109e-06 0.00042
88 <unused81> 8.88109e-06 0.00043
20 <unused13> 8.9407e-06 0.00032
37 <unused30> 8.9407e-06 0.00045
53 <unused46> 9.23872e-06 0.00043
96 <unused89> 9.35793e-06 0.00042
45 <unused38> 9.47714e-06 0.00041
79 <unused72> 9.71556e-06 0.00043
70 <unused63> 9.89437e-06 0.00042
68 <unused61> 1.00136e-05 0.0004
54 <unused47> 1.01924e-05 0.0004
32 <unused25> 1.03116e-05 0.00044
63 <unused56> 1.055e-05 0.00043
8 <unused1> 1.07884e-05 0.00047
46 <unused39> 1.0848e-05 0.00043
65 <unused58> 1.09673e-05 0.00044
5 <2mass> 1.13249e-05 0.00043
41 <unused34> 1.40667e-05 0.00041
255999 <unused99> 1.51992e-05 0.00048
7 <unused0> 6.16908e-05 0.0005

Unreachable tokens

1 entries below threshold of 0.001

token_id token indicator reencoded
158576 ▁ссср 7.15256e-06 941: ▁с, 15497: сс, 235334: р