Skip to content

Latest commit

 

History

History
983 lines (968 loc) · 308 KB

HuggingFaceH4_zephyr_7b_beta.md

File metadata and controls

983 lines (968 loc) · 308 KB

Report for HuggingFaceH4/zephyr-7b-beta

Model info

  • Model Info:
    • Tied embeddings: False
    • LM head uses bias: False
    • Embeddings shape: [32000, 4096]
  • Tokenizer Info:
    • Vocab Size: 32000
    • Tokenizer Class: LlamaTokenizer
    • Tokenizer Type: BPE
    • Bytes handling: Byte Fallback
    • Token for verification prompt building: includegraphics
    • Token id for verification prompt building: 7621
  • Indicator summary:
    • Indicator for under-trained tokens: E_{in} L2 Norm
    • Overall distribution: 0.177 +/- 0.021
  • Detected Token Counts:
    • Number of tested under-trained tokens: 637, 529 non-special, 70 below p = 0.01 threshold, 45 below soft indicator threshold
    • Number of single byte tokens: 380, of which 145 below indicator threshold
    • Number of special tokens: 0, of which 0 below indicator threshold

Under-trained token indicators plot

Indicators scatter plots

Verification plot

Verification plot

Under-trained token verification results

45 entries below threshold of 0.050

token_id token indicator max_prob in_other_tokens
31738 \uefc0 0.00256505 2.6e-08
20418 ▁/**\r 0.00368849 3.1e-08
26636 });\r 0.00488573 3.7e-09
26407 };\r 0.00519729 7.3e-09
26392 ▁});\r 0.00557457 1.8e-08
18759 ';\r 0.00600828 2.1e-08
26083 ▁//\r 0.00611446 3.7e-08
9823 */\r 0.00744269 1.6e-08
25833 >?[< 0.00774109 7.4e-08
7608 ▁*/\r 0.00841445 3.7e-08
28171 ]);\r 0.00898351 5.5e-08
23139 ▁};\r 0.00917953 3e-08
17695 },\r 0.0093152 1.4e-08 ▁},\r
15056 ());\r 0.00938823 1.8e-08
12193 ▁);\r 0.00941279 5.1e-08
31363 \x85 0.00975407 1.4e-09
14756 /**\r 0.010301 2.3e-08 ▁/**\r
16943 ');\r 0.0108607 3.1e-08
20692 ▁},\r 0.0110284 6.4e-08
10278 ',\r 0.0124934 5.5e-07
25 additional entries below threshold
token_id token indicator max_prob in_other_tokens
11880 ";\r 0.0141034 2e-07
30929 0.0149118 2e-07
14420 ];\r 0.0156988 6.5e-08
18055 ){\r 0.0159617 1.4e-07
10941 ));\r 0.0173721 8.9e-08 ());\r
14980 ">\r 0.0174355 4.2e-07
6913 ");\r 0.0252151 6.3e-07
25900 iNdEx 0.0259386 0.00025
22186 ')\r 0.0270944 2.4e-06
10939 ",\r 0.027903 6.4e-07
26831 ▁febbra 0.0298659 1.2e-05 ▁febbraio
4420 ();\r 0.0299867 5.1e-06
19248 NdEx 0.03231 0.00035 iNdEx
3426 ▁}\r 0.0359886 4.6e-06
9962 ()\r 0.0381682 0.00012
31853 0.039285 0.00056
4441 {\r 0.0398455 2.2e-06 ){\r
23486 ),\r 0.0402817 6.9e-06
14619 ▁)\r 0.0432961 1.6e-05
17334 (\r 0.0452383 4.9e-05
15641 ▁uitgen 0.0471153 3.6e-05 ▁uitgenodigd
27732 '\r 0.0474714 9.2e-05
2519 }\r 0.0483518 2.9e-05 ▁}\r
1969 ▁{\r 0.0494827 1.9e-05
31656 0.0500745 0.025
484 additional entries above threshold
token_id token indicator max_prob in_other_tokens
16949 ")\r 0.0504424 0.00017
1761 );\r 0.0505434 0.00026 ();\r, ");\r, ));\r, ▁);\r, ());\r, ...
31645 0.0514121 0.01
30413 0.0525205 0.045
27456 :%.*]] 0.0542301 0.0097
14668 ))\r 0.058685 2.4e-06
16724 tagHelper 0.0603997 0.89
16772 :%.* 0.0612881 0.065 :%.*]]
15880 >:]< 0.063233 0.022
30813 0.0658322 0.046
31932 ҽ 0.0676692 0.0089
7941 ICENSE 0.0710506 8.4e-05 LICENSE, ▁LICENSE
27265 ▁SDValue 0.0715061 0.04
10762 qpoint 0.0726682 0.99 pgfqpoint
15500 itempty 0.0748776 0.092 omitempty
31179 0.0759263 0.9
272 ▁the 0.0766494 1 ▁they, ▁their, ▁them, ▁there, ▁then, ...
31733 0.0783485 0.016
31841 0.0809174 0.57
17779 ▁gepublice 0.0811037 0.0043 ▁gepubliceerd
31922 0.0815264 0.00021
15630 odigd 0.0836839 0.0022 ▁uitgenodigd
30897 0.0837612 0.42
3685 >\r 0.0848595 0.0011 ">\r
14052 ▁Jahrhund 0.0849249 0.0003 ▁Jahrhundert, ▁Jahrhunderts
18766 ]\r 0.0871711 0.00024
31895 0.0883292 0.89
1271 ;\r 0.088446 0.015 );\r, ();\r, ");\r, ));\r, ";\r, ...
11167 ityEngine 0.0890044 0.082 ▁UnityEngine, UnityEngine
288 ing 0.0895147 1 ring, ings, tring, ning, ating, ...
31469 ӏ 0.0917858 0.12
302 ▁of 0.0920416 1 ▁off, ▁offer, ▁often, ▁offic, ▁office, ...
31172 0.0922086 0.52
31443 0.0924198 0.0012
264 ▁a 0.092889 1 ▁and, ▁al, ▁as, ▁an, ▁at, ...
30867 🟠 0.0935719 0.73
11525 "\r 0.0940426 0.00052
286 ed 0.0948464 1 ated, ied, hed, red, ▁need, ...
298 ▁to 0.0966393 1 ▁too, ▁top, ▁took, ▁tot, ▁told, ...
30983 ڕ 0.0967776 0.28
31317 0.0968221 0.022
274 es 0.0972934 1 est, ess, res, ies, ▁res, ...
29934 0.0977994 0.15
30770 🟡 0.0990421 0.92
263 er 0.100088 1 ver, ter, ere, ers, ser, ...
28593 pgfscope 0.100196 0.55
28705 0.100198 1
404 ers 0.101383 1 vers, erson, ▁person, ters, ivers, ...
31731 Ӏ 0.102405 0.62
12683 pgfpathlineto 0.102959 0.15
24713 vscale 0.103341 1
269 en 0.104398 1 ent, end, ment, ▁en, hen, ...
352 ation 0.104696 0.99 ations, ational, lation, formation, translation, ...
31901 0.10489 0.67
31636 0.105043 0.91
10765 pgfqpoint 0.105359 0.011
31933 0.10538 0.76
282 al 0.105493 1 ▁al, all, ial, ▁all, ally, ...
262 in 0.106046 1 ing, ▁in, ain, ine, int, ...
266 on 0.10728 1 ion, ation, ▁on, ▁con, ction, ...
725 ER 0.107431 1 ERR, VER, ERT, ERROR, TER, ...
2043 ING 0.107661 1 STRING, TING, CLUDING, WARNING, SETTING, ...
395 ▁with 0.107818 1 ▁without, ▁within, ▁withdraw, ▁withd, ▁withdrawal
297 ▁in 0.107951 1 ▁int, ▁into, ▁inter, ▁inst, ▁incl, ...
304 ▁and 0.107989 1 ▁android, ▁andere, ▁anderen, ▁ander, ▁andra, ...
23270 ByComparator 0.108252 0.022
26939 ▁invån 0.10833 3.2e-05 ▁invånare
276 an 0.108559 1 ▁and, and, ▁an, ant, ans, ...
497 ies 0.108568 1 ities, ries, ories, perties, ▁series, ...
354 ▁for 0.109391 1 ▁form, ▁fore, ▁forward, ▁force, ▁former, ...
20411 ][< 0.109658 0.93
415 ▁The 0.110072 1 ▁They, ▁There, ▁Then, ▁These, ▁Their, ...
697 ations 0.110262 0.52 ▁relations, ▁relationship, ulations, ▁operations, ifications, ...
385 os 0.111199 1 ost, ose, ▁pos, pos, ▁most, ...
380 ate 0.111248 1 ated, ater, ates, rivate, date, ...
278 is 0.111293 1 ▁is, ist, ▁this, ▁his, ▁dis, ...
31394 0.111319 0.015
356 ▁on 0.111635 1 ▁one, ▁only, ▁once, ▁online, ▁ones, ...
29091 0.111665 0.97
30690 ێ 0.11186 0.13
31264 0.111928 0.97
349 ▁is 0.112533 1 ▁iss, ▁ist, ▁isn, ▁issue, ▁issues, ...
11370 pgfpath 0.112772 0.013 pgfpathlineto
271 or 0.112907 1 ▁for, ort, ore, ▁or, port, ...
325 ▁( 0.113048 1 ▁(!, ▁(*, ▁((, ▁(), ▁($, ...
301 el 0.113273 1 ell, elf, ▁el, ely, iel, ...
369 ▁that 0.1136 1 ▁thats
2255 ES 0.113649 1 EST, CESS, TIES, RES, ▁WARRANTIES, ...
31692 0.113698 0.27
21876 imeq 0.113818 0.82 simeq
294 ic 0.113856 1 ice, ich, lic, ublic, ick, ...
291 le 0.114177 1 ▁le, able, ile, ple, lect, ...
267 re 0.114348 1 ▁re, ere, res, ore, ▁are, ...
299 et 0.114472 1 get, ▁return, ▁get, set, eth, ...
30660 0.114554 0.91
270 at 0.114852 1 ation, ▁that, ate, ▁at, ath, ...
390 ▁as 0.11506 1 ▁ass, ▁ask, ▁assert, ▁asked, ▁associ, ...
283 ar 0.115118 1 art, ▁are, ard, are, ▁ar, ...
31734 0.115191 0.87
477 ▁from 0.115242 1
293 as 0.115466 1 ▁as, ▁was, ass, ast, ase, ...
381 us 0.116275 1 ust, ▁us, ous, ▁just, ause, ...
346 ly 0.116278 1 ally, ely, ▁only, ily, ually, ...
601 ated 0.116617 1 ▁created, ▁related, dated, ▁associated, inated, ...
515 ia 0.116655 1 ian, ially, ential, aterial, iam, ...
31441 0.116867 0.019
460 ▁are 0.116934 1 ▁area, ▁areas, ▁aren, ▁arena
1020 EN 0.117002 1 ENT, END, MENT, ENSE, ICENSE, ...
31956 0.117007 0.027
31238 0.117148 0.84
1014 The 0.11719 1 ▁They, ▁There, ▁Then, ▁These, There, ...
31849 0.117227 0.84
330 ▁A 0.11729 1 ▁Al, ▁Ar, ▁And, ▁An, ▁As, ...
21399 TagHelpers 0.117675 0.55
279 it 0.117736 1 ith, ▁it, ▁with, ity, ite, ...
31826 0.117954 0.53
31949 0.117985 0.99
495 ive 0.11816 1 ative, ivers, ives, ived, iver, ...
396 ▁an 0.118242 1 ▁any, ▁another, ▁ann, ▁anything, ▁ant, ...
1086 AL 0.11857 1 VAL, ALL, INVAL, ALSE, VALUE, ...
31648 0.118586 0.84
734 ors 0.118613 1 ators, ctors, ▁worse, ▁errors, ▁horse, ...
472 ity 0.118711 1 ility, ality, ability, ivity, ▁University, ...
557 ), 0.11881 1 (),, "),, '),, ▁),, }),, ...
360 ter 0.118966 1 ▁inter, ater, fter, tern, ▁after, ...
1251 AN 0.119007 1 ▁AN, RAN, AND, ▁AND, ▁ANY, ...
486 ▁by 0.119115 1 ▁byte, ▁bytes, ▁byl, ▁był, ▁byla
31803 0.119231 0.92
30654 0.119245 0.14
609 ). 0.119257 1 ()., ")., ')., })., ))., ...
832 ON 0.119453 1 ION, CON, ▁CON, SON, ATION, ...
522 able 0.119485 1 ailable, ▁able, ▁table, ables, table, ...
1077 ating 0.119774 1 ▁creating, ▁dating, ▁eating, ▁operating, inating, ...
440 ant 0.119843 1 ▁want, ants, ante, ▁important, ▁wanted, ...
1002 ates 0.11994 1 ▁States, ▁states, ▁latest, ▁rates, dates, ...
28786 т 0.120069 1
31412 0.120166 0.96
1906 ED 0.120214 1 RED, ATED, LED, ▁ED, DED, ...
322 ot 0.120354 1 ▁not, ▁other, oth, other, ▁got, ...
867 les 0.120462 1 less, ▁les, ▁less, ales, ules, ...
742 ings 0.120549 1 ▁things, tings, Settings, settings, ▁settings, ...
313 id 0.120568 1 ide, ▁said, oid, ▁did, ▁void, ...
15320 ▁/***/ 0.120583 0.74
466 ment 0.120621 1 lement, ement, ument, ments, ament, ...
314 am 0.120719 1 ame, aram, ▁am, name, Name, ...
296 ion 0.120724 1 ation, ction, ions, ition, ations, ...
896 RE 0.12073 1 ▁RE, REG, URE, PRE, ARE, ...
424 te 0.120775 1 ite, ated, ▁te, text, ▁inter, ...
594 ions 0.120776 1 ations, ctions, ptions, itions, ▁options, ...
628 ary 0.120852 1 mary, inary, summary, uary, ibrary, ...
31941 0.120938 0.76
31032 0.120947 0.95
438 ▁at 0.121075 1 ▁att, ▁attack, ▁attempt, ▁attention, ▁attribute, ...
13130 ▁aapt 0.12109 0.98
10291 ERS 0.121476 1
30890 0.12152 0.0075
31837 0.121539 0.0019
532 to 0.121613 1 ▁into, ator, ton, ▁too, ustom, ...
345 ▁" 0.121683 1 ▁""", ▁"\, ▁"<, ▁"/, ▁"", ...
412 ie 0.12171 1 ies, ient, ier, iel, ied, ...
473 ine 0.121834 1 line, ines, ined, ▁line, iness, ...
465 age 0.121876 1 essage, ages, message, ager, aged, ...
745 ical 0.121946 1 ically, ▁political, ological, ▁physical, ▁medical, ...
31707 0.121961 0.66
31396 0.122052 0.049
1180 LE 0.122079 1 ABLE, FILE, ULE, LECT, LEN, ...
403 ▁was 0.122094 1 ▁wasn, ▁waste, ▁wash, ▁washing, ▁washed, ...
414 ▁\ 0.122144 1 ▁\\, ▁\,, ▁\], ▁\[, ▁\", ...
318 ▁S 0.122145 1 ▁St, ▁She, ▁Se, ▁Sh, ▁So, ...
31015 0.122157 0.038
30832 🟢 0.122274 0.98
31966 0.122441 0.27
31798 0.122489 0.21
31802 0.122526 0.45
31100 0.122563 0.99
31741 0.123025 0.24
973 als 0.12304 1 alse, ▁false, ▁als, false, Equals, ...
315 ▁I 0.123056 1 ▁In, ▁It, ▁If, ▁Is, ▁Ind, ...
308 ent 0.123288 1 ment, ient, ents, ▁ent, lement, ...
338 ch 0.123457 1 ▁ch, ich, ach, che, ▁which, ...
482 ure 0.123466 1 ures, ature, ▁sure, ured, atures, ...
378 ▁it 0.123532 1 ▁its, ▁item, ▁itself, ▁items, ▁iter, ...
1532 ters 0.123563 1 eters, ▁parameters, acters, ▁characters, Parameters, ...
31083 0.123634 0.91
324 ur 0.123685 1 our, urn, ure, turn, ▁your, ...
31789 0.123699 0.55
10530 ▁franç 0.123749 0.84 ▁français, ▁française
31737 0.123787 0.77
31904 0.12384 0.27
309 il 0.123907 1 ill, ile, ail, ▁will, ild, ...
31026 0.124007 0.0021
1339 ments 0.124042 1 uments, ements, ▁elements, ▁arguments, ▁comments, ...
31903 Ս 0.124043 0.83
31527 0.124076 0.85
1053 ons 0.124167 1 ▁cons, ctions, ponse, ptions, ▁consider, ...
1126 ins 0.12417 1 ▁inst, ▁ins, ains, ▁against, ▁instance, ...
4033 ▁OF 0.124242 1 ▁OFF
316 ad 0.124438 1 ▁had, ▁ad, ade, read, ▁add, ...
30762 ಿ 0.124449 0.0039
4866 ATION 0.124497 1 ICATION
1074 ts 0.124502 1 ments, ats, ets, ants, ists, ...
374 est 0.124602 1 ▁est, ▁test, ▁best, test, ▁quest, ...
1468 ets 0.124711 1 ▁gets, ▁sets, sets, lets, ▁streets, ...
311 ro 0.12479 1 ▁pro, rom, ▁from, rou, row, ...
31543 0.124794 0.99
31252 ۆ 0.124804 0.44
303 st 0.12488 1 ▁st, est, ist, ust, ost, ...
26570 AtA 0.125006 0.99
612 на 0.125074 1 ▁на, она, ная, зна, ▁насе, ...
331 se 0.125173 1 ▁se, ser, ase, ose, set, ...
362 th 0.125231 1 ▁that, ith, ▁with, ▁this, ath, ...
28809 0.125253 1
321 im 0.125273 1 ▁im, ime, ▁him, ▁import, ▁time, ...
30964 0.125343 0.95
13667 *\r 0.125427 0.02 /**\r, ▁/**\r
368 ▁you 0.125493 1 ▁your, ▁young, ▁yourself, ▁youth, ▁younger, ...
339 ay 0.12558 1 ays, ray, ▁may, ▁way, way, ...
31143 0.125584 0.02
31726 0.125637 0.059
28788 с 0.12568 1
506 ▁have 0.125938 1 ▁haven, ▁havet
383 um 0.125942 1 umber, ument, ▁number, umn, sum, ...
411 res 0.125946 1 ▁res, ress, ▁result, ures, ▁pres, ...
1063 ics 0.125961 1 istics, graphics, rics, includegraphics, ▁politics, ...
981 ▁“ 0.125996 1
775 IN 0.12601 1 ING, ▁IN, INT, INE, IND, ...
31066 0.126054 0.64
653 ize 0.126085 1 ized, size, ▁size, Size, izer, ...
1238 ures 0.126095 1 atures, ▁features, ▁pictures, ▁figures, ▁measures, ...
31863 Մ 0.12611 0.99
659 ▁has 0.1262 1 ▁hash, ▁hasta, ▁hasn, ▁hast, ▁hass
31702 ʐ 0.1262 0.9
16613 CLUD 0.126252 0.87 CLUDING, ▁INCLUDING, INCLUDING
1017 OR 0.126272 1 ORT, ▁OR, ERROR, PORT, ORD, ...
31775 0.126335 0.67
1332 ized 0.126356 1 ▁realized, ialized, ▁recognized, ▁organized, sized, ...
582 ▁up 0.126374 1 ▁upon, ▁update, ▁upper, ▁updated, ▁updates, ...
333 ve 0.1264 1 ver, ave, ive, ▁have, very, ...
351 ▁M 0.126414 1 ▁Mar, ▁My, ▁Man, ▁May, ▁Me, ...
391 and 0.126418 1 ▁hand, land, stand, ▁stand, ands, ...
400 ▁he 0.126425 1 ▁her, ▁hel, ▁here, ▁help, ▁head, ...
491 ak 0.126432 1 ake, ▁make, reak, aking, ▁take, ...
31946 0.126486 0.031
596 ens 0.126511 1 ense, icense, ▁License, ension, ▁sense, ...
418 ▁N 0.126514 1 ▁New, ▁No, ▁NULL, ▁Not, ▁Now, ...
1046 its 0.126522 1 ▁itself, ▁benefits, bits, ▁units, ▁bits, ...
12251 ября 0.126631 0.012 ▁сентября, ▁октября, ▁ноября
578 ally 0.126671 1 ually, ▁really, ially, ically, ▁actually, ...
399 ▁R 0.126705 1 ▁Re, ▁Res, ▁Reg, ▁Rep, ▁Rec, ...
3864 izing 0.126708 1 ▁realizing, ▁utilizing
31196 0.126769 0.96
488 ard 0.12677 1 ward, ▁hard, ards, wards, ▁heard, ...
715 up 0.126771 1 roup, ▁sup, ▁support, ▁group, ▁super, ...
28799 д 0.126822 1
617 ance 0.126833 1 ances, stance, ▁instance, anced, Instance, ...
846 ys 0.126855 1 ystem, ways, ▁system, ▁always, ▁System, ...
300 om 0.126915 1 ▁com, rom, ▁from, ome, ▁comp, ...
946 та 0.126956 1 ста, ▁та, ▁ста, став, ▁так, ...
1087 AR 0.127018 1 ART, ▁AR, ▁WAR, ARE, ▁WARRAN, ...
15947 BPACK 0.127024 0.85 WEBPACK
393 ▁L 0.127048 1 ▁Le, ▁La, ▁License, ▁Let, ▁List, ...
31942 0.127079 0.96
31913 0.127172 0.98
31379 0.127223 0.97
1218 ities 0.127263 1 ilities, ▁activities, abilities, ▁opportunities, ▁cities, ...
326 ig 0.127294 1 ight, ign, fig, igh, ▁right, ...
2435 .” 0.127365 0.99
4604 ,\r 0.127395 0.051 ',\r, ",\r, },\r, ▁},\r, ),\r
864 ise 0.127413 1 ised, wise, ises, aise, ▁otherwise, ...
2287 ▁▁▁ 0.127444 1 ▁▁▁▁▁▁▁▁▁, ▁▁▁▁▁▁▁, ▁▁▁▁▁▁▁▁▁▁▁
1905 ling 0.127466 1 elling, ▁feeling, aling, iling, bling, ...
30845 0.127537 0.0039
323 ac 0.127551 1 ack, ace, act, ach, ▁back, ...
31251 0.127572 0.072
31048 0.127668 0.9
643 ry 0.12776 1 very, ory, ▁every, ery, ▁very, ...
392 ist 0.127825 1 List, ▁dist, ▁list, ister, ists, ...
5004 izes 0.127923 1 ▁sizes, Sizes
2458 ised 0.127974 1 ▁raised, ▁surprised, ▁promised, ▁advised, vised, ...
575 ▁out 0.128004 1 ▁outside, ▁output, ▁outer, ▁outcome, ▁outdoor, ...
26292 emperaturen 0.128017 0.43 eltemperaturen
509 ans 0.128023 1 ▁trans, trans, translation, ▁dans, ▁means, ...
416 end 0.128037 1 ▁end, riend, pend, ▁friend, ender, ...
320 ▁T 0.128105 1 ▁The, ▁Th, ▁This, ▁They, ▁Tr, ...
367 ▁P 0.128107 1 ▁Pro, ▁Pl, ▁Pr, ▁Ph, ▁Par, ...
31938 0.128112 0.66
384 ▁D 0.12813 1 ▁De, ▁Do, ▁Des, ▁Die, ▁Dr, ...
753 ian 0.128301 1 ians, iant, iance, iano, iana, ...
328 ol 0.128303 1 old, oll, ool, ▁col, ▁pol, ...
31486 0.128378 0.00041
31061 0.128384 0.036
485 ne 0.128388 1 one, ▁one, ▁new, ener, ▁need, ...
13078 ERCHANTABILITY 0.128399 0.00096 ▁MERCHANTABILITY
31468 0.128415 0.9
520 ra 0.128432 1 aram, ray, param, ▁trans, rap, ...
727 ▁time 0.128444 1 ▁times, ▁timeout, ▁timer, ▁timestamp
31976 0.128475 0.74
375 ab 0.128507 1 able, ▁ab, ▁about, abel, label, ...
410 op 0.128554 1 ople, ▁people, ▁op, rop, ▁open, ...
1294 man 0.128658 1 ▁human, ▁woman, ▁command, ▁performance, Command, ...
265 he 0.128724 1 ▁the, ▁he, ▁The, hen, ▁her, ...
30765 0.128728 0.96
350 od 0.128748 1 ode, ▁mod, ood, ody, ▁good, ...
387 ▁- 0.128773 1 ▁--, ▁->, ▁-->, ▁-=, ▁---, ...
524 ▁K 0.128795 1 ▁King, ▁Ke, ▁Kl, ▁Key, ▁Kar, ...
20358 ):\r 0.129017 0.00031
31053 0.129083 0.98
1157 ting 0.12909 1 tings, ▁getting, itting, ▁writing, iting, ...
487 per 0.129109 1 ▁per, ▁person, ▁exper, perty, ▁oper, ...
30460 0.129154 0.98
366 em 0.129165 1 ▁them, ▁em, ystem, ▁rem, lement, ...
30973 0.129213 0.93
420 ▁G 0.129265 1 ▁Gr, ▁Ge, ▁Gu, ▁Get, ▁God, ...
590 ▁they 0.129291 1
560 ▁In 0.129325 1 ▁Ind, ▁Inst, ▁Intern, ▁Inter, ▁Int, ...
1006 led 0.129439 1 ▁called, ailed, illed, abled, ledge, ...
401 ▁F 0.129443 1 ▁For, ▁Fr, ▁Fl, ▁From, ▁Fin, ...
764 ▁– 0.129473 1 ▁–,
334 ▁C 0.129527 1 ▁Ch, ▁Com, ▁Con, ▁Cl, ▁Col, ...
28778 н 0.129562 1
365 ▁B 0.129617 1 ▁But, ▁Be, ▁Br, ▁Bl, ▁By, ...
20896 ▁Станов 0.129678 0.76 ▁Становништво
538 one 0.129744 1 ▁one, oney, ▁done, ione, ones, ...
570 ite 0.129749 1 ited, iter, item, cite, rite, ...
962 AT 0.129751 1 ATE, ATION, ATA, STAT, ATH, ...
456 ▁this 0.129797 1
405 ke 0.129831 1 ake, ▁like, ▁ke, ▁make, ▁take, ...
611 ." 0.129838 1 ...", .",, .");, ▁.", .""", ...
15617 netje 0.129878 0.89 ▁beginnetje
28803 м 0.129938 1
31427 0.130124 0.19
358 ce 0.130138 1 ice, ace, ance, ence, ource, ...
28838 0.130156 1
377 ap 0.130202 1 app, ▁app, ▁ap, rap, apt, ...
31963 0.130233 0.72
630 ▁she 0.130304 1 ▁shel, ▁shell, ▁sheet, ▁shelter, ▁sheets, ...
382 ▁H 0.130357 1 ▁He, ▁How, ▁His, ▁Her, ▁However, ...
31616 0.130492 0.17
31674 0.130513 0.11
28513 dentry 0.130518 1
31287 0.13054 0.016
2980 EL 0.130571 1 SELECT, ELD, FIELD, VEL, SEL, ...
28794 л 0.1306 1
1549 ants 0.13064 1 ▁wants, ▁plants, Constants, ▁participants, ▁restaurants, ...
450 de 0.130676 1 ide, ode, ▁des, ade, ▁def, ...
31506 0.130677 0.094
290 ▁m 0.130679 1 ▁me, ▁my, ▁man, ▁more, ▁mod, ...
361 ir 0.130705 1 ire, ▁their, irst, ▁first, air, ...
406 out 0.130836 1 ▁out, ▁about, ▁without, outh, ayout, ...
464 ▁' 0.130839 1 ▁'./, ▁'/, ▁'@, ▁'\, ▁'<, ...
459 ▁not 0.130854 1 ▁nothing, ▁note, ▁notice, ▁noticed, ▁notes, ...
1927 ches 0.130943 1 aches, ▁chest, ▁matches, chester, anches, ...
31489 ڈ 0.130953 0.92
31892 0.130962 0.22
370 un 0.130996 1 ▁un, ▁und, ound, ount, ▁fun, ...
684 ▁about 0.131029 1
25931 tcx 0.131061 1
347 ▁be 0.131065 1 ▁been, ▁bec, ▁bet, ▁because, ▁before, ...
490 ge 0.131107 1 get, ▁get, essage, ange, ger, ...
1151 SE 0.131121 1 SET, ▁SE, ALSE, ENSE, ICENSE, ...
31865 ٔ 0.1312 0.13
2854 AM 0.131208 1 NAME, AME, ▁AM, PARAM, AMP, ...
1702 ION 0.131219 1 ATION, CTION, ITION, SION, VERSION, ...
31943 0.131252 0.98
1079 ner 0.131297 1 ▁gener, ainer, ▁Gener, ▁general, ▁ener, ...
1449 ats 0.131383 1 stats, aats, Stats, ▁seats, ▁stats, ...
1791 ▁To 0.131457 1 ▁Tom, ▁Tor, ▁Tod, ▁Top, ▁Tour, ...
426 ain 0.131479 1 ▁again, ains, aint, ained, aining, ...
1009 of 0.131511 1 off, ▁prof, ▁offer, ▁often, ▁soft, ...
1190 els 0.131537 1 else, ▁models, ▁levels, annels, ▁els, ...
665 ра 0.131561 1 ▁ра, гра, ран, ▁раз, кра, ...
2094 ET 0.131566 1 SET, GET, NET, RET, LETE, ...
2252 US 0.131596 1 ▁USA, USE, STATUS, UST, USER, ...
344 ); 0.131608 1 ();, ");, ));, ());, );\r, ...
4896 GE 0.131628 1 GET, AGE, GER, GEN, MAGE, ...
28797 é 0.131632 1
7148 sembly 0.131687 0.75 ▁Assembly, ▁assembly, assembly, Assembly
31627 0.131748 0.028
547 ide 0.131753 1 ident, ider, ▁ide, ▁consider, ides, ...
31266 0.131769 0.99
782 ps 0.131773 1 aps, ips, ops, eps, roups, ...
336 ow 0.131779 1 own, row, ▁know, ▁how, ▁now, ...
31880 0.131819 0.69
737 ▁like 0.13185 1 ▁likely, ▁liked, ▁likes, ▁likelihood, ▁likewise
31532 0.131955 0.59
425 ill 0.131977 1 ▁will, ▁still, ille, ▁mill, illed, ...
3728 VE 0.131988 1 IVE, VERSION, EVENT, VEL, ACTIVE, ...
31994 ٓ 0.132017 0.56
31355 0.132025 0.0059
31985 0.132058 0.82
1263 ▁For 0.132079 1 ▁Form, ▁Fort, ▁Fore, ▁Force, ▁Ford, ...
535 ice 0.132089 1 ices, icense, ervice, ▁License, Service, ...
1237 the 0.132149 1 ▁another, ither, thers, ▁either, ▁together, ...
1225 да 0.132163 1 ▁года, ▁да, дар, зда, жда, ...
31190 0.132167 0.58
31487 0.132183 0.18
31857 0.132184 0.91
31227 ి 0.132276 0.00022
1308 ctions 0.132289 0.52 lections, actions, ▁functions, ▁actions, ructions, ...
1224 ти 0.132308 0.99 сти, тив, ности, ▁ти, кти, ...
343 ver 0.132358 1 very, vers, ▁over, ▁every, ▁very, ...
661 ▁It 0.132365 1 ▁Ital, ▁Its, ▁Italian, ▁Italy, ▁Item, ...
371 ▁{ 0.132365 1 ▁{\r, ▁{\, ▁{}, ▁{@, ▁{", ...
31276 0.132403 0.016
1392 for 0.132432 1 formation, ▁information, ▁perform, ▁fore, fort, ...
30208 0.132449 0.97
6991 ▁TO 0.132491 1 ▁TODO
394 ▁W 0.132615 1 ▁We, ▁Wh, ▁When, ▁What, ▁With, ...
30264 0.132635 0.95
31679 0.132649 0.72
277 ▁c 0.132673 1 ▁con, ▁com, ▁ch, ▁cl, ▁can, ...
28773 е 0.132676 1
1604 IC 0.132764 1 ICE, ICENSE, LICENSE, DEVICE, ▁PARTIC, ...
28795 к 0.132868 1
31886 0.132888 0.73
2178 ards 0.13289 1 wards, ▁towards, ▁cards, ▁standards, ▁Awards, ...
1033 ms 0.13292 1 ▁himself, ▁terms, params, ▁themselves, msg, ...
1851 IS 0.13292 1 ▁IS, IST, ▁ISBN, DIS, LIST, ...
2047 ley 0.132922 1 ▁Valley, iley, ▁valley, ▁Stanley, keley, ...
492 are 0.132958 1 ared, arent, ▁care, ware, ▁parent, ...
1100 ta 0.132968 1 ▁start, ▁data, Data, ▁take, eta, ...
550 ▁V 0.132995 1 ▁Ver, ▁Val, ▁Vol, ▁Vir, ▁Vis, ...
30078 0.133008 0.93
691 io 0.13311 1 ations, ption, ▁function, ious, ational, ...
513 ▁if 0.13313 1
1265 son 0.133163 1 ▁person, ason, ▁son, ison, ▁reason, ...
31421 0.133184 0.028
470 () 0.133203 1 ();, ()., (),, ());, ()), ...
586 ▁my 0.133257 1 ▁myself, ▁myst, ▁myth, ▁myster, ▁mystery, ...
1520 na 0.133274 1 ▁una, ▁na, ination, ana, nal, ...
1168 ars 0.133313 1 ▁years, parse, ears, ▁parse, ▁stars, ...
28811 я 0.133395 1
455 all 0.133409 1 ▁all, ally, ▁call, ually, ▁really, ...
357 ag 0.133414 1 age, ▁ag, essage, ▁again, ages, ...
31910 0.133422 0.63
1523 ler 0.133435 1 roller, Handler, eller, Controller, iler, ...
541 ▁can 0.133476 1 ▁cannot, ▁candid, ▁cant, ▁cancer, ▁candidate, ...
30073 0.133538 1
748 ays 0.13355 1 ways, ▁always, ▁days, ▁says, ▁ways, ...
1837 ization 0.133578 1 ▁organization, ▁organizations, izations, ▁optimization, Serialization, ...
1291 ages 0.133583 1 ▁images, ▁pages, ▁messages, essages, pages, ...
749 ier 0.133586 1 iers, ifier, rier, arlier, ▁earlier, ...
31522 0.133605 0.17
329 ut 0.133606 1 out, ▁but, ▁out, ▁about, put, ...
884 ty 0.133607 1 type, ▁type, ility, perty, ality, ...
789 ish 0.133633 1 ished, lish, lished, ▁English, ▁British, ...
567 ▁& 0.133692 1 ▁&&, ▁&=, ▁&\, ▁&#, ▁&=&, ...
1536 time 0.133716 1 ▁times, times, ▁sometimes, etime, ometimes, ...
1303 ines 0.133774 1 iness, ▁business, ▁lines, inese, ▁Chinese, ...
624 ▁one 0.133848 1 ▁ones
1403 by 0.133849 1 aby, byte, ▁baby, bytes, ▁byte, ...
1841 AD 0.133882 1 READ, ▁AD, LOAD, ADD, ADDR, ...
478 ▁we 0.133898 1 ▁were, ▁well, ▁week, ▁went, ▁wer, ...
636 ence 0.133931 1 ience, ference, ences, ▁experience, idence, ...
413 ▁E 0.13395 1 ▁Ex, ▁En, ▁El, ▁Eng, ▁Ed, ...
657 In 0.134015 1 ▁Ind, Info, Ind, Index, Int, ...
949 ▁don 0.134017 1 ▁done, ▁dont, ▁donde, ▁donc, ▁donne
28785 р 0.134037 1
284 ▁p 0.134096 1 ▁pro, ▁pl, ▁per, ▁pre, ▁pr, ...
920 ST 0.134103 1 STR, ▁ST, STAT, EST, IST, ...
3836 AY 0.13411 1 RAY, LAY, PLAY, ARRAY, DAY, ...
30991 0.134139 0.22
574 ▁your 0.134215 1 ▁yourself, ▁yours
30114 0.134237 0.99
516 ▁his 0.134245 1 ▁hist, ▁history, ▁histor, ▁historical, ▁historic, ...
3148 ATE 0.134271 1 STATE, DATE, ATED, UPDATE, CREATE, ...
429 ▁$ 0.134283 1 ▁$\, ▁$(, ▁${\, ▁$$, ▁${, ...
402 oc 0.134328 1 ock, ▁loc, lock, loc, ▁process, ...
30880 0.134463 0.87
373 ri 0.134474 1 ring, rib, riv, tring, rit, ...
317 ▁e 0.134485 1 ▁ex, ▁en, ▁el, ▁ev, ▁em, ...
1000 ла 0.134489 1 кла, ▁обла, лав, лан, лась, ...
1081 line 0.134526 1 ▁line, ▁online, eline, ▁lines, overline, ...
30406 0.134538 0.00046
22801 crtc 0.134546 1
442 ▁or 0.134554 1 ▁order, ▁organ, ▁orig, ▁org, ▁original, ...
30862 0.13471 0.41
31848 0.134769 0.51
2065 itions 0.13477 0.95 ▁conditions, ▁positions, ▁definitions, ▁traditions, initions, ...
2431 ENT 0.134782 1 MENT, EVENT, IENT, ▁EVENT, ENTRY, ...
826 der 0.134799 1 ▁der, ▁under, ider, ▁order, ▁consider, ...
1038 ▁make 0.134816 1 ▁makes, ▁makeup, ▁maker
4529 ▁Of 0.134838 1 ▁Office, ▁Officer, ▁Often, ▁Official
305 ▁l 0.13489 1 ▁le, ▁la, ▁li, ▁like, ▁look, ...

Byte tokens

145 entries below threshold of 0.039

token_id token indicator ord hex byte_type reencoded
4 <0x01> 0 1 0x01 ascii 29534: \x01
5 <0x02> 0 2 0x02 ascii 30551: \x02
6 <0x03> 0 3 0x03 ascii 30662: \x03
7 <0x04> 0 4 0x04 ascii 30724: \x04
8 <0x05> 0 5 0x05 ascii 30550: \x05
9 <0x06> 0 6 0x06 ascii 30314: \x06
10 <0x07> 0 7 0x07 ascii 30963: \x07
11 <0x08> 0 8 0x08 ascii 31129: \x08
14 <0x0B> 0 11 0x0B ascii 30638: \x0b
15 <0x0C> 0 12 0x0C ascii 29683: \x0c
16 <0x0D> 0 13 0x0D ascii 28801: \r
17 <0x0E> 0 14 0x0E ascii 30517: \x0e
18 <0x0F> 0 15 0x0F ascii 30698: \x0f
19 <0x10> 0 16 0x10 ascii 30388: \x10
20 <0x11> 0 17 0x11 ascii 30557: \x11
21 <0x12> 0 18 0x12 ascii 30298: \x12
22 <0x13> 0 19 0x13 ascii 30453: \x13
23 <0x14> 0 20 0x14 ascii 30721: \x14
24 <0x15> 0 21 0x15 ascii 30675: \x15
25 <0x16> 0 22 0x16 ascii 30935: \x16
125 additional entries below threshold
token_id token indicator ord hex byte_type reencoded
26 <0x17> 0 23 0x17 ascii 30841: \x17
27 <0x18> 0 24 0x18 ascii 30555: \x18
28 <0x19> 0 25 0x19 ascii 30969: \x19
29 <0x1A> 0 26 0x1A ascii 30759: \x1a
30 <0x1B> 0 27 0x1B ascii 30246: \x1b
31 <0x1C> 0 28 0x1C ascii 31134: \x1c
32 <0x1D> 0 29 0x1D ascii 31236: \x1d
33 <0x1E> 0 30 0x1E ascii 31150: \x1e
34 <0x1F> 0 31 0x1F ascii 31217: \x1f
35 <0x20> 0 32 0x20 ascii 28705:
36 <0x21> 0 33 0x21 ascii 28808: !
37 <0x22> 0 34 0x22 ascii 28739: "
38 <0x23> 0 35 0x23 ascii 28771: #
39 <0x24> 0 36 0x24 ascii 28776: $
40 <0x25> 0 37 0x25 ascii 28823: %
41 <0x26> 0 38 0x26 ascii 28800: &
42 <0x27> 0 39 0x27 ascii 28742: '
43 <0x28> 0 40 0x28 ascii 28732: (
44 <0x29> 0 41 0x29 ascii 28731: )
45 <0x2A> 0 42 0x2A ascii 28736: *
46 <0x2B> 0 43 0x2B ascii 28806: +
47 <0x2C> 0 44 0x2C ascii 28725: ,
48 <0x2D> 0 45 0x2D ascii 28733: -
49 <0x2E> 0 46 0x2E ascii 28723: .
50 <0x2F> 0 47 0x2F ascii 28748: /
51 <0x30> 0 48 0x30 ascii 28734: 0
52 <0x31> 0 49 0x31 ascii 28740: 1
53 <0x32> 0 50 0x32 ascii 28750: 2
54 <0x33> 0 51 0x33 ascii 28770: 3
55 <0x34> 0 52 0x34 ascii 28781: 4
56 <0x35> 0 53 0x35 ascii 28782: 5
57 <0x36> 0 54 0x36 ascii 28784: 6
58 <0x37> 0 55 0x37 ascii 28787: 7
59 <0x38> 0 56 0x38 ascii 28783: 8
60 <0x39> 0 57 0x39 ascii 28774: 9
61 <0x3A> 0 58 0x3A ascii 28747: :
62 <0x3B> 0 59 0x3B ascii 28745: ;
63 <0x3C> 0 60 0x3C ascii 28789: <
64 <0x3D> 0 61 0x3D ascii 28746: =
65 <0x3E> 0 62 0x3E ascii 28767: >
66 <0x3F> 0 63 0x3F ascii 28804: ?
67 <0x40> 0 64 0x40 ascii 28818: @
68 <0x41> 0 65 0x41 ascii 28741: A
69 <0x42> 0 66 0x42 ascii 28760: B
70 <0x43> 0 67 0x43 ascii 28743: C
71 <0x44> 0 68 0x44 ascii 28757: D
72 <0x45> 0 69 0x45 ascii 28749: E
73 <0x46> 0 70 0x46 ascii 28765: F
74 <0x47> 0 71 0x47 ascii 28777: G
75 <0x48> 0 72 0x48 ascii 28769: H
76 <0x49> 0 73 0x49 ascii 28737: I
77 <0x4A> 0 74 0x4A ascii 28798: J
78 <0x4B> 0 75 0x4B ascii 28796: K
79 <0x4C> 0 76 0x4C ascii 28758: L
80 <0x4D> 0 77 0x4D ascii 28755: M
81 <0x4E> 0 78 0x4E ascii 28759: N
82 <0x4F> 0 79 0x4F ascii 28762: O
83 <0x50> 0 80 0x50 ascii 28753: P
84 <0x51> 0 81 0x51 ascii 28824: Q
85 <0x52> 0 82 0x52 ascii 28754: R
86 <0x53> 0 83 0x53 ascii 28735: S
87 <0x54> 0 84 0x54 ascii 28738: T
88 <0x55> 0 85 0x55 ascii 28779: U
89 <0x56> 0 86 0x56 ascii 28790: V
90 <0x57> 0 87 0x57 ascii 28780: W
91 <0x58> 0 88 0x58 ascii 28814: X
92 <0x59> 0 89 0x59 ascii 28802: Y
93 <0x5A> 0 90 0x5A ascii 28828: Z
94 <0x5B> 0 91 0x5B ascii 28792: [
95 <0x5C> 0 92 0x5C ascii 28756: \
96 <0x5D> 0 93 0x5D ascii 28793: ]
97 <0x5E> 0 94 0x5E ascii 28815: ^
98 <0x5F> 0 95 0x5F ascii 28730: _
99 <0x60> 0 96 0x60 ascii 28832: `
100 <0x61> 0 97 0x61 ascii 28708: a
101 <0x62> 0 98 0x62 ascii 28726: b
102 <0x63> 0 99 0x63 ascii 28717: c
103 <0x64> 0 100 0x64 ascii 28715: d
104 <0x65> 0 101 0x65 ascii 28706: e
105 <0x66> 0 102 0x66 ascii 28722: f
106 <0x67> 0 103 0x67 ascii 28721: g
107 <0x68> 0 104 0x68 ascii 28716: h
108 <0x69> 0 105 0x69 ascii 28710: i
109 <0x6A> 0 106 0x6A ascii 28768: j
110 <0x6B> 0 107 0x6B ascii 28729: k
111 <0x6C> 0 108 0x6C ascii 28714: l
112 <0x6D> 0 109 0x6D ascii 28719: m
113 <0x6E> 0 110 0x6E ascii 28711: n
114 <0x6F> 0 111 0x6F ascii 28709: o
115 <0x70> 0 112 0x70 ascii 28720: p
116 <0x71> 0 113 0x71 ascii 28775: q
117 <0x72> 0 114 0x72 ascii 28712: r
118 <0x73> 0 115 0x73 ascii 28713: s
119 <0x74> 0 116 0x74 ascii 28707: t
120 <0x75> 0 117 0x75 ascii 28718: u
121 <0x76> 0 118 0x76 ascii 28728: v
122 <0x77> 0 119 0x77 ascii 28727: w
123 <0x78> 0 120 0x78 ascii 28744: x
124 <0x79> 0 121 0x79 ascii 28724: y
125 <0x7A> 0 122 0x7A ascii 28764: z
126 <0x7B> 0 123 0x7B ascii 28751: {
127 <0x7C> 0 124 0x7C ascii 28766: |
128 <0x7D> 0 125 0x7D ascii 28752: }
129 <0x7E> 0 126 0x7E ascii 28845: ~
130 <0x7F> 0 127 0x7F ascii 30982: \x7f
195 <0xC0> 0 192 0xC0 unused_utf8
196 <0xC1> 0 193 0xC1 unused_utf8
197 <0xC2> 0 194 0xC2 utf8
198 <0xC3> 0 195 0xC3 utf8
248 <0xF5> 0 245 0xF5 unused_utf8
249 <0xF6> 0 246 0xF6 unused_utf8
250 <0xF7> 0 247 0xF7 unused_utf8
251 <0xF8> 0 248 0xF8 unused_utf8
252 <0xF9> 0 249 0xF9 unused_utf8
253 <0xFA> 0 250 0xFA unused_utf8
254 <0xFB> 0 251 0xFB unused_utf8
255 <0xFC> 0 252 0xFC unused_utf8
256 <0xFD> 0 253 0xFD unused_utf8
257 <0xFE> 0 254 0xFE unused_utf8
258 <0xFF> 0 255 0xFF unused_utf8
31134 \x1c 0.0109612 28 0x1C ascii
31150 \x1e 0.0117701 30 0x1E ascii
31236 \x1d 0.0138711 29 0x1D ascii
29683 \x0c 0.0178993 12 0x0C ascii
30638 \x0b 0.0250177 11 0x0B ascii
235 additional entries above threshold
token_id token indicator ord hex byte_type
244 <0xF1> 0.0750657 241 0xF1 utf8
245 <0xF2> 0.0851321 242 0xF2 utf8
28713 s 0.086673 115 0x73 ascii
28723 . 0.0888948 46 0x2E ascii
28740 1 0.090978 49 0x31 ascii
28725 , 0.0943044 44 0x2C ascii
28750 2 0.0963412 50 0x32 ascii
28724 y 0.0978571 121 0x79 ascii
28731 ) 0.0991102 41 0x29 ascii
28735 S 0.0997779 83 0x53 ascii
28734 0 0.101295 48 0x30 ascii
28708 a 0.102304 97 0x61 ascii
28707 t 0.102341 116 0x74 ascii
28706 e 0.103177 101 0x65 ascii
28747 : 0.103533 58 0x3A ascii
28710 i 0.103537 105 0x69 ascii
30841 \x17 0.104708 23 0x17 ascii
28802 Y 0.105231 89 0x59 ascii
31217 \x1f 0.106339 31 0x1F ascii
28709 o 0.106805 111 0x6F ascii
30935 \x16 0.10847 22 0x16 ascii
28719 m 0.108601 109 0x6D ascii
30453 \x13 0.108805 19 0x13 ascii
30298 \x12 0.108985 18 0x12 ascii
28770 3 0.109098 51 0x33 ascii
28741 A 0.1095 65 0x41 ascii
30675 \x15 0.109537 21 0x15 ascii
28738 T 0.110005 84 0x54 ascii
28757 D 0.110252 68 0x44 ascii
30517 \x0e 0.111379 14 0x0E ascii
28711 n 0.111425 110 0x6E ascii
28781 4 0.11153 52 0x34 ascii
28715 d 0.111582 100 0x64 ascii
28782 5 0.111764 53 0x35 ascii
30698 \x0f 0.111874 15 0x0F ascii
28717 c 0.111985 99 0x63 ascii
13 <0x0A> 0.112075 10 0x0A ascii
28749 E 0.112204 69 0x45 ascii
28733 - 0.112489 45 0x2D ascii
28720 p 0.112711 112 0x70 ascii
28742 ' 0.112775 39 0x27 ascii
30314 \x06 0.112993 6 0x06 ascii
30721 \x14 0.113171 20 0x14 ascii
28718 u 0.113353 117 0x75 ascii
28762 O 0.114535 79 0x4F ascii
28712 r 0.114582 114 0x72 ascii
28714 l 0.114659 108 0x6C ascii
28729 k 0.114673 107 0x6B ascii
28743 C 0.1155 67 0x43 ascii
28755 M 0.115726 77 0x4D ascii
28756 \ 0.116065 92 0x5C ascii
30557 \x11 0.116151 17 0x11 ascii
28764 z 0.11665 122 0x7A ascii
28784 6 0.116867 54 0x36 ascii
28726 b 0.116869 98 0x62 ascii
28722 f 0.1171 102 0x66 ascii
28769 H 0.117487 72 0x48 ascii
28744 x 0.117603 120 0x78 ascii
28721 g 0.117654 103 0x67 ascii
30724 \x04 0.118335 4 0x04 ascii
28759 N 0.11841 78 0x4E ascii
28783 8 0.118523 56 0x38 ascii
28765 F 0.118992 70 0x46 ascii
28739 " 0.119433 34 0x22 ascii
28716 h 0.119691 104 0x68 ascii
28753 P 0.119694 80 0x50 ascii
28737 I 0.119705 73 0x49 ascii
28758 L 0.120198 76 0x4C ascii
28777 G 0.12057 71 0x47 ascii
28745 ; 0.120591 59 0x3B ascii
28793 ] 0.121062 93 0x5D ascii
28730 _ 0.121131 95 0x5F ascii
28796 K 0.121487 75 0x4B ascii
28767 > 0.122217 62 0x3E ascii
28751 { 0.122245 123 0x7B ascii
28774 9 0.12234 57 0x39 ascii
28728 v 0.122484 118 0x76 ascii
28732 ( 0.122578 40 0x28 ascii
28727 w 0.122755 119 0x77 ascii
28787 7 0.122916 55 0x37 ascii
28752 } 0.122984 125 0x7D ascii
28760 B 0.123052 66 0x42 ascii
28754 R 0.123309 82 0x52 ascii
30969 \x19 0.123521 25 0x19 ascii
28779 U 0.123757 85 0x55 ascii
28780 W 0.123965 87 0x57 ascii
30963 \x07 0.124607 7 0x07 ascii
28790 V 0.126235 86 0x56 ascii
28828 Z 0.126257 90 0x5A ascii
224 <0xDD> 0.126426 221 0xDD utf8
28748 / 0.126949 47 0x2F ascii
28768 j 0.127853 106 0x6A ascii
28814 X 0.128844 88 0x58 ascii
28804 ? 0.128848 63 0x3F ascii
28736 * 0.130238 42 0x2A ascii
30555 \x18 0.131105 24 0x18 ascii
30388 \x10 0.131674 16 0x10 ascii
28808 ! 0.131779 33 0x21 ascii
30550 \x05 0.131906 5 0x05 ascii
30662 \x03 0.132958 3 0x03 ascii
12 <0x09> 0.133516 9 0x09 ascii
28798 J 0.133871 74 0x4A ascii
28771 # 0.133998 35 0x23 ascii
28775 q 0.134677 113 0x71 ascii
28776 $ 0.135957 36 0x24 ascii
28792 [ 0.13671 91 0x5B ascii
28746 = 0.137702 61 0x3D ascii
226 <0xDF> 0.139587 223 0xDF utf8
28824 Q 0.140163 81 0x51 ascii
225 <0xDE> 0.140859 222 0xDE utf8
30551 \x02 0.143268 2 0x02 ascii
28832 ` 0.143384 96 0x60 ascii
30246 \x1b 0.144255 27 0x1B ascii
28766 | 0.144602 124 0x7C ascii
28789 < 0.14517 60 0x3C ascii
28806 + 0.146968 43 0x2B ascii
3 <0x00> 0.147016 0x00 ascii
247 <0xF4> 0.147204 244 0xF4 utf8
28815 ^ 0.148092 94 0x5E ascii
29534 \x01 0.148759 1 0x01 ascii
131 <0x80> 0.15049 128 0x80 utf8
223 <0xDC> 0.152093 220 0xDC utf8
28845 ~ 0.152329 126 0x7E ascii
30759 \x1a 0.152532 26 0x1A ascii
233 <0xE6> 0.15431 230 0xE6 utf8
163 <0xA0> 0.154364 160 0xA0 utf8
28823 % 0.157816 37 0x25 ascii
28800 & 0.159108 38 0x26 ascii
175 <0xAC> 0.161479 172 0xAC utf8
28818 @ 0.16148 64 0x40 ascii
139 <0x88> 0.162135 136 0x88 utf8
147 <0x90> 0.162291 144 0x90 utf8
179 <0xB0> 0.162564 176 0xB0 utf8
152 <0x95> 0.162839 149 0x95 utf8
137 <0x86> 0.162841 134 0x86 utf8
178 <0xAF> 0.163081 175 0xAF utf8
159 <0x9C> 0.163138 156 0x9C utf8
229 <0xE2> 0.16385 226 0xE2 utf8
155 <0x98> 0.163885 152 0x98 utf8
184 <0xB5> 0.164032 181 0xB5 utf8
167 <0xA4> 0.164148 164 0xA4 utf8
180 <0xB1> 0.164171 177 0xB1 utf8
174 <0xAB> 0.16431 171 0xAB utf8
161 <0x9E> 0.164691 158 0x9E utf8
140 <0x89> 0.16476 137 0x89 utf8
173 <0xAA> 0.165152 170 0xAA utf8
144 <0x8D> 0.165181 141 0x8D utf8
134 <0x83> 0.165182 131 0x83 utf8
168 <0xA5> 0.165215 165 0xA5 utf8
219 <0xD8> 0.165712 216 0xD8 utf8
171 <0xA8> 0.165877 168 0xA8 utf8
181 <0xB2> 0.16588 178 0xB2 utf8
177 <0xAE> 0.166021 174 0xAE utf8
211 <0xD0> 0.166134 208 0xD0 utf8
135 <0x84> 0.166231 132 0x84 utf8
169 <0xA6> 0.166451 166 0xA6 utf8
138 <0x87> 0.166524 135 0x87 utf8
187 <0xB8> 0.166546 184 0xB8 utf8
157 <0x9A> 0.166727 154 0x9A utf8
188 <0xB9> 0.166853 185 0xB9 utf8
183 <0xB4> 0.167181 180 0xB4 utf8
148 <0x91> 0.167265 145 0x91 utf8
143 <0x8C> 0.167373 140 0x8C utf8
191 <0xBC> 0.167453 188 0xBC utf8
166 <0xA3> 0.167696 163 0xA3 utf8
176 <0xAD> 0.168126 173 0xAD utf8
189 <0xBA> 0.168396 186 0xBA utf8
154 <0x97> 0.168417 151 0x97 utf8
185 <0xB6> 0.168527 182 0xB6 utf8
164 <0xA1> 0.168591 161 0xA1 utf8
170 <0xA7> 0.168792 167 0xA7 utf8
162 <0x9F> 0.168999 159 0x9F utf8
141 <0x8A> 0.169207 138 0x8A utf8
227 <0xE0> 0.169655 224 0xE0 utf8
182 <0xB3> 0.169676 179 0xB3 utf8
145 <0x8E> 0.170031 142 0x8E utf8
132 <0x81> 0.170131 129 0x81 utf8
158 <0x9B> 0.170198 155 0x9B utf8
234 <0xE7> 0.170245 231 0xE7 utf8
172 <0xA9> 0.170257 169 0xA9 utf8
153 <0x96> 0.170553 150 0x96 utf8
193 <0xBE> 0.170612 190 0xBE utf8
190 <0xBB> 0.170666 187 0xBB utf8
235 <0xE8> 0.170832 232 0xE8 utf8
136 <0x85> 0.17085 133 0x85 utf8
186 <0xB7> 0.171103 183 0xB7 utf8
151 <0x94> 0.171129 148 0x94 utf8
142 <0x8B> 0.17126 139 0x8B utf8
31129 \x08 0.171393 8 0x08 ascii
160 <0x9D> 0.171562 157 0x9D utf8
150 <0x93> 0.171585 147 0x93 utf8
232 <0xE5> 0.17162 229 0xE5 utf8
149 <0x92> 0.172192 146 0x92 utf8
133 <0x82> 0.172375 130 0x82 utf8
156 <0x99> 0.172498 153 0x99 utf8
192 <0xBD> 0.1728 189 0xBD utf8
165 <0xA2> 0.172991 162 0xA2 utf8
221 <0xDA> 0.173104 218 0xDA utf8
212 <0xD1> 0.17314 209 0xD1 utf8
239 <0xEC> 0.174296 236 0xEC utf8
215 <0xD4> 0.174438 212 0xD4 utf8
194 <0xBF> 0.17479 191 0xBF utf8
146 <0x8F> 0.174801 143 0x8F utf8
238 <0xEB> 0.175881 235 0xEB utf8
228 <0xE1> 0.176468 225 0xE1 utf8
243 <0xF0> 0.17669 240 0xF0 utf8
213 <0xD2> 0.177937 210 0xD2 utf8
240 <0xED> 0.17859 237 0xED utf8
216 <0xD5> 0.179407 213 0xD5 utf8
246 <0xF3> 0.179979 243 0xF3 utf8
237 <0xEA> 0.180086 234 0xEA utf8
220 <0xD9> 0.180366 217 0xD9 utf8
236 <0xE9> 0.182062 233 0xE9 utf8
218 <0xD7> 0.18216 215 0xD7 utf8
231 <0xE4> 0.184283 228 0xE4 utf8
214 <0xD3> 0.185144 211 0xD3 utf8
242 <0xEF> 0.186289 239 0xEF utf8
217 <0xD6> 0.186448 214 0xD6 utf8
203 <0xC8> 0.187053 200 0xC8 utf8
222 <0xDB> 0.187636 219 0xDB utf8
201 <0xC6> 0.188027 198 0xC6 utf8
202 <0xC7> 0.190104 199 0xC7 utf8
209 <0xCE> 0.191158 206 0xCE utf8
208 <0xCD> 0.191409 205 0xCD utf8
204 <0xC9> 0.193083 201 0xC9 utf8
206 <0xCB> 0.195482 203 0xCB utf8
241 <0xEE> 0.19673 238 0xEE utf8
207 <0xCC> 0.196865 204 0xCC utf8
230 <0xE3> 0.197025 227 0xE3 utf8
199 <0xC4> 0.199003 196 0xC4 utf8
200 <0xC5> 0.199067 197 0xC5 utf8
210 <0xCF> 0.199114 207 0xCF utf8
30982 \x7f 0.199932 127 0x7F ascii
205 <0xCA> 0.200114 202 0xCA utf8
28801 \r 0.208957 13 0x0D ascii

Special tokens

1 entries below threshold of 0.039

token_id token indicator max_prob
0 <unk> 0.000668355 1.1e-08
2 additional entries above threshold
token_id token indicator max_prob
2 </s> 0.0787296 0.99
1 <s> 0.139665 4.4e-07