Skip to content

Latest commit

 

History

History
983 lines (968 loc) · 294 KB

mistralai_Mistral_7B_v0_1.md

File metadata and controls

983 lines (968 loc) · 294 KB

Report for mistralai/Mistral-7B-v0.1

Model info

  • Model Info:
    • Tied embeddings: False
    • LM head uses bias: False
    • Embeddings shape: [32000, 4096]
  • Tokenizer Info:
    • Vocab Size: 32000
    • Tokenizer Class: LlamaTokenizer
    • Tokenizer Type: BPE
    • Bytes handling: Byte Fallback
    • Token for verification prompt building: includegraphics
    • Token id for verification prompt building: 7621
  • Indicator summary:
    • Indicator for under-trained tokens: E_{in} L2 Norm
    • Overall distribution: 0.176 +/- 0.021
  • Detected Token Counts:
    • Number of tested under-trained tokens: 637, 529 non-special, 43 below p = 0.01 threshold, 36 below soft indicator threshold
    • Number of single byte tokens: 380, of which 143 below indicator threshold
    • Number of special tokens: 0, of which 0 below indicator threshold

Under-trained token indicators plot

Indicators scatter plots

Verification plot

Verification plot

Under-trained token verification results

36 entries below threshold of 0.040

token_id token indicator max_prob in_other_tokens
31738 \uefc0 0.00256505 2.5e-10
20418 ▁/**\r 0.00386389 2.9e-10
26636 });\r 0.00487029 6.7e-07
26407 };\r 0.00519749 5.5e-09
26392 ▁});\r 0.0054809 4.9e-06
26083 ▁//\r 0.00611446 1.3e-05
18759 ';\r 0.00615078 7.1e-06
9823 */\r 0.00744269 6.2e-07
25833 >?[< 0.00774109 0.00017
7608 ▁*/\r 0.00839155 0.0011
28171 ]);\r 0.00891207 0.00032
23139 ▁};\r 0.0090801 0.00048
15056 ());\r 0.00931322 5.5e-07
17695 },\r 0.0093152 2.3e-05 ▁},\r
12193 ▁);\r 0.00948616 0.00048
31363 \x85 0.00975407 1.7e-09
14756 /**\r 0.0103307 0.00097 ▁/**\r
16943 ');\r 0.0108719 7.1e-07
20692 ▁},\r 0.0110457 0.0015
10278 ',\r 0.0124804 0.00076
16 additional entries below threshold
token_id token indicator max_prob in_other_tokens
11880 ";\r 0.0140456 0.0014
30929 0.0149118 0.002
14420 ];\r 0.0156465 0.00053
18055 ){\r 0.0159617 0.0013
10941 ));\r 0.0173568 0.00013 ());\r
14980 ">\r 0.0173889 0.0025
6913 ");\r 0.025225 0.00064
25900 iNdEx 0.0259386 0.0019
22186 ')\r 0.0271444 0.00096
10939 ",\r 0.0279031 0.001
26831 ▁febbra 0.0298659 0.00083 ▁febbraio
4420 ();\r 0.0299747 0.0033
19248 NdEx 0.03231 0.56 iNdEx
3426 ▁}\r 0.0361206 0.00099
9962 ()\r 0.0381206 0.014
31853 0.039285 0.57
493 additional entries above threshold
token_id token indicator max_prob in_other_tokens
4441 {\r 0.0397886 0.00082 ){\r
23486 ),\r 0.0402804 0.0017
14619 ▁)\r 0.0433053 0.0044
17334 (\r 0.0452076 0.0063
15641 ▁uitgen 0.0471153 0.0019 ▁uitgenodigd
27732 '\r 0.0474389 0.0028
2519 }\r 0.0483977 0.00035 ▁}\r
1969 ▁{\r 0.0495068 0.14
31656 0.0500745 0.79
1761 );\r 0.0503821 0.037 ();\r, ");\r, ));\r, ▁);\r, ());\r, ...
16949 ")\r 0.0504 0.0082
31645 0.0514121 0.73
30413 0.0525205 0.97
27456 :%.*]] 0.0542301 0.21
14668 ))\r 0.0587438 0.011
16724 tagHelper 0.0603997 0.84
16772 :%.* 0.0612881 0.62 :%.*]]
15880 >:]< 0.063233 0.25
30813 0.0658322 0.99
31932 ҽ 0.0676692 0.98
7941 ICENSE 0.070442 0.1 LICENSE, ▁LICENSE
27265 ▁SDValue 0.0715061 0.95
10762 qpoint 0.0726682 0.99 pgfqpoint
15500 itempty 0.0748776 0.82 omitempty
31179 0.0759263 0.99
272 ▁the 0.0762231 1 ▁they, ▁their, ▁them, ▁there, ▁then, ...
31733 0.0783485 0.58
31841 0.0809174 0.98
17779 ▁gepublice 0.0811037 0.0095 ▁gepubliceerd
31922 0.0815264 0.17
15630 odigd 0.0836839 0.09 ▁uitgenodigd
30897 0.0837612 0.95
3685 >\r 0.0848695 0.93 ">\r
14052 ▁Jahrhund 0.0849249 0.064 ▁Jahrhundert, ▁Jahrhunderts
18766 ]\r 0.0872433 0.27
31895 0.0883292 1
1271 ;\r 0.0883544 0.82 );\r, ();\r, ");\r, ));\r, ";\r, ...
288 ing 0.0884252 1 ring, ings, tring, ning, ating, ...
11167 ityEngine 0.0890044 0.93 ▁UnityEngine, UnityEngine
302 ▁of 0.0907202 1 ▁off, ▁offer, ▁often, ▁offic, ▁office, ...
31469 ӏ 0.0917887 1
264 ▁a 0.0920783 1 ▁and, ▁al, ▁as, ▁an, ▁at, ...
31172 0.0922086 0.98
30867 🟠 0.092405 0.98
31443 0.0924198 0.12
286 ed 0.0935538 1 ated, ied, hed, red, ▁need, ...
11525 "\r 0.0940681 0.27
298 ▁to 0.095134 1 ▁too, ▁top, ▁took, ▁tot, ▁told, ...
274 es 0.0953476 1 est, ess, res, ies, ▁res, ...
30983 ڕ 0.0967776 0.94
31317 0.0968221 0.53
29934 0.0977994 0.98
263 er 0.0979869 1 ver, ter, ere, ers, ser, ...
30770 🟡 0.0990252 0.99
28705 0.0994336 1
28593 pgfscope 0.100196 0.77
404 ers 0.100763 1 vers, erson, ▁person, ters, ivers, ...
31731 Ӏ 0.102405 1
269 en 0.102659 1 ent, end, ment, ▁en, hen, ...
12683 pgfpathlineto 0.102959 0.78
24713 vscale 0.103341 1
282 al 0.103778 1 ▁al, all, ial, ▁all, ally, ...
352 ation 0.103888 0.99 ations, ational, lation, formation, translation, ...
31901 0.104889 0.97
262 in 0.104921 1 ing, ▁in, ain, ine, int, ...
31636 0.105023 1
10765 pgfqpoint 0.105359 0.92
31933 0.105362 0.96
266 on 0.10583 1 ion, ation, ▁on, ▁con, ction, ...
725 ER 0.106118 1 ERR, VER, ERT, ERROR, TER, ...
2043 ING 0.106734 1 STRING, TING, CLUDING, WARNING, SETTING, ...
297 ▁in 0.106894 1 ▁int, ▁into, ▁inter, ▁inst, ▁incl, ...
395 ▁with 0.107302 1 ▁without, ▁within, ▁withdraw, ▁withd, ▁withdrawal
497 ies 0.107524 1 ities, ries, ories, perties, ▁series, ...
304 ▁and 0.107526 1 ▁android, ▁andere, ▁anderen, ▁ander, ▁andra, ...
276 an 0.10762 1 ▁and, and, ▁an, ant, ans, ...
23270 ByComparator 0.108252 0.84
26939 ▁invån 0.10833 0.01 ▁invånare
354 ▁for 0.10858 1 ▁form, ▁fore, ▁forward, ▁force, ▁former, ...
415 ▁The 0.109138 1 ▁They, ▁There, ▁Then, ▁These, ▁Their, ...
20411 ][< 0.109658 0.98
697 ations 0.109937 0.99 ▁relations, ▁relationship, ulations, ▁operations, ifications, ...
385 os 0.109969 1 ost, ose, ▁pos, pos, ▁most, ...
278 is 0.110116 1 ▁is, ist, ▁this, ▁his, ▁dis, ...
356 ▁on 0.110708 1 ▁one, ▁only, ▁once, ▁online, ▁ones, ...
380 ate 0.110728 1 ated, ater, ates, rivate, date, ...
31394 0.111301 0.3
349 ▁is 0.111417 1 ▁iss, ▁ist, ▁isn, ▁issue, ▁issues, ...
29091 0.111665 1
30690 ێ 0.111861 0.9
301 el 0.111887 1 ell, elf, ▁el, ely, iel, ...
31264 0.111937 1
271 or 0.112019 1 ▁for, ort, ore, ▁or, port, ...
2255 ES 0.112413 1 EST, CESS, TIES, RES, ▁WARRANTIES, ...
325 ▁( 0.112422 1 ▁(!, ▁(*, ▁((, ▁(), ▁($, ...
291 le 0.112465 1 ▁le, able, ile, ple, lect, ...
369 ▁that 0.112754 1 ▁thats
11370 pgfpath 0.112772 0.22 pgfpathlineto
299 et 0.112877 1 get, ▁return, ▁get, set, eth, ...
267 re 0.113046 1 ▁re, ere, res, ore, ▁are, ...
294 ic 0.113214 1 ice, ich, lic, ublic, ick, ...
31692 0.113691 0.92
270 at 0.113797 1 ation, ▁that, ate, ▁at, ath, ...
21876 imeq 0.113818 0.99 simeq
390 ▁as 0.113856 1 ▁ass, ▁ask, ▁assert, ▁asked, ▁associ, ...
293 as 0.11407 1 ▁as, ▁was, ass, ast, ase, ...
283 ar 0.114092 1 art, ▁are, ard, are, ▁ar, ...
477 ▁from 0.114112 1
30660 0.114554 1
381 us 0.115095 1 ust, ▁us, ous, ▁just, ause, ...
31734 0.115191 1
346 ly 0.115555 1 ally, ely, ▁only, ily, ually, ...
515 ia 0.11572 1 ian, ially, ential, aterial, iam, ...
330 ▁A 0.115918 1 ▁Al, ▁Ar, ▁And, ▁An, ▁As, ...
1020 EN 0.116173 1 ENT, END, MENT, ENSE, ICENSE, ...
601 ated 0.116287 1 ▁created, ▁related, dated, ▁associated, inated, ...
1014 The 0.116339 1 ▁They, ▁There, ▁Then, ▁These, There, ...
31849 0.116351 0.98
460 ▁are 0.116427 1 ▁area, ▁areas, ▁aren, ▁arena
31441 0.116846 0.041
396 ▁an 0.116887 1 ▁any, ▁another, ▁ann, ▁anything, ▁ant, ...
279 it 0.116901 1 ith, ▁it, ▁with, ity, ite, ...
31956 0.11698 0.72
31238 0.117117 0.97
1086 AL 0.117128 1 VAL, ALL, INVAL, ALSE, VALUE, ...
21399 TagHelpers 0.117675 0.99
1906 ED 0.117809 1 RED, ATED, LED, ▁ED, DED, ...
495 ive 0.117829 1 ative, ivers, ives, ived, iver, ...
31826 0.117937 0.97
486 ▁by 0.117952 1 ▁byte, ▁bytes, ▁byl, ▁był, ▁byla
734 ors 0.117982 1 ators, ctors, ▁worse, ▁errors, ▁horse, ...
31949 0.117985 0.99
557 ), 0.118079 1 (),, "),, '),, ▁),, }),, ...
360 ter 0.118231 1 ▁inter, ater, fter, tern, ▁after, ...
472 ity 0.118241 1 ility, ality, ability, ivity, ▁University, ...
1251 AN 0.118392 1 ▁AN, RAN, AND, ▁AND, ▁ANY, ...
31648 0.118425 0.97
31803 0.118735 0.96
28786 т 0.118999 1
1077 ating 0.119069 1 ▁creating, ▁dating, ▁eating, ▁operating, inating, ...
522 able 0.119093 1 ailable, ▁able, ▁table, ables, table, ...
609 ). 0.119138 1 ()., ")., ')., })., ))., ...
30654 0.119245 0.62
314 am 0.119282 1 ame, aram, ▁am, name, Name, ...
832 ON 0.119284 1 ION, CON, ▁CON, SON, ATION, ...
440 ant 0.119366 1 ▁want, ants, ante, ▁important, ▁wanted, ...
438 ▁at 0.119412 1 ▁att, ▁attack, ▁attempt, ▁attention, ▁attribute, ...
322 ot 0.119415 1 ▁not, ▁other, oth, other, ▁got, ...
424 te 0.119451 1 ite, ated, ▁te, text, ▁inter, ...
1002 ates 0.119568 1 ▁States, ▁states, ▁latest, ▁rates, dates, ...
867 les 0.119697 1 less, ▁les, ▁less, ales, ules, ...
466 ment 0.119731 1 lement, ement, ument, ments, ament, ...
313 id 0.11977 1 ide, ▁said, oid, ▁did, ▁void, ...
628 ary 0.119828 1 mary, inary, summary, uary, ibrary, ...
296 ion 0.119836 1 ation, ction, ions, ition, ations, ...
742 ings 0.119911 1 ▁things, tings, Settings, settings, ▁settings, ...
345 ▁" 0.119923 1 ▁""", ▁"\, ▁"<, ▁"/, ▁"", ...
594 ions 0.120164 1 ations, ctions, ptions, itions, ▁options, ...
31412 0.120196 0.99
532 to 0.120359 1 ▁into, ator, ton, ▁too, ustom, ...
896 RE 0.120544 1 ▁RE, REG, URE, PRE, ARE, ...
15320 ▁/***/ 0.120583 0.99
473 ine 0.120681 1 line, ines, ined, ▁line, iness, ...
31032 0.120939 0.99
31941 0.120947 0.99
412 ie 0.120967 1 ies, ient, ier, iel, ied, ...
13130 ▁aapt 0.12109 0.99
465 age 0.121304 1 essage, ages, message, ager, aged, ...
403 ▁was 0.121315 1 ▁wasn, ▁waste, ▁wash, ▁washing, ▁washed, ...
31837 0.121399 0.26
10291 ERS 0.121406 1
30890 0.121501 0.17
414 ▁\ 0.121519 1 ▁\\, ▁\,, ▁\], ▁\[, ▁\", ...
31798 0.12167 0.76
31707 0.121707 0.97
30832 🟢 0.121749 0.99
1180 LE 0.121928 1 ABLE, FILE, ULE, LECT, LEN, ...
745 ical 0.121985 1 ically, ▁political, ological, ▁physical, ▁medical, ...
308 ent 0.122047 1 ment, ient, ents, ▁ent, lement, ...
31396 0.122053 0.77
31966 0.122066 0.61
338 ch 0.122116 1 ▁ch, ich, ach, che, ▁which, ...
31802 0.122153 0.96
31741 0.122181 0.91
31015 0.122256 0.35
482 ure 0.122435 1 ures, ature, ▁sure, ured, atures, ...
315 ▁I 0.122471 1 ▁In, ▁It, ▁If, ▁Is, ▁Ind, ...
973 als 0.122507 1 alse, ▁false, ▁als, false, Equals, ...
31100 0.122557 0.99
28809 0.122612 1
324 ur 0.12263 1 our, urn, ure, turn, ▁your, ...
318 ▁S 0.122701 1 ▁St, ▁She, ▁Se, ▁Sh, ▁So, ...
378 ▁it 0.122819 1 ▁its, ▁item, ▁itself, ▁items, ▁iter, ...
1532 ters 0.12298 1 eters, ▁parameters, acters, ▁characters, Parameters, ...
1074 ts 0.122986 1 ments, ats, ets, ants, ists, ...
316 ad 0.123019 1 ▁had, ▁ad, ade, read, ▁add, ...
31737 0.123178 0.99
4033 ▁OF 0.123186 1 ▁OFF
309 il 0.123315 1 ill, ile, ail, ▁will, ild, ...
1126 ins 0.123416 1 ▁inst, ▁ins, ains, ▁against, ▁instance, ...
303 st 0.123483 1 ▁st, est, ist, ust, ost, ...
31904 0.123509 0.78
1053 ons 0.123621 1 ▁cons, ctions, ponse, ptions, ▁consider, ...
31083 0.123633 0.98
10530 ▁franç 0.12365 0.96 ▁français, ▁française
31026 0.123691 0.018
31789 0.123724 0.98
30762 ಿ 0.123829 0.17
1339 ments 0.123831 1 uments, ements, ▁elements, ▁arguments, ▁comments, ...
374 est 0.123896 1 ▁est, ▁test, ▁best, test, ▁quest, ...
1468 ets 0.123902 1 ▁gets, ▁sets, sets, lets, ▁streets, ...
4866 ATION 0.123944 1 ICATION
362 th 0.124028 1 ▁that, ith, ▁with, ▁this, ath, ...
31903 Ս 0.124043 0.99
31527 0.124076 0.99
311 ro 0.124093 1 ▁pro, rom, ▁from, rou, row, ...
31543 0.124209 1
383 um 0.124213 1 umber, ument, ▁number, umn, sum, ...
411 res 0.124427 1 ▁res, ress, ▁result, ures, ▁pres, ...
331 se 0.124455 1 ▁se, ser, ase, ose, set, ...
339 ay 0.124587 1 ays, ray, ▁may, ▁way, way, ...
2435 .” 0.124588 0.99
612 на 0.124624 1 ▁на, она, ная, зна, ▁насе, ...
506 ▁have 0.124766 1 ▁haven, ▁havet
321 im 0.124805 1 ▁im, ime, ▁him, ▁import, ▁time, ...
31252 ۆ 0.124808 0.95
28799 д 0.124939 1
31143 0.124994 0.022
26570 AtA 0.125006 1
28788 с 0.125006 1
31726 0.125012 0.89
659 ▁has 0.125035 1 ▁hash, ▁hasta, ▁hasn, ▁hast, ▁hass
391 and 0.125192 1 ▁hand, land, stand, ▁stand, ands, ...
30964 0.125303 0.99
368 ▁you 0.125345 1 ▁your, ▁young, ▁yourself, ▁youth, ▁younger, ...
300 om 0.12535 1 ▁com, rom, ▁from, ome, ▁comp, ...
31863 Մ 0.125368 1
596 ens 0.125381 1 ense, icense, ▁License, ension, ▁sense, ...
13667 *\r 0.125427 0.73 /**\r, ▁/**\r
1063 ics 0.125452 1 istics, graphics, rics, includegraphics, ▁politics, ...
491 ak 0.125464 1 ake, ▁make, reak, aking, ▁take, ...
981 ▁“ 0.125528 1
946 та 0.125536 1 ста, ▁та, ▁ста, став, ▁так, ...
1238 ures 0.125585 1 atures, ▁features, ▁pictures, ▁figures, ▁measures, ...
1332 ized 0.125613 1 ▁realized, ialized, ▁recognized, ▁organized, sized, ...
582 ▁up 0.125658 1 ▁upon, ▁update, ▁upper, ▁updated, ▁updates, ...
775 IN 0.125772 1 ING, ▁IN, INT, INE, IND, ...
333 ve 0.125829 1 ver, ave, ive, ▁have, very, ...
846 ys 0.126034 1 ystem, ways, ▁system, ▁always, ▁System, ...
400 ▁he 0.126056 1 ▁her, ▁hel, ▁here, ▁help, ▁head, ...
617 ance 0.126093 1 ances, stance, ▁instance, anced, Instance, ...
31702 ʐ 0.126202 0.99
31066 0.126212 0.97
326 ig 0.126214 1 ight, ign, fig, igh, ▁right, ...
3864 izing 0.12624 1 ▁realizing, ▁utilizing
16613 CLUD 0.126252 0.99 CLUDING, ▁INCLUDING, INCLUDING
653 ize 0.126272 1 ized, size, ▁size, Size, izer, ...
1017 OR 0.12631 1 ORT, ▁OR, ERROR, PORT, ORD, ...
31775 0.126315 0.97
864 ise 0.126318 1 ised, wise, ises, aise, ▁otherwise, ...
715 up 0.126328 1 roup, ▁sup, ▁support, ▁group, ▁super, ...
488 ard 0.12634 1 ward, ▁hard, ards, wards, ▁heard, ...
31946 0.126383 0.41
1046 its 0.126406 1 ▁itself, ▁benefits, bits, ▁units, ▁bits, ...
578 ally 0.126455 1 ually, ▁really, ially, ically, ▁actually, ...
328 ol 0.126571 1 old, oll, ool, ▁col, ▁pol, ...
323 ac 0.126586 1 ack, ace, act, ach, ▁back, ...
392 ist 0.126607 1 List, ▁dist, ▁list, ister, ists, ...
12251 ября 0.126631 0.54 ▁сентября, ▁октября, ▁ноября
2287 ▁▁▁ 0.126649 1 ▁▁▁▁▁▁▁▁▁, ▁▁▁▁▁▁▁, ▁▁▁▁▁▁▁▁▁▁▁
31196 0.126729 0.99
485 ne 0.12679 1 one, ▁one, ▁new, ener, ▁need, ...
31942 0.126844 1
575 ▁out 0.126899 1 ▁outside, ▁output, ▁outer, ▁outcome, ▁outdoor, ...
1218 ities 0.126959 1 ilities, ▁activities, abilities, ▁opportunities, ▁cities, ...
15947 BPACK 0.127024 1 WEBPACK
1905 ling 0.12703 1 elling, ▁feeling, aling, iling, bling, ...
509 ans 0.12708 1 ▁trans, trans, translation, ▁dans, ▁means, ...
520 ra 0.127153 1 aram, ray, param, ▁trans, rap, ...
1087 AR 0.127157 1 ART, ▁AR, ▁WAR, ARE, ▁WARRAN, ...
643 ry 0.127167 1 very, ory, ▁every, ery, ▁very, ...
31913 0.127172 0.95
410 op 0.12718 1 ople, ▁people, ▁op, rop, ▁open, ...
1294 man 0.127218 1 ▁human, ▁woman, ▁command, ▁performance, Command, ...
31379 0.127219 0.99
416 end 0.127239 1 ▁end, riend, pend, ▁friend, ender, ...
753 ian 0.127356 1 ians, iant, iance, iano, iana, ...
2458 ised 0.127385 1 ▁raised, ▁surprised, ▁promised, ▁advised, vised, ...
4604 ,\r 0.127394 0.88 ',\r, ",\r, },\r, ▁},\r, ),\r
31976 0.127466 0.88
30845 0.127497 0.88
418 ▁N 0.12753 1 ▁New, ▁No, ▁NULL, ▁Not, ▁Now, ...
31251 0.127541 0.44
375 ab 0.127587 1 able, ▁ab, ▁about, abel, label, ...
31048 0.127594 1
366 em 0.12775 1 ▁them, ▁em, ystem, ▁rem, lement, ...
31468 0.127791 0.98
1157 ting 0.127843 1 tings, ▁getting, itting, ▁writing, iting, ...
387 ▁- 0.127874 1 ▁--, ▁->, ▁-->, ▁-=, ▁---, ...
351 ▁M 0.127966 1 ▁Mar, ▁My, ▁Man, ▁May, ▁Me, ...
350 od 0.127973 1 ode, ▁mod, ood, ody, ▁good, ...
727 ▁time 0.128007 1 ▁times, ▁timeout, ▁timer, ▁timestamp
26292 emperaturen 0.128017 0.78 eltemperaturen
31938 0.128114 0.99
5004 izes 0.128178 1 ▁sizes, Sizes
265 he 0.128247 1 ▁the, ▁he, ▁The, hen, ▁her, ...
590 ▁they 0.128319 1
538 one 0.128335 1 ▁one, oney, ▁done, ione, ones, ...
31061 0.128346 0.69
13078 ERCHANTABILITY 0.128399 0.037 ▁MERCHANTABILITY
1006 led 0.128411 1 ▁called, ailed, illed, abled, ledge, ...
399 ▁R 0.128462 1 ▁Re, ▁Res, ▁Reg, ▁Rep, ▁Rec, ...
31486 0.128497 0.097
393 ▁L 0.128527 1 ▁Le, ▁La, ▁License, ▁Let, ▁List, ...
28778 н 0.128633 1
30765 0.128694 0.99
611 ." 0.128716 1 ...", .",, .");, ▁.", .""", ...
405 ke 0.128988 1 ake, ▁like, ▁ke, ▁make, ▁take, ...
28838 0.129001 1
20358 ):\r 0.129028 0.0018
570 ite 0.129064 1 ited, iter, item, cite, rite, ...
30460 0.129071 0.99
31053 0.129083 0.99
560 ▁In 0.129131 1 ▁Ind, ▁Inst, ▁Intern, ▁Inter, ▁Int, ...
28803 м 0.129134 1
764 ▁– 0.129159 1 ▁–,
28794 л 0.129194 1
30973 0.129196 0.99
962 AT 0.129241 1 ATE, ATION, ATA, STAT, ATH, ...
456 ▁this 0.129242 1
487 per 0.129367 1 ▁per, ▁person, ▁exper, perty, ▁oper, ...
358 ce 0.129409 1 ice, ace, ance, ence, ource, ...
377 ap 0.129471 1 app, ▁app, ▁ap, rap, apt, ...
450 de 0.129548 1 ide, ode, ▁des, ade, ▁def, ...
384 ▁D 0.129652 1 ▁De, ▁Do, ▁Des, ▁Die, ▁Dr, ...
20896 ▁Станов 0.129678 0.98 ▁Становништво
320 ▁T 0.129714 1 ▁The, ▁Th, ▁This, ▁They, ▁Tr, ...
367 ▁P 0.129763 1 ▁Pro, ▁Pl, ▁Pr, ▁Ph, ▁Par, ...
31616 0.129775 0.23
401 ▁F 0.129794 1 ▁For, ▁Fr, ▁Fl, ▁From, ▁Fin, ...
361 ir 0.129823 1 ire, ▁their, irst, ▁first, air, ...
15617 netje 0.129878 0.89 ▁beginnetje
524 ▁K 0.12992 1 ▁King, ▁Ke, ▁Kl, ▁Key, ▁Kar, ...
630 ▁she 0.129929 1 ▁shel, ▁shell, ▁sheet, ▁shelter, ▁sheets, ...
334 ▁C 0.129944 1 ▁Ch, ▁Com, ▁Con, ▁Cl, ▁Col, ...
2980 EL 0.129994 1 SELECT, ELD, FIELD, VEL, SEL, ...
370 un 0.130033 1 ▁un, ▁und, ound, ount, ▁fun, ...
1549 ants 0.130054 1 ▁wants, ▁plants, Constants, ▁participants, ▁restaurants, ...
406 out 0.130063 1 ▁out, ▁about, ▁without, outh, ayout, ...
31427 0.130108 0.45
31674 0.130137 0.64
1009 of 0.13015 1 off, ▁prof, ▁offer, ▁often, ▁soft, ...
31506 0.130166 0.92
459 ▁not 0.130188 1 ▁nothing, ▁note, ▁notice, ▁noticed, ▁notes, ...
31963 0.130235 0.95
347 ▁be 0.130351 1 ▁been, ▁bec, ▁bet, ▁because, ▁before, ...
684 ▁about 0.130444 1
28513 dentry 0.130518 1
31287 0.130541 0.96
1927 ches 0.130647 1 aches, ▁chest, ▁matches, chester, anches, ...
1079 ner 0.13065 1 ▁gener, ainer, ▁Gener, ▁general, ▁ener, ...
665 ра 0.130728 1 ▁ра, гра, ран, ▁раз, кра, ...
31489 ڈ 0.13074 0.99
1151 SE 0.130743 1 SET, ▁SE, ALSE, ENSE, ICENSE, ...
490 ge 0.130818 1 get, ▁get, essage, ange, ger, ...
31943 0.130821 1
2854 AM 0.13087 1 NAME, AME, ▁AM, PARAM, AMP, ...
782 ps 0.130919 1 aps, ips, ops, eps, roups, ...
464 ▁' 0.130929 1 ▁'./, ▁'/, ▁'@, ▁'\, ▁'<, ...
31892 0.130963 0.98
737 ▁like 0.130997 1 ▁likely, ▁liked, ▁likes, ▁likelihood, ▁likewise
420 ▁G 0.131026 1 ▁Gr, ▁Ge, ▁Gu, ▁Get, ▁God, ...
547 ide 0.131036 1 ident, ider, ▁ide, ▁consider, ides, ...
426 ain 0.131044 1 ▁again, ains, aint, ained, aining, ...
25931 tcx 0.131061 1
28797 é 0.131065 1
1702 ION 0.131098 1 ATION, CTION, ITION, SION, VERSION, ...
2094 ET 0.131159 1 SET, GET, NET, RET, LETE, ...
1449 ats 0.131172 1 stats, aats, Stats, ▁seats, ▁stats, ...
1237 the 0.131179 1 ▁another, ither, thers, ▁either, ▁together, ...
31865 ٔ 0.131206 0.98
31880 0.131208 0.57
535 ice 0.131255 1 ices, icense, ervice, ▁License, Service, ...
425 ill 0.131312 1 ▁will, ▁still, ille, ▁mill, illed, ...
2252 US 0.131314 1 ▁USA, USE, STATUS, UST, USER, ...
1190 els 0.131319 1 else, ▁models, ▁levels, annels, ▁els, ...
1791 ▁To 0.131338 1 ▁Tom, ▁Tor, ▁Tod, ▁Top, ▁Tour, ...
371 ▁{ 0.131359 1 ▁{\r, ▁{\, ▁{}, ▁{@, ▁{", ...
382 ▁H 0.131491 1 ▁He, ▁How, ▁His, ▁Her, ▁However, ...
336 ow 0.131509 1 own, row, ▁know, ▁how, ▁now, ...
3728 VE 0.131573 1 IVE, VERSION, EVENT, VEL, ACTIVE, ...
365 ▁B 0.131575 1 ▁But, ▁Be, ▁Br, ▁Bl, ▁By, ...
1263 ▁For 0.131587 1 ▁Form, ▁Fort, ▁Fore, ▁Force, ▁Ford, ...
4896 GE 0.131609 1 GET, AGE, GER, GEN, MAGE, ...
290 ▁m 0.131631 1 ▁me, ▁my, ▁man, ▁more, ▁mod, ...
31985 0.131673 0.87
7148 sembly 0.131687 1 ▁Assembly, ▁assembly, assembly, Assembly
344 ); 0.131695 1 ();, ");, ));, ());, );\r, ...
1225 да 0.13171 1 ▁года, ▁да, дар, зда, жда, ...
31266 0.131769 1
31627 0.131782 0.45
343 ver 0.131832 1 very, vers, ▁over, ▁every, ▁very, ...
31190 0.13189 0.8
1224 ти 0.131916 1 сти, тив, ности, ▁ти, кти, ...
28773 е 0.131963 1
31532 0.131977 0.96
661 ▁It 0.131991 1 ▁Ital, ▁Its, ▁Italian, ▁Italy, ▁Item, ...
31994 ٓ 0.132015 0.98
31276 0.132038 0.86
31355 0.132046 0.17
1308 ctions 0.13208 0.96 lections, actions, ▁functions, ▁actions, ructions, ...
31227 ి 0.132104 0.018
31679 0.132107 0.95
1520 na 0.132145 1 ▁una, ▁na, ination, ana, nal, ...
1100 ta 0.132151 1 ▁start, ▁data, Data, ▁take, eta, ...
1392 for 0.132172 1 formation, ▁information, ▁perform, ▁fore, fort, ...
31857 0.132173 0.98
31487 0.132192 0.82
1851 IS 0.13221 1 ▁IS, IST, ▁ISBN, DIS, LIST, ...
470 () 0.132223 1 ();, ()., (),, ());, ()), ...
6991 ▁TO 0.132268 1 ▁TODO
1523 ler 0.132322 1 roller, Handler, eller, Controller, iler, ...
492 are 0.132325 1 ared, arent, ▁care, ware, ▁parent, ...
28795 к 0.132337 1
1604 IC 0.132392 1 ICE, ICENSE, LICENSE, DEVICE, ▁PARTIC, ...
2047 ley 0.132392 1 ▁Valley, iley, ▁valley, ▁Stanley, keley, ...
30208 0.132457 0.99
586 ▁my 0.132488 1 ▁myself, ▁myst, ▁myth, ▁myster, ▁mystery, ...
691 io 0.132505 1 ations, ption, ▁function, ious, ational, ...
2178 ards 0.132509 1 wards, ▁towards, ▁cards, ▁standards, ▁Awards, ...
1033 ms 0.132544 1 ▁himself, ▁terms, params, ▁themselves, msg, ...
1168 ars 0.132569 1 ▁years, parse, ears, ▁parse, ▁stars, ...
567 ▁& 0.132617 1 ▁&&, ▁&=, ▁&\, ▁&#, ▁&=&, ...
884 ty 0.13262 1 type, ▁type, ility, perty, ality, ...
30264 0.132646 0.99
1265 son 0.13265 1 ▁person, ason, ▁son, ison, ▁reason, ...
624 ▁one 0.132652 1 ▁ones
357 ag 0.132709 1 age, ▁ag, essage, ▁again, ages, ...
28811 я 0.132718 1
329 ut 0.132761 1 out, ▁but, ▁out, ▁about, put, ...
1403 by 0.132781 1 aby, byte, ▁baby, bytes, ▁byte, ...
373 ri 0.132802 1 ring, rib, riv, tring, rit, ...
31421 0.132845 0.022
513 ▁if 0.132881 1
31886 0.132884 0.97
455 all 0.132913 1 ▁all, ally, ▁call, ually, ▁really, ...
541 ▁can 0.132934 1 ▁cannot, ▁candid, ▁cant, ▁cancer, ▁candidate, ...
30078 0.133033 0.99
749 ier 0.133064 1 iers, ifier, rier, arlier, ▁earlier, ...
277 ▁c 0.133078 1 ▁con, ▁com, ▁ch, ▁cl, ▁can, ...
748 ays 0.133146 1 ways, ▁always, ▁days, ▁says, ▁ways, ...
1536 time 0.133172 1 ▁times, times, ▁sometimes, etime, ometimes, ...
1837 ization 0.133187 1 ▁organization, ▁organizations, izations, ▁optimization, Serialization, ...
478 ▁we 0.133297 1 ▁were, ▁well, ▁week, ▁went, ▁wer, ...
949 ▁don 0.13331 1 ▁done, ▁dont, ▁donde, ▁donc, ▁donne
28785 р 0.133313 1
31522 0.133328 0.52
429 ▁$ 0.133353 1 ▁$\, ▁$(, ▁${\, ▁$$, ▁${, ...
636 ence 0.13338 1 ience, ference, ences, ▁experience, idence, ...
31910 0.133387 0.93
1303 ines 0.133392 1 iness, ▁business, ▁lines, inese, ▁Chinese, ...
1291 ages 0.133404 1 ▁images, ▁pages, ▁messages, essages, pages, ...
30073 0.133408 1
789 ish 0.133418 1 ished, lish, lished, ▁English, ▁British, ...
516 ▁his 0.133464 1 ▁hist, ▁history, ▁histor, ▁historical, ▁historic, ...
4529 ▁Of 0.133479 1 ▁Office, ▁Officer, ▁Often, ▁Official
402 oc 0.133513 1 ock, ▁loc, lock, loc, ▁process, ...
550 ▁V 0.133551 1 ▁Ver, ▁Val, ▁Vol, ▁Vir, ▁Vis, ...
3836 AY 0.133658 1 RAY, LAY, PLAY, ARRAY, DAY, ...
574 ▁your 0.133756 1 ▁yourself, ▁yours
657 In 0.133758 1 ▁Ind, Info, Ind, Index, Int, ...
1000 ла 0.133809 1 кла, ▁обла, лав, лан, лась, ...
30406 0.133816 0.048
1081 line 0.13399 1 ▁line, ▁online, eline, ▁lines, overline, ...
31666 0.134068 1
3148 ATE 0.134107 1 STATE, DATE, ATED, UPDATE, CREATE, ...
30991 0.134132 0.99
442 ▁or 0.134153 1 ▁order, ▁organ, ▁orig, ▁org, ▁original, ...
826 der 0.13416 1 ▁der, ▁under, ider, ▁order, ▁consider, ...
30114 0.134198 1
317 ▁e 0.134231 1 ▁ex, ▁en, ▁el, ▁ev, ▁em, ...
536 ire 0.134315 1 irect, ired, ▁direct, ▁require, ▁required, ...
705 ma 0.134337 1 math, ▁may, ▁make, ▁made, ▁many, ...
1841 AD 0.134364 1 READ, ▁AD, LOAD, ADD, ADDR, ...
441 ue 0.134383 1 alue, ▁que, que, ▁true, ues, ...
920 ST 0.134391 1 STR, ▁ST, STAT, EST, IST, ...
31848 0.13446 0.89
30880 0.134461 1
284 ▁p 0.134489 1 ▁pro, ▁pl, ▁per, ▁pre, ▁pr, ...
529 ast 0.134496 1 ▁last, aster, ▁least, ▁past, cast, ...
917 ка 0.134522 1 ▁ка, ска, ника, ская, ▁как, ...
22801 crtc 0.134546 0.99
1038 ▁make 0.134567 1 ▁makes, ▁makeup, ▁maker

Byte tokens

143 entries below threshold of 0.015

token_id token indicator ord hex byte_type reencoded
4 <0x01> 0 1 0x01 ascii 29534: \x01
5 <0x02> 0 2 0x02 ascii 30551: \x02
6 <0x03> 0 3 0x03 ascii 30662: \x03
7 <0x04> 0 4 0x04 ascii 30724: \x04
8 <0x05> 0 5 0x05 ascii 30550: \x05
9 <0x06> 0 6 0x06 ascii 30314: \x06
10 <0x07> 0 7 0x07 ascii 30963: \x07
11 <0x08> 0 8 0x08 ascii 31129: \x08
14 <0x0B> 0 11 0x0B ascii 30638: \x0b
15 <0x0C> 0 12 0x0C ascii 29683: \x0c
16 <0x0D> 0 13 0x0D ascii 28801: \r
17 <0x0E> 0 14 0x0E ascii 30517: \x0e
18 <0x0F> 0 15 0x0F ascii 30698: \x0f
19 <0x10> 0 16 0x10 ascii 30388: \x10
20 <0x11> 0 17 0x11 ascii 30557: \x11
21 <0x12> 0 18 0x12 ascii 30298: \x12
22 <0x13> 0 19 0x13 ascii 30453: \x13
23 <0x14> 0 20 0x14 ascii 30721: \x14
24 <0x15> 0 21 0x15 ascii 30675: \x15
25 <0x16> 0 22 0x16 ascii 30935: \x16
123 additional entries below threshold
token_id token indicator ord hex byte_type reencoded
26 <0x17> 0 23 0x17 ascii 30841: \x17
27 <0x18> 0 24 0x18 ascii 30555: \x18
28 <0x19> 0 25 0x19 ascii 30969: \x19
29 <0x1A> 0 26 0x1A ascii 30759: \x1a
30 <0x1B> 0 27 0x1B ascii 30246: \x1b
31 <0x1C> 0 28 0x1C ascii 31134: \x1c
32 <0x1D> 0 29 0x1D ascii 31236: \x1d
33 <0x1E> 0 30 0x1E ascii 31150: \x1e
34 <0x1F> 0 31 0x1F ascii 31217: \x1f
35 <0x20> 0 32 0x20 ascii 28705:
36 <0x21> 0 33 0x21 ascii 28808: !
37 <0x22> 0 34 0x22 ascii 28739: "
38 <0x23> 0 35 0x23 ascii 28771: #
39 <0x24> 0 36 0x24 ascii 28776: $
40 <0x25> 0 37 0x25 ascii 28823: %
41 <0x26> 0 38 0x26 ascii 28800: &
42 <0x27> 0 39 0x27 ascii 28742: '
43 <0x28> 0 40 0x28 ascii 28732: (
44 <0x29> 0 41 0x29 ascii 28731: )
45 <0x2A> 0 42 0x2A ascii 28736: *
46 <0x2B> 0 43 0x2B ascii 28806: +
47 <0x2C> 0 44 0x2C ascii 28725: ,
48 <0x2D> 0 45 0x2D ascii 28733: -
49 <0x2E> 0 46 0x2E ascii 28723: .
50 <0x2F> 0 47 0x2F ascii 28748: /
51 <0x30> 0 48 0x30 ascii 28734: 0
52 <0x31> 0 49 0x31 ascii 28740: 1
53 <0x32> 0 50 0x32 ascii 28750: 2
54 <0x33> 0 51 0x33 ascii 28770: 3
55 <0x34> 0 52 0x34 ascii 28781: 4
56 <0x35> 0 53 0x35 ascii 28782: 5
57 <0x36> 0 54 0x36 ascii 28784: 6
58 <0x37> 0 55 0x37 ascii 28787: 7
59 <0x38> 0 56 0x38 ascii 28783: 8
60 <0x39> 0 57 0x39 ascii 28774: 9
61 <0x3A> 0 58 0x3A ascii 28747: :
62 <0x3B> 0 59 0x3B ascii 28745: ;
63 <0x3C> 0 60 0x3C ascii 28789: <
64 <0x3D> 0 61 0x3D ascii 28746: =
65 <0x3E> 0 62 0x3E ascii 28767: >
66 <0x3F> 0 63 0x3F ascii 28804: ?
67 <0x40> 0 64 0x40 ascii 28818: @
68 <0x41> 0 65 0x41 ascii 28741: A
69 <0x42> 0 66 0x42 ascii 28760: B
70 <0x43> 0 67 0x43 ascii 28743: C
71 <0x44> 0 68 0x44 ascii 28757: D
72 <0x45> 0 69 0x45 ascii 28749: E
73 <0x46> 0 70 0x46 ascii 28765: F
74 <0x47> 0 71 0x47 ascii 28777: G
75 <0x48> 0 72 0x48 ascii 28769: H
76 <0x49> 0 73 0x49 ascii 28737: I
77 <0x4A> 0 74 0x4A ascii 28798: J
78 <0x4B> 0 75 0x4B ascii 28796: K
79 <0x4C> 0 76 0x4C ascii 28758: L
80 <0x4D> 0 77 0x4D ascii 28755: M
81 <0x4E> 0 78 0x4E ascii 28759: N
82 <0x4F> 0 79 0x4F ascii 28762: O
83 <0x50> 0 80 0x50 ascii 28753: P
84 <0x51> 0 81 0x51 ascii 28824: Q
85 <0x52> 0 82 0x52 ascii 28754: R
86 <0x53> 0 83 0x53 ascii 28735: S
87 <0x54> 0 84 0x54 ascii 28738: T
88 <0x55> 0 85 0x55 ascii 28779: U
89 <0x56> 0 86 0x56 ascii 28790: V
90 <0x57> 0 87 0x57 ascii 28780: W
91 <0x58> 0 88 0x58 ascii 28814: X
92 <0x59> 0 89 0x59 ascii 28802: Y
93 <0x5A> 0 90 0x5A ascii 28828: Z
94 <0x5B> 0 91 0x5B ascii 28792: [
95 <0x5C> 0 92 0x5C ascii 28756: \
96 <0x5D> 0 93 0x5D ascii 28793: ]
97 <0x5E> 0 94 0x5E ascii 28815: ^
98 <0x5F> 0 95 0x5F ascii 28730: _
99 <0x60> 0 96 0x60 ascii 28832: `
100 <0x61> 0 97 0x61 ascii 28708: a
101 <0x62> 0 98 0x62 ascii 28726: b
102 <0x63> 0 99 0x63 ascii 28717: c
103 <0x64> 0 100 0x64 ascii 28715: d
104 <0x65> 0 101 0x65 ascii 28706: e
105 <0x66> 0 102 0x66 ascii 28722: f
106 <0x67> 0 103 0x67 ascii 28721: g
107 <0x68> 0 104 0x68 ascii 28716: h
108 <0x69> 0 105 0x69 ascii 28710: i
109 <0x6A> 0 106 0x6A ascii 28768: j
110 <0x6B> 0 107 0x6B ascii 28729: k
111 <0x6C> 0 108 0x6C ascii 28714: l
112 <0x6D> 0 109 0x6D ascii 28719: m
113 <0x6E> 0 110 0x6E ascii 28711: n
114 <0x6F> 0 111 0x6F ascii 28709: o
115 <0x70> 0 112 0x70 ascii 28720: p
116 <0x71> 0 113 0x71 ascii 28775: q
117 <0x72> 0 114 0x72 ascii 28712: r
118 <0x73> 0 115 0x73 ascii 28713: s
119 <0x74> 0 116 0x74 ascii 28707: t
120 <0x75> 0 117 0x75 ascii 28718: u
121 <0x76> 0 118 0x76 ascii 28728: v
122 <0x77> 0 119 0x77 ascii 28727: w
123 <0x78> 0 120 0x78 ascii 28744: x
124 <0x79> 0 121 0x79 ascii 28724: y
125 <0x7A> 0 122 0x7A ascii 28764: z
126 <0x7B> 0 123 0x7B ascii 28751: {
127 <0x7C> 0 124 0x7C ascii 28766: |
128 <0x7D> 0 125 0x7D ascii 28752: }
129 <0x7E> 0 126 0x7E ascii 28845: ~
130 <0x7F> 0 127 0x7F ascii 30982: \x7f
195 <0xC0> 0 192 0xC0 unused_utf8
196 <0xC1> 0 193 0xC1 unused_utf8
197 <0xC2> 0 194 0xC2 utf8
198 <0xC3> 0 195 0xC3 utf8
248 <0xF5> 0 245 0xF5 unused_utf8
249 <0xF6> 0 246 0xF6 unused_utf8
250 <0xF7> 0 247 0xF7 unused_utf8
251 <0xF8> 0 248 0xF8 unused_utf8
252 <0xF9> 0 249 0xF9 unused_utf8
253 <0xFA> 0 250 0xFA unused_utf8
254 <0xFB> 0 251 0xFB unused_utf8
255 <0xFC> 0 252 0xFC unused_utf8
256 <0xFD> 0 253 0xFD unused_utf8
257 <0xFE> 0 254 0xFE unused_utf8
258 <0xFF> 0 255 0xFF unused_utf8
31134 \x1c 0.0109402 28 0x1C ascii
31150 \x1e 0.0117701 30 0x1E ascii
31236 \x1d 0.0138711 29 0x1D ascii
237 additional entries above threshold
token_id token indicator ord hex byte_type
29683 \x0c 0.0179352 12 0x0C ascii
30638 \x0b 0.0249671 11 0x0B ascii
244 <0xF1> 0.0750657 241 0xF1 utf8
28713 s 0.0851114 115 0x73 ascii
245 <0xF2> 0.0851321 242 0xF2 utf8
28723 . 0.0874161 46 0x2E ascii
28740 1 0.0899923 49 0x31 ascii
28725 , 0.0932242 44 0x2C ascii
28750 2 0.0956001 50 0x32 ascii
28724 y 0.0968967 121 0x79 ascii
28731 ) 0.0975677 41 0x29 ascii
28735 S 0.0978537 83 0x53 ascii
28708 a 0.100198 97 0x61 ascii
28734 0 0.100932 48 0x30 ascii
28707 t 0.10118 116 0x74 ascii
28706 e 0.101341 101 0x65 ascii
28710 i 0.101846 105 0x69 ascii
28747 : 0.10209 58 0x3A ascii
28802 Y 0.103663 89 0x59 ascii
30841 \x17 0.104708 23 0x17 ascii
28709 o 0.105572 111 0x6F ascii
31217 \x1f 0.106343 31 0x1F ascii
28719 m 0.106589 109 0x6D ascii
28741 A 0.107557 65 0x41 ascii
30453 \x13 0.10797 19 0x13 ascii
28770 3 0.108155 51 0x33 ascii
30935 \x16 0.108467 22 0x16 ascii
28738 T 0.108858 84 0x54 ascii
28757 D 0.108986 68 0x44 ascii
30298 \x12 0.109009 18 0x12 ascii
28711 n 0.109022 110 0x6E ascii
28715 d 0.109509 100 0x64 ascii
30675 \x15 0.109537 21 0x15 ascii
28717 c 0.109739 99 0x63 ascii
28749 E 0.110014 69 0x45 ascii
28733 - 0.110577 45 0x2D ascii
28720 p 0.110955 112 0x70 ascii
28781 4 0.110961 52 0x34 ascii
30517 \x0e 0.111379 14 0x0E ascii
28782 5 0.111676 53 0x35 ascii
28718 u 0.111794 117 0x75 ascii
13 <0x0A> 0.111866 10 0x0A ascii
30698 \x0f 0.111867 15 0x0F ascii
28742 ' 0.111903 39 0x27 ascii
28714 l 0.112549 108 0x6C ascii
28729 k 0.112961 107 0x6B ascii
30314 \x06 0.112971 6 0x06 ascii
30721 \x14 0.113171 20 0x14 ascii
28712 r 0.113257 114 0x72 ascii
28762 O 0.113382 79 0x4F ascii
28755 M 0.114244 77 0x4D ascii
28764 z 0.114876 122 0x7A ascii
28743 C 0.114906 67 0x43 ascii
28726 b 0.115131 98 0x62 ascii
28756 \ 0.115259 92 0x5C ascii
30557 \x11 0.115686 17 0x11 ascii
28722 f 0.11596 102 0x66 ascii
28784 6 0.11648 54 0x36 ascii
28721 g 0.116486 103 0x67 ascii
28769 H 0.11662 72 0x48 ascii
28744 x 0.117057 120 0x78 ascii
28759 N 0.117523 78 0x4E ascii
28737 I 0.117738 73 0x49 ascii
28783 8 0.117775 56 0x38 ascii
28716 h 0.118067 104 0x68 ascii
28765 F 0.118144 70 0x46 ascii
30724 \x04 0.118335 4 0x04 ascii
28739 " 0.118492 34 0x22 ascii
28745 ; 0.118721 59 0x3B ascii
28753 P 0.118816 80 0x50 ascii
28758 L 0.119115 76 0x4C ascii
28730 _ 0.11997 95 0x5F ascii
28777 G 0.120017 71 0x47 ascii
28793 ] 0.120238 93 0x5D ascii
28796 K 0.120359 75 0x4B ascii
28728 v 0.120778 118 0x76 ascii
28732 ( 0.12091 40 0x28 ascii
28727 w 0.121242 119 0x77 ascii
28754 R 0.121461 82 0x52 ascii
28751 { 0.121626 123 0x7B ascii
30969 \x19 0.121797 25 0x19 ascii
28767 > 0.121902 62 0x3E ascii
28774 9 0.122193 57 0x39 ascii
28787 7 0.122342 55 0x37 ascii
28760 B 0.122497 66 0x42 ascii
28779 U 0.122887 85 0x55 ascii
28752 } 0.12315 125 0x7D ascii
28780 W 0.123712 87 0x57 ascii
30963 \x07 0.124607 7 0x07 ascii
28828 Z 0.12497 90 0x5A ascii
28790 V 0.125895 86 0x56 ascii
224 <0xDD> 0.126424 221 0xDD utf8
28768 j 0.126452 106 0x6A ascii
28748 / 0.126572 47 0x2F ascii
28804 ? 0.127466 63 0x3F ascii
28814 X 0.128276 88 0x58 ascii
12 <0x09> 0.129375 9 0x09 ascii
28736 * 0.129478 42 0x2A ascii
30555 \x18 0.130593 24 0x18 ascii
28808 ! 0.130781 33 0x21 ascii
30388 \x10 0.131677 16 0x10 ascii
30550 \x05 0.131906 5 0x05 ascii
30662 \x03 0.131952 3 0x03 ascii
28775 q 0.133587 113 0x71 ascii
28771 # 0.133623 35 0x23 ascii
28798 J 0.134013 74 0x4A ascii
28776 $ 0.13496 36 0x24 ascii
28792 [ 0.135907 91 0x5B ascii
28746 = 0.136915 61 0x3D ascii
226 <0xDF> 0.138305 223 0xDF utf8
28824 Q 0.140546 81 0x51 ascii
225 <0xDE> 0.140859 222 0xDE utf8
30246 \x1b 0.142965 27 0x1B ascii
30551 \x02 0.14327 2 0x02 ascii
28832 ` 0.143591 96 0x60 ascii
28766 | 0.144523 124 0x7C ascii
28789 < 0.144997 60 0x3C ascii
247 <0xF4> 0.145228 244 0xF4 utf8
3 <0x00> 0.146657 0x00 ascii
28806 + 0.146756 43 0x2B ascii
28815 ^ 0.148671 94 0x5E ascii
29534 \x01 0.148759 1 0x01 ascii
131 <0x80> 0.149107 128 0x80 utf8
28845 ~ 0.151332 126 0x7E ascii
223 <0xDC> 0.151333 220 0xDC utf8
30759 \x1a 0.152042 26 0x1A ascii
163 <0xA0> 0.153763 160 0xA0 utf8
233 <0xE6> 0.154005 230 0xE6 utf8
28823 % 0.1573 37 0x25 ascii
28800 & 0.158935 38 0x26 ascii
28818 @ 0.160446 64 0x40 ascii
175 <0xAC> 0.160779 172 0xAC utf8
179 <0xB0> 0.161513 176 0xB0 utf8
147 <0x90> 0.161731 144 0x90 utf8
139 <0x88> 0.161808 136 0x88 utf8
152 <0x95> 0.162132 149 0x95 utf8
178 <0xAF> 0.162318 175 0xAF utf8
229 <0xE2> 0.162322 226 0xE2 utf8
167 <0xA4> 0.162684 164 0xA4 utf8
137 <0x86> 0.162724 134 0x86 utf8
159 <0x9C> 0.162936 156 0x9C utf8
184 <0xB5> 0.163269 181 0xB5 utf8
180 <0xB1> 0.163326 177 0xB1 utf8
155 <0x98> 0.163463 152 0x98 utf8
174 <0xAB> 0.163657 171 0xAB utf8
219 <0xD8> 0.164266 216 0xD8 utf8
161 <0x9E> 0.164447 158 0x9E utf8
168 <0xA5> 0.164498 165 0xA5 utf8
173 <0xAA> 0.16468 170 0xAA utf8
140 <0x89> 0.164763 137 0x89 utf8
134 <0x83> 0.164894 131 0x83 utf8
144 <0x8D> 0.164988 141 0x8D utf8
169 <0xA6> 0.164992 166 0xA6 utf8
181 <0xB2> 0.165137 178 0xB2 utf8
171 <0xA8> 0.165476 168 0xA8 utf8
177 <0xAE> 0.165546 174 0xAE utf8
135 <0x84> 0.165587 132 0x84 utf8
138 <0x87> 0.165841 135 0x87 utf8
187 <0xB8> 0.166033 184 0xB8 utf8
188 <0xB9> 0.16611 185 0xB9 utf8
211 <0xD0> 0.166134 208 0xD0 utf8
157 <0x9A> 0.166502 154 0x9A utf8
183 <0xB4> 0.166523 180 0xB4 utf8
148 <0x91> 0.166652 145 0x91 utf8
191 <0xBC> 0.166796 188 0xBC utf8
143 <0x8C> 0.167209 140 0x8C utf8
166 <0xA3> 0.167269 163 0xA3 utf8
154 <0x97> 0.167477 151 0x97 utf8
176 <0xAD> 0.16764 173 0xAD utf8
227 <0xE0> 0.167941 224 0xE0 utf8
170 <0xA7> 0.168105 167 0xA7 utf8
189 <0xBA> 0.168228 186 0xBA utf8
164 <0xA1> 0.168339 161 0xA1 utf8
185 <0xB6> 0.168706 182 0xB6 utf8
141 <0x8A> 0.168821 138 0x8A utf8
162 <0x9F> 0.168885 159 0x9F utf8
132 <0x81> 0.169029 129 0x81 utf8
158 <0x9B> 0.169218 155 0x9B utf8
172 <0xA9> 0.169241 169 0xA9 utf8
182 <0xB3> 0.169279 179 0xB3 utf8
190 <0xBB> 0.16954 187 0xBB utf8
145 <0x8E> 0.169548 142 0x8E utf8
193 <0xBE> 0.169603 190 0xBE utf8
153 <0x96> 0.169723 150 0x96 utf8
234 <0xE7> 0.169944 231 0xE7 utf8
136 <0x85> 0.170296 133 0x85 utf8
232 <0xE5> 0.170374 229 0xE5 utf8
151 <0x94> 0.170403 148 0x94 utf8
160 <0x9D> 0.170445 157 0x9D utf8
186 <0xB7> 0.17049 183 0xB7 utf8
142 <0x8B> 0.170869 139 0x8B utf8
150 <0x93> 0.171165 147 0x93 utf8
31129 \x08 0.171379 8 0x08 ascii
149 <0x92> 0.171539 146 0x92 utf8
192 <0xBD> 0.171675 189 0xBD utf8
133 <0x82> 0.171695 130 0x82 utf8
156 <0x99> 0.171773 153 0x99 utf8
212 <0xD1> 0.171788 209 0xD1 utf8
235 <0xE8> 0.171824 232 0xE8 utf8
239 <0xEC> 0.172081 236 0xEC utf8
221 <0xDA> 0.172186 218 0xDA utf8
165 <0xA2> 0.172571 162 0xA2 utf8
215 <0xD4> 0.172914 212 0xD4 utf8
146 <0x8F> 0.173752 143 0x8F utf8
194 <0xBF> 0.173923 191 0xBF utf8
243 <0xF0> 0.174015 240 0xF0 utf8
238 <0xEB> 0.175265 235 0xEB utf8
228 <0xE1> 0.17592 225 0xE1 utf8
213 <0xD2> 0.177153 210 0xD2 utf8
240 <0xED> 0.178534 237 0xED utf8
216 <0xD5> 0.179047 213 0xD5 utf8
237 <0xEA> 0.179785 234 0xEA utf8
220 <0xD9> 0.179819 217 0xD9 utf8
246 <0xF3> 0.179979 243 0xF3 utf8
236 <0xE9> 0.181481 233 0xE9 utf8
218 <0xD7> 0.181902 215 0xD7 utf8
214 <0xD3> 0.18501 211 0xD3 utf8
242 <0xEF> 0.185461 239 0xEF utf8
231 <0xE4> 0.185838 228 0xE4 utf8
217 <0xD6> 0.185873 214 0xD6 utf8
201 <0xC6> 0.186931 198 0xC6 utf8
203 <0xC8> 0.187043 200 0xC8 utf8
222 <0xDB> 0.18706 219 0xDB utf8
202 <0xC7> 0.189307 199 0xC7 utf8
208 <0xCD> 0.191241 205 0xCD utf8
209 <0xCE> 0.191477 206 0xCE utf8
204 <0xC9> 0.193493 201 0xC9 utf8
206 <0xCB> 0.195525 203 0xCB utf8
230 <0xE3> 0.196557 227 0xE3 utf8
241 <0xEE> 0.196669 238 0xEE utf8
207 <0xCC> 0.197093 204 0xCC utf8
210 <0xCF> 0.198095 207 0xCF utf8
199 <0xC4> 0.198415 196 0xC4 utf8
200 <0xC5> 0.198902 197 0xC5 utf8
205 <0xCA> 0.1997 202 0xCA utf8
30982 \x7f 0.199901 127 0x7F ascii
28801 \r 0.208373 13 0x0D ascii

Special tokens

1 entries below threshold of 0.015

token_id token indicator max_prob
0 <unk> 0 1e-07
2 additional entries above threshold
token_id token indicator max_prob
2 </s> 0.0777459 0.019
1 <s> 0.144863