Skip to content

Latest commit

 

History

History
971 lines (957 loc) · 307 KB

meta_llama_Llama_2_7b_hf.md

File metadata and controls

971 lines (957 loc) · 307 KB

Report for meta-llama/Llama-2-7b-hf

Model info

  • Model Info:
    • Tied embeddings: False
    • LM head uses bias: False
    • Embeddings shape: [32000, 4096]
  • Tokenizer Info:
    • Vocab Size: 32000
    • Tokenizer Class: LlamaTokenizer
    • Tokenizer Type: BPE
    • Bytes handling: Byte Fallback
    • Token for verification prompt building: springframework
    • Token id for verification prompt building: 6688
  • Indicator summary:
    • Indicator for under-trained tokens: E_{in} L2 Norm
    • Overall distribution: 1.080 +/- 0.109
  • Detected Token Counts:
    • Number of tested under-trained tokens: 639, 551 non-special, 19 below p = 0.01 threshold, 11 below soft indicator threshold
    • Number of single byte tokens: 351, of which 115 below indicator threshold
    • Number of special tokens: 0, of which 0 below indicator threshold

Under-trained token indicators plot

Indicators scatter plots

Verification plot

Verification plot

Under-trained token verification results

11 entries below threshold of 0.232

token_id token indicator max_prob in_other_tokens
28574 ▁Mediabestanden 0.0286962 1.5e-08
20609 ▁Portály 0.0955797 1.2e-06
3798 oreferrer 0.112781 2e-05 ▁noreferrer, noreferrer
12731 ederbörd 0.114027 3.6e-06 ▁nederbörd, nederbörd, ▁årsnederbörd
28354 ▁Расподела 0.135537 0.00013
28633 nederbörd 0.154854 0.0024 ▁årsnederbörd
31664 ߬ 0.175596 0.015
23313 Obrázky 0.202286 0.46
11193 ▁Normdaten 0.209872 0.034
12882 ITableView 0.211368 0.91 ▁UITableView, UITableView
9831 ▁челов 0.228664 0.95 ▁человек, ▁челове, ▁человека
540 additional entries above threshold
token_id token indicator max_prob in_other_tokens
28642 ▁regnig 0.23206 0.0012 ▁regnigaste
19539 ▁demsel 0.236618 9.8e-05 ▁demselben
28623 ▁Genomsnitt 0.238265 0.011 ▁Genomsnittlig
30772 0.253437 0.98
20486 tatywna 0.266416 0.0047 ▁autorytatywna
31899 0.276749 0.84
31477 0.278053 0.88
16056 љашње 0.378021 0.0092 ▁Спољашње
31772 0.385905 0.99
10688 ▁gepublic 0.391154 0.061 ▁gepubliceerd
31926 𝓝 0.398527 0.79
18596 ципа 0.400826 0.99 ніципа, ніципалі, ▁муніципалі, ниципа, ▁муниципа
31483 0.401468 1
12472 ateien 0.418931 0.36 ▁Audiodateien
30994 𝕜 0.424236 0.74
31884 ѫ 0.436651 0.99
31591 0.460256 0.98
278 ▁the 0.466007 1 ▁there, ▁then, ▁they, ▁them, ▁their, ...
31589 0.467678 1
16013 temperaturen 0.474151 1 eltemperaturen
27706 ]{' 0.49282 0.92
20448 ▁Kontrola 0.496692 0.99
304 ▁to 0.496999 1 ▁tot, ▁too, ▁top, ▁total, ▁took, ...
9236 ▁pobla 0.501446 0.68 ▁población, ▁població
16196 textt 0.502152 1 texttt
310 ▁of 0.503424 1 ▁off, ▁offic, ▁often, ▁offer, ▁official, ...
297 ▁in 0.510731 1 ▁inst, ▁int, ▁into, ▁inter, ▁incl, ...
27660 ckså 0.512958 0.012 ▁också
23767 egyzetek 0.514052 0.055 ▁Jegyzetek
29871 0.52217 1
21042 lês 0.525851 0.89 ▁inglês
2639 Portail 0.537209 1
322 ▁and 0.539863 1 ▁android, ▁andere, ▁anderen, ▁anderem, ▁andra
21721 archivi 0.540432 1 archiviato
27600 prilis 0.544835 0.89 ▁április
26335 llaços 0.544876 0.066 ▁Enllaços
12479 ▁Audiod 0.546042 1 ▁Audiodateien
31216 0.546382 0.98
28294 usztus 0.552103 0.98 ▁augusztus
11766 хівовано 0.556475 0.021 Архівовано
26498 >\<^ 0.563593 0.82
31626 0.563609 0.98
263 ▁a 0.566613 1 ▁and, ▁an, ▁al, ▁as, ▁at, ...
16916 ▁invån 0.579019 0.07 ▁invånare
30010 0.582847 1
23441 któber 0.585471 0.066 ▁október
30267 0.591219 1
30841 0.596323 0.98
31528 0.600201 1
7368 ября 0.601386 0.059 ▁сентября, ▁октября, ▁ноября
30935 0.610601 0.99
31336 ʐ 0.613597 0.92
338 ▁is 0.614278 1 ▁iss, ▁ist, ▁issue, ▁isn, ▁issues, ...
20072 ywna 0.62077 0.9 tatywna, ▁autorytatywna
31625 0.622474 0.33
6663 ▁Einzelnach 0.622708 0.062 ▁Einzelnachweise
393 ▁that 0.624136 1 ▁thats
363 ▁for 0.625213 1 ▁form, ▁format, ▁former, ▁force, ▁fort, ...
31771 0.6255 1
31489 0.626007 0.99
20696 adratkil 0.628716 0.033 adratkilometer
31586 0.62956 0.98
408 ▁as 0.629896 1 ▁ass, ▁ask, ▁assign, ▁assum, ▁associ, ...
10553 \<^ 0.631176 0.98 >\<^
31806 ӏ 0.638546 0.95
313 ▁( 0.6398 1 ▁(*, ▁($, ▁(\, ▁(), ▁(", ...
856 ... 0.640126 1 ▁..., ...., ........, ...), ▁...., ...
31956 0.641976 0.97
23280 ździer 0.644954 0.027 ▁paździer, ▁października
10775 ▁formatt 0.649433 1 ▁formatting, ▁formatted
31128 0.650196 0.9
373 ▁on 0.653508 1 ▁one, ▁only, ▁once, ▁ont, ▁onder, ...
25145 ▁kwiet 0.656178 0.22 ▁kwietnia
292 ing 0.656331 1 ▁using, ings, tring, ning, ating, ...
30770 0.657743 0.99
31670 ʑ 0.661318 0.98
306 ▁I 0.661876 1 ▁In, ▁It, ▁If, ▁Is, ▁Il, ...
491 ▁by 0.665202 1 ▁bytes, ▁byte, ▁byl, ▁był, ▁byla, ...
17835 ▁Станов 0.666019 1 ▁Становништво
14545 ewnę 0.666601 0.01 ewnętrz, ▁zewnętrz, ▁zewnętrzne
411 ▁with 0.667623 1 ▁without, ▁within, ▁withdraw
31178 0.667951 1
31892 ҡ 0.66824 1
31638 0.671281 0.4
526 ▁are 0.675708 1 ▁area, ▁aren, ▁areas
27563 datei 0.679052 1 ▁Normdatei
366 ▁you 0.682033 1 ▁your, ▁young, ▁yourself, ▁youth, ▁yours, ...
31808 0.684182 0.98
450 ▁The 0.684837 1 ▁There, ▁Then, ▁They, ▁These, ▁Therefore, ...
23247 ▁dátum 0.68489 1 ▁dátummal
13606 oreign 0.685292 0.98 ▁Foreign, Foreign
28416 ▁Мексичка 0.685299 4.7e-06
15394 usetts 0.685612 0.77 achusetts, ▁Massachusetts
287 ed 0.689768 1 ated, ▁need, led, ied, red, ...
22918 prüft 0.69104 0.88 ▁geprüft
31766 0.693351 1
19330 ▁Википеди 0.694212 0.99 ▁Википедии
31794 ˠ 0.694586 0.98
372 ▁it 0.696264 1 ▁its, ▁item, ▁itself, ▁iter, ▁items, ...
15022 ▁zewnętrz 0.698878 0.97 ▁zewnętrzne
376 ▁" 0.700889 1 ▁"$, ▁"", ▁"/, ▁"\, ▁",, ...
367 ▁be 0.701545 1 ▁bet, ▁been, ▁bec, ▁bel, ▁because, ...
6009 perties 0.701874 0.99 properties, Properties, ▁Properties
26194 ▁Савез 0.701947 0.99 ▁Савезне
471 ▁was 0.702149 1 ▁wasn, ▁waste
515 ▁from 0.703087 1
30098 0.703405 1
6002 entication 0.703582 0.98 ▁authentication, Authentication, authentication, ▁Authentication
262 in 0.70458 1 ing, ▁in, ine, egin, begin, ...
1346 ▁“ 0.704837 1
267 es 0.705122 1 est, ess, ▁des, ies, estion, ...
414 ers 0.705673 1 vers, ▁vers, erson, ivers, ▁version, ...
467 ). 0.705857 1 ()., ")., ')., .)., })., ...
31515 0.706253 1
30330 0.706472 1
319 ▁A 0.706813 1 ▁An, ▁Answer, ▁Ar, ▁Al, ▁As, ...
31673 0.706914 1
1966 \\ 0.70887 1 ▁\\, }\\, )\\, :\\
14755 ewnętrz 0.708959 0.024 ▁zewnętrz, ▁zewnętrzne
13591 ongodb 0.709042 0.19 mongodb, ▁mongodb
472 ▁at 0.71051 1 ▁att, ▁attempt, ▁attack, ▁attribute, ▁attributes, ...
28409 Sito 0.710765 1
25104 Zygote 0.710799 1 ZygoteInit
511 ), 0.710897 1 (),, "),, '),, )),, }),, ...
9611 ViewById 0.711494 0.96 findViewById, ▁findViewById
20739 ▁надмор 0.711682 0.27 ▁надморској
385 ▁an 0.712137 1 ▁any, ▁answer, ▁android, ▁another, ▁ang, ...
31777 ѐ 0.71393 0.98
540 ▁he 0.716529 1 ▁her, ▁hel, ▁here, ▁help, ▁het, ...
591 ▁we 0.717607 1 ▁were, ▁well, ▁web, ▁wer, ▁went, ...
31444 0.718143 0.98
445 ▁this 0.71933 1
508 ▁can 0.720757 1 ▁cannot, ▁cant, ▁candid, ▁canvas, ▁cancel, ...
31281 0.720993 1
316 ▁de 0.721067 1 ▁des, ▁der, ▁del, ▁def, ▁den, ...
1576 The 0.721348 1 ▁There, ▁Then, ▁They, ▁These, ▁Therefore, ...
470 ▁or 0.723038 1 ▁org, ▁orig, ▁order, ▁original, ▁organ, ...
448 ▁- 0.72385 1 ▁--, ▁->, ▁-->, ▁---, ▁-\, ...
27061 ▁Резултати 0.724027 0.91
512 ▁In 0.724611 1 ▁Ind, ▁Intern, ▁Inst, ▁Int, ▁Inter, ...
261 er 0.725054 1 ter, ver, ere, ers, ber, ...
1213 ." 0.725721 1 ...", .");, .",, .")
756 ▁has 0.729532 1 ▁hash, ▁hasta, ▁hast, ▁hasn
23658 ▁geprüft 0.730693 0.89
30024 0.732824 1
896 ▁they 0.733075 1
28142 ightarrow 0.733924 0.76 trightarrow
31646 0.734102 0.76
28090 ▁Савезне 0.734794 0.00046
277 it 0.735217 1 ▁it, ith, ▁with, ity, ite, ...
30560 0.735275 1
31585 0.736171 0.15
360 ▁D 0.73682 1 ▁De, ▁Die, ▁Do, ▁Der, ▁Des, ...
636 .. 0.736835 1 ..., ▁..., ...., ▁.., ../, ...
31913 0.737005 0.99
10400 ▁Mitg 0.737324 1 ▁Mitglied, ▁Mitglieder
505 ▁have 0.737698 1 ▁haven, ▁havet
10164 loyee 0.737993 0.92 ▁employee, Employee, ▁employees, ▁Employee, employee
26964 ▁Хронологи 0.739664 0.91 ▁Хронологија
264 en 0.739835 1 ent, end, ment, ▁en, ▁Comment, ...
272 or 0.741115 1 ▁for, ort, ▁or, ore, ord, ...
451 ▁not 0.741546 1 ▁nothing, ▁note, ▁notice, ▁noticed, ▁notes, ...
284 al 0.742362 1 ▁al, all, ▁all, ial, ally, ...
341 ▁M 0.74241 1 ▁Mar, ▁My, ▁Me, ▁Man, ▁Mon, ...
30015 0.7431 0.99
739 ▁It 0.743645 1 ▁Ital, ▁Its, ▁Italian, ▁Item, ▁Italia, ...
273 an 0.744107 1 ▁and, ▁an, and, ant, ▁can, ...
2023 ▁... 0.744117 1 ▁...., ▁...)
265 on 0.744387 1 ion, ation, ▁on, ▁con, ction, ...
15571 ▁февра 0.744496 0.045 ▁февраля
315 ▁C 0.746425 1 ▁Com, ▁Comment, ▁Ch, ▁Con, ▁Col, ...
350 ▁B 0.74702 1 ▁But, ▁Be, ▁Br, ▁Bo, ▁Bar, ...
525 ▁' 0.747153 1 ▁'', ▁'/, ▁'\, ▁'<, ▁',, ...
317 ▁S 0.747435 1 ▁St, ▁Se, ▁Sch, ▁So, ▁Sh, ...
3178 .” 0.747608 0.98
785 ▁– 0.747623 1
31672 0.747648 0.97
271 at 0.748533 1 ation, ▁that, ate, ▁at, ath, ...
15407 ▁statunit 0.749229 0.43 ▁statunitense
365 ▁L 0.749477 1 ▁Le, ▁La, ▁List, ▁Les, ▁Li, ...
31342 0.74979 0.95
323 ▁T 0.749881 1 ▁The, ▁Th, ▁This, ▁Tags, ▁Tr, ...
20638 ungsseite 0.75109 0.67
349 ▁P 0.751925 1 ▁Pro, ▁Par, ▁Pr, ▁Pl, ▁Ph, ...
402 ▁G 0.752072 1 ▁Gr, ▁Ge, ▁Gu, ▁Go, ▁Gener, ...
31269 0.752595 0.99
390 ▁R 0.753104 1 ▁Re, ▁Ro, ▁Reg, ▁Res, ▁Rec, ...
27918 ▁Хронологија 0.755252 0.00093
31921 0.755666 0.94
30929 0.755818 1
31896 0.756161 0.027
31889 0.756423 0.97
31663 Ս 0.756946 1
383 ▁F 0.757435 1 ▁For, ▁Fran, ▁Fl, ▁Fil, ▁France, ...
24366 ▁sierp 0.757604 0.76 ▁sierpnia
31800 ܝ 0.757795 0.83
16153 gresql 0.758062 0.86 ▁postgresql, postgresql
20716 ▁Begriffsklär 0.758192 0.061
30375 0.759576 0.97
275 is 0.760574 1 ▁is, ist, ▁this, ▁his, ish, ...
596 ▁your 0.760705 1 ▁yourself, ▁yours
379 ▁H 0.761464 1 ▁He, ▁How, ▁Here, ▁However, ▁Her, ...
12867 лання 0.76216 0.17 силання, ▁Посилання
1763 ▁To 0.762175 1 ▁Tom, ▁Tor, ▁Tour, ▁Top, ▁Tod, ...
1001 ER 0.762692 1 HER, TER, HERE, VER, ▁WHERE, ...
294 as 0.763092 1 ▁as, ass, ▁was, ase, ast, ...
405 ▁N 0.764016 1 ▁New, ▁No, ▁Not, ▁Ne, ▁Now, ...
18206 braio 0.764144 0.3 ▁febbraio
31289 0.764195 1
674 ▁will 0.764756 1 ▁willing
476 ▁K 0.765823 1 ▁Kar, ▁King, ▁Ke, ▁Kir, ▁Kon, ...
279 ar 0.765827 1 art, ▁are, ard, ▁ar, are, ...
517 to 0.766292 1 ton, ▁into, ator, ato, ustom, ...
399 ▁W 0.76696 1 ▁Wh, ▁We, ▁What, ▁When, ▁Web, ...
382 ▁E 0.767649 1 ▁En, ▁Ex, ▁El, ▁Er, ▁Ed, ...
5129 ▁‘ 0.76785 1
22755 źdz 0.767895 0.69 ździer, ▁paździer, ▁października
892 ▁were 0.768279 1 ▁wereld
1838 ▁doesn 0.768324 0.97 ▁doesnt
276 re 0.768364 1 ▁re, ere, ore, ▁are, ire, ...
31705 0.768417 0.96
23795 ▁paździer 0.769237 0.1 ▁października
30957 0.769576 0.9
1152 ▁For 0.769649 1 ▁Form, ▁Fort, ▁Force, ▁Ford, ▁Forest, ...
22768 ▁жовт 0.769739 0.96 ▁жовтня
1334 ▁We 0.769835 1 ▁Web, ▁West, ▁Well, ▁Weblinks, ▁Welt, ...
590 ▁my 0.769838 1 ▁mysql, ▁myself, ▁mysq, ▁myst, ▁mysqli, ...
670 ▁his 0.770169 1 ▁histor, ▁history, ▁hist, ▁historia, ▁historical, ...
20070 ▁autory 0.770414 0.56 ▁autorytatywna
565 ▁if 0.771369 1 ▁iframe
403 ate 0.771448 1 ated, ater, ates, date, ▁create, ...
2277 ## 0.772135 1 ▁####, ####, ########, ################, ▁#####
293 ic 0.772614 1 ich, lic, ▁which, ice, ublic, ...
17391 ▁савез 0.77285 0.88 ▁савезној
300 et 0.77314 1 eth, ▁et, get, ▁get, ▁set, ...
314 am 0.773255 1 ame, ▁am, name, ample, ▁same, ...
17047 omsnitt 0.773649 0.11 ▁genomsnitt, ▁Genomsnitt, ▁Genomsnittlig
27900 ▁eredetiből 0.773994 0.0047
541 ▁but 0.77416 1 ▁button, ▁buttons
1964 AL 0.774616 1 VAL, ALL, ALSE, ▁VAL, ▁AL, ...
30347 0.774968 1
424 ant 0.77513 1 ▁want, ante, ants, anti, ▁ant, ...
355 end 0.775139 1 ▁end, ender, ends, enden, ending, ...
31764 ɫ 0.775702 1
27566 sime 0.777157 1 simeq
333 id 0.778476 1 ide, roid, ▁id, ider, ▁did, ...
1016 ▁don 0.778596 1 ▁done, ▁dont, ▁donde, ▁donc, ▁donn, ...
1699 ," 0.778662 1 ",", ▁","
4214 ING 0.778983 1 STRING, ARNING
362 ation 0.779225 1 ations, ational, lication, ization, ▁application, ...
359 os 0.77998 1 ost, ose, ▁pos, pos, ▁los, ...
31614 0.780165 0.96
3995 ,” 0.780697 1
295 el 0.78108 1 ell, ▁el, ▁del, iel, elf, ...
31372 0.781548 0.99
17578 estanden 0.781648 0.81 abestanden, ▁Mediabestanden
31879 0.782221 1
280 le 0.782356 1 ▁le, ile, able, ple, ble, ...
20870 kreich 0.782993 0.46 ▁Frankreich
9035 férés 0.783171 0.28 ozzáférés, Hozzáférés
869 ▁. 0.783199 1 ▁..., ▁.., ▁./, ▁...., ▁.=, ...
301 ▁l 0.784529 1 ▁la, ▁le, ▁li, ▁lo, ▁like, ...
31601 0.784639 0.6
701 ▁up 0.784693 1 ▁upon, ▁update, ▁upd, ▁updated, ▁upload, ...
425 ▁la 0.784794 1 ▁last, ▁las, ▁lar, ▁later, ▁large, ...
1009 ▁their 0.78501 1
309 il 0.78523 1 ill, ile, ▁will, ail, ild, ...
23910 ритор 0.785594 0.69 ▁територ
1183 ▁she 0.785908 1 ▁shell, ▁sheet, ▁sheets, ▁shelter, ▁shed, ...
887 ▁You 0.785941 1 ▁Your, ▁Young, ▁YouTube, ▁Youth
1164 ON 0.785945 1 ION, SON, ▁JSON, CON, ▁ON, ...
438 ▁O 0.786866 1 ▁Or, ▁On, ▁One, ▁Ok, ▁Ob, ...
474 ▁i 0.787613 1 ▁im, ▁if, ▁inst, ▁int, ▁into, ...
26641 ▁Мексика 0.787652 0.65
20172 ▁Przyp 0.788312 0.99 ▁Przypisy
24935 ▁RewriteCond 0.788473 0.9
31106 0.78855 0.98
797 In 0.789896 1 ▁Ind, ▁Intern, Ind, ▁Inst, Int, ...
910 ▁This 0.790258 1
31890 ɵ 0.790295 1
30214 0.790455 1
714 ▁out 0.790612 1 ▁output, ▁outside, ▁outer, ▁outputs, ▁outros, ...
8079 ▁OF 0.791113 1
20422 ніципалі 0.791367 0.016 ▁муніципалі
327 ot 0.791589 1 ▁not, oth, ote, ▁other, other, ...
368 ly 0.79164 1 ally, ▁only, ely, ually, ically, ...
1058 ▁who 0.791885 1 ▁whole, ▁whose, ▁whom
270 ▁d 0.791887 1 ▁de, ▁do, ▁des, ▁der, ▁del, ...
31588 0.79226 1
697 ▁one 0.792265 1 ▁ones
1955 OR 0.792583 1 ▁OR, ROR, ORT, ERROR, WOR, ...
332 ur 0.792664 1 our, ure, urn, ▁your, turn, ...
2190 AN 0.793038 1 ▁AND, AND, ANT, ▁AN, ANG, ...
16088 ▁… 0.79315 1
21765 ▁(\< 0.793362 0.98
478 ▁V 0.793369 1 ▁Ver, ▁Val, ▁Vol, ▁View, ▁Vo, ...
296 ent 0.793378 1 ment, ▁Comment, vent, ▁ent, ement, ...
18418 ▁людя 0.794161 0.21 ▁людях
940 ▁He 0.794396 1 ▁Here, ▁Her, ▁Hen, ▁Het, ▁Hel, ...
348 un 0.794674 1 ▁un, ▁und, ound, unction, ung, ...
750 ▁had 0.794701 1 ▁hade, ▁hadn
305 ch 0.794887 1 ich, ach, ▁ch, ▁which, isch, ...
345 ve 0.795093 1 ver, ▁have, ive, vent, ven, ...
26338 ▁Års 0.795769 1 ▁Årsmed
23875 ▁Насеље 0.79712 0.09
902 ▁her 0.797895 1 ▁here, ▁hers, ▁herself, ▁hero, ▁herm, ...
303 st 0.798546 1 est, ▁st, ist, ust, ost, ...
31575 Մ 0.798665 1
328 ad 0.799262 1 ▁ad, ▁had, ▁add, ado, read, ...
4806 We 0.799669 1 ▁Well, ▁Weblinks, ▁Welt, ▁Wer, ▁Wel, ...
2193 ▁That 0.80006 1
260 ▁t 0.800686 1 ▁th, ▁the, ▁to, ▁that, ▁this, ...
3112 It 0.801271 1 ▁Ital, Items, ▁Its, Ital, ▁Italian, ...
27865 ]`. 0.801804 0.59
11628 ▁исполь 0.801844 0.94 ▁использова, ▁использу
289 ▁b 0.801875 1 ▁be, ▁by, ▁but, ▁bet, ▁bo, ...
599 ▁all 0.801946 1 ▁allow, ▁alla, ▁alle, ▁allowed, ▁allows, ...
931 ▁time 0.802436 1 ▁times, ▁timeout, ▁timer, ▁timestamp, ▁timezone
606 ▁и 0.802441 1 ▁из, ▁ин, ▁или, ▁име, ▁исто, ...
386 th 0.802525 1 ith, ▁that, ▁with, ▁this, ath, ...
1177 IN 0.802603 1 ▁IN, ING, OIN, ▁JOIN, INE, ...
1430 EN 0.802872 1 ENT, ▁THEN, ▁END, END, ▁EN, ...
437 ▁do 0.803212 1 ▁does, ▁don, ▁doc, ▁down, ▁doesn, ...
31644 0.803349 0.6
1718 AR 0.804209 1 ART, ▁AR, CHAR, ARN, ARCHAR, ...
518 ▁[ 0.804391 1 ▁[], ▁[[, ▁[`, ▁[', ▁[", ...
326 im 0.80445 1 ▁im, ime, ▁time, ▁sim, ▁import, ...
613 ", 0.804904 1 ",", ▁",, ▁"",, ",\r, }",, ...
1749 ▁our 0.805432 1 ▁ourselves
31226 ە 0.805628 0.96
427 ▁en 0.805708 1 ▁ent, ▁end, ▁enc, ▁entre, ▁eng, ...
1218 ating 0.805737 1 ▁creating, ▁updating, ▁operating, ▁generating, ▁floating, ...
3282 ▁didn 0.805836 0.97 ▁didnt
484 ne 0.806328 1 one, ▁one, ▁new, ener, ▁need, ...
964 ▁into 0.80634 1
13594 ▁янва 0.806481 0.44 ▁января
24264 '}[ 0.806639 0.99
31351 0.806662 0.89
311 de 0.806729 1 ▁de, ode, code, ▁des, ▁der, ...
712 ▁que 0.807002 1 ▁question, ▁query, ▁questions, ▁quel, ▁queries, ...
1307 LE 0.80719 1 LECT, ▁SELECT, ABLE, SELECT, FILE, ...
3508 ▁isn 0.807359 1
342 est 0.807677 1 estion, ▁est, ▁Question, quest, ▁question, ...
269 ▁s 0.807712 1 ▁st, ▁se, ▁su, ▁sh, ▁so, ...
630 ated 0.807867 1 ▁created, ▁related, ▁updated, ▁generated, ▁located, ...
13959 ▁окт 0.807956 0.98 ▁октября
31769 0.808135 0.98
607 ▁which 0.808269 1
1207 ▁make 0.808397 1 ▁makes
3045 .... 0.8084 1 ........, ▁...., ....., ................
371 te 0.808519 1 ate, ite, ated, item, tern, ...
397 od 0.808639 1 code, ▁code, ▁mod, ethod, ▁method, ...
321 ▁e 0.808942 1 ▁en, ▁ex, ▁el, ▁er, ▁et, ...
746 ▁when 0.809821 1 ▁whenever
813 ▁— 0.810015 1
435 ▁J 0.810578 1 ▁Joh, ▁John, ▁Jah, ▁Je, ▁Jan, ...
31923 0.810991 0.99
26782 ▁пописа 0.811647 0.24
1525 RE 0.811674 1 HERE, ▁RE, ▁WHERE, REATE, URE, ...
727 ▁there 0.812044 1 ▁therefore, ▁thereby
577 ▁so 0.81222 1 ▁some, ▁sol, ▁som, ▁son, ▁something, ...
26199 ▁mieszkań 0.812228 0.44 ▁mieszkańców
1078 ates 0.812789 1 ▁States, ▁states, ▁latest, plates, ▁creates, ...
31511 0.813018 1
28906 ▁листопада 0.813436 0.64
1048 ▁about 0.813462 1
31743 ʎ 0.813532 1
381 ir 0.814067 1 ire, irst, ▁first, ▁their, irect, ...
14155 multicol 0.814708 1 multicolumn
1094 ▁As 0.814822 1 ▁Ass, ▁Associ, ▁Association, ▁Ast, ▁Ash, ...
16252 tembre 0.814912 0.51 ▁settembre
282 ▁p 0.814932 1 ▁pro, ▁pr, ▁par, ▁per, ▁pl, ...
413 ▁k 0.815154 1 ▁kn, ▁know, ▁ke, ▁key, ▁km, ...
1299 AT 0.815308 1 ATE, DATE, ATH, ATION, ATA, ...
2890 ES 0.815436 1 UES, QUEST, ▁VALUES, RES, REQUEST, ...
330 ▁g 0.815628 1 ▁get, ▁go, ▁gr, ▁gener, ▁gre, ...
13648 ══ 0.815716 0.99 ════
375 us 0.816083 1 ▁us, ust, ▁use, ous, ▁using, ...
396 ▁# 0.816103 1 ▁##, ▁###, ▁####, ▁#[, ▁#####, ...
5911 bolds 0.81613 1 boldsymbol
583 ies 0.816198 1 ities, cies, ries, ties, ▁dies, ...
344 se 0.816793 1 ▁se, ase, ser, ▁use, ▁ser, ...
388 ay 0.816933 1 ray, ▁way, ays, ▁may, ▁array, ...
274 ▁c 0.817231 1 ▁con, ▁com, ▁can, ▁ch, ▁cont, ...
31643 0.817615 0.99
370 ab 0.817829 1 able, ▁ab, ▁about, abel, ▁table, ...
31119 0.817946 1
3026 ?" 0.817963 1
31983 ɯ 0.818478 1
3352 ED 0.818798 1 ▁ED, ▁EDIT, EDIT, RED, LED, ...
290 om 0.819087 1 ▁com, ▁Com, rom, ▁Comment, com, ...
423 ia 0.819314 1 ial, ian, edia, cial, cia, ...
30925 Қ 0.819377 1
307 ro 0.819471 1 ▁pro, rom, ▁from, ▁ro, ror, ...
353 ▁= 0.819997 1 ▁=>, ▁==, ▁===, ▁=~, ▁=\, ...
975 ▁over 0.820111 1 ▁override, ▁overflow, ▁overall, ▁overhead, ▁overrid, ...
286 ▁m 0.820338 1 ▁my, ▁me, ▁ma, ▁man, ▁mod, ...
443 ▁un 0.820442 1 ▁und, ▁under, ▁una, ▁une, ▁understand, ...
453 ill 0.820582 1 ▁will, ▁still, ille, ▁Will, illa, ...
800 ations 0.820681 1 ▁relations, lications, ▁operations, ifications, ulations, ...
31816 զ 0.820684 0.81
694 ▁no 0.820905 1 ▁now, ▁non, ▁nom, ▁nov, ▁node, ...
30964 0.820945 0.36
1806 IT 0.82126 1 ITE, ▁EDIT, ITY, ITable, EDIT, ...
1551 ▁On 0.82133 1 ▁One, ▁Once, ▁Only, ▁Online, ▁Ont, ...
25726 ▁травня 0.821405 0.34
2965 IC 0.822643 1 ICE, ▁IC, VICE, LIC, ICATION
11547 ▁konn 0.823143 1 ▁konnte, ▁konnten
31499 0.823163 0.99
391 ist 0.823423 1 ▁list, List, ▁dist, ister, ▁ist, ...
26415 ríguez 0.82392 0.5 ▁Rodríguez
31421 ܐ 0.824415 1
1678 ▁▁▁ 0.824427 1 ▁▁▁▁▁▁▁▁▁, ▁▁▁▁▁▁▁, ▁▁▁▁▁▁▁▁▁▁▁, ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
30301 0.824791 1
31452 0.824802 0.97
26711 gså 0.824894 0.67 ▁også
331 em 0.825236 1 item, ement, lement, ▁em, ▁them, ...
346 ce 0.825365 1 ice, ence, ance, ces, ace, ...
16270 %%%% 0.825592 1 %%%%%%%%
14840 пня 0.826108 0.82 ▁липня, ▁серпня
335 ig 0.826123 1 ight, ign, fig, igh, right, ...
457 ine 0.826473 1 ▁line, line, ined, ines, ▁eine, ...
329 ut 0.826842 1 out, ▁but, put, ▁out, ution, ...
4587 ▁Of 0.827148 1 ▁Off, ▁Official, ▁Office, ▁Offic, ▁Offiz, ...
2688 ▁They 0.827486 1
23217 ▁zvuky 0.82753 0.47
347 ie 0.82766 1 ies, ier, iew, iel, ▁die, ...
943 ors 0.828163 1 ators, ▁errors, ▁lors, ▁alors, ▁horse, ...
562 ac 0.828743 1 act, ace, rac, ▁acc, ▁act, ...
1135 ▁than 0.828831 1 ▁thanks, ▁thank
6677 ?” 0.828935 0.98
298 ▁h 0.829181 1 ▁ha, ▁have, ▁he, ▁his, ▁had, ...
3143 ▁book 0.829556 1 ▁books
7702 ▁daugh 0.829715 1 ▁daughter, ▁daughters
31742 Ė 0.829805 1
31224 0.829911 0.99
6169 .' 0.829961 1
25460 ▁жовтня 0.830206 0.68
392 and 0.830416 1 land, ▁hand, ▁android, stand, ando, ...
23582 ▁RewriteRule 0.830449 1
358 ment 0.830566 1 ▁Comment, ement, ument, lement, ament, ...
26334 ▁квітня 0.83071 0.66
30686 0.831152 1
14262 ▁фев 0.831519 0.98 ▁февра, ▁февраля
14374 ▁апре 0.831674 0.77 ▁апреля
325 ▁v 0.831699 1 ▁val, ▁var, ▁vo, ▁value, ▁von, ...
336 ra 0.831725 1 ran, ray, rac, ▁tra, raph, ...
573 ive 0.832019 1 ative, ivers, ▁Univers, iver, ▁given, ...
481 ap 0.832292 1 ▁app, app, raph, ograph, map, ...
31311 Ű 0.832677 1
786 up 0.832924 1 oup, ▁supp, ▁support, ▁group, ▁super, ...
967 ▁its 0.833083 1 ▁itself
454 ▁le 0.833248 1 ▁les, ▁let, ▁left, ▁leg, ▁less, ...
324 ol 0.833315 1 ▁col, ▁sol, ▁fol, old, col, ...
28498 ▁лютого 0.833476 0.35
1383 ▁Sh 0.833489 1 ▁She, ▁Show, ▁Should, ▁Short, ▁Sher, ...
30003 0.833517 1
749 ance 0.833664 1 ▁instance, ances, ▁France, ▁performance, Instance, ...
30086 0.833705 1
1724 ▁What 0.833762 1
901 ▁more 0.833945 1 ▁moreover
519 able 0.833963 1 ▁table, ables, ▁able, ▁variable, table, ...
7495 ▁TO 0.834014 1 ▁TODO
3919 ENT 0.834107 1 MENT, RENT
490 ▁в 0.83422 1 ▁во, ▁ви, ▁вы, ▁ве, ▁від, ...
398 um 0.834374 1 ument, ▁num, umn, ▁number, ▁document, ...
960 ▁If 0.835028 1
24294 Webachiv 0.835318 0.00025
459 op 0.835417 1 ▁op, rop, ▁proper, ▁open, ▁oper, ...
763 ▁like 0.835448 1 ▁likely, ▁liked
28653 ▁regnigaste 0.835609 0.00099
25229 лтати 0.835853 0.0013 ▁Резултати
669 ▁& 0.836025 1 ▁&&, ▁&=, ▁&\, ▁&=\
25308 стову 0.836086 0.13 ▁використову
479 ge 0.836145 1 age, get, ▁get, ger, ange, ...
281 ▁w 0.836671 1 ▁wh, ▁with, ▁was, ▁we, ▁which, ...
291 ion 0.836854 1 ation, ction, estion, unction, ition, ...
1156 ▁after 0.837179 1 ▁afterwards, ▁afternoon
426 ▁{ 0.837573 1 ▁{\, ▁{\r, ▁{}, ▁{{, ▁{", ...
8098 ATION 0.837579 0.99 ICATION
357 ter 0.837849 1 tern, fter, ▁inter, ater, ▁after, ...
575 ens 0.837913 1 ense, ension, ensch, ▁sense, ▁sens, ...
3235 IS 0.838464 1 ▁IS, ISBN, IST, ▁ISSN, ▁ISO, ...
31518 ǧ 0.838596 0.99
1438 ▁these 0.838762 1
416 ); 0.839243 1 ();, ");, ');, ));, ▁});, ...
446 ke 0.839317 1 ▁like, ▁make, ake, ▁ke, ken, ...
17467 ▁inwon 0.839414 0.43 ▁inwoners
2973 ▁With 0.839776 1 ▁Without, ▁Within
742 ', 0.84032 1 ',', ▁',, ▁'',, ',\r, (',
20824 >\< 0.84041 0.99 >\<^
796 ▁Z 0.840531 1 ▁Ze, ▁Zeit, ▁Zwe, ▁Zw, ▁Zealand, ...
31470 ʋ 0.840643 1
9007 ▁wasn 0.841017 0.99
352 ul 0.841126 1 ould, ult, ▁would, ▁should, ull, ...
16412 .’ 0.841312 0.98
10137 itmap 0.841381 1 Bitmap, ▁Bitmap, ▁bitmap
3035 AD 0.841514 1 ▁AD, READ, ADD, HEAD, ▁ADD, ...
26557 embros 0.842088 0.99 ▁miembros
25840 ▁државе 0.842159 0.51
6824 !! 0.842362 1 !!!, ▁!!
25908 éricaine 0.842415 0.0015 ▁américaine
288 ▁o 0.842477 1 ▁of, ▁on, ▁or, ▁one, ▁ob, ...
1254 ST 0.842482 1 OST, POST, ▁ST, IST, STR, ...
679 ▁get 0.842581 1 ▁getting, ▁gets
3850 !" 0.84281 1 !");
530 ▁An 0.842991 1 ▁Answer, ▁And, ▁Any, ▁Ang, ▁Ant, ...
7654 ▁beskre 0.843145 0.01 ▁beskrevs
302 ▁n 0.84321 1 ▁not, ▁ne, ▁no, ▁new, ▁need, ...
26378 iből 0.843666 0.041 ▁eredetiből
2125 ▁take 0.843817 1 ▁taken, ▁takes
15534 ▁mysq 0.844016 1 ▁mysqli
2725 ION 0.844058 1 ATION, CTION, SION, SSION, VERSION, ...
31215 0.844348 0.99
343 ▁y 0.844906 1 ▁you, ▁your, ▁year, ▁years, ▁yet, ...
364 ▁r 0.845036 1 ▁res, ▁ro, ▁return, ▁run, ▁reg, ...
545 ure 0.845093 1 ature, ▁sure, ures, ured, atures, ...
21437 bráz 0.845228 0.61 brázky, Obrázky
537 ity 0.845604 1 ility, ivity, ality, ▁University, ability, ...
30676 0.845649 0.99
31401 0.845651 0.87
1372 ts 0.845928 1 ats, ets, ments, ots, ants, ...
1367 ID 0.846238 1 ▁ID, UID, VID, ▁IDE, WID, ...
394 ▁al 0.846358 1 ▁all, ▁also, ▁als, ▁already, ▁always, ...
937 ▁first 0.846995 1
2054 indows 0.847046 1 ▁Windows, ▁windows, Windows, windows
886 ings 0.847102 1 ▁things, ▁strings, ▁settings, ettings, Settings, ...
1642 ". 0.847118 0.99 ▁"., ("., ▁$("., ".$, ="., ...
2178 ▁All 0.847183 1 ▁Allen, ▁Alle, ▁Allow, ▁AllMovie, ▁Alliance
26734 ▁Årsmed 0.847424 0.00011
29404 ▁lutego 0.847523 0.77
369 ver 0.847689 1 vers, over, ▁over, ▁ver, very, ...
8643 ”. 0.847762 0.99
557 ak 0.847844 1 ▁make, ake, ▁tak, ▁take, ▁mak, ...
825 ▁what 0.847895 1 ▁whatever
24184 achiv 0.848613 1 Webachiv
4741 CE 0.848924 1 ACE, ICE, ▁CE, CCE, VICE, ...
1126 ▁And 0.849246 1 ▁Android, ▁Andrew, ▁André, ▁Anderson, ▁Anders, ...
285 ▁f 0.849248 1 ▁for, ▁from, ▁function, ▁form, ▁fol, ...
1080 ions 0.849448 1 ctions, ptions, itions, estions, questions, ...

Byte tokens

115 entries below threshold of 0.378

token_id token indicator ord hex byte_type reencoded
40 <0x25> 0.000349553 37 0x25 ascii 29995: %
87 <0x54> 0.00035289 84 0x54 ascii 29911: T
198 <0xC3> 0.000352937 195 0xC3 utf8
106 <0x67> 0.00035321 103 0x67 ascii 29887: g
98 <0x5F> 0.000353294 95 0x5F ascii 29918: _
75 <0x48> 0.000353442 72 0x48 ascii 29950: H
76 <0x49> 0.000353553 73 0x49 ascii 29902: I
78 <0x4B> 0.000353974 75 0x4B ascii 29968: K
112 <0x6D> 0.000353992 109 0x6D ascii 29885: m
36 <0x21> 0.000354813 33 0x21 ascii 29991: !
71 <0x44> 0.000355259 68 0x44 ascii 29928: D
257 <0xFE> 0.00035526 254 0xFE unused_utf8
54 <0x33> 0.000355356 51 0x33 ascii 29941: 3
57 <0x36> 0.000355367 54 0x36 ascii 29953: 6
125 <0x7A> 0.000355599 122 0x7A ascii 29920: z
51 <0x30> 0.000355822 48 0x30 ascii 29900: 0
121 <0x76> 0.000355903 118 0x76 ascii 29894: v
72 <0x45> 0.000355949 69 0x45 ascii 29923: E
61 <0x3A> 0.000356287 58 0x3A ascii 29901: :
115 <0x70> 0.000356474 112 0x70 ascii 29886: p
95 additional entries below threshold
token_id token indicator ord hex byte_type reencoded
62 <0x3B> 0.000356522 59 0x3B ascii 29936: ;
64 <0x3D> 0.000356552 61 0x3D ascii 29922: =
73 <0x46> 0.000356636 70 0x46 ascii 29943: F
107 <0x68> 0.000356722 104 0x68 ascii 29882: h
103 <0x64> 0.000356724 100 0x64 ascii 29881: d
53 <0x32> 0.000356775 50 0x32 ascii 29906: 2
69 <0x42> 0.000356784 66 0x42 ascii 29933: B
105 <0x66> 0.000356812 102 0x66 ascii 29888: f
127 <0x7C> 0.000356898 124 0x7C ascii 29989: |
43 <0x28> 0.000357088 40 0x28 ascii 29898: (
80 <0x4D> 0.000357193 77 0x4D ascii 29924: M
92 <0x59> 0.000357211 89 0x59 ascii 29979: Y
116 <0x71> 0.000357213 113 0x71 ascii 29939: q
48 <0x2D> 0.000357224 45 0x2D ascii 29899: -
35 <0x20> 0.000357337 32 0x20 ascii 29871:
94 <0x5B> 0.000357362 91 0x5B ascii 29961: [
70 <0x43> 0.000357616 67 0x43 ascii 29907: C
250 <0xF7> 0.000357626 247 0xF7 unused_utf8
68 <0x41> 0.000357736 65 0x41 ascii 29909: A
85 <0x52> 0.000357893 82 0x52 ascii 29934: R
82 <0x4F> 0.000357944 79 0x4F ascii 29949: O
129 <0x7E> 0.000358006 126 0x7E ascii 30022: ~
102 <0x63> 0.000358035 99 0x63 ascii 29883: c
111 <0x6C> 0.000358327 108 0x6C ascii 29880: l
120 <0x75> 0.000358404 117 0x75 ascii 29884: u
258 <0xFF> 0.00035851 255 0xFF unused_utf8
77 <0x4A> 0.000358534 74 0x4A ascii 29967: J
50 <0x2F> 0.000358599 47 0x2F ascii 29914: /
41 <0x26> 0.000358785 38 0x26 ascii 29987: &
128 <0x7D> 0.000358788 125 0x7D ascii 29913: }
45 <0x2A> 0.000358817 42 0x2A ascii 29930: *
83 <0x50> 0.000359294 80 0x50 ascii 29925: P
104 <0x65> 0.000359346 101 0x65 ascii 29872: e
63 <0x3C> 0.000359508 60 0x3C ascii 29966: <
122 <0x77> 0.000359554 119 0x77 ascii 29893: w
66 <0x3F> 0.000359587 63 0x3F ascii 29973: ?
58 <0x37> 0.00035966 55 0x37 ascii 29955: 7
254 <0xFB> 0.000359866 251 0xFB unused_utf8
74 <0x47> 0.000359876 71 0x47 ascii 29954: G
119 <0x74> 0.000360028 116 0x74 ascii 29873: t
110 <0x6B> 0.000360232 107 0x6B ascii 29895: k
95 <0x5C> 0.00036028 92 0x5C ascii 29905: \
109 <0x6A> 0.000360433 106 0x6A ascii 29926: j
47 <0x2C> 0.000360453 44 0x2C ascii 29892: ,
86 <0x53> 0.000360453 83 0x53 ascii 29903: S
196 <0xC1> 0.000360566 193 0xC1 unused_utf8
67 <0x40> 0.00036067 64 0x40 ascii 29992: @
195 <0xC0> 0.000360678 192 0xC0 unused_utf8
255 <0xFC> 0.000361047 252 0xFC unused_utf8
39 <0x24> 0.000361137 36 0x24 ascii 29938: $
100 <0x61> 0.000361144 97 0x61 ascii 29874: a
256 <0xFD> 0.00036137 253 0xFD unused_utf8
88 <0x55> 0.000361525 85 0x55 ascii 29965: U
253 <0xFA> 0.000361625 250 0xFA unused_utf8
101 <0x62> 0.000361671 98 0x62 ascii 29890: b
90 <0x57> 0.000361719 87 0x57 ascii 29956: W
113 <0x6E> 0.000361883 110 0x6E ascii 29876: n
93 <0x5A> 0.000361913 90 0x5A ascii 29999: Z
60 <0x39> 0.00036192 57 0x39 ascii 29929: 9
251 <0xF8> 0.000362142 248 0xF8 unused_utf8
81 <0x4E> 0.000362173 78 0x4E ascii 29940: N
126 <0x7B> 0.000362406 123 0x7B ascii 29912: {
49 <0x2E> 0.000362415 46 0x2E ascii 29889: .
118 <0x73> 0.000362494 115 0x73 ascii 29879: s
59 <0x38> 0.0003625 56 0x38 ascii 29947: 8
123 <0x78> 0.000362563 120 0x78 ascii 29916: x
37 <0x22> 0.000362826 34 0x22 ascii 29908: "
38 <0x23> 0.000362847 35 0x23 ascii 29937: #
52 <0x31> 0.000362981 49 0x31 ascii 29896: 1
46 <0x2B> 0.000363121 43 0x2B ascii 29974: +
84 <0x51> 0.000363154 81 0x51 ascii 29984: Q
55 <0x34> 0.000363318 52 0x34 ascii 29946: 4
79 <0x4C> 0.000363319 76 0x4C ascii 29931: L
44 <0x29> 0.000363431 41 0x29 ascii 29897: )
42 <0x27> 0.000363599 39 0x27 ascii 29915: '
56 <0x35> 0.000363746 53 0x35 ascii 29945: 5
97 <0x5E> 0.000363756 94 0x5E ascii 29985: ^
249 <0xF6> 0.00036379 246 0xF6 unused_utf8
16 <0x0D> 0.000363873 13 0x0D ascii 30004: \r
65 <0x3E> 0.000363895 62 0x3E ascii 29958: >
252 <0xF9> 0.000364292 249 0xF9 unused_utf8
117 <0x72> 0.000365058 114 0x72 ascii 29878: r
124 <0x79> 0.000365178 121 0x79 ascii 29891: y
96 <0x5D> 0.00036609 93 0x5D ascii 29962: ]
114 <0x6F> 0.000366186 111 0x6F ascii 29877: o
99 <0x60> 0.000366264 96 0x60 ascii 29952: `
89 <0x56> 0.000366503 86 0x56 ascii 29963: V
108 <0x69> 0.000367985 105 0x69 ascii 29875: i
248 <0xF5> 0.000368799 245 0xF5 unused_utf8
91 <0x58> 0.000369175 88 0x58 ascii 29990: X
13 <0x0A> 0.200063 10 0x0A ascii
244 <0xF1> 0.294427 241 0xF1 utf8
245 <0xF2> 0.295276 242 0xF2 utf8
29889 . 0.319513 46 0x2E ascii
29892 , 0.358964 44 0x2C ascii
236 additional entries above threshold
token_id token indicator ord hex byte_type
29906 2 0.559211 50 0x32 ascii
29896 1 0.568724 49 0x31 ascii
29915 ' 0.590714 39 0x27 ascii
29901 : 0.594011 58 0x3A ascii
29899 - 0.6154 45 0x2D ascii
29949 O 0.622797 79 0x4F ascii
29879 s 0.625092 115 0x73 ascii
29991 ! 0.644942 33 0x21 ascii
29936 ; 0.651857 59 0x3B ascii
29898 ( 0.675037 40 0x28 ascii
29897 ) 0.677739 41 0x29 ascii
29908 " 0.682318 34 0x22 ascii
29900 0 0.693217 48 0x30 ascii
29973 ? 0.697434 63 0x3F ascii
29873 t 0.711536 116 0x74 ascii
29884 u 0.714017 117 0x75 ascii
29891 y 0.715158 121 0x79 ascii
29874 a 0.719115 97 0x61 ascii
29902 I 0.719642 73 0x49 ascii
29903 S 0.719642 83 0x53 ascii
29911 T 0.721485 84 0x54 ascii
29909 A 0.722032 65 0x41 ascii
29979 Y 0.723329 89 0x59 ascii
29877 o 0.724567 111 0x6F ascii
29875 i 0.726322 105 0x69 ascii
29950 H 0.72635 72 0x48 ascii
29928 D 0.729377 68 0x44 ascii
29941 3 0.731278 51 0x33 ascii
29876 n 0.737293 110 0x6E ascii
29907 C 0.73769 67 0x43 ascii
29882 h 0.738248 104 0x68 ascii
29923 E 0.73837 69 0x45 ascii
29883 c 0.739972 99 0x63 ascii
29914 / 0.740379 47 0x2F ascii
29924 M 0.744238 77 0x4D ascii
29954 G 0.745466 71 0x47 ascii
29885 m 0.745968 109 0x6D ascii
29886 p 0.746119 112 0x70 ascii
29946 4 0.747275 52 0x34 ascii
29890 b 0.748493 98 0x62 ascii
29931 L 0.749878 76 0x4C ascii
29872 e 0.749895 101 0x65 ascii
29888 f 0.753054 102 0x66 ascii
29918 _ 0.753355 95 0x5F ascii
29881 d 0.753777 100 0x64 ascii
29940 N 0.754321 78 0x4E ascii
29887 g 0.754817 103 0x67 ascii
29878 r 0.755272 114 0x72 ascii
29943 F 0.755471 70 0x46 ascii
29934 R 0.757096 82 0x52 ascii
29945 5 0.762486 53 0x35 ascii
29933 B 0.762505 66 0x42 ascii
29968 K 0.763037 75 0x4B ascii
29895 k 0.765298 107 0x6B ascii
29925 P 0.76573 80 0x50 ascii
29893 w 0.767202 119 0x77 ascii
29880 l 0.770884 108 0x6C ascii
29894 v 0.770886 118 0x76 ascii
29956 W 0.781035 87 0x57 ascii
29962 ] 0.781241 93 0x5D ascii
29920 z 0.78254 122 0x7A ascii
29953 6 0.786947 54 0x36 ascii
29916 x 0.791137 120 0x78 ascii
29926 j 0.793477 106 0x6A ascii
29965 U 0.79422 85 0x55 ascii
224 <0xDD> 0.794313 221 0xDD utf8
29990 X 0.80137 88 0x58 ascii
29963 V 0.804148 86 0x56 ascii
29912 { 0.804758 123 0x7B ascii
29947 8 0.804946 56 0x38 ascii
29955 7 0.80685 55 0x37 ascii
232 <0xE5> 0.809739 229 0xE5 utf8
29930 * 0.81135 42 0x2A ascii
29929 9 0.811631 57 0x39 ascii
26 <0x17> 0.814664 23 0x17 ascii
12 <0x09> 0.818432 9 0x09 ascii
29961 [ 0.818713 91 0x5B ascii
233 <0xE6> 0.818935 230 0xE6 utf8
29999 Z 0.822533 90 0x5A ascii
29967 J 0.825465 74 0x4A ascii
235 <0xE8> 0.83363 232 0xE8 utf8
29913 } 0.83586 125 0x7D ascii
225 <0xDE> 0.838118 222 0xDE utf8
31 <0x1C> 0.83904 28 0x1C ascii
29958 > 0.839625 62 0x3E ascii
29905 \ 0.844405 92 0x5C ascii
29922 = 0.84918 61 0x3D ascii
24 <0x15> 0.859993 21 0x15 ascii
247 <0xF4> 0.861869 244 0xF4 utf8
25 <0x16> 0.865116 22 0x16 ascii
29966 < 0.8695 60 0x3C ascii
234 <0xE7> 0.878172 231 0xE7 utf8
29938 $ 0.878677 36 0x24 ascii
29984 Q 0.878979 81 0x51 ascii
29937 # 0.879781 35 0x23 ascii
3 <0x00> 0.881064 0x00 ascii
17 <0x0E> 0.88375 14 0x0E ascii
18 <0x0F> 0.885594 15 0x0F ascii
29987 & 0.887163 38 0x26 ascii
33 <0x1E> 0.887258 30 0x1E ascii
29974 + 0.888913 43 0x2B ascii
29995 % 0.90072 37 0x25 ascii
29939 q 0.903412 113 0x71 ascii
29989 | 0.905458 124 0x7C ascii
226 <0xDF> 0.908763 223 0xDF utf8
30004 \r 0.913981 13 0x0D ascii
20 <0x11> 0.915301 17 0x11 ascii
238 <0xEB> 0.915337 235 0xEB utf8
29992 @ 0.920946 64 0x40 ascii
21 <0x12> 0.921773 18 0x12 ascii
231 <0xE4> 0.926385 228 0xE4 utf8
136 <0x85> 0.926476 133 0x85 utf8
7 <0x04> 0.928402 4 0x04 ascii
29952 ` 0.941949 96 0x60 ascii
236 <0xE9> 0.943978 233 0xE9 utf8
27 <0x18> 0.945344 24 0x18 ascii
239 <0xEC> 0.94627 236 0xEC utf8
29985 ^ 0.952448 94 0x5E ascii
23 <0x14> 0.95405 20 0x14 ascii
19 <0x10> 0.960914 16 0x10 ascii
156 <0x99> 0.96862 153 0x99 utf8
29 <0x1A> 0.975459 26 0x1A ascii
167 <0xA4> 0.976253 164 0xA4 utf8
152 <0x95> 0.977705 149 0x95 utf8
34 <0x1F> 0.979578 31 0x1F ascii
22 <0x13> 0.980158 19 0x13 ascii
132 <0x81> 0.98108 129 0x81 utf8
191 <0xBC> 0.985249 188 0xBC utf8
135 <0x84> 0.985688 132 0x84 utf8
242 <0xEF> 0.98711 239 0xEF utf8
139 <0x88> 0.988618 136 0x88 utf8
240 <0xED> 0.989035 237 0xED utf8
228 <0xE1> 0.992292 225 0xE1 utf8
227 <0xE0> 0.994288 224 0xE0 utf8
155 <0x98> 0.995086 152 0x98 utf8
187 <0xB8> 0.996373 184 0xB8 utf8
131 <0x80> 0.996599 128 0x80 utf8
30022 ~ 0.996621 126 0x7E ascii
162 <0x9F> 0.997467 159 0x9F utf8
189 <0xBA> 0.997707 186 0xBA utf8
193 <0xBE> 0.998706 190 0xBE utf8
151 <0x94> 1.00162 148 0x94 utf8
145 <0x8E> 1.00203 142 0x8E utf8
179 <0xB0> 1.00248 176 0xB0 utf8
30 <0x1B> 1.00548 27 0x1B ascii
243 <0xF0> 1.00723 240 0xF0 utf8
183 <0xB4> 1.00748 180 0xB4 utf8
8 <0x05> 1.00766 5 0x05 ascii
223 <0xDC> 1.0097 220 0xDC utf8
159 <0x9C> 1.01063 156 0x9C utf8
32 <0x1D> 1.01214 29 0x1D ascii
169 <0xA6> 1.01221 166 0xA6 utf8
146 <0x8F> 1.01491 143 0x8F utf8
134 <0x83> 1.01537 131 0x83 utf8
138 <0x87> 1.01674 135 0x87 utf8
229 <0xE2> 1.01814 226 0xE2 utf8
4 <0x01> 1.02057 1 0x01 ascii
178 <0xAF> 1.02075 175 0xAF utf8
144 <0x8D> 1.02177 141 0x8D utf8
181 <0xB2> 1.02625 178 0xB2 utf8
192 <0xBD> 1.02656 189 0xBD utf8
143 <0x8C> 1.02667 140 0x8C utf8
148 <0x91> 1.02763 145 0x91 utf8
133 <0x82> 1.02871 130 0x82 utf8
154 <0x97> 1.02876 151 0x97 utf8
237 <0xEA> 1.02962 234 0xEA utf8
137 <0x86> 1.03068 134 0x86 utf8
176 <0xAD> 1.03135 173 0xAD utf8
164 <0xA1> 1.03277 161 0xA1 utf8
212 <0xD1> 1.0328 209 0xD1 utf8
28 <0x19> 1.03405 25 0x19 ascii
177 <0xAE> 1.03681 174 0xAE utf8
180 <0xB1> 1.03812 177 0xB1 utf8
142 <0x8B> 1.0387 139 0x8B utf8
9 <0x06> 1.03871 6 0x06 ascii
168 <0xA5> 1.04245 165 0xA5 utf8
190 <0xBB> 1.0427 187 0xBB utf8
158 <0x9B> 1.04283 155 0x9B utf8
184 <0xB5> 1.04289 181 0xB5 utf8
161 <0x9E> 1.04352 158 0x9E utf8
188 <0xB9> 1.04478 185 0xB9 utf8
153 <0x96> 1.04569 150 0x96 utf8
147 <0x90> 1.04579 144 0x90 utf8
194 <0xBF> 1.04655 191 0xBF utf8
182 <0xB3> 1.04923 179 0xB3 utf8
197 <0xC2> 1.05009 194 0xC2 utf8
140 <0x89> 1.05022 137 0x89 utf8
5 <0x02> 1.05226 2 0x02 ascii
171 <0xA8> 1.05256 168 0xA8 utf8
173 <0xAA> 1.05265 170 0xAA utf8
166 <0xA3> 1.05291 163 0xA3 utf8
175 <0xAC> 1.05383 172 0xAC utf8
160 <0x9D> 1.05422 157 0x9D utf8
216 <0xD5> 1.05593 213 0xD5 utf8
174 <0xAB> 1.05647 171 0xAB utf8
15 <0x0C> 1.05699 12 0x0C ascii
14 <0x0B> 1.05754 11 0x0B ascii
203 <0xC8> 1.0604 200 0xC8 utf8
214 <0xD3> 1.06043 211 0xD3 utf8
163 <0xA0> 1.06218 160 0xA0 utf8
211 <0xD0> 1.06461 208 0xD0 utf8
246 <0xF3> 1.06576 243 0xF3 utf8
141 <0x8A> 1.06626 138 0x8A utf8
150 <0x93> 1.06708 147 0x93 utf8
157 <0x9A> 1.06942 154 0x9A utf8
201 <0xC6> 1.07865 198 0xC6 utf8
185 <0xB6> 1.07918 182 0xB6 utf8
172 <0xA9> 1.0792 169 0xA9 utf8
207 <0xCC> 1.07969 204 0xCC utf8
170 <0xA7> 1.08086 167 0xA7 utf8
186 <0xB7> 1.08208 183 0xB7 utf8
149 <0x92> 1.08253 146 0x92 utf8
222 <0xDB> 1.08257 219 0xDB utf8
218 <0xD7> 1.09124 215 0xD7 utf8
213 <0xD2> 1.09259 210 0xD2 utf8
165 <0xA2> 1.09325 162 0xA2 utf8
208 <0xCD> 1.09347 205 0xCD utf8
210 <0xCF> 1.09486 207 0xCF utf8
11 <0x08> 1.09729 8 0x08 ascii
217 <0xD6> 1.10089 214 0xD6 utf8
202 <0xC7> 1.10894 199 0xC7 utf8
209 <0xCE> 1.11113 206 0xCE utf8
230 <0xE3> 1.11225 227 0xE3 utf8
130 <0x7F> 1.11301 127 0x7F ascii
215 <0xD4> 1.11486 212 0xD4 utf8
204 <0xC9> 1.11921 201 0xC9 utf8
10 <0x07> 1.11938 7 0x07 ascii
199 <0xC4> 1.12015 196 0xC4 utf8
241 <0xEE> 1.12529 238 0xEE utf8
205 <0xCA> 1.12782 202 0xCA utf8
200 <0xC5> 1.13029 197 0xC5 utf8
221 <0xDA> 1.13734 218 0xDA utf8
6 <0x03> 1.1526 3 0x03 ascii
206 <0xCB> 1.15413 203 0xCB utf8
219 <0xD8> 1.15969 216 0xD8 utf8
220 <0xD9> 1.1775 217 0xD9 utf8

Special tokens

1 entries below threshold of 0.378

token_id token indicator max_prob
0 <unk> 0.000358604 3e-08
2 additional entries above threshold
token_id token indicator max_prob
1 <s> 0.484084 0.089
2 </s> 0.598099 0.021