Skip to content

Commit

Permalink
uwu
Browse files Browse the repository at this point in the history
  • Loading branch information
Daniel-Liu-c0deb0t committed Mar 25, 2021
1 parent d1b1c43 commit e9654bd
Show file tree
Hide file tree
Showing 8 changed files with 197 additions and 127 deletions.
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "uwuify"
version = "0.1.0"
version = "0.2.0"
authors = ["c0deb0t <[email protected]>"]
edition = "2018"
license = "MIT"
Expand Down
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ if they were lucky enough to cross the boundary of a simd vector or a thread's b
</details>

### ok i want uwu'd text, how do i run this myself?
#### install command-line tool
1. install rust: run `curl https://sh.rustup.rs -sSf | sh` on unix,
or go [here](https://www.rust-lang.org/tools/install) for more options
2. run `cargo install uwuify`
Expand All @@ -71,6 +72,24 @@ it is possible to read and write from files by specifying the input file and
output file, in that order. u can use `--help` for more info. pass in
`-v` for timings

this is on crates.io [here](https://crates.io/crates/uwuify)

#### include as library
1. put `uwuify = "^0.2"` under `[dependencies]` in your `Cargo.toml` file
2. the library is called `uwuifier` (slightly different from the name of the binary!)
use it like so:
```
use uwuifier::{uwuify_sse, round_up16};
let s = "hello world";
let b = s.as_bytes();
let mut temp1 = vec![0u8; round_up16(b.len()) * 16];
let mut temp2 = vec![0u8; round_up16(b.len()) * 16];
let res = uwuify_sse(b, &mut temp1, &mut temp2);
assert_eq!(std::str::from_utf8(res).unwrap(), "hewwo wowwd");
```

documentation is [here](https://docs.rs/uwuify/latest/uwuifier/)

#### build from this repo
<details>
<summary>click for more info</summary>
Expand Down
233 changes: 126 additions & 107 deletions README_UWU.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ fastest t-text uwuifiew in t-the west

twansfowms
```
hey... i-i think i weawwy wuv you. òωó do you w-want a headpat?
hey... i-i think i weawwy wuv you. OwO do you w-want a headpat?
```
into
```
hey... i-i think i w-weawwy wuv you. ʘwʘ (⑅˘꒳˘) d-do you want a headpat?
hey... i-i think i w-weawwy wuv you. (⑅˘꒳˘) (⑅˘꒳˘) d-do you want a headpat?
```

t-thewe's an [uwu'd](weadme_uwu.txt) v-vewsion of this w-weadme
Expand All @@ -17,7 +17,7 @@ t-thewe's an [uwu'd](weadme_uwu.txt) v-vewsion of this w-weadme
u want wawge amounts of text uwu'd in a smow amount of time

### whewe?
uw computew, ( ͡o ω ͡o ) if i-it has a wecent x86 cpu (intew, (U ᵕ U❁) amd) that suppowts sse4.1
uw computew, UwU if i-it has a wecent x86 cpu (intew, (///ˬ///✿) amd) that suppowts sse4.1

### why?
why nyot?
Expand All @@ -29,198 +29,217 @@ twdw: 128-bit s-simd vectowization pwus s-some big bwain awgos
<summawy>cwick fow mowe info</summawy>
<p>

aftew houws of weseawch, o.O i-i've finawwy undewstood the e-essence of uwu'd t-text
aftew houws of weseawch, ( ͡o ω ͡o ) i-i've finawwy undewstood the e-essence of uwu'd t-text

thewe awe a-a few twansfowmations:
1. OwO n-nyya-ify (eg. o.O `nawuhodo` -> `nyawuhodo`)
2. rawr x3 wepwace `w` a-and `w` with `w`
3. σωσ stuttew sometimes (`hi` -> `h-hi`)
4. (˘ω˘) a-add a-a text emoji aftew p-punctuation (`,`, rawr x3 `.`, ow `!`) sometimes
5. OwO wepwace some wowds (`smow` -> `smow`, (///ˬ///✿) e-etc.)

these twansfowmation p-passes take advantage of sse4.1 vectow intwinsics to pwocess 16 bytes at once. -.-
f-fow stwing seawching, rawr x3 i'm using a custom simd i-impwementation of the
[bitap](https://en.wikipedia.owg/wiki/bitap_awgowithm) awgowithm f-fow matching a-against muwtipwe s-stwings. -.-
fow wandom nyumbew genewation, (˘ω˘) i'm using [xowshift32](https://en.wikipedia.owg/wiki/xowshift). σωσ fow most
chawactew-wevew detection w-within simd wegistews, (˘ω˘) i-its aww masking a-and shifting t-to simuwate b-basic state
1. o.O n-nyya-ify (eg. UwU `nawuhodo` -> `nyawuhodo`)
2. (˘ω˘) wepwace `w` a-and `w` with `w`
3. (U ᵕ U❁) stuttew sometimes (`hi` -> `h-hi`)
4. ʘwʘ a-add a-a text emoji aftew p-punctuation (`,`, -.- `.`, ow `!`) sometimes
5. σωσ wepwace some wowds (`smow` -> `smow`, UwU e-etc.)

these twansfowmation p-passes take advantage of sse4.1 vectow intwinsics to pwocess 16 bytes at once. σωσ
f-fow stwing seawching, OwO i'm using a custom simd i-impwementation of the
[bitap](https://en.wikipedia.owg/wiki/bitap_awgowithm) awgowithm f-fow matching a-against muwtipwe s-stwings. OwO
fow wandom nyumbew genewation, o.O i'm using [xowshift32](https://en.wikipedia.owg/wiki/xowshift). (U ﹏ U) fow most
chawactew-wevew detection w-within simd wegistews, σωσ i-its aww masking a-and shifting t-to simuwate b-basic state
machines i-in pawawwew

muwtithweading is suppowted, rawr x3 so u-u can expwoit aww of uw cpu cowes f-fow the nyobwe goaw
muwtithweading is suppowted, ʘwʘ so u-u can expwoit aww of uw cpu cowes f-fow the nyobwe goaw
of uwu-ing m-massive amounts o-of text

utf-8 is handwed ewegantwy by simpwy ignowing nyon-ascii c-chawactews in the input

unfowtunatewy, (///ˬ///✿) due t-to both simd pawawwewism and muwtithweading, some wowds may nyot b-be fuwwy uwu'd
if they wewe wucky e-enough to cwoss t-the boundawy o-of a simd vectow o-ow a thwead's buffew. (˘ω˘)
unfowtunatewy, (U ﹏ U) due t-to both simd pawawwewism and muwtithweading, some wowds may nyot b-be fuwwy uwu'd
if they wewe wucky e-enough to cwoss t-the boundawy o-of a simd vectow o-ow a thwead's buffew. (ꈍᴗꈍ)
*they won't e-escape so easiwy n-nyext time*

</p>
</detaiws>

### o-ok i want uwu'd text, o.O how d-do i wun this mysewf?
1. ( ͡o ω ͡o ) instaww wust: wun `cuww h-https://sh.wustup.ws -ssf | s-sh` on unix, >w<
### o-ok i want uwu'd text, -.- how d-do i wun this mysewf?
#### instaww command-wine t-toow
1. o.O instaww w-wust: wun `cuww https://sh.wustup.ws -ssf | s-sh` on unix, (⑅˘꒳˘)
ow go [hewe](https://www.wust-wang.owg/toows/instaww) f-fow mowe options
2. (U ﹏ U) wun `cawgo i-instaww uwuify`
3. OwO w-wun `uwuify` which wiww wead f-fwom stdin and output t-to stdout. OwO m-make suwe u
pwess ctww + d (unix) o-ow ctww + z and entew (windows) a-aftew u type s-stuff in stdin to s-send an eof
2. w-wun `cawgo instaww uwuify`
3. w-wun `uwuify` which w-wiww wead fwom s-stdin and output to stdout. ( ͡o ω ͡o ) m-make suwe u
pwess ctww + d (unix) o-ow ctww + z and e-entew (windows) a-aftew u type stuff in stdin to s-send an eof

if you awe having t-twoubwe wunning `uwuify`, rawr x3 m-make suwe you have `~/.cawgo/bin`
i-in y-youw `$path`
if y-you awe having twoubwe wunning `uwuify`, (///ˬ///✿) m-make suwe y-you have `~/.cawgo/bin`
i-in youw `$path`

it i-is possibwe to w-wead and wwite fwom f-fiwes by specifying the input fiwe and
output f-fiwe, -.- in that owdew. OwO u can use `--hewp` f-fow mowe info. (⑅˘꒳˘) pass in
i-it i-is possibwe to wead and wwite fwom fiwes by specifying t-the input fiwe and
output f-fiwe, >w< in that owdew. σωσ u can use `--hewp` fow mowe info. o.O pass in
`-v` fow timings

#### buiwd fwom this wepo
this is on cwates.io [hewe](https://cwates.io/cwates/uwuify)

#### incwude as w-wibwawy
1. -.- put `uwuify = "^0.2"` u-undew `[dependencies]` in youw `cawgo.tomw` fiwe
2. o.O t-the wibwawy i-is cawwed `uwuifiew` (swightwy d-diffewent fwom the nyame of the binawy!)
use it w-wike so:
```
use uwuifiew::{uwuify_sse, ( ͡o ω ͡o ) w-wound_up16};
w-wet s = "hewwo wowwd";
wet b-b = s.as_bytes();
w-wet mut temp1 = v-vec![0u8; wound_up16(b.wen()) * 16];
wet mut temp2 = vec![0u8; wound_up16(b.wen()) * 16];
wet w-wes = uwuify_sse(b, o.O &mut temp1, (U ﹏ U) &mut t-temp2);
assewt_eq!(std::stw::fwom_utf8(wes).unwwap(), (U ﹏ U) "hewwo w-wowwd");
```

documentation is [hewe](https://docs.ws/uwuify/watest/uwuifiew/)

#### buiwd fwom t-this wepo
<detaiws>
<summawy>cwick fow mowe info</summawy>
<summawy>cwick f-fow mowe info</summawy>
<p>

1. UwU instaww wust
2. (///ˬ///✿) w-wun `git cwone h-https://github.com/daniew-wiu-c0deb0t/uwu.git && cd uwu`
3. ( ͡o ω ͡o ) wun `cawgo wun --wewease`
1. (U ﹏ U) instaww wust
2. (U ᵕ U❁) wun `git cwone h-https://github.com/daniew-wiu-c0deb0t/uwu.git && cd uwu`
3. wun `cawgo wun --wewease`

##### t-testing
1. o.O wun `cawgo t-test`
##### testing
1. (U ᵕ U❁) wun `cawgo t-test`

##### b-benchmawking
1. UwU wun `mkdiw test && cd test`
##### benchmawking
1. (U ᵕ U❁) w-wun `mkdiw test && c-cd test`

*wawning: w-wawge fiwes of 100mb a-and 1gb, (˘ω˘) wespectivewy*
*wawning: w-wawge fiwes of 100mb and 1gb, (///ˬ///✿) wespectivewy*

2. (U ᵕ U❁) w-wun `cuww -ow http://mattmahoney.net/dc/enwik8.zip && u-unzip enwik8.zip`
3. ʘwʘ w-wun `cuww -ow h-http://mattmahoney.net/dc/enwik9.zip && unzip enwik9.zip`
4. -.- wun `cd .. σωσ && ./bench.sh`
2. >w< w-wun `cuww -ow h-http://mattmahoney.net/dc/enwik8.zip && unzip enwik8.zip`
3. òωó w-wun `cuww -ow h-http://mattmahoney.net/dc/enwik9.zip && unzip enwik9.zip`
4. (˘ω˘) w-wun `cd .. && ./bench.sh`

</p>
</detaiws>

### i don't bewieve t-that this is fast. UwU i need pwoof!!1! σωσ
t-twdw: can b-be awmost as fast as simpwy copying a fiwe
### i don't bewieve that this is fast. ʘwʘ i nyeed pwoof!!1! (U ᵕ U❁)
twdw: can be awmost as fast a-as simpwy copying a fiwe

<detaiws>
<summawy>cwick f-fow mowe info</summawy>
<summawy>cwick fow mowe info</summawy>
<p>

w-waw nyumbews fwom wunning `./bench.sh` on a 2019 m-macbook pwo with eight
intew 2.3 ghz i9 cpus and 16 gb of wam a-awe shown bewow. OwO the dataset
used i-is the fiwst 100mb a-and fiwst 1gb o-of engwish wikipedia. OwO the same
dataset is used f-fow the [huttew p-pwize](http://pwize.huttew1.net/)
fow text compwession
waw nyumbews fwom wunning `./bench.sh` o-on a 2019 m-macbook pwo with e-eight
intew 2.3 g-ghz i9 cpus a-and 16 gb of wam awe shown bewow. t-the dataset
used i-is the fiwst 100mb a-and fiwst 1gb of engwish wikipedia. (˘ω˘) the same
d-dataset is used f-fow the [huttew pwize](http://pwize.huttew1.net/)
f-fow text compwession

```
1 t-thwead uwu enwik8
t-time taken: 178 ms
time taken: 178 ms
input size: 100000000 bytes
o-output size: 115095591 bytes
thwoughput: 0.55992 gb/s
output size: 115095591 b-bytes
thwoughput: 0.55992 g-gb/s

2 thwead uwu enwik8
time taken: 105 ms
input size: 100000000 b-bytes
output size: 115095591 bytes
time t-taken: 105 ms
i-input size: 100000000 bytes
output s-size: 115095591 bytes
thwoughput: 0.94701 gb/s

4 thwead uwu e-enwik8
4 thwead uwu enwik8
time taken: 60 m-ms
input size: 100000000 bytes
o-output size: 115095591 b-bytes
t-thwoughput: 1.64883 gb/s

8 thwead u-uwu enwik8
t-time taken: 47 ms
i-input size: 100000000 bytes
output size: 115095591 b-bytes
thwoughput: 1.64883 gb/s

8 thwead uwu enwik8
time taken: 47 ms
input size: 100000000 bytes
output size: 115095591 bytes
thwoughput: 2.12590 gb/s

c-copy enwik8
copy enwik8

weaw 0m0.035s
u-usew 0m0.001s
sys 0m0.031s

1 t-thwead uwu enwik9
1 thwead u-uwu enwik9
time taken: 2087 ms
input size: 1000000000 bytes
o-output size: 1149772651 b-bytes
input size: 1000000000 b-bytes
output size: 1149772651 b-bytes
thwoughput: 0.47905 gb/s

2 thwead u-uwu enwik9
time t-taken: 992 ms
input size: 1000000000 b-bytes
output size: 1149772651 bytes
output s-size: 1149772651 b-bytes
thwoughput: 1.00788 gb/s

4 thwead uwu enwik9
4 t-thwead uwu enwik9
time taken: 695 m-ms
input size: 1000000000 b-bytes
input s-size: 1000000000 bytes
output size: 1149772651 bytes
thwoughput: 1.43854 gb/s

8 thwead uwu enwik9
time taken: 436 ms
8 t-thwead uwu enwik9
t-time taken: 436 ms
input size: 1000000000 bytes
output size: 1149772651 bytes
thwoughput: 2.29214 gb/s
t-thwoughput: 2.29214 g-gb/s

copy e-enwik9
copy enwik9

weaw 0m0.387s
usew 0m0.001s
s-sys 0m0.341s
u-usew 0m0.001s
sys 0m0.341s
```

*//todo: compawe with othew toows*
*//todo: compawe with othew t-toows*

</p>
</detaiws>

### w-why isn't this weadme uwu'd?
so i-its weadabwe
### why isn't this w-weadme uwu'd?
so its weadabwe

if u happen to find u-uwu'd text mowe w-weadabwe, o.O thewe's awways an [uwu'd](weadme_uwu.txt) v-vewsion
if u happen to find uwu'd text mowe w-weadabwe, (ꈍᴗꈍ) thewe's a-awways an [uwu'd](weadme_uwu.txt) v-vewsion

### o-ok but why awen't t-thewe any s-settings i can change?!1?!!1
### ok but why awen't thewe any settings i can change?!1?!!1
fwee w-wiww is an iwwusion

### w-wtf this is so unpwofessionaw how awe u gonna get hiwed at faang nyow?! (U ﹏ U)
d-don't wowwy, σωσ i-i've got u covewed
### w-wtf this i-is so unpwofessionaw h-how awe u gonna get hiwed at faang nyow?! (U ᵕ U❁)
don't wowwy, i've got u covewed

#### titwe: uwu is aww you nyeed
#### t-titwe: u-uwu is aww you nyeed

#### abstwact

w-wecent advances i-in computing have made stwides i-in pawawwewization, ʘwʘ whethew
at a fine-gwained w-wevew with simd instwuctions, (U ﹏ U) o-ow at a high wevew with muwtipwe
cpu cowes. (ꈍᴗꈍ) taking advantage of t-these advances, -.- w-we expwowe how the u-usefuw
task of pewfowming an uwu twansfowmation on pwain text can be scawed up t-to wawge
input d-datasets. o.O ouw contwibutions i-in t-this papew awe thweefowd: fiwst, (⑅˘꒳˘) we pwesent, ( ͡o ω ͡o )
to ouw knowwedge, (///ˬ///✿) the fiwst wigowous d-definition of u-uwu'd text. >w< second, σωσ we show
ouw n-nyovew awgowithms f-fow uwu-ing text, o.O expwoiting vectowization a-and
m-muwtithweading f-featuwes that awe avaiwabwe on modewn cpus. -.- finawwy, o.O w-we pwovide
w-wigowous expewimentaw w-wesuwts that s-show how ouw i-impwementation couwd be the
"fastest in the west." i-in ouw benchmawks, ( ͡o ω ͡o ) w-we obsewve t-that ouw impwementation
was awmost as a fast as a-a simpwe fiwe copy, o.O w-which is entiwewy i-io-bound. (U ﹏ U)
w-we bewieve ouw w-wowk has potentiaw appwications i-in vawious domains, (U ﹏ U) f-fwom data
augmentation and text p-pwepwocessing fow nyatuwaw wanguage p-pwocessing, (U ﹏ U) to
giving authows t-the abiwity to convey potentiawwy w-whowesome ow kawaii~ meme m-messages
with minimaw time and effowt. (U ᵕ U❁)
w-wecent advances i-in computing have made stwides i-in pawawwewization, w-whethew
a-at a fine-gwained wevew with simd instwuctions, UwU o-ow at a high wevew w-with muwtipwe
c-cpu cowes. (U ﹏ U) taking a-advantage of t-these advances, (U ﹏ U) we expwowe how the usefuw
task of p-pewfowming an u-uwu twansfowmation o-on pwain text can be scawed up to wawge
input d-datasets. UwU ouw contwibutions i-in t-this papew awe thweefowd: f-fiwst, -.- w-we pwesent, σωσ
to ouw knowwedge, òωó the f-fiwst wigowous d-definition of uwu'd text. OwO second, w-we show
ouw nyovew awgowithms f-fow uwu-ing text, (˘ω˘) expwoiting vectowization a-and
muwtithweading f-featuwes that awe avaiwabwe on modewn c-cpus. finawwy, (ꈍᴗꈍ) we pwovide
wigowous expewimentaw w-wesuwts that s-show how ouw impwementation couwd be the
"fastest i-in the west." in ouw benchmawks, we obsewve that ouw impwementation
was awmost as a fast as a-a simpwe fiwe copy, >w< w-which is entiwewy i-io-bound. rawr x3
w-we bewieve ouw w-wowk has potentiaw appwications in vawious domains, (U ᵕ U❁) f-fwom data
augmentation a-and text pwepwocessing f-fow nyatuwaw wanguage pwocessing, σωσ t-to
giving authows the abiwity t-to convey potentiawwy whowesome o-ow kawaii~ meme m-messages
with m-minimaw time and effowt. ( ͡o ω ͡o )

*// todo: w-wwite papew*

*// t-todo: wwite mowe about machine weawning so i-i get funding*
*// t-todo: wwite m-mowe about machine w-weawning so i get funding*

### ok i nyeed to use this fow something and i nyeed the wicense info
### ok i nyeed to use this fow something and i n-nyeed the wicense info
mit wicense

### o-ok but i h-have an issue with t-this ow a suggestion o-ow a question n-nyot answewed hewe
### ok but i have an issue with this ow a suggestion ow a question nyot answewed hewe
open an issue, (U ᵕ U❁) be nyice

### w-wefewences
* h-https://honk.moe/toows/owo.htmw
* https://github.com/iamwifki/uwuizew
* h-https://github.com/deadshot465/owoify_ws
* https://kawaii~kaomoji.com/chawactews/uwu/
### wefewences
* https://honk.moe/toows/owo.htmw
* h-https://github.com/iamwifki/uwuizew
* https://github.com/deadshot465/owoify_ws
* h-https://kawaii~kaomoji.com/chawactews/uwu/
* h-https://kawaii~kaomoji.com/chawactews/owo/
* https://kawaii~kaomoji.com/chawactews/fwowew-giww/
* a-and many mowe; wet me know i-if i missed anything
* h-https://kawaii~kaomoji.com/chawactews/fwowew-giww/
* a-and many mowe; wet me know if i missed anything
Loading

0 comments on commit e9654bd

Please sign in to comment.