Codon slower than CPython in task with pattern matching #624

rikfarrow · 2025-02-04T18:36:27Z

Codon is slower than CPython in a task with pattern matching

rik@nuke:~/Reports/GR$ time report > t2

real 0m6.443s
user 0m8.682s
sys 0m1.122s
rik@nuke:~/Reports/GR$ time report.py > t1

real 0m4.947s
user 0m4.419s
sys 0m0.510s
rik@nuke:~/Reports/GR$ wc -l weblogs
7282338 weblogs

report is "codon build report.py";

rik@nuke:~/Reports/GR$cat ~/bin/report.py

#!/usr/bin/python3

import re
with open('e') as file:
d = {}
# total number of hits; Codon [int]
total = 0
regexString = re.compile('/login.* H')
for line in file:
# Codon disliked: m = re.findall(regexString,line)
m = regexString.findall(line)
if len(m) > 0: # make sure there was a match
if re.search('destination=', m[0]):
continue # skip lines containing destination=
y = re.split(' ',m[0])
# add 1 to each dict entry, or init as 1
d[y[0]] = d.get(y[0], 0) + 1
# I took this from Karpathy on makemore, and the sort
# lambda where making -kv means a reverse order sort
# https://www.youtube.com/watch?v=PaCmpygFfXo 12:36
total = total+1
sorted_count=sorted(d.items(), key = lambda kv: -kv[1])
for item in sorted_count:
print(item[0], item[1])
print("Total number of hits: ", total)

arshajii · 2025-02-05T16:40:39Z

Hi Rik, thanks for the report. Haven't tried your code yet but just checking if you used the -release flag when compiling (e.g. codon build -release report.py)? That flag is required to enable optimizations.

rikfarrow · 2025-02-05T17:07:18Z

No, I didn't know about using the -release flag. I just rebuilt my script with codon build -release report.py and the results are the same (near enough) ***@***.***:~/Reports/GR$ time report > t2 real 0m6.599s user 0m8.175s sys 0m1.650s My assumption is that the 50% increase in time has to do with pattern matching, although your docs do mention possible issues with IO. The web log file is a large one at 3.2 GB and seven million lines. Rik

…

On Wed, Feb 5, 2025 at 9:41 AM A. R. Shajii ***@***.***> wrote: Hi Rik, thanks for the report. Haven't tried your code yet but just checking if you used the -release flag when compiling (e.g. codon build -release report.py)? That flag is required to enable optimizations. — Reply to this email directly, view it on GitHub <#624 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADFU6NSX6SK2UOGWWMXSJID2OI5J5AVCNFSM6AAAAABWPIGU7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZXGQ2TGMJXGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

elisbyberi · 2025-02-05T22:14:15Z

Here is a benchmark using a sample file:

import re
import time

# with open('e', 'w') as w:
#     for i in range(1000000):
#         w.write("""192.168.1.1 - - [05/Feb/2025:10:15:32] "/login user=admin H"
#         192.168.1.3 - - [05/Feb/2025:10:17:12] "/home page=dashboard H"
#         192.168.1.1 - - [05/Feb/2025:10:18:07] "/login user=guest H"
#         192.168.1.4 - - [05/Feb/2025:10:19:55] "/login user=admin destination=somewhere H"
#         192.168.1.2 - - [05/Feb/2025:10:20:22] "/login user=root H"
#         """)


def main():
    with open('e') as file:
        d = {}
        # total number of hits; Codon [int]
        total = 0
        regexString = re.compile(r'/login.* H')

        for line in file:
            m = regexString.findall(line)
            if len(m) > 0:  # make sure there was a match
                if re.search(r'destination=', m[0]):
                    continue  # skip lines containing destination=

                y = re.split(r' ', m[0])
                # add 1 to each dict entry, or init as 1
                d[y[0]] = d.get(y[0], 0) + 1
                total = total + 1

        # sorted_count = sorted(d.items(), key=lambda kv: -kv[1])
        # for item in sorted_count:
        #     print(item[0], item[1])

        # print("Total number of hits: ", total)


if __name__ == '__main__':
    t = time.time()
    main()
    print(time.time() - t)

Result:

10.817  # Codon 0.18.0
12.429415702819824  # Python 3.10

rikfarrow · 2025-02-05T23:01:27Z

Interesting. With the sort and print commented out, not a lot more time spent. But still slower on my system: $ python3 t.py # where t.py is the program included in the email 5.171091318130493 $ codon run -release t.py 6.33556 $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 140 model name : 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz stepping : 1 microcode : 0xb8 cpu MHz : 1247.248 cache size : 8192 KB ... I'm running a fully patched Debian. Just weird that the one test I tried is slower for me than it is for you. What I did like is that compilation just worked. When I played with codon in 2023, I had lots of trouble getting my little scripts to compile. I'll try some of your synthetic examples from the blog post next. Rik

…

On Wed, Feb 5, 2025 at 3:14 PM Elis Byberi ***@***.***> wrote: Here is a benchmark using a sample file: import reimport time # with open('e', 'w') as w:# for i in range(1000000):# w.write("""192.168.1.1 - - [05/Feb/2025:10:15:32] "/login user=admin H"# 192.168.1.3 - - [05/Feb/2025:10:17:12] "/home page=dashboard H"# 192.168.1.1 - - [05/Feb/2025:10:18:07] "/login user=guest H"# 192.168.1.4 - - [05/Feb/2025:10:19:55] "/login user=admin destination=somewhere H"# 192.168.1.2 - - [05/Feb/2025:10:20:22] "/login user=root H"# """) def main(): with open('e') as file: d = {} # total number of hits; Codon [int] total = 0 regexString = re.compile(r'/login.* H') for line in file: m = regexString.findall(line) if len(m) > 0: # make sure there was a match if re.search(r'destination=', m[0]): continue # skip lines containing destination= y = re.split(r' ', m[0]) # add 1 to each dict entry, or init as 1 d[y[0]] = d.get(y[0], 0) + 1 total = total + 1 # sorted_count = sorted(d.items(), key=lambda kv: -kv[1]) # for item in sorted_count: # print(item[0], item[1]) # print("Total number of hits: ", total) if __name__ == '__main__': t = time.time() main() print(time.time() - t) Result: 10.817 # Codon 0.18.0 12.429415702819824 # Python 3.10 — Reply to this email directly, view it on GitHub <#624 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADFU6NUUZE24CP2BJDK2QRD2OKEM3AVCNFSM6AAAAABWPIGU7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZYGE3DAMRSGM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

rikfarrow · 2025-02-22T18:26:29Z

I tried the first two examples in the blog post, forgetting to include -release. And they both run slower than python. I thought perhaps having the very large (500_000_000) arrays might be a problem in the approximate pi script, so I divided both x and y by ten and tried again. Now codon is even slower than python. I was about to give up, but decided to at least let you know what's happening when I try the examples in your blog post. In one of your emails, you mention needing to include -release when running or building using codon. When I tried that, the codon version of calculating pi does run ten times faster. Your blog post never mentions using -release. And why is that not the default? Imagine what happens when other people try what I just did and discover crappy performance when using codon, compared to python3? As an editor, I constantly have to point out to authors that they have left out critical information because that information is so familiar to the authors that they unconsciously assume everybody knows about it. I think you are doing that with -release. Rik

…

On Wed, Feb 5, 2025 at 4:01 PM Rik Farrow ***@***.***> wrote: Interesting. With the sort and print commented out, not a lot more time spent. But still slower on my system: $ python3 t.py # where t.py is the program included in the email 5.171091318130493 $ codon run -release t.py 6.33556 $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 140 model name : 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz stepping : 1 microcode : 0xb8 cpu MHz : 1247.248 cache size : 8192 KB ... I'm running a fully patched Debian. Just weird that the one test I tried is slower for me than it is for you. What I did like is that compilation just worked. When I played with codon in 2023, I had lots of trouble getting my little scripts to compile. I'll try some of your synthetic examples from the blog post next. Rik On Wed, Feb 5, 2025 at 3:14 PM Elis Byberi ***@***.***> wrote: > Here is a benchmark using a sample file: > > import reimport time > # with open('e', 'w') as w:# for i in range(1000000):# w.write("""192.168.1.1 - - [05/Feb/2025:10:15:32] "/login user=admin H"# 192.168.1.3 - - [05/Feb/2025:10:17:12] "/home page=dashboard H"# 192.168.1.1 - - [05/Feb/2025:10:18:07] "/login user=guest H"# 192.168.1.4 - - [05/Feb/2025:10:19:55] "/login user=admin destination=somewhere H"# 192.168.1.2 - - [05/Feb/2025:10:20:22] "/login user=root H"# """) > > def main(): > with open('e') as file: > d = {} > # total number of hits; Codon [int] > total = 0 > regexString = re.compile(r'/login.* H') > > for line in file: > m = regexString.findall(line) > if len(m) > 0: # make sure there was a match > if re.search(r'destination=', m[0]): > continue # skip lines containing destination= > > y = re.split(r' ', m[0]) > # add 1 to each dict entry, or init as 1 > d[y[0]] = d.get(y[0], 0) + 1 > total = total + 1 > > # sorted_count = sorted(d.items(), key=lambda kv: -kv[1]) > # for item in sorted_count: > # print(item[0], item[1]) > > # print("Total number of hits: ", total) > > if __name__ == '__main__': > t = time.time() > main() > print(time.time() - t) > > Result: > > 10.817 # Codon 0.18.0 > 12.429415702819824 # Python 3.10 > > — > Reply to this email directly, view it on GitHub > <#624 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ADFU6NUUZE24CP2BJDK2QRD2OKEM3AVCNFSM6AAAAABWPIGU7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZYGE3DAMRSGM> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codon slower than CPython in task with pattern matching #624

Codon slower than CPython in task with pattern matching #624

rikfarrow commented Feb 4, 2025

arshajii commented Feb 5, 2025

rikfarrow commented Feb 5, 2025 via email

elisbyberi commented Feb 5, 2025

rikfarrow commented Feb 5, 2025 via email

rikfarrow commented Feb 22, 2025 via email

Codon slower than CPython in task with pattern matching #624

Codon slower than CPython in task with pattern matching #624

Comments

rikfarrow commented Feb 4, 2025

arshajii commented Feb 5, 2025

rikfarrow commented Feb 5, 2025 via email

elisbyberi commented Feb 5, 2025

rikfarrow commented Feb 5, 2025 via email

rikfarrow commented Feb 22, 2025 via email