Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codon slower than CPython in task with pattern matching #624

Open
rikfarrow opened this issue Feb 4, 2025 · 5 comments
Open

Codon slower than CPython in task with pattern matching #624

rikfarrow opened this issue Feb 4, 2025 · 5 comments

Comments

@rikfarrow
Copy link

Codon is slower than CPython in a task with pattern matching

rik@nuke:~/Reports/GR$ time report > t2

real 0m6.443s
user 0m8.682s
sys 0m1.122s
rik@nuke:~/Reports/GR$ time report.py > t1

real 0m4.947s
user 0m4.419s
sys 0m0.510s
rik@nuke:~/Reports/GR$ wc -l weblogs
7282338 weblogs

report is "codon build report.py";

rik@nuke:~/Reports/GR$cat ~/bin/report.py

#!/usr/bin/python3

import re
with open('e') as file:
d = {}
# total number of hits; Codon [int]
total = 0
regexString = re.compile('/login.* H')
for line in file:
# Codon disliked: m = re.findall(regexString,line)
m = regexString.findall(line)
if len(m) > 0: # make sure there was a match
if re.search('destination=', m[0]):
continue # skip lines containing destination=
y = re.split(' ',m[0])
# add 1 to each dict entry, or init as 1
d[y[0]] = d.get(y[0], 0) + 1
# I took this from Karpathy on makemore, and the sort
# lambda where making -kv means a reverse order sort
# https://www.youtube.com/watch?v=PaCmpygFfXo 12:36
total = total+1
sorted_count=sorted(d.items(), key = lambda kv: -kv[1])
for item in sorted_count:
print(item[0], item[1])
print("Total number of hits: ", total)

@arshajii
Copy link
Contributor

arshajii commented Feb 5, 2025

Hi Rik, thanks for the report. Haven't tried your code yet but just checking if you used the -release flag when compiling (e.g. codon build -release report.py)? That flag is required to enable optimizations.

@rikfarrow
Copy link
Author

rikfarrow commented Feb 5, 2025 via email

@elisbyberi
Copy link

Here is a benchmark using a sample file:

import re
import time

# with open('e', 'w') as w:
#     for i in range(1000000):
#         w.write("""192.168.1.1 - - [05/Feb/2025:10:15:32] "/login user=admin H"
#         192.168.1.3 - - [05/Feb/2025:10:17:12] "/home page=dashboard H"
#         192.168.1.1 - - [05/Feb/2025:10:18:07] "/login user=guest H"
#         192.168.1.4 - - [05/Feb/2025:10:19:55] "/login user=admin destination=somewhere H"
#         192.168.1.2 - - [05/Feb/2025:10:20:22] "/login user=root H"
#         """)


def main():
    with open('e') as file:
        d = {}
        # total number of hits; Codon [int]
        total = 0
        regexString = re.compile(r'/login.* H')

        for line in file:
            m = regexString.findall(line)
            if len(m) > 0:  # make sure there was a match
                if re.search(r'destination=', m[0]):
                    continue  # skip lines containing destination=

                y = re.split(r' ', m[0])
                # add 1 to each dict entry, or init as 1
                d[y[0]] = d.get(y[0], 0) + 1
                total = total + 1

        # sorted_count = sorted(d.items(), key=lambda kv: -kv[1])
        # for item in sorted_count:
        #     print(item[0], item[1])

        # print("Total number of hits: ", total)


if __name__ == '__main__':
    t = time.time()
    main()
    print(time.time() - t)

Result:

10.817  # Codon 0.18.0
12.429415702819824  # Python 3.10

@rikfarrow
Copy link
Author

rikfarrow commented Feb 5, 2025 via email

@rikfarrow
Copy link
Author

rikfarrow commented Feb 22, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants