You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But, to get this to work changes/hacks were needed to support using 'struct' to read types from a file.
(the Codon workarounds slow down model and token loading)
The other ugly workaround was to pack the bytes with '4B', to get 4 bytes at a time (inefficiently) then unpack with 'i' or 'f'.
The best solution needs a fix from Codon.
I think perhaps the smallest Codon fix would be to return bytes (b'...') if the file is opened binary. returning string otherwise.
Below shows the issue:
from python import struct
@python
def to_bytes(s):
#return bytes(s, 'iso-8859-1') if type(s) is str else s
return bytes(s, 'cp1252') if type(s) is str else s
def to_bytes_4(b):
return struct.pack('4B', ord(b[0]),ord(b[1]),ord(b[2]),ord(b[3]))
# codon file.read currently results in string. attempt to unpack as int and float, given ascii and binary strings
# this input represents data that could come from file.read
sa = 'abcd'
sb = '\xb0\xb1\xb2\xb3'
assert 4 == len(sa)
assert 4 == len(sb)
print("convert using pack 4B works for ascii")
print(to_bytes_4(sa))
print(struct.unpack('<i', to_bytes_4(sa))[0], struct.unpack('<f', to_bytes_4(sa))[0])
print("convert using python 'bytes' works for ascii")
print(to_bytes(sa))
print(struct.unpack('<i', to_bytes(sa))[0], struct.unpack('<f', to_bytes(sa))[0])
print("convert using pack 4B works for binary, though limited to fixed size")
print(to_bytes_4(sb))
print(struct.unpack('<i', to_bytes_4(sb))[0], struct.unpack('<f', to_bytes_4(sb))[0])
print("convert using python 'bytes' fails for binary using codon")
print(to_bytes(sb))
print(struct.unpack('<i', to_bytes(sb))[0], struct.unpack('<f', to_bytes(sb))[0])
The text was updated successfully, but these errors were encountered:
Hi @dmahurin -- thanks for sharing this! It seems like the best long-term solution would simply be to support the struct module in Codon. However, you can actually read the binary data as whatever type you need via pointers in Codon:
s='\xb0\xb1\xb2\xb3'print(int(Ptr[i32](s.ptr)[0])) # equivalent to struct.unpack('<i', b'\xb0\xb1\xb2\xb3')
This should be a lot more efficient than going through CPython, until we add proper support for struct.
Llama code now runs on Codon. (with a 74 X improvement compared to Python).
https://github.com/dmahurin/llama2.codon/
tairov/llama2.py#5
But, to get this to work changes/hacks were needed to support using 'struct' to read types from a file.
(the Codon workarounds slow down model and token loading)
See:
dmahurin/llama2.codon@1e2e7fa
Currently with Codon, file.read returns strings instead of bytes. Bytes are required for python struct.
One workaround is implementing a new file interface using file descriptors in python
https://github.com/dmahurin/llama2.codon/blob/codon/fdfile.py
The other ugly workaround was to pack the bytes with '4B', to get 4 bytes at a time (inefficiently) then unpack with 'i' or 'f'.
The best solution needs a fix from Codon.
I think perhaps the smallest Codon fix would be to return bytes (b'...') if the file is opened binary. returning string otherwise.
Below shows the issue:
The text was updated successfully, but these errors were encountered: