Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should C functions be called FFI-safely on big inputs? #327

Open
phadej opened this issue Apr 2, 2021 · 1 comment
Open

Should C functions be called FFI-safely on big inputs? #327

phadej opened this issue Apr 2, 2021 · 1 comment
Labels
internal No API-level changes

Comments

@phadej
Copy link
Contributor

phadej commented Apr 2, 2021

Something like

diff --git a/src/Data/Text/Encoding.hs b/src/Data/Text/Encoding.hs
index 239e15e..9b6a01c 100644
--- a/src/Data/Text/Encoding.hs
+++ b/src/Data/Text/Encoding.hs
@@ -436,7 +436,10 @@ encodeUtf8 (Text arr off len)
   fp <- mallocByteString (len*3) -- see https://github.com/haskell/text/issues/194 for why len*3 is enough
   withForeignPtr fp $ \ptr ->
     with ptr $ \destPtr -> do
-      c_encode_utf8 destPtr (A.aBA arr) (fromIntegral off) (fromIntegral len)
+      -- When ByteArray# is large enough, it is pinned.
+      -- Then we can use safe FFI calls, which we indeed want for big inputs!
+      (if len > 4096 then c_encode_utf8_safe else c_encode_utf8_unsafe)
+        destPtr (A.aBA arr) (fromIntegral off) (fromIntegral len)
       newDest <- peek destPtr
       let utf8len = newDest `minusPtr` ptr
       if utf8len >= len `shiftR` 1
@@ -535,5 +538,8 @@ foreign import ccall unsafe "_hs_text_decode_utf8_state" c_decode_utf8_with_stat
 foreign import ccall unsafe "_hs_text_decode_latin1" c_decode_latin1
     :: MutableByteArray# s -> Ptr Word8 -> Ptr Word8 -> IO ()
 
-foreign import ccall unsafe "_hs_text_encode_utf8" c_encode_utf8
+foreign import ccall unsafe "_hs_text_encode_utf8" c_encode_utf8_unsafe
+    :: Ptr (Ptr Word8) -> ByteArray# -> CSize -> CSize -> IO ()
+
+foreign import ccall safe "_hs_text_encode_utf8" c_encode_utf8_safe
     :: Ptr (Ptr Word8) -> ByteArray# -> CSize -> CSize -> IO ()

The benchmark suite doesn't have exercises for multithreaded programs.

As far as I understand the code now, encoding or decoding big Text will prevent GC from running. Is that an issue, hard to tell.

E.g. cryptohash-sha256 branches on size: https://hackage.haskell.org/package/cryptohash-sha256-0.11.102.0/docs/src/Crypto.Hash.SHA256.html#c_sha256_update, (my diff above is probably written better in that way too)

@Bodigrim
Copy link
Contributor

Bodigrim commented Apr 2, 2021

Cf. haskell/bytestring#282

@Lysxia Lysxia added the internal No API-level changes label Apr 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal No API-level changes
Projects
None yet
Development

No branches or pull requests

3 participants