Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UUIDs are not actually 128-bits of randomness #1

Open
ppxxcc opened this issue Mar 5, 2023 · 7 comments
Open

UUIDs are not actually 128-bits of randomness #1

ppxxcc opened this issue Mar 5, 2023 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@ppxxcc
Copy link

ppxxcc commented Mar 5, 2023

Problem

In gimbal_uuid.c, UUIDs are generated with the following:

// Implemented per RFC4122 section 4.4
GBL_EXPORT GBL_RESULT GblUuid_initV4(GblUuid* pSelf) {
    GBL_CTX_BEGIN(NULL);
    GBL_CTX_VERIFY_POINTER(pSelf);

    // First initialize all bytes to random/psuedo-random numbers
    for(unsigned b = 0; b < GBL_UUID_BYTE_COUNT; ++b) {
        pSelf->bytes[b] = gblRandRange(0, 255);
    }

where gblRandRange(0,255) resolves to

GBL_EXPORT int gblRandRange(int min, int max)  {
    GBL_ASSERT(max <= RAND_MAX);
    static GblBool seeded = GBL_FALSE;
    if(!seeded) GBL_UNLIKELY {
        srand((unsigned)gblSeed(0));
        seeded = GBL_TRUE;
    }
    return (rand() % (max - min + 1)) + min;
}

The C standard library rand() will typically only have a $2^{32} -1$ long period before it begins repeating for any given seed.
The issue arises, because generating multiple random numbers, and concatenating them, does not increase randomness. Since a PRNG's output is determinate, for any given seed, every next value is determinate.

Also, there is no reseeding after it is initially seeded.

In the following program I wrote which basically implements the same behavior

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include <stdint.h>

#define SIZE 16

int main(void)
{
    srand(time(NULL));
    
    uint8_t reference[SIZE] = {0};
    uint8_t current[SIZE] = {0};
    
    printf("Reference: ");
    for(int i = 0; i < SIZE; i++)
    {
        reference[i] = (uint8_t)(rand() % 256);
        printf("%02X ", reference[i]);
    }
    printf("\nFinding duplicates...\n");
    
    clock_t start = clock();

    for(uint64_t i = 1; i < UINT64_MAX; i++)
    {
        for(int j = 0; j < SIZE; j++)
            current[j] = (uint8_t)(rand() % 256);
        
        
        if(!memcmp(reference, current, SIZE))
        {
            printf("Duplicate found after %llu generations\n", i);
            printf("Duplicate: ");
            for(int k = 0; k < SIZE; k++) printf("%02X ", current[k]);
            printf("\n");
            break;
        }
    }
    
    clock_t end = clock();
    float sec = (float)(end-start) / CLOCKS_PER_SEC;
    
    printf("CPU time: %.3f seconds.\n", sec);
    
    return 0;
}

Gives the following output

Reference: 51 82 D7 92 DF 69 AB B6 D1 B4 A1 8E 60 0E C8 B9
Finding duplicates...
Duplicate found after 1048576 generations
Duplicate: 51 82 D7 92 DF 69 AB B6 D1 B4 A1 8E 60 0E C8 B9
CPU time: 0.157 seconds.

Solution

To generate proper UUIDs with all the necessary bits of randomness, implementing a 128-bit PRNG yourself will likely be the best option. An LFSR or an LCG are commonly used in PRNGs.

In addition, you could also find out a scheme for reseeding your RNG in-between number generations, that could help too.

@ppxxcc
Copy link
Author

ppxxcc commented Mar 5, 2023

Curiously enough, regardless of the seed, it always finds a duplicate in 1048576 generations.

Since we are generating 16-bytes each time, 1048576 * 16 = 16,777,216 which is exactly equal to $2^{24}$ .

Since each time we are doing a modulus $256=2^8$ to get a random byte, we are losing 8 bits of precision.

So, since the rand() period length is $2^{32}$, and we are losing 8-bits of precision from the modulus to get a 0-255 byte, and since we are generating 16-bytes every time, that is how we get the duplicate in 1048576 generations.

@gyrovorbis
Copy link
Owner

gyrovorbis commented Mar 9, 2023

Well, well, well. Also within gimbal_hash.h are:


GBL_EXPORT void     gblRandBuffer    (void*   pData,
                                      GblSize size)                    GBL_NOEXCEPT;

GBL_EXPORT int      gblRandString    (char*       pBuffer,
                                      int         minSize,
                                      int         maxSize,
                                      const char* pCharList)           GBL_NOEXCEPT;

Which are going to suffer from the same deficiencies... and I use them like everywhere for unit tests... Noted. Damn fine bug reporting.

@gyrovorbis gyrovorbis added the bug Something isn't working label Mar 9, 2023
@gyrovorbis gyrovorbis self-assigned this Mar 9, 2023
@gyrovorbis
Copy link
Owner

I implemented a new "randomness" API and back-end with utilities for generating all sorts of random types and bounded data sets as well as various discrete and continuous statistical distributions for said randomness...

The new distributions generator back-end is backed by an LRG algorithm which looks very good and does reseed itself after every pass. When I get time, I'll add unit tests to prove its randomness and hook it up to GblUuid.

The plan is to have a generic API for seeding and setting the random generator algorithm which is then used by the rest of the library, so you can use libc or the new LRG generator libGimbal will provide or whatever else.

Newfangled randomness API:
https://github.com/gyrovorbis/libgimbal/blob/master/lib/api/gimbal/algorithms/gimbal_random.h

"Lehmer" LRG back-end (not yet exposed):

GBL_EXPORT double gblRandLehmer(void) {

@gyrovorbis
Copy link
Owner

gyrovorbis commented Aug 8, 2023

Holy shit, dude, how terrible is whatever version of libC you're testing with?

I'm redoing your test setup with pitting the default libc rand() against my new lehmer LCG random generator, and I'm basically getting an infinite loop with either algorithm trying to match 4 bytes in a row, neither ever completes.

* ********* Starting TestSuite [GblRandomTestSuite] *********
* [ INIT      ]: GblRandomTestSuite
* [ RUN       ]: GblRandomTestSuite::libCDuplicate
Duplicate found after 4512138 generations
Duplicate: 67 C6 69 
* [      PASS ]: GblRandomTestSuite::libCDuplicate (153.970 ms)
* [ RUN       ]: GblRandomTestSuite::lehmerDuplicate
Duplicate found after 11685257 generations
Duplicate: 8C AC 7E 
* [      PASS ]: GblRandomTestSuite::lehmerDuplicate (264.649 ms)
* [ FINAL     ]: GblRandomTestSuite
* Totals: 2 passed, 0 failed, 0 skipped, 418.619ms, 0/0 leaked (0/0 bytes)
* ********* Finished TestSuite [GblRandomTestSuite] *********

Check out how many iterations I have to run just to find a match of 3 bytes being duplicated... At least it does look as though the new generator does collide less than the C stdlib one does, and it's also faster too.

@ppxxcc
Copy link
Author

ppxxcc commented Aug 14, 2023

Currently on vacation away from my computer. When I get back in a week I’ll do more testing and find out why it found a collision so soon.

@gyrovorbis
Copy link
Owner

Ugh. Okay, the new generator has issues on the Dreamcast build, because it relies on double types, while the Dreamcast KOS SDK is configured to treat them as floats, causing a loss of precision that just breaks the whole thing. Need to see if I can do some freaky uint64_t casting to fix it there.

@gyrovorbis
Copy link
Owner

We now have full 64-bits of floating point precision with double on Dreamcast with KOS, when configuring the environment and toolchain (it's optional). This should be resolved now, even on DC. Check me, brah.

@gyrovorbis gyrovorbis assigned ppxxcc and unassigned gyrovorbis Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants