Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reusing instances for data generation #569

Open
GabrielMarquezMatte opened this issue Oct 26, 2024 · 0 comments
Open

Reusing instances for data generation #569

GabrielMarquezMatte opened this issue Oct 26, 2024 · 0 comments

Comments

@GabrielMarquezMatte
Copy link

Description

I made a simple reimplementation of the Generate functions, since I saw that the Generate for multiple objects simply executes Enumerable.Range(1, count).Select(i => Generate(ruleSets)).ToList();. I got a speedup of roughly 90 times simply by creating a helper function and reusing objects. Also I got a 200x improvement in memory consumption and zero Gen1 GC.


BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4317/23H2/2023Update/SunValley3)
AMD Ryzen 7 5700X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]     : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX2
  DefaultJob : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX2


Method count Mean Error StdDev Gen0 Gen1 Allocated
GenerateMyFaker 100 21.00 μs 0.196 μs 0.164 μs 0.6104 - 10.18 KB
GenerateFaker 100 1,688.76 μs 27.635 μs 24.498 μs 121.0938 7.8125 2008.43 KB
GenerateMyFakerLazy 100 22.09 μs 0.145 μs 0.129 μs 0.7019 - 11.63 KB
GenerateFakerLazy 100 1,665.26 μs 31.268 μs 42.799 μs 121.0938 7.8125 2008.29 KB

LINQPad Code Example

The benchmark code is as below

using System.Collections;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Bogus;
using NoAlloq;

namespace Test
{
    public sealed class Student
    {
        public required string Name { get; set; }
        public required string Email { get; set; }
        public int Age { get; set; }
        public DateTime RegisteredAt { get; set; }
    }
    public sealed class MyFaker : Faker<Student>
    {
        private Student Generate(Func<Faker, Student> creator, Faker faker, string[] rules)
        {
            var instance = creator(faker);
            PopulateInternal(instance, rules);
            return instance;
        }
        public Student[] Generate(int count)
        {
            ref var createRule = ref CollectionsMarshal.GetValueRefOrNullRef(CreateActions, Default);
            if (Unsafe.IsNullRef(ref createRule))
            {
                return [];
            }
            var students = new Student[count];
            var cleanRules = ParseDirtyRulesSets(null);
            foreach (ref var student in students.AsSpan())
            {
                student = Generate(createRule, FakerHub, cleanRules);
            }
            return students;
        }
        public IEnumerable<Student> GenerateLazy(int count)
        {
            if (!CreateActions.TryGetValue(Default, out var createRule))
            {
                return [];
            }
            var cleanRules = ParseDirtyRulesSets(null);
            return new StudentEnumerable(count, createRule, FakerHub, this, cleanRules);
        }
        private struct StudentEnumerable(int size, Func<Faker, Student> creator, Faker faker, MyFaker myFaker, string[] rules) : IEnumerable<Student>
        {
            private struct StudentEnumerator(int size, Func<Faker, Student> creator, Faker faker, MyFaker myFaker, string[] rules) : IEnumerator<Student>
            {
                public int _index;
                public readonly object Current => creator(faker);
                readonly Student IEnumerator<Student>.Current => myFaker.Generate(creator, faker, rules);

                public void Dispose()
                {
                    _index = 0;
                }

                public bool MoveNext()
                {
                    if (_index < size)
                    {
                        _index++;
                        return true;
                    }
                    return false;
                }

                public void Reset()
                {
                    _index = 0;
                }
            }
            public readonly IEnumerator GetEnumerator()
            {
                return new StudentEnumerator(size, creator, faker, myFaker, rules);
            }

            readonly IEnumerator<Student> IEnumerable<Student>.GetEnumerator()
            {
                return new StudentEnumerator(size, creator, faker, myFaker, rules);
            }
        }
    }

    [MemoryDiagnoser]
    public class BenchmarkTest
    {
        private readonly MyFaker myFaker = new();
        private readonly Faker<Student> faker = new();
        [GlobalSetup]
        public void Setup()
        {
            myFaker.RuleFor(student => student.Name, faker => faker.Person.FullName);
            faker.RuleFor(student => student.Name, faker => faker.Person.FullName);
            myFaker.RuleFor(student => student.Email, faker => faker.Person.Email);
            faker.RuleFor(student => student.Email, faker => faker.Person.Email);
            myFaker.RuleFor(student => student.Age, faker => faker.Random.Number(18, 60));
            faker.RuleFor(student => student.Age, faker => faker.Random.Number(18, 60));
            myFaker.RuleFor(student => student.RegisteredAt, faker => faker.Date.Past());
            faker.RuleFor(student => student.RegisteredAt, faker => faker.Date.Past());
        }

        [Benchmark]
        [Arguments(100)]
        public Student[] GenerateMyFaker(int count)
        {
            return myFaker.Generate(count);
        }

        [Benchmark]
        [Arguments(100)]
        public List<Student> GenerateFaker(int count)
        {
            return faker.Generate(count);
        }
        
        [Benchmark]
        [Arguments(100)]
        public List<Student> GenerateMyFakerLazy(int count)
        {
            return myFaker.GenerateLazy(count).ToList();
        }

        [Benchmark]
        [Arguments(100)]
        public List<Student> GenerateFakerLazy(int count)
        {
            return faker.GenerateLazy(count).ToList();
        }
    }

    public static class Program
    {
        public static void Main(string[] args)
        {
            BenchmarkRunner.Run<BenchmarkTest>();
        }
    }
}

What alternatives have you considered?

The workaround would be simply reimplement the Generate functions to reuse the "cleanRules", "createRule" variables and create a new context for each element

Could you help with a pull-request?

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant