Skip to content

Latest commit

 

History

History
322 lines (257 loc) · 11.2 KB

README.md

File metadata and controls

322 lines (257 loc) · 11.2 KB

PayDayLoan

Build Status Coverage Status Hex.pm version Hex.pm downloads License API Docs

Fast cache now!

This project provides a framework for building on-demand caching in Elixir. It provides a synchronous API to a cache that is loaded asynchronously. The cache itself may be backed in any way that you choose, though the default is to use an ETS table backend that has several built-in features for managing the mapping of keys to process ids (e.g., a process registry). You have the option of implementing your own backend using Redis, mnesia, a single process, etc.

PDL is designed for low-latency access to cache elements after they are initially loaded and gives you a framework to minimize load time by performing batch loads. This works very well with data streaming applications that have multiple workers processing events in parallel and are sharing cache state across workers.

Think of PDL as a cache "frontend". In a typical application, we may want to load data from a database and cache it for fast lookup later. PDL provides a "frontend" so that MyCache.get(some_id) will automatically make sure that the data corresponding to some_id is loaded into the cache and will return the value once it is available (or time out if the load takes too long). It batches the loading of data so that you can take advantage of, e.g., database queries that fetch multiple records in one call.

The actual storage of the data is done by a cache "backend". PDL provides a default backend via PayDayLoan.EtsBackend that is quite flexible. You can, however, implement your own backend using the PayDayLoan.Backend behaviour. This is useful for using an external service (e.g., Redis) as a cache backend. See the examples below.

NOTE _pid functions (e.g., PayDayLoan.get_pid/2) are deprecated and have been removed. These functions can be replaced with their non-_pid equivalents. get_pid is replaced with get, peek_pid is replaced with peek, and with_pid is replaced with with_value. 0.3.0 was the last release that included the _pid functions.

Key ideas

  • Presents a synchronous API for asynchronous cache loading
  • The cache consists of key-value pairs
  • Provides a default backend for storing values in an ETS table but allows arbitrary backend implementations
  • Tries very hard not to use process messaging in the main lookup API because that can be a bottleneck. Uses ETS tables for state management.
  • Encourages bulk queries for cache loading.
  • Provides hooks for instrumentation

Example usage: Default backend

# cache wrapper module - this wraps the PDL functions so that they
#   make sense within the context of your application
defmodule MyCache do
  # defines MyCache.pay_day_loan/0 (and alias pdl/0),
  #    which is set up with defaults and the supplied callback module
  use PayDayLoan, callback_module: MyCacheLoader

  # optionally pass in other arguments to override defaults, e.g.,
  #   use PayDayLoan, callback_module: MyCacheLoader, batch_size: 100
  
  # also defines pass-through functions for the PayDayLoan module -
  #  e.g., `MyCache.get(key)` is a pass-through to
  #   `MyCache.get(MyCache.pdl(), key)`
end

# cache loader callback module - this will, for example, execute database
#   queries and turn the results into cache elements (e.g., Agent or
#   GenServer processes)
defmodule MyCacheLoader do
  @behaviour PayDayLoan.Loader
 
  def key_exists?(key) do
    # should return true if the key exists -
    #   e.g., if "SELECT count(1) FROM some_table WHERE id = #{key}" returns > 0
  end

  def bulk_load(keys) do
    # code to look up records for keys in database (or whatever)
    #  should return a list of tuples of the format
    #  [{key, load_datum}]
  end
  
  def new(key, load_datum) do
    # note these are three separate examples - your callback will not do
    #   all three

    # if we are using processes:
    Agent.start_link(fn -> load_datum end)

    # if we want to store a callback:
    {:ok, fn -> {:ok, load_datum} end}

    # if we want to store the bare value
    {:ok, load_datum}
  end
  
  def refresh(existing_value, key, load_datum) do
    # note these are three separate examples - your callback will not do
    #   all three

    # if we are using proccesses, the existing_value is the pid of the
    #   already-started process
    pid = existing_value
    Agent.update(pid, fn(_cached_datum) -> load_datum end)
    # we need to return the pid back
    {:ok, pid}

    # or we could stop the existing pid and replace it with a new one
    Agent.stop(pid)
    Agent.start_link(fn -> load_datum end)

    # or if we stored a callback
    {:ok, cached_datum} = existing_value.()
    Logger.info("Replacing #{inspect cached_datum} with #{inspect load_datum}")
    {:ok, fn -> {:ok, load_datum} end}

    # or to store the new datum as a bare value
    {:ok, load_datum}
  end
end

# Add PDL to your existing supervision tree so that everything initializes properly
defmodule MyOTPApp do
  use Application 

  # existing Application.start callback
  def start(_type, _args) do
    my_supervisor_children = [
      # ... existing children specs
      PayDayLoan.supervisor_specification(MyCache.pdl)
    ]
    
    # for example
    Supervisor.start_link(my_supervisor_children, supervisor_opts)
  end
end

# synchronous API - behind the scenes will add the key (1) to the
#   load state table and the asynchronous loader will include that
#   in its next load cycle - this call does not return until either
#   the cache is loaded (via new above) or the request times out
{:ok, value} = MyCache.get(1)

Example usage: Process backend (e.g., Redis connection)

# cache wrapper module - this wraps the PDL functions so that they
#   make sense within the context of your application
defmodule MyCache do
  # same as above but we specify a `backend` module and disable the
  #  cache monitor, we also specify a `backend_payload` so that we can
  #  specify a unique identifier for the backend process 
  use(
    PayDayLoan,
    callback_module: MyCacheLoader,
    backend: MyCacheBackend,
    backend_payload: :my_cache,
    cache_monitor: false # we won't be storing pids
  )
end

# same ideas as above but the new/refresh callbacks are different
defmodule MyCacheLoader do
  @behaviour PayDayLoan.Loader
 
  def key_exists?(key) do
    # should return true if the key exists -
    #   e.g., if "SELECT count(1) FROM some_table WHERE id = #{key}" returns > 0
  end

  def bulk_load(keys) do
    # code to look up records for keys in database (or whatever)
    #  should return a list of tuples of the format
    #  [{key, load_datum}]
  end
  
  def new(key, load_datum) do
    # we could modify the data here, but we are just going to store it raw
    {:ok, load_datum}
  end
  
  def refresh(_existing_value, key, load_datum) do
    # we could merge the existing value and the load_datum or we could modify
    #  before we store, but we're just going to replace
    {:ok, load_datum}
  end
end

# backend behaviour implementation
defmodule MyCacheBackend do
  @behaviour PayDayLoan.Backend

  # this shows an example of how we might use a single process backend, using
  # Redis is very similar - the process would be Redis connection and the
  # various callbacks would use Redis commands

  def start_link(name), do: Agent.start_link(fn -> %{} end, name: __name)

  # nothing to do for setup
  def setup(_pdl), do: :ok

  # this would be a little more involved with redis - you could use the KEYS
  #   command and then MGET but with a large cache, that approach is not
  #   advised.  SCAN can be used with larger caches.
  def reduce(pdl, acc0, reducer) do
    Agent.get(pdl.backend_payload, fn(m) -> Enum.reduce(m, acc0, reducer) end)
  end

  # with redis this could be a call to DBSIZE
  def size(pdl), do: Agent.get(pdl.backend_payload, &map_size/1) 

  # with redis this could be a call to the KEYS command
  def keys(pdl), do: Agent.get(pdl.backend_payload, &Map.keys/1)

  # see comments on the reduce command
  def values(pdl), do: Agent.get(pdl.backend_payload, &Map.values/1)

  # this should be a simple GET command in redis
  def get(pdl, key) do
    case Agent.get(pdl.backend_payload, fn(m) -> Map.get(m, key) end) do
      nil -> {:error, :not_found}
      v -> {:ok, v}
    end
  end

  # with redis you could use SET here
  def put(pdl, key, val) do
    Agent.update(pdl.backend_payload, fn(m) -> Map.put(m, key, "V#{val}") end)
  end

  # corresponds to redis DEL
  def delete(pdl, key) do
    Agent.update(pdl.backend_payload, fn(m) -> Map.delete(m, key) end)
  end
end

# Add PDL to your existing supervision tree so that everything initializes properly
defmodule MyOTPApp do
  use Application 

  # existing Application.start callback
  def start(_type, _args) do
    my_supervisor_children = [
      # start the backend with the payload as its name
      worker(MyCacheBackend, [MyCache.pdl().backend_payload]),
      # ... existing children specs
      PayDayLoan.supervisor_specification(MyCache.pdl)
    ]
    
    # for example
    Supervisor.start_link(my_supervisor_children, supervisor_opts)
  end
end

# synchronous API - behind the scenes will add the key (1) to the
#   load state table and the asynchronous loader will include that
#   in its next load cycle - this call does not return until either
#   the cache is loaded (via new above) or the request times out
{:ok, value} = MyCache.get(1)

Logging & Instrumentation

The use macro accepts an event_loggers option, which should be a list of functions that take two arguments. When certain events occur, each of these functions will be called with an event atom and the key requested. The events are

  • :timed_out - Timed out while loading cache.
  • :disappeared - Key was marked as :loaded but the backend did not return a value
  • :failed - The loader failed to load a value for the key
  • :cache_miss - A requested value was not already cached
  • :no_key - The loaded says this key does not exist

Example usage:

defmodule CacheEventLogger do
  require Logger

  def log(event, key) do
    Logger.debug("Requesting key #{inspect key} caused event #{inspect event}")
  end
end

defmodule CacheEventStats do
  def log(event, key) do
    # update a statsd counter, etc.
  end
end

defmodule MyCache do
  use PayDayLoan, event_loggers: [&CacheEventLogge.log/2, &CacheEventStats.log/2]
end

The PayDayLoan.load_state_stats/1 function returns the count of keys in each load state and is also useful for instrumentation.

Development & Contributing

The usual Elixir and github contribution workflows apply. Pull requests are welcome!

mix deps.get
mix compile
mix test

License

See LICENSE.txt