Caching (Tue Mar 22, lect 17) | previous | next | slides |

Caching is one of the first, simplest ways of attacking scale

Logistics

  • Magic Code:

Review: Achieving Scale

  • Measuring performance:
    • How many Xs per second?
    • and/orhow long does it take to Y?
  • Analysis
    • Instrumentation (basically logging)
    • Deep thought
    • Identify the bottle neck
  • Action
    • Remove the bottleneck
  • Remember: One of the cardinal “sins” is optimizing early
    • Instead, optimize based on measurement
    • Discover which parts of your product’s features is causing a scaling problem
    • Consider which of your techniques might be brought to bear

Scalability Pattern: General Caching

What is caching?

  • Save the result of a request with a given set of parameters.
  • In a future request with the same parameter (maybe) return the same result
  • System level caching. Storage:
    1. In ‘local’ memory
    2. In ‘remote’ memory
    3. In database
    4. In Cloud

Review of storage system architectural hierarchy

  • Processor
    1. Cores
    2. Caches
    3. On board memory
  • Offboard Memory
    1. Very different speeds depending on cost
    2. On a special very fast connection (bus)
  • External local storage
    1. USB connected SSD
  • Ethernet Connected storage
    1. On local LAN
  • Ethernet connected storage
    1. On separate network (internet/cloud)

Cost of operations

  • Awareness of order of magnitude speed of operations:
  • Access registers inside CPU
  • Access CPU caches
  • Access standard RAM
  • Access local disk
    1. Access files
    2. Access local database
  • Access over network
    1. To a nearby server
    2. To a nearby database server
  • Access over the internet
    1. To a remote server
    2. To a remote database server
    3. To a remote Web Service

Memoization:

  • caching applied to an individual method
  • A basic programming technique
  • Simple

Name-value databases

  • Very fast searches and lookups
  • Distributed searches and distributed databases
  • Robust across system and application failures

Database Caching

  • To a certaine extent, it’s what databases do
  • Caching both at the server (postgres itself)
  • And at the client (the postgres and activerecord subsystems)
  • Yet a lot more can be done

HTML page caching

  • Done at the web server
  • Don’t regenerate the page if it’s requested again
  • As long as you know it hasn’t changed
  • Page fragment caching, including “russian doll caching”
  • A key feature of good frameworks

Caching with “Redis”

Advantages

  • Blindingly fast
  • Many data types: list, set, sorted set and hashes
  • Atomic operations
  • Has many uses: caching, message queue, publish subscribe, sharing application global state

An instance of “network caching”

  • Evolved from the original cached
  • Typical structure is a key-value store
  • A nosql database. But in memory!
  • Ruby bindings gem redis

Wait, where’s the data actually stored?

  • A redis host, accessible by tcp/ip: dns name + port number
  • You can run it: $ redis-server
  • Heroku can run it for you with Redis to go. Nano size is free!
  • In all cases, if the host dies, the data is gone (not 100% true)

It has some interesting characteristics

  • ATOMIC operations, e.g. “INCR” operation
  • keys that expire (TTL)
  • Supports other values: lists, sets, hashes
  • And many many more

Heroku

  • Can provide a basic free instance of it
  • Remember it has a URL and can be shared across applications
  • heroku redis:cli

Redis Concepts

  • Play with Redis
  • Keys
    • are text with colons, e.g. global:usercount by convention
    • but can be anything. You decide your structure. Colons are recommended.
  • Values
    • Are strings
    • Or compoounds: lists, sets, sorted sets, hashes
  • Note we play with commands (a kind of a REPL)
  • But you will be doing API calls

SET key value                   # store a singular key and give it a value
GET key                         # retrieve its value
INCR key                        # add one to key value, should be an integer. Atomic!
DECR key                        # delete one from key value
EXPIRE key seconds              # key will cease to exsit `seconds` later
RPUSH key value                 # Append value to list
LPUSH key value                 # Prepend value to list
LPOP key                        # Remove first value
RPOP                            # Remvoe last value
SADD key value                  # Add value to set
SREM key                        # Remove value from set
SISMEMBER key value             # Is value a member of set
SUNION key1 key2                # Get union of two sets
ZADD key score value            # Add value with score to sorted set
HSET hashname hashkey value     # Add entry to hash
HHET hashname hashkey           # Retrieve a value from hash

Putting the it together

  • Redis is a power tool!
  • Don’t be scared of putting lots of information into it
  • Remember that it’s a cache. You need to have real persistent store to recover
  • Example:
    1. Display list of 50 most recent posts for users who are followed by user uid
    2. Key is: 50_tweets_for_user:uid
    3. Value is: ordered list of tweet ids
  • Processing:
    1. When list is displayed
    2. When user :u tweets

Thank you. Questions?  (random Image from picsum.photos)