Cosi 105b

### Logistics

* Magic code is: 
* Looking at "nano twitter core" submissions
* Excellent progress! Keep it up. 
* There is still a lot of time but there is also a lot to do! I do glance at the commit history and in some cases there are big lags with no updates.
* Each team has received a slack message today from me with a question about your submissions
* Remember to update readme with name of who contributed to a feature or change. How can we overcome the "server down" problem? Maybe a staging server? Not sure.
* In most cases the functionality that was required for the submission was not there. Believe me a try really hard to find it!
* The grade weight of these interim submissions is not super high. But you will not do well if you try to cram it all in at the end!

<slide_break></slide_break>

<h2 id="scalability-pattern-database-paritiioning">Scalability Pattern: Database Paritiioning</h2>
<ul>
  <li>What are the considerations in deciding whether and how to parition the database?</li>
</ul>

<slide_break></slide_break>

<h3 id="considering-the-code">Considering the code</h3>
<ul>
  <li>Minimize the number of times code calls the database (which is usually the same as the number of SQL statements are sent.)</li>
  <li>Investigate and know the capabilities of your database system</li>
  <li>Check whether there is a bulk operations which will do the job (e.g inserting ten records with one call)</li>
</ul>

<slide_break></slide_break>

<h3 id="consider-the-schema">Consider the Schema</h3>

<ul>
  <li>Are the right columns indexed? Either too many or too few can be bad for performance, depending on the scenario.</li>
  <li>Check whether there are database constraints that you can add</li>
  <li>Check whether there are stored procedures that could be useful</li>
  <li>Check whether denormalizing might help in some cases</li>
</ul>

<div class="callout callout-small">
  <span class="callout-badge">teams</span> What specific columns in what specific tables would you index or are you indexing?
</div>

<slide_break></slide_break>

<h2 id="scalability-pattern-database-partitioning">Scalability Pattern: Database Partitioning</h2>
<ul>
  <li>Advanced and central technique to deal with database scaling</li>
  <li>It can be performance (how long does an operation take)</li>
  <li>Or throughput (how many operations can be done per second)</li>
  <li>Or both</li>
</ul>

<slide_break></slide_break>

<h3 id="conflicting-definitions---partitioning-and-sharding">Conflicting definitions - Partitioning and Sharding</h3>
<ul>
  <li>When you divide a big database into several smaller ones</li>
  <li>Partitioning: Horizontal and Vertical</li>
  <li>Sharding; Horizontal Partitioning</li>
</ul>

<slide_break></slide_break>

<h3 id="whats-the-problem">What’s the problem?</h3>
<ul>
  <li>When the database is the bottleneck</li>
  <li>Add a second database server</li>
  <li>What to do with the data?</li>
</ul>

<slide_break></slide_break>

<h3 id="some-options">Some options</h3>
<ul>
  <li>Replication: Put a complete copy of the data on the second db server</li>
  <li>Pay attention to read vs. write</li>
  <li>What to do about data consistency?</li>
  <li>Parition</li>
</ul>

<slide_break></slide_break>

<h3 id="scenario">Scenario</h3>
<ul>
  <li>User Database
    <ol>
      <li>Happens to often be a monster</li>
      <li>Lots of records</li>
      <li>Each record with lots of information</li>
      <li>Accessed a lot</li>
    </ol>
  </li>
  <li>Schema
    <ol>
      <li>User: (id, name, email, biography, hobbies, college, last_login, encrypted_pw, profile_photo_jpg, …)</li>
    </ol>
  </li>
</ul>

<slide_break></slide_break>

<h3 id="vertical-partition">Vertical partition</h3>
<ul>
  <li>Often associated with an SOA</li>
  <li>Divide the User table into three different database servers:
    <ol>
      <li>User: (id, name, biography, email)</li>
      <li>Autentication: (id, last_login, encrypted_pw)</li>
      <li>Photos: (id, profile_photo_jpg)</li>
    </ol>
  </li>
  <li>How it changes your application</li>
  <li>Pretty basic rearchitecture into separate services</li>
</ul>

<slide_break></slide_break>

<h3 id="sharding-horizontal-partition">Sharding (Horizontal partition)</h3>
<ul>
  <li>“buckets” of users (== shards)</li>
  <li>How? Create multiple database servers with
    <ol>
      <li>the same schema</li>
      <li>different subset or clump of records</li>
    </ol>
  </li>
  <li>Need a way to direct requests to the right “shard”
    <ol>
      <li>inspect something about the record</li>
      <li>determine what shard to look in</li>
    </ol>
  </li>
</ul>

<slide_break></slide_break>

<h3 id="three-common-algorithms-to-decide-what-bucket-gets-a-record">Three common algorithms to decide what bucket gets a record</h3>
<ul>
  <li>Range Based: Range of some scalar value (record id, first letter of name, etc.)</li>
  <li>List Based: Take some other property (e.g. zipcode, department)</li>
  <li>Hash Based: Compute a hash on some value</li>
  <li>How it changes your application
    <ol>
      <li>Whenever you either read, write, or search</li>
      <li>Require to include enough information to pick the right shard</li>
    </ol>
  </li>
</ul>

<slide_break></slide_break>

<h3 id="pros-and-cons">Pros and Cons</h3>
<ul>
  <li>Joins become a problem
    <ol>
      <li>What was once one db is now spread over more than one db</li>
      <li>Can lead to denormalization</li>
    </ol>
  </li>
  <li>Data Integrity
    <ol>
      <li>Foreign keys might now point to another database</li>
      <li>Databases can get out of sync</li>
    </ol>
  </li>
</ul>

<div class="callout callout-small">
  <span class="callout-badge">NB</span>Both kinds of sharding are advanced techniques and you should only use them when you have quantitative reasons to believe they will improve a measured performance issue.
</div>

<slide_break></slide_break>

<div class="callout callout-small">
  <span class="callout-badge">Teams</span>Work out a plan for sharding your databases. What would you shard, why and how?
</div>

<slide_break></slide_break>

<h2 id="scalability-pattern-database-caching">Scalability Pattern: Database Caching</h2>
<ul>
  <li>Using caching (e.g. redis) to reduce db access</li>
</ul>

<slide_break></slide_break>

<h3 id="example-social-graph">Example: Social Graph</h3>
<ul>
  <li>Schema (like all of you have)
    <ol>
      <li>User(id, name)</li>
      <li>Follow(id, follower_id, following_id)</li>
      <li>Content(id, author_id)</li>
    </ol>
  </li>
  <li>Nicely normalized
    <ol>
      <li>First, Second and Third Normal form</li>
      <li>Origins of the relational database</li>
    </ol>
  </li>
  <li>Queries like:
    <ol>
      <li>How many people are following user X?</li>
      <li>Who is following user Y?</li>
      <li>What are the most recent “n” posts (i.e. content) for user “u”?</li>
      <li>What are the most recent “n” posts for users that “u” is following?</li>
    </ol>
  </li>
  <li>But to display each and every user, a join is needed!</li>
</ul>

<slide_break></slide_break>

<h3 id="measurement">Measurement</h3>
<ul>
  <li>Ask database system to analyze SQL queries that are slow
    <ol>
      <li>Discover that the social graph access was very slow</li>
    </ol>
  </li>
  <li>Discussion
    <ol>
      <li>Have you started using redis yet in your projects?</li>
      <li>What do your redis keys look like?</li>
      <li>How do you compute your cache key?</li>
    </ol>
  </li>
</ul>

<slide_break></slide_break>

<h3 id="db-caching">DB: Caching</h3>
<ul>
  <li>Use Network scale caching (Redis) to store and share across servers
    <ol>
      <li><code>count:followers:u = number</code></li>
      <li><code>count:following:u = number</code></li>
    </ol>
  </li>
  <li>How to maintain this number?</li>
  <li>How important is it that it is correct?</li>
  <li>What might make it incorrect?</li>
</ul>

<slide_break></slide_break>

<ul>
  <li>get_follower_count(user), get_following_count(user), incr_follower_count(user), decr_follower_count(user), incr_following_count(user), decr_following_count(user)</li>
  <li>What class has those methods?</li>
  <li>Where are they invoked?</li>
  <li>Result of queries?</li>
  <li>Result of search?</li>
  <li>Creating the cache key</li>
  <li>What do you store in the cache?</li>
</ul>

<div class="callout callout-small">
  <span class="callout-badge">Teams</span> Discuss and design how you would incorporate this idea in your specific nanoTwitter
</div>

<slide_break></slide_break>

<h2 class="shadow p-3 bg-white rounded">Thank you. Questions?<img class="img-fluid w-100" src="https://picsum.photos/800/100.jpg" /> <small> (random Image from picsum.photos)</small></h2>