diff --git a/_performance/004-Cache.md b/_performance/004-Cache.md index 50b8c5d..578b8e5 100644 --- a/_performance/004-Cache.md +++ b/_performance/004-Cache.md @@ -13,8 +13,8 @@ Cache The typical rule for most applications is that only a fraction of its data is regularly accessed. As with many other things data can tend to follow the 80/20 rule with 20% of your data accounting for 80% of the reads and often times its higher than this. Postgres itself actually tracks access patterns of your data and will on its own keep frequently accessed data in cache. Generally you want your database to have a cache hit rate of about 99%. You can find your cache hit rate with: SELECT - sum(heap_blks_read) as heap_read, - sum(heap_blks_hit) as heap_hit, + sum(heap_blks_read) as heap_read, + sum(heap_blks_hit) as heap_hit, (sum(heap_blks_hit) - sum(heap_blks_read)) / sum(heap_blks_hit) as ratio FROM @@ -33,12 +33,12 @@ The other primary piece for improving performance is [indexes]( 0 + seq_scan + idx_scan > 0 ORDER BY n_live_tup DESC; @@ -50,21 +50,22 @@ Pro tip: If you're adding an index on a production database use `CREATE INDEX CO Looking at a real world example of the recently launched Heroku dashboard, we can run this query and see our results: - SELECT relname, - 100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used, - n_live_tup rows_in_table - FROM pg_stat_user_tables + SELECT relname, + 100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used, + n_live_tup rows_in_table + FROM pg_stat_user_tables ORDER BY n_live_tup DESC; relname | percent_of_times_index_used | rows_in_table - ---------------------+-----------------------------+--------------- + ---------------------+-----------------------------+--------------- events | 0 | 669917 - app_infos_user_info | 0 | 198218 + app_infos_user_info | 0 | 198218 app_infos | 50 | 175640 - user_info | 3 | 46718 - rollouts | 0 | 34078 favorites | 0 | 3059 - schema_migrations | 0 | 2 - authorizations | 0 | 0 + user_info | 3 | 46718 + rollouts | 0 | 34078 + favorites | 0 | 3059 + schema_migrations | 0 | 2 + authorizations | 0 | 0 delayed_jobs | 23 | 0 From this we can wee the events table which has around 700,000 rows has no indexes that have been used. From here you could investigate within my application and see some of the common queries that are used, one example is pulling the events for this blog post which you are reaching. You can see your [execution plan]() by running an [`EXPLAIN ANALYZE`]() which gives you can get a better idea of the performance of a specific query: @@ -79,10 +80,10 @@ From this we can wee the events table which has around 700,000 rows has no index Given there's a sequential scan across all that data this is an area we can optimize with an index. We can add our index concurrently to prevent locking on that table and then see how performance is: CREATE INDEX CONCURRENTLY idx_events_app_info_id ON - events(app_info_id); + events(app_info_id); EXPLAIN ANALYZE SELECT * FROM events WHERE app_info_id = 7559; - ---------------------------------------------------------------------- + ---------------------------------------------------------------------- Index Scan using idx_events_app_info_id on events (cost=0.00..23.40 rows=38 width=688) (actual time=0.021..0.115 rows=89 loops=1) : Index Cond: (app_info_id = 7559) @@ -98,9 +99,9 @@ examine the results in [New Relic](https://elements.heroku.com/addons/newrelic)a Finally to combine the two if you're interested in how many of your indexes are within your cache you can run: SELECT - sum(idx_blks_read) as idx_read, - sum(idx_blks_hit) as idx_hit, - sum(idx_blks_hit) - sum(idx\_blks\_read)) sum(idx_blks_hit) as ratio + sum(idx_blks_read) as idx_read, + sum(idx_blks_hit) as idx_hit, + (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit) as ratio FROM pg_statio_user_indexes;