[pgSQL] EXPLAIN ANALYZE－碎碎念

https://docs.postgresql.tw/the-sql-language/performance-tips/using-explain

在SQL指令前面加上EXPLAIN ANALYZE，可以得到該SQL指令的執行時間，方便DB QUERY時間分析使用。

可以使用 EXPLAIN 的 ANALYZE 選項檢查計劃員估算的準確性。使用此選項，EXPLAIN 實際執行查詢，然後顯示每個計劃節點中累積的真實資料列計數和真實執行時間，以及簡單 EXPLAIN 顯示的相同估計值。例如，我們可能得到這樣的結果：

EXPLAIN ANALYZE SELECT *

FROM tenk1 t1, tenk2 t2

WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)

-> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1)

Recheck Cond: (unique1 < 10)

-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1)

Index Cond: (unique1 < 10)

-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10)

Index Cond: (unique2 = t1.unique2)

Planning time: 0.181 ms

Execution time: 0.501 ms

注意，「實際時間」值是實際執行的時間，以毫秒為單位，而成本估算會以任意單位表示；所以他們不太可能匹配。通常最重要的事情是估計的資料列數量是否與現實相當接近。在這個例子中，估計都是死的，但這在實務上很不尋常。

在某些查詢計劃中，子計劃節點可能不止一次執行。例如，內部索引掃描將在上述巢狀循環計劃中的每個外部資料列執行一次。在這種情況下，循環的值回報節點的總執行次數，顯示的實際時間和資料列的值是每次執行的平均值。這樣做是為了使數字與顯示成本估算的方式相當。乘以 loopsvalue 得到實際花費在節點上的總時間。在上面的例子中，我們總共花了 0.220 毫秒來執行 tenk2 上的索引掃描。

在某些情況下，EXPLAIN ANALYZE 會在計劃節點執行時間和資料列計數之外顯示其他執行統計訊息。例如，Sort 和 Hash 節點提供了額外的訊息：

EXPLAIN ANALYZE SELECT *

FROM tenk1 t1, tenk2 t2

WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2 ORDER BY t1.fivethous;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------

Sort (cost=717.34..717.59 rows=101 width=488) (actual time=7.761..7.774 rows=100 loops=1)

Sort Key: t1.fivethous

Sort Method: quicksort Memory: 77kB

-> Hash Join (cost=230.47..713.98 rows=101 width=488) (actual time=0.711..7.427 rows=100 loops=1)

Hash Cond: (t2.unique2 = t1.unique2)

-> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) (actual time=0.007..2.583 rows=10000 loops=1)

-> Hash (cost=229.20..229.20 rows=101 width=244) (actual time=0.659..0.659 rows=100 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 28kB

-> Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244) (actual time=0.080..0.526 rows=100 loops=1)

Recheck Cond: (unique1 < 100)

-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) (actual time=0.049..0.049 rows=100 loops=1)

Index Cond: (unique1 < 100)

Planning time: 0.194 ms

Execution time: 8.008 ms

Sort 節點顯示使用的排序方法（特別是，排序是在記憶體中還是在磁碟上）以及所需的記憶體量或磁盤空間量。Hash 節點顯示 hash buckets 和批次的數量以及用於雜湊表的尖峰記憶體用量。（如果批次次數超過 1，則還會涉及磁碟空間使用，但不會顯示出來。）

另一種類型的額外訊息是過濾條件移除的資料列數目：

EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE ten < 7;

QUERY PLAN

---------------------------------------------------------------------------------------------------------

Seq Scan on tenk1 (cost=0.00..483.00 rows=7000 width=244) (actual time=0.016..5.107 rows=7000 loops=1)

Filter: (ten < 7)

Rows Removed by Filter: 3000

Planning time: 0.083 ms

Execution time: 5.905 ms

這些計數對於在連接節點處應用的過濾條件特別有用。僅當過濾條件拒絕了至少一個掃描資料列或加入節點情況下的潛在交叉查詢配對時，才會顯示「Rows Removed」。

類似於過濾條件的情況發生在「lossy」索引掃描中。例如，考慮搜尋包含特定點的多邊形：

EXPLAIN ANALYZE SELECT * FROM polygon_tbl WHERE f1 @> polygon '(0.5,2.0)';

QUERY PLAN

------------------------------------------------------------------------------------------------------

Seq Scan on polygon_tbl (cost=0.00..1.05 rows=1 width=32) (actual time=0.044..0.044 rows=0 loops=1)

Filter: (f1 @> '((0.5,2))'::polygon)

Rows Removed by Filter: 4

Planning time: 0.040 ms

Execution time: 0.083 ms

規劃程序（非常正確地）認為這個樣本資料表太小而無法進行索引掃描，因此我們進行了簡單的循序掃描，其中所有資料列都被過濾條件拒絕。但是如果我們強制使用索引掃描，我們會看到：

SET enable_seqscan TO off;

EXPLAIN ANALYZE SELECT * FROM polygon_tbl WHERE f1 @> polygon '(0.5,2.0)';

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------

Index Scan using gpolygonind on polygon_tbl (cost=0.13..8.15 rows=1 width=32) (actual time=0.062..0.062 rows=0 loops=1)

Index Cond: (f1 @> '((0.5,2))'::polygon)

Rows Removed by Index Recheck: 1

Planning time: 0.034 ms

Execution time: 0.144 ms

在這裡我們可以看到索引回傳了一個候選資料列，然後透過重新檢查索引條件來拒絕該資料列。發生這種情況是因為 GiST 索引對於多邊形包含測試來說是「lossy」的：它實際上回傳的多個資料列與目標重疊的多邊形，然後我們必須對這些資料列進行精確的測試。

EXPLAIN 有一個 BUFFERS 選項，可以與 ANALYZE 一起使用以獲得更多的執行時統計訊息：

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on tenk1 (cost=25.08..60.21 rows=10 width=244) (actual time=0.323..0.342 rows=10 loops=1)

Recheck Cond: ((unique1 < 100) AND (unique2 > 9000))

Buffers: shared hit=15

-> BitmapAnd (cost=25.08..25.08 rows=10 width=0) (actual time=0.309..0.309 rows=0 loops=1)

Buffers: shared hit=7

-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) (actual time=0.043..0.043 rows=100 loops=1)

Index Cond: (unique1 < 100)

Buffers: shared hit=2

-> Bitmap Index Scan on tenk1_unique2 (cost=0.00..19.78 rows=999 width=0) (actual time=0.227..0.227 rows=999 loops=1)

Index Cond: (unique2 > 9000)

Buffers: shared hit=5

Planning time: 0.088 ms

Execution time: 0.423 ms

BUFFERS 提供的數字有助於識別查詢的哪些部分是 I/O 密集程度最高的。

請記住，因為 EXPLAIN ANALYZE 實際上執行查詢，所以任何副作用都會照常發生，即使查詢可能輸出的任何結果都被丟棄，有利於輸出 EXPLAIN 數據。如果要在不更改資料表的情況下分析資料修改查詢，可以在之後回溯事務，例如：

BEGIN;

EXPLAIN ANALYZE UPDATE tenk1 SET hundred = hundred + 1 WHERE unique1 < 100;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------

Update on tenk1 (cost=5.07..229.46 rows=101 width=250) (actual time=14.628..14.628 rows=0 loops=1)

-> Bitmap Heap Scan on tenk1 (cost=5.07..229.46 rows=101 width=250) (actual time=0.101..0.439 rows=100 loops=1)

Recheck Cond: (unique1 < 100)

-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) (actual time=0.043..0.043 rows=100 loops=1)

Index Cond: (unique1 < 100)

Planning time: 0.079 ms

Execution time: 14.727 ms

ROLLBACK;

如此範例所示，當查詢是 INSERT，UPDATE 或 DELETE 指令時，套用資料表變更的實際工作由最上層的 INSERT，UPDATE 或 DELETE 計劃節點完成。此節點下的計劃節點執行定位舊的資料列和計算新資料的工作。所以上面，我們看到了我們已經看到的相同類型的 bitmap 資料表掃描，並且它的輸出被遞送到儲存更新資料列的 Update 節點。值得注意的是，儘管資料修改節點可能需要相當長的執行時間（此時，它佔用了大部分時間），但規劃程序目前並未在成本估算中加入任何內容來解釋該工作。這是因為要完成的工作對於每個正確的查詢計劃都是相同的，因此它不會影響計劃決策。

當 UPDATE 或 DELETE 指令影響繼承結構時，輸出可能如下所示：

EXPLAIN UPDATE parent SET f2 = f2 + 1 WHERE f1 = 101;

QUERY PLAN

-----------------------------------------------------------------------------------

Update on parent (cost=0.00..24.53 rows=4 width=14)

Update on parent

Update on child1

Update on child2

Update on child3

-> Seq Scan on parent (cost=0.00..0.00 rows=1 width=14)

Filter: (f1 = 101)

-> Index Scan using child1_f1_key on child1 (cost=0.15..8.17 rows=1 width=14)

Index Cond: (f1 = 101)

-> Index Scan using child2_f1_key on child2 (cost=0.15..8.17 rows=1 width=14)

Index Cond: (f1 = 101)

-> Index Scan using child3_f1_key on child3 (cost=0.15..8.17 rows=1 width=14)

Index Cond: (f1 = 101)

在此範例中，Update 節點需要考慮三個子資料表以及最初提到的父資料表。因此，有四個輸入掃描子計劃，每個資料表一個。為清楚起見，更新節點的註釋是為了顯示將要更新的特定目標資料表，其順序與相應的子計劃相同。（這些註釋是 PostgreSQL 9.5 的新註釋；在以前的版本中，讀取者必須透過檢查子計劃來看到目標資料表。）

EXPLAIN ANALYZE 顯示的 Planning 時間是從解析的查詢産生查詢計劃並對其進行最佳化所花費的時間。它不包括解析或重寫。

dizzy03

碎碎念

dizzy03 發表在痞客邦留言(0) 人氣()

E-mail轉寄

碎碎念

murmur

[pgSQL] EXPLAIN ANALYZE

歷史上的今天

留言列表

熱門文章

文章精選

文章搜尋

誰來我家

參觀人氣

POWERED BY

站方公告

我的連結