As I wrote in this blog, we got issues that some of the products have been often down (actually not down but become just very heavy) and I've not resolved it. Symptoms are like them.
- Load average is very high.
- Mainly I/O wait. Uses swap.
- No slow query log was found.
Total investigation flow
I advanced the investigation with this flow.- Know more detail of current situation
- Detect the specific process to consume memory
- Detect the specific program to consume memory
Know more detail of current situation
I've already detected the bottleneck is I/O wait of CPU. And this is occurred by swap. But I didn't know what actual process or program occurred that. So I tried to get more detailed information.Install dstat
If you want to observe server resource in real time in detail, I recommend dstat command. This allows us to monitor only important information of CPU in real time. In order to install dstat command, just run this command.$ yum install dstat
Main usage of dstat command
We can specify statistic information that we want to output with options like this.- -Tclm: Related to CPU and memory
- -Tcldr: Related to disk I/O (byte and I/O requests)
- -Tclmdrn: All info including network I/O
Also if we specify --top-bio option, we can show most expensive block I/O process together.
For example, we can get this kind of output every 1 sec.
# dstat --top-bio -Tcldr ----most-expensive---- --epoch--- ----total-cpu-usage---- ---load-avg--- -dsk/total- --io/total- block i/o process | epoch |usr sys idl wai hiq siq| 1m 5m 15m | read writ| read writ mysqld 795B 140k|1482059825| 13 3 84 0 0 0|0.39 0.23 0.16| 24k 98k|3.54 4.68 php-fpm: po 0 4096B|1482059826| 65 35 0 0 0 0|0.39 0.23 0.16| 0 8192B| 0 2.00 |1482059827| 57 43 0 0 0 0|0.39 0.23 0.16| 0 136k| 0 32.0 nginx: work 0 4096B|1482059828| 66 34 0 0 0 0|0.39 0.23 0.16| 0 0 | 0 0 jbd2/xvda1- 0 20k|1482059829| 62 38 0 0 0 0|0.39 0.23 0.16| 0 56k| 0 4.00 |1482059830| 64 36 0 0 0 0|0.44 0.24 0.17| 0 0 | 0 0
Then I got this result with dstat command.
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- usr sys idl wai hiq siq| read writ| recv send| in out | int csw | 1m 5m 15m 1 0 0 99 0 0|1704k 112k| 10k 112k|4096B 0 | 301 152 |3.92 4.19 4.88 1 0 0 99 0 0|1288k 80k|2715B 17k| 0 0 | 212 154 |3.92 4.19 4.88 0 0 0 100 0 0|1568k 144k| 774B 2167B| 24k 0 | 212 184 |3.92 4.19 4.88 0 1 0 99 0 0|1360k 96k| 981B 1016B| 0 0 | 186 129 |3.92 4.19 4.88 0 0 0 100 0 0|1064k 160k|3918B 130k| 0 0 | 250 182 |3.93 4.19 4.88 6 1 0 93 0 0|1208k 32k|4406B 117k| 0 0 | 226 198 |3.93 4.19 4.88 34 1 46 19 0 0| 304k 320k| 33k 92k| 0 0 | 391 685 |3.93 4.19 4.88 22 0 78 0 0 0| 0 8192B| 46k 73k| 0 0 | 338 486 |3.93 4.19 4.88 0 0 100 0 0 0| 0 0 | 800B 1569B| 0 0 | 83 98 |3.93 4.19 4.88 0 0 100 0 0 0| 0 0 |5000B 147k| 0 0 | 182 121 |3.62 4.12 4.85 0 0 100 0 0 0| 0 0 |9228B 213k| 0 0 | 258 127 |3.62 4.12 4.85
It's a little difficult to see on this page but I found I/O wait raised suddenly when disk read occurred. So as the next step, I tried to detect what process consumed a lot of memory.
Detect the specific program to consume memory
A little different from the flow above but I tried to detect what program consume a lot of memory first.From the fact that this application didn't have slow query log, I hypothesized that too many times DB access occurred. Then I checked the source code of pages first which have the most access. Then I found some these kinds of code.
$views = App\RecipeView::where('Month', '=', Carbon\Carbon::now()->month)->orderBy('Count', 'desc')->paginate(6); foreach ($views as $view) { $recipe = App\Recipe::where('id', '=', $view->RecipeId)->first();This kind of code occurs DB access every time of foreach loop but actually, enough output can be retrieved by one query. Then we modify these codes.
Detect the specific process to consume memory
Then I tried to detect which process consumed a lot of memory. It was easy. We can get this info with top command + M. Then I found mysqld and php-fpm process consumed the memory the most. But mysqld consumed only about 40MB. So I focused on php-fpm process. We can calculate the total memory that a block of process consumes.# ps aux | grep php-fpm-5.6 | awk '{sum += $6}END{print sum}' => About 1.5GBWe're using the server with 2GB memory so it was too big.
Restrict the max number of processes of PHP-FPM
In order to decrease the total volume of memory PHP-FPM consume, I set the config like this.; pm = dynamic pm = static ; pm.max_children = 50 pm.max_children = 20 ; pm.start_servers = 5 pm.start_servers = 4 ; pm.min_spare_servers = 5 pm.min_spare_servers = 2 pm.max_spare_servers = 10 pm.max_requests = 500max_children is the directive to specify the maximum number of processes. But with this config, I got this error when I restarted PHP-FPM.
Starting php-fpm-5.6: [17-Dec-2016 02:18:58] ALERT: [pool www] pm.min_spare_servers(2) and pm.max_spare_servers(35) cannot be greater than pm.max_children(15) [17-Dec-2016 02:18:58] ERROR: failed to post process the configuration [17-Dec-2016 02:18:58] ERROR: FPM initialization failedSo finally I set like this.
pm = dynamic ; pm = static ; pm.max_children = 50 pm.max_children = 20 ; pm.start_servers = 5 pm.start_servers = 4 pm.max_spare_servers = 10 ; pm.min_spare_servers = 5 pm.min_spare_servers = 2 pm.max_requests = 500
0 Comments:
Post a Comment