You are here

How we fixed a memory leak in solaris 10

Hi,

Here is and example of finding out, who is causing a kernel memory leak in solaris 10:

Vemos que el uso esta en el kernel:
> ::memstat
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 1160173 9063 75%
Anon 17864 139 1%
Exec and libs 1816 14 0%
Page cache 128638 1004 8%
Free (cachelist) 230961 1804 15%
Free (freelist) 2158 16 0%

Si vemos la evolución vemos como aumenta el consumo de memoria por el kernel:

> cat mem.out | grep -i Kernel
Kernel 91897 717 6%
Kernel 106725 833 7%
Kernel 446842 3490 29%
Kernel 789369 6166 51%
Kernel 922117 7204 60%
Kernel 1109375 8666 72%
Kernel 1145430 8948 74%
Kernel 1177550 9199 76%

Con el comando kmastat vemos el uso de memorias de las caches del kernel
> ::kmastat
cache buf buf buf memory alloc alloc
name size in use total in use succeed fail
------------------------- ------ ------ ------ --------- --------- -----
kmem_magazine_1 16 76 203 8192 265 0
kmem_magazine_3 32 2060 6858 442368 24429 0
kmem_magazine_7 64 10354 41832 4079616 226370 0
kmem_magazine_15 128 2039 5250 1024000 61193 0
kmem_magazine_31 256 210 350 114688 6273 0
kmem_magazine_47 384 0 72 32768 18827 0
kmem_magazine_63 512 159 210 122880 5041 0
kmem_magazine_95 768 47 747 679936 5275 0
kmem_magazine_143 1152 28 30 40960 35 0
kmem_slab_cache 56 687756 687810 55787520 2198017 0
kmem_bufctl_cache 24 6 169 8192 6 0
kmem_bufctl_audit_cache 128 4665032 4665166 721076224 8552470 0
kmem_va_8192 8192 64511 64512 528482304 75782 0
kmem_va_16384 16384 9 16 262144 9 0
kmem_va_24576 24576 5 10 262144 6 0
kmem_va_32768 32768 0 0 0 0 0
kmem_va_40960 40960 0 0 0 0 0
kmem_va_49152 49152 0 0 0 0 0
kmem_va_57344 57344 0 0 0 0 0
kmem_va_65536 65536 0 0 0 0 0
kmem_alloc_8 8 47645 52992 1695744 3428208 0
kmem_alloc_16 16 32102 32232 1294336 7272576 0
kmem_alloc_24 24 8621 12410 598016 931616 0
kmem_alloc_32 32 20325 22046 1236992 2334067 0
kmem_alloc_40 40 24837 25984 1662976 1239818 0
kmem_alloc_48 48 39372 70964 5144576 389272 0
kmem_alloc_56 56 73515 84864 6815744 1432321 0
kmem_alloc_64 64 53458 53568 6856704 4798232 0
kmem_alloc_80 80 36757 45318 4759552 1106961 0
kmem_alloc_96 96 7510 9248 1114112 719615 0
kmem_alloc_112 112 598758 598800 81756160 643102 0
kmem_alloc_128 128 6192 6594 1286144 451576 0
kmem_alloc_160 160 57690 57772 10756096 221363 0
kmem_alloc_192 192 8441 8544 2187264 112003 0
kmem_alloc_224 224 7302 7359 1826816 39801 0
kmem_alloc_256 256 701 900 294912 2024996 0
kmem_alloc_320 320 429 609 237568 1633155 0
kmem_alloc_384 384 517 666 303104 683826 0
kmem_alloc_448 448 37724 37888 19398656 2167167 0
kmem_alloc_512 512 213 3752 2195456 143832 0
kmem_alloc_640 640 8283 8305 6184960 339701 0
kmem_alloc_768 768 146 162 147456 1005 0
kmem_alloc_896 896 21 72 73728 4687 0
kmem_alloc_1152 1152 797 832 1048576 2319848 0
kmem_alloc_1344 1344 30 44 65536 44963 0
kmem_alloc_1600 1600 143 162 294912 58641 0
kmem_alloc_2048 2048 97 121 270336 44814 0
kmem_alloc_2688 2688 112 136 417792 147313 0
kmem_alloc_4096 4096 95 95 778240 60506 0
kmem_alloc_8192 8192 670067 670067 1194221568 1139229 0
kmem_alloc_12288 12288 7 7 114688 180 0
kmem_alloc_16384 16384 39 39 638976 90995 0
kmem_alloc_24576 24576 5 5 122880 332 0
kmem_alloc_32768 32768 231 231 7569408 51998 0
kmem_alloc_40960 40960 14 14 573440 48 0
kmem_alloc_49152 49152 0 0 0 25 0
kmem_alloc_57344 57344 2 2 114688 30 0
kmem_alloc_65536 65536 9 9 589824 126370 0
kmem_alloc_73728 73728 2 2 147456 1982 0
kmem_alloc_81920 81920 1 1 81920 8 0
kmem_alloc_90112 90112 0 0 0 15 0
kmem_alloc_98304 98304 1 1 98304 3 0
kmem_alloc_106496 106496 1 1 106496 3 0
kmem_alloc_114688 114688 0 0 0 4 0
kmem_alloc_122880 122880 3 3 368640 5 0
kmem_alloc_131072 131072 49 49 6422528 86 0
streams_mblk 64 1377955 1378700 132874240 74864627 0

Hay 2 tablas que tienen valores muy altos, streams_mblk y kmem_alloc_8192, mirando los historicos vemos que crecen progresivamente:

> cat mem.out | grep -i streams_mblk

streams_mblk 64 12914 13260 1277952 3582059 0
streams_mblk 64 39034 39355 3792896 9471794 0
streams_mblk 64 64922 66470 6406144 16645023 0
streams_mblk 64 90763 91375 8806400 21412926 0
streams_mblk 64 116630 117215 11296768 24648200 0
streams_mblk 64 505651 506515 48816128 62339448 0
streams_mblk 64 531731 532525 51322880 62648285 0
streams_mblk 64 816173 816935 78733312 66043097 0
streams_mblk 64 841749 842605 81207296 66350514 0
streams_mblk 64 867747 868445 83697664 66658109 0
streams_mblk 64 1204251 1204960 116129792 70856763 0
streams_mblk 64 1230299 1230970 118636544 71466589 0

> cat mem.out | grep -i kmem_alloc_8192

kmem_alloc_8192 8192 9154 9154 74989568 23705 0
kmem_alloc_8192 8192 22477 22477 184131584 61092 0
kmem_alloc_8192 8192 145702 145702 1193590784 412073 0
kmem_alloc_8192 8192 393626 393626 3224584192 761686 0
kmem_alloc_8192 8192 405977 405977 3325763584 777320 0
kmem_alloc_8192 8192 417683 417683 3421659136 792291 0
kmem_alloc_8192 8192 429820 429820 3521085440 807243 0
kmem_alloc_8192 8192 441753 441753 3618840576 822454 0
kmem_alloc_8192 8192 453330 453330 3713679360 837346 0
kmem_alloc_8192 8192 465812 465812 3815931904 852977 0
kmem_alloc_8192 8192 477691 477691 3913244672 879410 0
kmem_alloc_8192 8192 489594 489594 4010754048 894613 0
kmem_alloc_8192 8192 502870 502870 4119511040 924176 0
kmem_alloc_8192 8192 515319 515319 4221493248 939473 0

Pero para poder consultar que funciones/trazas están llenando estos buffers, tenemos que habilitar el 3 flags:

#define KMF_AUDIT 0x00000001 /* transaction auditing */
#define KMF_DEADBEEF 0x00000002 /* deadbeef checking */
#define KMF_REDZONE 0x00000004 /* redzone checking */
#define KMF_CONTENTS 0x00000008 /* freed-buffer content logging */
#define KMF_STICKY 0x00000010 /* if set, override /etc/system */
#define KMF_NOMAGAZINE 0x00000020 /* disable per-cpu magazines */
#define KMF_FIREWALL 0x00000040 /* put all bufs before unmapped pages */
#define KMF_LITE 0x00000100 /* lightweight debugging */

Por lo que añadimos al /etc/system la suma de los tres en hex:

set kmem_flags=0x7

Reiniciamos el servidor y ya podemos consultar las 2 caches que crecen tan rapido:

> ::kmausers kmem_alloc_8192
623206400 bytes for 76075 allocations with data size 8192:
kmem_slab_alloc+0xac
kmem_cache_alloc+0x2dc
kmem_alloc+0x58
ce_allocb+0xc
ce_replace_page+0xa8
ce_intr+0xc60
pci_intr_wrapper+0xb8

>::kmausers streams_mblk
408475522 bytes for 638243 allocations with data size 64:
kmem_cache_alloc+0x150
dblk_esb_constructor+0xc
kmem_cache_alloc_debug+0x27c
kmem_cache_alloc+0x150
gesballoc+0xc
ce_dupb+0x18
ce_intr+0xf14
pci_intr_wrapper+0xb8
39229312 bytes for 612958 allocations with data size 64:
kmem_cache_alloc+0x150
dblk_esb_constructor+0xc
kmem_cache_alloc_debug+0x27c
kmem_cache_alloc+0x150
gesballoc+0xc
ce_dupb+0x18
ce_intr+0xca4
pci_intr_wrapper+0xb8

En todos los casos esta el driver de la tarjeta CE Sun giga swift ethernet de por medio.

Mirando en los bugs resueltos por los ultimos parches del driver de la Sun giga swift ethernet encontramos:

Abstract: SUNBT6214285 leak of streams mblk with Sun trunking on network which has type 88

*** 01/05/05 09:59 am ***
----------- Migrated Synopsis from BugTraq ------------
SUNBT6214285 leak of streams mblk with Sun trunking on network which has type
886d ethernet packets broadcast
-------- End of Migrated Synopsis from BugTraq --------
*** 01/05/05 09:59 am ***
----------- Migrated Description from BugTraq ------------
The customer system runs out of memory after a few days when kernel has grown
to 7.6G leaving .13G for user processes to try to run in which does not allow
it to make forward progress, as all user threads try to find a page to swap
data in.

95% of memory is in kmem_va_8192, this includes 78% which is allocated to
kmem_alloc_8192 and 4% to streams_mblk, most of this memory was allocated for
rx ring buffers by the ce driver

The problem has been reproduced in the PTS lab by broadcasting type 0x886d
packets on the network

Se soluciona instalando el parche:
118777-18 y su dependencia 121181-06

Unix Systems: 

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.

Fatal error: Class CToolsCssCache contains 1 abstract method and must therefore be declared abstract or implement the remaining methods (DrupalCacheInterface::__construct) in /homepages/37/d228974590/htdocs/sites/all/modules/ctools/includes/css-cache.inc on line 52