360SDN.COM

首页/Hadoop/列表

HBase总结(十七)Ganglia监控hadoop、hbase集群

来源:  2016-04-23 18:57:51    评论:0点击:

1. 在主节点上安装ganglia-webfrontend和ganglia-monitor
  1. sudo apt-get install ganglia-webfrontend ganglia-monitor
复制代码
在主节点上安装ganglia-webfrontend和ganglia-monitor。在其他监视节点上,只需要安装ganglia-monitor即可
将ganglia的文件链接到apache的默认目录下
  1. sudo ln -s /usr/share/ganglia-webfrontend /var/www/ganglia
复制代码

2. 安装ganglia-monitor
在其他监视节点上,只需要安装ganglia-monitor
  1. sudo apt-get install ganglia-monitor
复制代码

3. Ganglia配置
gmond.conf
在每个节点上都需要配置/etc/ganglia/gmond.conf,配置相同如下所示
  1. sudo vim /etc/ganglia/gmond.conf
复制代码

修改后的/etc/ganglia/gmond.conf

    globals {                    
      daemonize = yes  ##以后台的方式运行            
      setuid = yes             
      user = ganglia     #运行Ganglia的用户              
      debug_level = 0               
      max_udp_msg_len = 1472        
      mute = no             
      deaf = no             
      host_dmax = 0 /*secs */
      cleanup_threshold = 300 /*secs */
      gexec = no             
      send_metadata_interval = 10     #发送数据的时间间隔
    }

    /* If a cluster attribute is specified, then all gmond hosts are wrapped inside
    * of a <CLUSTER> tag.  If you do not specify a cluster tag, then all <HOSTS> will
    * NOT be wrapped inside of a <CLUSTER> tag. */
    cluster {
      name = "hadoop-cluster"         #集群名称
      owner = "ganglia"               #运行Ganglia的用户
      latlong = "unspecified"
      url = "unspecified"
    }

    /* The host section describes attributes of the host, like the location */
    host {
      location = "unspecified"
    }

    /* Feel free to specify as many udp_send_channels as you like.  Gmond
       used to only support having a single channel */
    udp_send_channel {
      #mcast_join = 239.2.11.71     #注释掉组播
      host = master                 #发送给安装gmetad的机器
      port = 8649                   #监听端口
      ttl = 1
    }

    /* You can specify as many udp_recv_channels as you like as well. */
    udp_recv_channel {
      #mcast_join = 239.2.11.71     #注释掉组播
      port = 8649
      #bind = 239.2.11.71
    }

    /* You can specify as many tcp_accept_channels as you like to share
       an xml description of the state of the cluster */
    tcp_accept_channel {
      port = 8649
    }


gmetad.conf
在主节点上还需要配置/etc/ganglia/gmetad.conf,这里面的名字hadoop-cluster和上面gmond.conf中name应该一致。 
/etc/ganglia/gmetad.conf
  1. sudo vim /etc/ganglia/gmetad.conf
复制代码
修改为以下内容
  1. data_source "hadoop-cluster" 10 master:8649 slave:8649
  2. setuid_username "nobody"
  3. rrd_rootdir "/var/lib/ganglia/rrds"
  4. gridname "hadoop-cluster"
  5. 注:master:8649 slave:8649为要监听的主机和端口,data_source中hadoop-cluster与gmond.conf中name一致
复制代码


4. Hadoop配置
在所有hadoop所在的节点,均需要配置hadoop-metrics2.properties,配置如下:
  1. #   Licensed to the Apache Software Foundation (ASF) under one or more
  2. #   contributor license agreements.  See the NOTICE file distributed with
  3. #   this work for additional information regarding copyright ownership.
  4. #   The ASF licenses this file to You under the Apache License, Version 2.0
  5. #   (the "License"); you may not use this file except in compliance with
  6. #   the License.  You may obtain a copy of the License at
  7. #
  8. #       http://www.apache.org/licenses/LICENSE-2.0
  9. #
  10. #   Unless required by applicable law or agreed to in writing, software
  11. #   distributed under the License is distributed on an "AS IS" BASIS,
  12. #   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13. #   See the License for the specific language governing permissions and
  14. #   limitations under the License.
  15. #
  16.  
  17. # syntax: [prefix].[source|sink].[instance].[options]
  18. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
  19.  
  20. #注释掉以前原有配置
  21.  
  22. #*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
  23. # default sampling period, in seconds
  24. #*.period=10
  25.  
  26. # The namenode-metrics.out will contain metrics from all context
  27. #namenode.sink.file.filename=namenode-metrics.out
  28. # Specifying a special sampling period for namenode:
  29. #namenode.sink.*.period=8
  30.  
  31. #datanode.sink.file.filename=datanode-metrics.out
  32.  
  33. # the following example split metrics of different
  34. # context to different sinks (in this case files)
  35. #jobtracker.sink.file_jvm.context=jvm
  36. #jobtracker.sink.file_jvm.filename=jobtracker-jvm-metrics.out
  37. #jobtracker.sink.file_mapred.context=mapred
  38. #jobtracker.sink.file_mapred.filename=jobtracker-mapred-metrics.out
  39.  
  40. #tasktracker.sink.file.filename=tasktracker-metrics.out
  41.  
  42. #maptask.sink.file.filename=maptask-metrics.out
  43.  
  44. #reducetask.sink.file.filename=reducetask-metrics.out
  45.  
  46. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  
  47. *.sink.ganglia.period=10
  48.  
  49. *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both  
  50. *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40  
  51.  
  52. namenode.sink.ganglia.servers=master:8649  
  53. resourcemanager.sink.ganglia.servers=master:8649  
  54.  
  55. datanode.sink.ganglia.servers=master:8649    
  56. nodemanager.sink.ganglia.servers=master:8649    
  57.  
  58.  
  59. maptask.sink.ganglia.servers=master:8649    
  60. reducetask.sink.ganglia.servers=master:8649
复制代码


5. Hbase配置
在所有的hbase节点中均配置hadoop-metrics2-hbase.properties,配置如下:
  1. # syntax: [prefix].[source|sink].[instance].[options]
  2. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
  3.  
  4. #*.sink.file*.class=org.apache.hadoop.metrics2.sink.FileSink
  5. # default sampling period
  6. #*.period=10
  7.  
  8. # Below are some examples of sinks that could be used
  9. # to monitor different hbase daemons.
  10.  
  11. # hbase.sink.file-all.class=org.apache.hadoop.metrics2.sink.FileSink
  12. # hbase.sink.file-all.filename=all.metrics
  13.  
  14. # hbase.sink.file0.class=org.apache.hadoop.metrics2.sink.FileSink
  15. # hbase.sink.file0.context=hmaster
  16. # hbase.sink.file0.filename=master.metrics
  17.  
  18. # hbase.sink.file1.class=org.apache.hadoop.metrics2.sink.FileSink
  19. # hbase.sink.file1.context=thrift-one
  20. # hbase.sink.file1.filename=thrift-one.metrics
  21.  
  22. # hbase.sink.file2.class=org.apache.hadoop.metrics2.sink.FileSink
  23. # hbase.sink.file2.context=thrift-two
  24. # hbase.sink.file2.filename=thrift-one.metrics
  25.  
  26. # hbase.sink.file3.class=org.apache.hadoop.metrics2.sink.FileSink
  27. # hbase.sink.file3.context=rest
  28. # hbase.sink.file3.filename=rest.metrics
  29.  
  30.  
  31. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  
  32. *.sink.ganglia.period=10  
  33.  
  34. hbase.sink.ganglia.period=10  
  35. hbase.sink.ganglia.servers=master:8649
复制代码


6. 启动hadoop、hbase集群
  1. start-dfs.sh
  2. start-yarn.sh
  3. start-hbase.sh
复制代码


7. 启动Ganglia
先需要重启hadoop和hbase 。在各个节点上启动gmond服务,主节点还需要启动gmetad服务。
使用apt-get方式安装的Ganglia,可以直接用service方式启动。
  1. sudo service ganglia-monitor start(每台机器都需要启动)
  2.  
  3. sudo service gmetad start(在安装了ganglia-webfrontend的机器上启动)
复制代码


8. 检验
登录浏览器查看:http://master/ganglia,如果Hosts up为9即表示安装成功。
若安装不成功,有几个很有用的调试命令:
以调试模式启动gmetad:gmetad -d 9
查看gmetad收集到的XML文件:telnet master 8649


9. 截图

 


 


master节点gmetad.conf配置
  1. # This is an example of a Ganglia Meta Daemon configuration file
  2. #                http://ganglia.sourceforge.net/
  3. #
  4. #
  5. #-------------------------------------------------------------------------------
  6. # Setting the debug_level to 1 will keep daemon in the forground and
  7. # show only error messages. Setting this value higher than 1 will make 
  8. # gmetad output debugging information and stay in the foreground.
  9. # default: 0
  10. # debug_level 10
  11. #
  12. #-------------------------------------------------------------------------------
  13. # What to monitor. The most important section of this file. 
  14. #
  15. # The data_source tag specifies either a cluster or a grid to
  16. # monitor. If we detect the source is a cluster, we will maintain a complete
  17. # set of RRD databases for it, which can be used to create historical 
  18. # graphs of the metrics. If the source is a grid (it comes from another gmetad),
  19. # we will only maintain summary RRDs for it.
  20. #
  21. # Format: 
  22. # data_source "my cluster" [polling interval] address1:port addreses2:port ...
  23. # The keyword 'data_source' must immediately be followed by a unique
  24. # string which identifies the source, then an optional polling interval in 
  25. # seconds. The source will be polled at this interval on average. 
  26. # If the polling interval is omitted, 15sec is asssumed. 
  27. #
  28. # If you choose to set the polling interval to something other than the default,
  29. # note that the web frontend determines a host as down if its TN value is less
  30. # than 4 * TMAX (20sec by default).  Therefore, if you set the polling interval
  31. # to something around or greater than 80sec, this will cause the frontend to
  32. # incorrectly display hosts as down even though they are not.
  33. #
  34. # A list of machines which service the data source follows, in the 
  35. # format ip:port, or name:port. If a port is not specified then 8649
  36. # (the default gmond port) is assumed.
  37. # default: There is no default value
  38. #
  39. # data_source "my cluster" 10 localhost  my.machine.edu:8649  1.2.3.5:8655
  40. # data_source "my grid" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651
  41. # data_source "another source" 1.3.4.7:8655  1.3.4.8
  42.  
  43. data_source "hadoop-cluster" 10 master:8649 slave:8649
  44. setuid_username "nobody"
  45. rrd_rootdir "/var/lib/ganglia/rrds"
  46. gridname "hadoop-cluster"
  47.  
  48. #
  49. # Round-Robin Archives
  50. # You can specify custom Round-Robin archives here (defaults are listed below)
  51. #
  52. # Old Default RRA: Keep 1 hour of metrics at 15 second resolution. 1 day at 6 minute
  53. # RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" \
  54. #      "RRA:AVERAGE:0.5:5760:374"
  55. # New Default RRA
  56. # Keep 5856 data points at 15 second resolution assuming 15 second (default) polling. That's 1 day
  57. # Two weeks of data points at 1 minute resolution (average)
  58. #RRAs "RRA:AVERAGE:0.5:1:5856" "RRA:AVERAGE:0.5:4:20160" "RRA:AVERAGE:0.5:40:52704"
  59.  
  60. #
  61. #-------------------------------------------------------------------------------
  62. # Scalability mode. If on, we summarize over downstream grids, and respect
  63. # authority tags. If off, we take on 2.5.0-era behavior: we do not wrap our output
  64. # in <GRID></GRID> tags, we ignore all <GRID> tags we see, and always assume
  65. # we are the "authority" on data source feeds. This approach does not scale to
  66. # large groups of clusters, but is provided for backwards compatibility.
  67. # default: on
  68. # scalable off
  69. #
  70. #-------------------------------------------------------------------------------
  71. # The name of this Grid. All the data sources above will be wrapped in a GRID
  72. # tag with this name.
  73. # default: unspecified
  74. # gridname "MyGrid"
  75. #
  76. #-------------------------------------------------------------------------------
  77. # The authority URL for this grid. Used by other gmetads to locate graphs
  78. # for our data sources. Generally points to a ganglia/
  79. # website on this machine.
  80. # default: "http://hostname/ganglia/",
  81. #   where hostname is the name of this machine, as defined by gethostname().
  82. # authority "http://mycluster.org/newprefix/"
  83. #
  84. #-------------------------------------------------------------------------------
  85. # List of machines this gmetad will share XML with. Localhost
  86. # is always trusted. 
  87. # default: There is no default value
  88. # trusted_hosts 127.0.0.1 169.229.50.165 my.gmetad.org
  89. #
  90. #-------------------------------------------------------------------------------
  91. # If you want any host which connects to the gmetad XML to receive
  92. # data, then set this value to "on"
  93. # default: off
  94. # all_trusted on
  95. #
  96. #-------------------------------------------------------------------------------
  97. # If you don't want gmetad to setuid then set this to off
  98. # default: on
  99. # setuid off
  100. #
  101. #-------------------------------------------------------------------------------
  102. # User gmetad will setuid to (defaults to "nobody")
  103. # default: "nobody"
  104. # setuid_username "nobody"
  105. #
  106. #-------------------------------------------------------------------------------
  107. # Umask to apply to created rrd files and grid directory structure
  108. # default: 0 (files are public)
  109. # umask 022
  110. #
  111. #-------------------------------------------------------------------------------
  112. # The port gmetad will answer requests for XML
  113. # default: 8651
  114. # xml_port 8651
  115. #
  116. #-------------------------------------------------------------------------------
  117. # The port gmetad will answer queries for XML. This facility allows
  118. # simple subtree and summation views of the XML tree.
  119. # default: 8652
  120. # interactive_port 8652
  121. #
  122. #-------------------------------------------------------------------------------
  123. # The number of threads answering XML requests
  124. # default: 4
  125. # server_threads 10
  126. #
  127. #-------------------------------------------------------------------------------
  128. # Where gmetad stores its round-robin databases
  129. # default: "/var/lib/ganglia/rrds"
  130. # rrd_rootdir "/some/other/place"
  131. #
  132. #-------------------------------------------------------------------------------
  133. # In earlier versions of gmetad, hostnames were handled in a case
  134. # sensitive manner
  135. # If your hostname directories have been renamed to lower case,
  136. # set this option to 0 to disable backward compatibility.
  137. # From version 3.2, backwards compatibility will be disabled by default.
  138. # default: 1   (for gmetad < 3.2)
  139. # default: 0   (for gmetad >= 3.2)
  140. case_sensitive_hostnames 0
  141.  
  142. #-------------------------------------------------------------------------------
  143. # It is now possible to export all the metrics collected by gmetad directly to
  144. # graphite by setting the following attributes. 
  145. #
  146. # The hostname or IP address of the Graphite server
  147. # default: unspecified
  148. # carbon_server "my.graphite.box"
  149. #
  150. # The port on which Graphite is listening
  151. # default: 2003
  152. # carbon_port 2003
  153. #
  154. # A prefix to prepend to the metric names exported by gmetad. Graphite uses dot-
  155. # separated paths to organize and refer to metrics. 
  156. # default: unspecified
  157. # graphite_prefix "datacenter1.gmetad"
  158. #
  159. # Number of milliseconds gmetad will wait for a response from the graphite server 
  160. # default: 500
  161. # carbon_timeout 500
  162. #
  163. master-gmond.conf.md Raw
复制代码

master节点gmond.conf配置
  1. /* This configuration is as close to 2.5.x default behavior as possible 
  2.    The values closely match ./gmond/metric.h definitions in 2.5.x */ 
  3. globals {                    
  4.   daemonize = yes              
  5.   setuid = yes             
  6.   user = ganglia              
  7.   debug_level = 0               
  8.   max_udp_msg_len = 1472        
  9.   mute = no             
  10.   deaf = no             
  11.   host_dmax = 0 /*secs */ 
  12.   cleanup_threshold = 300 /*secs */ 
  13.   gexec = no             
  14.   send_metadata_interval = 10 
  15.  
  16. /* If a cluster attribute is specified, then all gmond hosts are wrapped inside 
  17. * of a <CLUSTER> tag.  If you do not specify a cluster tag, then all <HOSTS> will 
  18. * NOT be wrapped inside of a <CLUSTER> tag. */ 
  19. cluster { 
  20.   name = "hadoop-cluster" 
  21.   owner = "ganglia" 
  22.   latlong = "unspecified" 
  23.   url = "unspecified" 
  24.  
  25. /* The host section describes attributes of the host, like the location */ 
  26. host { 
  27.   location = "unspecified" 
  28.  
  29. /* Feel free to specify as many udp_send_channels as you like.  Gmond 
  30.    used to only support having a single channel */ 
  31. udp_send_channel { 
  32.   #mcast_join = 239.2.11.71 
  33.   host = master
  34.   port = 8649 
  35.   ttl = 1 
  36.  
  37. /* You can specify as many udp_recv_channels as you like as well. */ 
  38. udp_recv_channel { 
  39.   #mcast_join = 239.2.11.71 
  40.   port = 8649 
  41.   #bind = 239.2.11.71 
  42.  
  43. /* You can specify as many tcp_accept_channels as you like to share 
  44.    an xml description of the state of the cluster */ 
  45. tcp_accept_channel { 
  46.   port = 8649 
  47.  
  48. /* Each metrics module that is referenced by gmond must be specified and 
  49.    loaded. If the module has been statically linked with gmond, it does not 
  50.    require a load path. However all dynamically loadable modules must include 
  51.    a load path. */ 
  52. modules { 
  53.   module { 
  54.     name = "core_metrics" 
  55.   } 
  56.   module { 
  57.     name = "cpu_module" 
  58.     path = "/usr/lib/ganglia/modcpu.so" 
  59.   } 
  60.   module { 
  61.     name = "disk_module" 
  62.     path = "/usr/lib/ganglia/moddisk.so" 
  63.   } 
  64.   module { 
  65.     name = "load_module" 
  66.     path = "/usr/lib/ganglia/modload.so" 
  67.   } 
  68.   module { 
  69.     name = "mem_module" 
  70.     path = "/usr/lib/ganglia/modmem.so" 
  71.   } 
  72.   module { 
  73.     name = "net_module" 
  74.     path = "/usr/lib/ganglia/modnet.so" 
  75.   } 
  76.   module { 
  77.     name = "proc_module" 
  78.     path = "/usr/lib/ganglia/modproc.so" 
  79.   } 
  80.   module { 
  81.     name = "sys_module" 
  82.     path = "/usr/lib/ganglia/modsys.so" 
  83.   } 
  84.  
  85. include ('/etc/ganglia/conf.d/*.conf') 
  86.  
  87.  
  88. /* The old internal 2.5.x metric array has been replaced by the following 
  89.    collection_group directives.  What follows is the default behavior for 
  90.    collecting and sending metrics that is as close to 2.5.x behavior as 
  91.    possible. */
  92.  
  93. /* This collection group will cause a heartbeat (or beacon) to be sent every 
  94.    20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses 
  95.    the age of the running gmond. */ 
  96. collection_group { 
  97.   collect_once = yes 
  98.   time_threshold = 20 
  99.   metric { 
  100.     name = "heartbeat" 
  101.   } 
  102.  
  103. /* This collection group will send general info about this host every 1200 secs. 
  104.    This information doesn't change between reboots and is only collected once. */ 
  105. collection_group { 
  106.   collect_once = yes 
  107.   time_threshold = 1200 
  108.   metric { 
  109.     name = "cpu_num" 
  110.     title = "CPU Count" 
  111.   } 
  112.   metric { 
  113.     name = "cpu_speed" 
  114.     title = "CPU Speed" 
  115.   } 
  116.   metric { 
  117.     name = "mem_total" 
  118.     title = "Memory Total" 
  119.   } 
  120.   /* Should this be here? Swap can be added/removed between reboots. */ 
  121.   metric { 
  122.     name = "swap_total" 
  123.     title = "Swap Space Total" 
  124.   } 
  125.   metric { 
  126.     name = "boottime" 
  127.     title = "Last Boot Time" 
  128.   } 
  129.   metric { 
  130.     name = "machine_type" 
  131.     title = "Machine Type" 
  132.   } 
  133.   metric { 
  134.     name = "os_name" 
  135.     title = "Operating System" 
  136.   } 
  137.   metric { 
  138.     name = "os_release" 
  139.     title = "Operating System Release" 
  140.   } 
  141.   metric { 
  142.     name = "location" 
  143.     title = "Location" 
  144.   } 
  145.  
  146. /* This collection group will send the status of gexecd for this host every 300 secs */
  147. /* Unlike 2.5.x the default behavior is to report gexecd OFF.  */ 
  148. collection_group { 
  149.   collect_once = yes 
  150.   time_threshold = 300 
  151.   metric { 
  152.     name = "gexec" 
  153.     title = "Gexec Status" 
  154.   } 
  155.  
  156. /* This collection group will collect the CPU status info every 20 secs. 
  157.    The time threshold is set to 90 seconds.  In honesty, this time_threshold could be 
  158.    set significantly higher to reduce unneccessary network chatter. */ 
  159. collection_group { 
  160.   collect_every = 20 
  161.   time_threshold = 90 
  162.   /* CPU status */ 
  163.   metric { 
  164.     name = "cpu_user"  
  165.     value_threshold = "1.0" 
  166.     title = "CPU User" 
  167.   } 
  168.   metric { 
  169.     name = "cpu_system"   
  170.     value_threshold = "1.0" 
  171.     title = "CPU System" 
  172.   } 
  173.   metric { 
  174.     name = "cpu_idle"  
  175.     value_threshold = "5.0" 
  176.     title = "CPU Idle" 
  177.   } 
  178.   metric { 
  179.     name = "cpu_nice"  
  180.     value_threshold = "1.0" 
  181.     title = "CPU Nice" 
  182.   } 
  183.   metric { 
  184.     name = "cpu_aidle" 
  185.     value_threshold = "5.0" 
  186.     title = "CPU aidle" 
  187.   } 
  188.   metric { 
  189.     name = "cpu_wio" 
  190.     value_threshold = "1.0" 
  191.     title = "CPU wio" 
  192.   } 
  193.   /* The next two metrics are optional if you want more detail... 
  194.      ... since they are accounted for in cpu_system.  
  195.   metric { 
  196.     name = "cpu_intr" 
  197.     value_threshold = "1.0" 
  198.     title = "CPU intr" 
  199.   } 
  200.   metric { 
  201.     name = "cpu_sintr" 
  202.     value_threshold = "1.0" 
  203.     title = "CPU sintr" 
  204.   } 
  205.   */ 
  206.  
  207. collection_group { 
  208.   collect_every = 20 
  209.   time_threshold = 90 
  210.   /* Load Averages */ 
  211.   metric { 
  212.     name = "load_one" 
  213.     value_threshold = "1.0" 
  214.     title = "One Minute Load Average" 
  215.   } 
  216.   metric { 
  217.     name = "load_five" 
  218.     value_threshold = "1.0" 
  219.     title = "Five Minute Load Average" 
  220.   } 
  221.   metric { 
  222.     name = "load_fifteen" 
  223.     value_threshold = "1.0" 
  224.     title = "Fifteen Minute Load Average" 
  225.   }
  226.  
  227. /* This group collects the number of running and total processes */ 
  228. collection_group { 
  229.   collect_every = 80 
  230.   time_threshold = 950 
  231.   metric { 
  232.     name = "proc_run" 
  233.     value_threshold = "1.0" 
  234.     title = "Total Running Processes" 
  235.   } 
  236.   metric { 
  237.     name = "proc_total" 
  238.     value_threshold = "1.0" 
  239.     title = "Total Processes" 
  240.   } 
  241. }
  242.  
  243. /* This collection group grabs the volatile memory metrics every 40 secs and 
  244.    sends them at least every 180 secs.  This time_threshold can be increased 
  245.    significantly to reduce unneeded network traffic. */ 
  246. collection_group { 
  247.   collect_every = 40 
  248.   time_threshold = 180 
  249.   metric { 
  250.     name = "mem_free" 
  251.     value_threshold = "1024.0" 
  252.     title = "Free Memory" 
  253.   } 
  254.   metric { 
  255.     name = "mem_shared" 
  256.     value_threshold = "1024.0" 
  257.     title = "Shared Memory" 
  258.   } 
  259.   metric { 
  260.     name = "mem_buffers" 
  261.     value_threshold = "1024.0" 
  262.     title = "Memory Buffers" 
  263.   } 
  264.   metric { 
  265.     name = "mem_cached" 
  266.     value_threshold = "1024.0" 
  267.     title = "Cached Memory" 
  268.   } 
  269.   metric { 
  270.     name = "swap_free" 
  271.     value_threshold = "1024.0" 
  272.     title = "Free Swap Space" 
  273.   } 
  274.  
  275. collection_group { 
  276.   collect_every = 40 
  277.   time_threshold = 300 
  278.   metric { 
  279.     name = "bytes_out" 
  280.     value_threshold = 4096 
  281.     title = "Bytes Sent" 
  282.   } 
  283.   metric { 
  284.     name = "bytes_in" 
  285.     value_threshold = 4096 
  286.     title = "Bytes Received" 
  287.   } 
  288.   metric { 
  289.     name = "pkts_in" 
  290.     value_threshold = 256 
  291.     title = "Packets Received" 
  292.   } 
  293.   metric { 
  294.     name = "pkts_out" 
  295.     value_threshold = 256 
  296.     title = "Packets Sent" 
  297.   } 
  298. }
  299.  
  300. /* Different than 2.5.x default since the old config made no sense */ 
  301. collection_group { 
  302.   collect_every = 1800 
  303.   time_threshold = 3600 
  304.   metric { 
  305.     name = "disk_total" 
  306.     value_threshold = 1.0 
  307.     title = "Total Disk Space" 
  308.   } 
  309. }
  310.  
  311. collection_group { 
  312.   collect_every = 40 
  313.   time_threshold = 180 
  314.   metric { 
  315.     name = "disk_free" 
  316.     value_threshold = 1.0 
  317.     title = "Disk Space Available" 
  318.   } 
  319.   metric { 
  320.     name = "part_max_used" 
  321.     value_threshold = 1.0 
  322.     title = "Maximum Disk Space Used" 
  323.   } 
  324. }
  325. master-hadoop-metrics2-hbase.properties.md Raw
  326. master节点hadoop-metrics2-hbase.properties配置
  327.  
  328. # syntax: [prefix].[source|sink].[instance].[options]
  329. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
  330.  
  331. #*.sink.file*.class=org.apache.hadoop.metrics2.sink.FileSink
  332. # default sampling period
  333. #*.period=10
  334.  
  335. # Below are some examples of sinks that could be used
  336. # to monitor different hbase daemons.
  337.  
  338. # hbase.sink.file-all.class=org.apache.hadoop.metrics2.sink.FileSink
  339. # hbase.sink.file-all.filename=all.metrics
  340.  
  341. # hbase.sink.file0.class=org.apache.hadoop.metrics2.sink.FileSink
  342. # hbase.sink.file0.context=hmaster
  343. # hbase.sink.file0.filename=master.metrics
  344.  
  345. # hbase.sink.file1.class=org.apache.hadoop.metrics2.sink.FileSink
  346. # hbase.sink.file1.context=thrift-one
  347. # hbase.sink.file1.filename=thrift-one.metrics
  348.  
  349. # hbase.sink.file2.class=org.apache.hadoop.metrics2.sink.FileSink
  350. # hbase.sink.file2.context=thrift-two
  351. # hbase.sink.file2.filename=thrift-one.metrics
  352.  
  353. # hbase.sink.file3.class=org.apache.hadoop.metrics2.sink.FileSink
  354. # hbase.sink.file3.context=rest
  355. # hbase.sink.file3.filename=rest.metrics
  356.  
  357.  
  358. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  
  359. *.sink.ganglia.period=10  
  360.  
  361. hbase.sink.ganglia.period=10  
  362. hbase.sink.ganglia.servers=master:8649
  363. master-hadoop-metrics2.properties.md Raw
  364. master节点hadoop-metrics2.properties配置
  365.  
  366. #
  367. #   Licensed to the Apache Software Foundation (ASF) under one or more
  368. #   contributor license agreements.  See the NOTICE file distributed with
  369. #   this work for additional information regarding copyright ownership.
  370. #   The ASF licenses this file to You under the Apache License, Version 2.0
  371. #   (the "License"); you may not use this file except in compliance with
  372. #   the License.  You may obtain a copy of the License at
  373. #
  374. #       http://www.apache.org/licenses/LICENSE-2.0
  375. #
  376. #   Unless required by applicable law or agreed to in writing, software
  377. #   distributed under the License is distributed on an "AS IS" BASIS,
  378. #   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  379. #   See the License for the specific language governing permissions and
  380. #   limitations under the License.
  381. #
  382.  
  383. # syntax: [prefix].[source|sink].[instance].[options]
  384. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
  385.  
  386. #*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
  387. # default sampling period, in seconds
  388. #*.period=10
  389.  
  390. # The namenode-metrics.out will contain metrics from all context
  391. #namenode.sink.file.filename=namenode-metrics.out
  392. # Specifying a special sampling period for namenode:
  393. #namenode.sink.*.period=8
  394.  
  395. #datanode.sink.file.filename=datanode-metrics.out
  396.  
  397. # the following example split metrics of different
  398. # context to different sinks (in this case files)
  399. #jobtracker.sink.file_jvm.context=jvm
  400. #jobtracker.sink.file_jvm.filename=jobtracker-jvm-metrics.out
  401. #jobtracker.sink.file_mapred.context=mapred
  402. #jobtracker.sink.file_mapred.filename=jobtracker-mapred-metrics.out
  403.  
  404. #tasktracker.sink.file.filename=tasktracker-metrics.out
  405.  
  406. #maptask.sink.file.filename=maptask-metrics.out
  407.  
  408. #reducetask.sink.file.filename=reducetask-metrics.out
  409.  
  410. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  
  411. *.sink.ganglia.period=10
  412.  
  413. *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both  
  414. *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40  
  415.  
  416. namenode.sink.ganglia.servers=master:8649  
  417. resourcemanager.sink.ganglia.servers=master:8649  
  418.  
  419. datanode.sink.ganglia.servers=master:8649    
  420. nodemanager.sink.ganglia.servers=master:8649    
  421.  
  422.  
  423. maptask.sink.ganglia.servers=master:8649    
  424. reducetask.sink.ganglia.servers=master:8649
  425. slave-gmond.conf.md Raw
  426. slave节点gmond.conf配置
  427.  
  428. /* This configuration is as close to 2.5.x default behavior as possible 
  429.    The values closely match ./gmond/metric.h definitions in 2.5.x */ 
  430. globals {                    
  431.   daemonize = yes              
  432.   setuid = yes             
  433.   user = ganglia              
  434.   debug_level = 0               
  435.   max_udp_msg_len = 1472        
  436.   mute = no             
  437.   deaf = no             
  438.   host_dmax = 0 /*secs */ 
  439.   cleanup_threshold = 300 /*secs */ 
  440.   gexec = no             
  441.   send_metadata_interval = 10     
  442.  
  443. /* If a cluster attribute is specified, then all gmond hosts are wrapped inside 
  444. * of a <CLUSTER> tag.  If you do not specify a cluster tag, then all <HOSTS> will 
  445. * NOT be wrapped inside of a <CLUSTER> tag. */ 
  446. cluster { 
  447.   name = "hadoop-cluster" 
  448.   owner = "ganglia" 
  449.   latlong = "unspecified" 
  450.   url = "unspecified" 
  451.  
  452. /* The host section describes attributes of the host, like the location */ 
  453. host { 
  454.   location = "unspecified" 
  455.  
  456. /* Feel free to specify as many udp_send_channels as you like.  Gmond 
  457.    used to only support having a single channel */ 
  458. udp_send_channel { 
  459.   #mcast_join = 239.2.11.71
  460.   host = master 
  461.   port = 8649 
  462.   ttl = 1 
  463.  
  464. /* You can specify as many udp_recv_channels as you like as well. */ 
  465. udp_recv_channel { 
  466.   #mcast_join = 239.2.11.71 
  467.   port = 8649 
  468.   #bind = 239.2.11.71 
  469.  
  470. /* You can specify as many tcp_accept_channels as you like to share 
  471.    an xml description of the state of the cluster */ 
  472. tcp_accept_channel { 
  473.   port = 8649 
  474.  
  475. /* Each metrics module that is referenced by gmond must be specified and 
  476.    loaded. If the module has been statically linked with gmond, it does not 
  477.    require a load path. However all dynamically loadable modules must include 
  478.    a load path. */ 
  479. modules { 
  480.   module { 
  481.     name = "core_metrics" 
  482.   } 
  483.   module { 
  484.     name = "cpu_module" 
  485.     path = "/usr/lib/ganglia/modcpu.so" 
  486.   } 
  487.   module { 
  488.     name = "disk_module" 
  489.     path = "/usr/lib/ganglia/moddisk.so" 
  490.   } 
  491.   module { 
  492.     name = "load_module" 
  493.     path = "/usr/lib/ganglia/modload.so" 
  494.   } 
  495.   module { 
  496.     name = "mem_module" 
  497.     path = "/usr/lib/ganglia/modmem.so" 
  498.   } 
  499.   module { 
  500.     name = "net_module" 
  501.     path = "/usr/lib/ganglia/modnet.so" 
  502.   } 
  503.   module { 
  504.     name = "proc_module" 
  505.     path = "/usr/lib/ganglia/modproc.so" 
  506.   } 
  507.   module { 
  508.     name = "sys_module" 
  509.     path = "/usr/lib/ganglia/modsys.so" 
  510.   } 
  511.  
  512. include ('/etc/ganglia/conf.d/*.conf') 
  513.  
  514.  
  515. /* The old internal 2.5.x metric array has been replaced by the following 
  516.    collection_group directives.  What follows is the default behavior for 
  517.    collecting and sending metrics that is as close to 2.5.x behavior as 
  518.    possible. */
  519.  
  520. /* This collection group will cause a heartbeat (or beacon) to be sent every 
  521.    20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses 
  522.    the age of the running gmond. */ 
  523. collection_group { 
  524.   collect_once = yes 
  525.   time_threshold = 20 
  526.   metric { 
  527.     name = "heartbeat" 
  528.   } 
  529.  
  530. /* This collection group will send general info about this host every 1200 secs. 
  531.    This information doesn't change between reboots and is only collected once. */ 
  532. collection_group { 
  533.   collect_once = yes 
  534.   time_threshold = 1200 
  535.   metric { 
  536.     name = "cpu_num" 
  537.     title = "CPU Count" 
  538.   } 
  539.   metric { 
  540.     name = "cpu_speed" 
  541.     title = "CPU Speed" 
  542.   } 
  543.   metric { 
  544.     name = "mem_total" 
  545.     title = "Memory Total" 
  546.   } 
  547.   /* Should this be here? Swap can be added/removed between reboots. */ 
  548.   metric { 
  549.     name = "swap_total" 
  550.     title = "Swap Space Total" 
  551.   } 
  552.   metric { 
  553.     name = "boottime" 
  554.     title = "Last Boot Time" 
  555.   } 
  556.   metric { 
  557.     name = "machine_type" 
  558.     title = "Machine Type" 
  559.   } 
  560.   metric { 
  561.     name = "os_name" 
  562.     title = "Operating System" 
  563.   } 
  564.   metric { 
  565.     name = "os_release" 
  566.     title = "Operating System Release" 
  567.   } 
  568.   metric { 
  569.     name = "location" 
  570.     title = "Location" 
  571.   } 
  572.  
  573. /* This collection group will send the status of gexecd for this host every 300 secs */
  574. /* Unlike 2.5.x the default behavior is to report gexecd OFF.  */ 
  575. collection_group { 
  576.   collect_once = yes 
  577.   time_threshold = 300 
  578.   metric { 
  579.     name = "gexec" 
  580.     title = "Gexec Status" 
  581.   } 
  582.  
  583. /* This collection group will collect the CPU status info every 20 secs. 
  584.    The time threshold is set to 90 seconds.  In honesty, this time_threshold could be 
  585.    set significantly higher to reduce unneccessary network chatter. */ 
  586. collection_group { 
  587.   collect_every = 20 
  588.   time_threshold = 90 
  589.   /* CPU status */ 
  590.   metric { 
  591.     name = "cpu_user"  
  592.     value_threshold = "1.0" 
  593.     title = "CPU User" 
  594.   } 
  595.   metric { 
  596.     name = "cpu_system"   
  597.     value_threshold = "1.0" 
  598.     title = "CPU System" 
  599.   } 
  600.   metric { 
  601.     name = "cpu_idle"  
  602.     value_threshold = "5.0" 
  603.     title = "CPU Idle" 
  604.   } 
  605.   metric { 
  606.     name = "cpu_nice"  
  607.     value_threshold = "1.0" 
  608.     title = "CPU Nice" 
  609.   } 
  610.   metric { 
  611.     name = "cpu_aidle" 
  612.     value_threshold = "5.0" 
  613.     title = "CPU aidle" 
  614.   } 
  615.   metric { 
  616.     name = "cpu_wio" 
  617.     value_threshold = "1.0" 
  618.     title = "CPU wio" 
  619.   } 
  620.   /* The next two metrics are optional if you want more detail... 
  621.      ... since they are accounted for in cpu_system.  
  622.   metric { 
  623.     name = "cpu_intr" 
  624.     value_threshold = "1.0" 
  625.     title = "CPU intr" 
  626.   } 
  627.   metric { 
  628.     name = "cpu_sintr" 
  629.     value_threshold = "1.0" 
  630.     title = "CPU sintr" 
  631.   } 
  632.   */ 
  633.  
  634. collection_group { 
  635.   collect_every = 20 
  636.   time_threshold = 90 
  637.   /* Load Averages */ 
  638.   metric { 
  639.     name = "load_one" 
  640.     value_threshold = "1.0" 
  641.     title = "One Minute Load Average" 
  642.   } 
  643.   metric { 
  644.     name = "load_five" 
  645.     value_threshold = "1.0" 
  646.     title = "Five Minute Load Average" 
  647.   } 
  648.   metric { 
  649.     name = "load_fifteen" 
  650.     value_threshold = "1.0" 
  651.     title = "Fifteen Minute Load Average" 
  652.   }
  653.  
  654. /* This group collects the number of running and total processes */ 
  655. collection_group { 
  656.   collect_every = 80 
  657.   time_threshold = 950 
  658.   metric { 
  659.     name = "proc_run" 
  660.     value_threshold = "1.0" 
  661.     title = "Total Running Processes" 
  662.   } 
  663.   metric { 
  664.     name = "proc_total" 
  665.     value_threshold = "1.0" 
  666.     title = "Total Processes" 
  667.   } 
  668. }
  669.  
  670. /* This collection group grabs the volatile memory metrics every 40 secs and 
  671.    sends them at least every 180 secs.  This time_threshold can be increased 
  672.    significantly to reduce unneeded network traffic. */ 
  673. collection_group { 
  674.   collect_every = 40 
  675.   time_threshold = 180 
  676.   metric { 
  677.     name = "mem_free" 
  678.     value_threshold = "1024.0" 
  679.     title = "Free Memory" 
  680.   } 
  681.   metric { 
  682.     name = "mem_shared" 
  683.     value_threshold = "1024.0" 
  684.     title = "Shared Memory" 
  685.   } 
  686.   metric { 
  687.     name = "mem_buffers" 
  688.     value_threshold = "1024.0" 
  689.     title = "Memory Buffers" 
  690.   } 
  691.   metric { 
  692.     name = "mem_cached" 
  693.     value_threshold = "1024.0" 
  694.     title = "Cached Memory" 
  695.   } 
  696.   metric { 
  697.     name = "swap_free" 
  698.     value_threshold = "1024.0" 
  699.     title = "Free Swap Space" 
  700.   } 
  701.  
  702. collection_group { 
  703.   collect_every = 40 
  704.   time_threshold = 300 
  705.   metric { 
  706.     name = "bytes_out" 
  707.     value_threshold = 4096 
  708.     title = "Bytes Sent" 
  709.   } 
  710.   metric { 
  711.     name = "bytes_in" 
  712.     value_threshold = 4096 
  713.     title = "Bytes Received" 
  714.   } 
  715.   metric { 
  716.     name = "pkts_in" 
  717.     value_threshold = 256 
  718.     title = "Packets Received" 
  719.   } 
  720.   metric { 
  721.     name = "pkts_out" 
  722.     value_threshold = 256 
  723.     title = "Packets Sent" 
  724.   } 
  725. }
  726.  
  727. /* Different than 2.5.x default since the old config made no sense */ 
  728. collection_group { 
  729.   collect_every = 1800 
  730.   time_threshold = 3600 
  731.   metric { 
  732.     name = "disk_total" 
  733.     value_threshold = 1.0 
  734.     title = "Total Disk Space" 
  735.   } 
  736. }
  737.  
  738. collection_group { 
  739.   collect_every = 40 
  740.   time_threshold = 180 
  741.   metric { 
  742.     name = "disk_free" 
  743.     value_threshold = 1.0 
  744.     title = "Disk Space Available" 
  745.   } 
  746.   metric { 
  747.     name = "part_max_used" 
  748.     value_threshold = 1.0 
  749.     title = "Maximum Disk Space Used" 
  750.   } 
  751. }
  752. slave-hadoop-metrics2-hbase.properties.md Raw
  753. slave节点hadoop-metrics2-hbase.properties配置
  754.  
  755. # syntax: [prefix].[source|sink].[instance].[options]
  756. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
  757.  
  758. #*.sink.file*.class=org.apache.hadoop.metrics2.sink.FileSink
  759. # default sampling period
  760. #*.period=10
  761.  
  762. # Below are some examples of sinks that could be used
  763. # to monitor different hbase daemons.
  764.  
  765. # hbase.sink.file-all.class=org.apache.hadoop.metrics2.sink.FileSink
  766. # hbase.sink.file-all.filename=all.metrics
  767.  
  768. # hbase.sink.file0.class=org.apache.hadoop.metrics2.sink.FileSink
  769. # hbase.sink.file0.context=hmaster
  770. # hbase.sink.file0.filename=master.metrics
  771.  
  772. # hbase.sink.file1.class=org.apache.hadoop.metrics2.sink.FileSink
  773. # hbase.sink.file1.context=thrift-one
  774. # hbase.sink.file1.filename=thrift-one.metrics
  775.  
  776. # hbase.sink.file2.class=org.apache.hadoop.metrics2.sink.FileSink
  777. # hbase.sink.file2.context=thrift-two
  778. # hbase.sink.file2.filename=thrift-one.metrics
  779.  
  780. # hbase.sink.file3.class=org.apache.hadoop.metrics2.sink.FileSink
  781. # hbase.sink.file3.context=rest
  782. # hbase.sink.file3.filename=rest.metrics
  783.  
  784.  
  785. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  
  786. *.sink.ganglia.period=10  
  787.  
  788. hbase.sink.ganglia.period=10  
  789. hbase.sink.ganglia.servers=master:8649
  790. slave-hadoop-metrics2.properties.md Raw
  791. slave节点hadoop-metrics2.properties配置
  792.  
  793. #
  794. #   Licensed to the Apache Software Foundation (ASF) under one or more
  795. #   contributor license agreements.  See the NOTICE file distributed with
  796. #   this work for additional information regarding copyright ownership.
  797. #   The ASF licenses this file to You under the Apache License, Version 2.0
  798. #   (the "License"); you may not use this file except in compliance with
  799. #   the License.  You may obtain a copy of the License at
  800. #
  801. #       http://www.apache.org/licenses/LICENSE-2.0
  802. #
  803. #   Unless required by applicable law or agreed to in writing, software
  804. #   distributed under the License is distributed on an "AS IS" BASIS,
  805. #   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  806. #   See the License for the specific language governing permissions and
  807. #   limitations under the License.
  808. #
  809.  
  810. # syntax: [prefix].[source|sink].[instance].[options]
  811. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
  812.  
  813. #*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
  814. # default sampling period, in seconds
  815. #*.period=10
  816.  
  817. # The namenode-metrics.out will contain metrics from all context
  818. #namenode.sink.file.filename=namenode-metrics.out
  819. # Specifying a special sampling period for namenode:
  820. #namenode.sink.*.period=8
  821.  
  822. #datanode.sink.file.filename=datanode-metrics.out
  823.  
  824. # the following example split metrics of different
  825. # context to different sinks (in this case files)
  826. #jobtracker.sink.file_jvm.context=jvm
  827. #jobtracker.sink.file_jvm.filename=jobtracker-jvm-metrics.out
  828. #jobtracker.sink.file_mapred.context=mapred
  829. #jobtracker.sink.file_mapred.filename=jobtracker-mapred-metrics.out
  830.  
  831. #tasktracker.sink.file.filename=tasktracker-metrics.out
  832.  
  833. #maptask.sink.file.filename=maptask-metrics.out
  834.  
  835. #reducetask.sink.file.filename=reducetask-metrics.out
  836.  
  837. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  
  838. *.sink.ganglia.period=10
  839.  
  840. *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both  
  841. *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40  
  842.  
  843. namenode.sink.ganglia.servers=master:8649  
  844. resourcemanager.sink.ganglia.servers=master:8649  
  845.  
  846. datanode.sink.ganglia.servers=master:8649    
  847. nodemanager.sink.ganglia.servers=master:8649    
  848.  
  849.  
  850. maptask.sink.ganglia.servers=master:8649    
  851. reducetask.sink.ganglia.servers=master:8649
  852.  
为您推荐

友情链接 |九搜汽车网 |手机ok生活信息网|ok生活信息网|ok微生活
 Powered by www.360SDN.COM   京ICP备11022651号-4 © 2012-2016 版权