Linux系统管理中需要监控Linux服务器 的性能, Linux本身提供一些很有用的命令和组件, 几乎所有的Linux发行版都装备了大量的监控工具,这些工具通过获取系统状态从而达到衡量系统健康状况, 系统管理员可以使用这些工具来找到导致系统性能问题的原因。本文列出的命令是一些常用的基本命令,用来分析和调试系统状况,比如:
性能瓶颈.
磁盘 (存储 ) 问题.
CPU和内存的使用.
网络性能.
#1: top – 进程活动状态
“top” 命令提供运行系统的动态实时视图, 比如: 系统进程, 内存、CPU占用. 缺省状态下, top显示按CPU使用的进程, 每隔5秒更新一次.
常用热键
如下为”top”命令常用的热键:
热键 说明
t 显示/关闭概括信息.
m 显示/关闭内存信息.
A 降序显示占用的系统资源,可以很容易识别消耗系统资源多的进程.
f 交互模式配置模式, 可以运行一些特定的任务.
o 允许在top命令窗口交互式选择排序方式 .
r 对指定进程运行 renice 命令.(优先级)
k 对指定进程运行 kill 命令(杀死指定的进程)
z 打开/关闭 彩色/黑白 显示
=> 相关命令: How do I Find Out Linux CPU Utilization?
#2: vmstat – 系统状态,内存,CPU使用状况
“vmstat”命令报告系统进程、内存,页面、以及 block IO, traps, CPU的使用状况.
# vmstat 3
输出:
procs ———–memory———- —swap– —–io—- –system– —–cpu——
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 2540988 522188 5130400 0 0 2 32 4 2 4 1 96 0 0
1 0 0 2540988 522188 5130400 0 0 0 720 1199 665 1 0 99 0 0
0 0 0 2540956 522188 5130400 0 0 0 0 1151 1569 4 1 95 0 0
0 0 0 2540956 522188 5130500 0 0 0 6 1117 439 1 0 99 0 0
0 0 0 2540940 522188 5130512 0 0 0 536 1189 932 1 0 98 0 0
0 0 0 2538444 522188 5130588 0 0 0 0 1187 1417 4 1 96 0 0
0 0 0 2490060 522188 5130640 0 0 0 18 1253 1123 5 1 94 0 0显示内存使用情况
# vmstat -m
获取活动/非活动的内存页面状况
# vmstat -a
=> 相关链接: How do I find out Linux Resource utilization to detect system bottlenecks?
#3: w – 查看谁登录到系统,并且在做什么事情
w 命令显示当前登录到系统的用户,以及当前运行的进程/命令.
# w <用户名>
# w root
Sample Outputs:
[root@smtp ~]# w root
12:56:52 up 238 days, 13:52, 3 users, load average: 0.00, 0.00, 0.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root pts/0 10.10.3.38 12:56 26.00s 0.01s 0.00s vi /etc/hosts
root pts/1 10.10.3.38 12:56 0.00s 0.01s 0.00s w root
root pts/2 10.10.3.38 12:56 3.00s 0.02s 0.00s top
#4: uptime – 系统运行时间
uptime命令可以查看系统已经运行了多长时间,当前登录的用户,以及过去1分钟,5分钟和15分钟的系统负载。
# uptime
输出:
18:02:41 up 41 days, 23:42, 1 user, load average: 0.00, 0.00, 0.00
#5: ps – 显示系统进程
ps 命令会输出当前进程的报告,要输出所有进程,请使用 -A 或 -e 选项:
# ps -A
输出:
PID TTY TIME CMD
1 ? 00:00:02 init
2 ? 00:00:02 migration/0
3 ? 00:00:01 ksoftirqd/0
4 ? 00:00:00 watchdog/0
5 ? 00:00:00 migration/1
6 ? 00:00:15 ksoftirqd/1
….
…..
4881 ? 00:53:28 java
4885 tty1 00:00:00 mingetty
4886 tty2 00:00:00 mingetty
4887 tty3 00:00:00 mingetty
4888 tty4 00:00:00 mingetty
4891 tty5 00:00:00 mingetty
4892 tty6 00:00:00 mingetty
4893 ttyS1 00:00:00 agetty
12853 ? 00:00:00 cifsoplockd
12854 ? 00:00:00 cifsdnotifyd
14231 ? 00:10:34 lighttpd
14232 ? 00:00:00 php-cgi
54981 pts/0 00:00:00 vim
55465 ? 00:00:00 php-cgi
55546 ? 00:00:00 bind9-snmp-stat
55704 pts/1 00:00:00 psps is just like top but provides more information.
长格式输出
# ps -Al
超长格式输出:
# ps -AlF
查看线程 ( LWP 和 NLWP)
# ps -AlFH
查看每个进程的线程
# ps -AlLm
查看服务器 运行的所有进程
# ps ax
# ps axu
以树状输出进程状态
# ps -ejH
# ps axjf
# pstree
输出安全 信息
# ps -eo euser,ruser,SUSEr,fuser,f,comm,label
# ps axZ
# ps -eM
按特定用户身份查看进程
# ps -U root -u root u
按用户定义的格式输出进程状态
# ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm
# ps axo stat,euid,ruid,tty,tpgid,sess,pgrp,ppid,pid,pcpu,comm
# ps -eopid,tt,user,fname,tmout,f,wchan
显示某特定的进程ID, 例如:lighttpd
# ps -C lighttpd -o pid=
或者
# pgrep lighttpd
或者
# pgrep -u vivek php-cgi
显示PID 55977的进程名称
# ps -p 55977 -o comm=
找出前10个最多占用系统内存的进程
# ps -auxf | sort -nr -k 4 | head -10
找出前10个最多占用CPU资源的进程
# ps -auxf | sort -nr -k 3 | head -10
#6: free – 内存占用状况
free命令显示系统物理内存、交换 空间以及内核buffer的使用状态。
# free
输出:
total used free shared buffers cached
Mem: 12302896 9739664 2563232 0 523124 5154740
-/+ buffers/cache: 4061800 8241096
Swap: 1052248 0 1052248=> 相关命令 :
Linux Find Out Virtual Memory PAGESIZE
Linux Limit CPU Usage Per Process
How much RAM does my Ubuntu / Fedora Linux desktop PC have?
#7: iostat – 平均CPU 负载,磁盘活动状况
iostst命令显示CPU的状态以及设备的输入输出, 分区信息以及NFS的状态.
# iostat
输出:
Linux 2.6.18-128.1.14.el5 (www03.nixcraft.in) 06/26/2009
avg-cpu: %user %nice %system %iowait %steal %idle
3.50 0.09 0.51 0.03 0.00 95.86
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 22.04 31.88 512.03 16193351 260102868
sda1 0.00 0.00 0.00 2166 180
sda2 22.04 31.87 512.03 16189010 260102688
sda3 0.00 0.00 0.00 1615 0=> 相关链接: : Linux Track NFS Directory / Disk I/O Stats
#8: sar – 查看系统运行状态
sar命令用来收集、报告以及保存系统的活动信息,例如,查看网络计数,输入:
# sar -n DEV | more
显示自24th以来的网络计数:
# sar -n DEV -f /var/log/sa/sa24 | more
使用sar显示实时使用状态:
# sar 4 5
输出:
Linux 2.6.18-128.1.14.el5 (www03.nixway.net) 06/26/2010
06:45:12 PM CPU %user %nice %system %iowait %steal %idle
06:45:16 PM all 2.00 0.00 0.22 0.00 0.00 97.78
06:45:20 PM all 2.07 0.00 0.38 0.03 0.00 97.52
06:45:24 PM all 0.94 0.00 0.28 0.00 0.00 98.78
06:45:28 PM all 1.56 0.00 0.22 0.00 0.00 98.22
06:45:32 PM all 3.53 0.00 0.25 0.03 0.00 96.19
Average: all 2.02 0.00 0.27 0.01 0.00 97.70=> 相关命令: : How to collect Linux system utilization data into a file
#9: mpstat – 查看多处理器状态
mpstat命令查看系统的每个单个处理器的使用状态,使用 mpstat -P ALL 将显示每个处理器的平均使用率:
# mpstat -P ALL
输出:
Linux 2.6.18-128.1.14.el5 (www.6688.cc ) 06/26/2010
06:48:11 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
06:48:11 PM all 3.50 0.09 0.34 0.03 0.01 0.17 0.00 95.86 1218.04
06:48:11 PM 0 3.44 0.08 0.31 0.02 0.00 0.12 0.00 96.04 1000.31
06:48:11 PM 1 3.10 0.08 0.32 0.09 0.02 0.11 0.00 96.28 34.93
06:48:11 PM 2 4.16 0.11 0.36 0.02 0.00 0.11 0.00 95.25 0.00
06:48:11 PM 3 3.77 0.11 0.38 0.03 0.01 0.24 0.00 95.46 44.80
06:48:11 PM 4 2.96 0.07 0.29 0.04 0.02 0.10 0.00 96.52 25.91
06:48:11 PM 5 3.26 0.08 0.28 0.03 0.01 0.10 0.00 96.23 14.98
06:48:11 PM 6 4.00 0.10 0.34 0.01 0.00 0.13 0.00 95.42 3.75
06:48:11 PM 7 3.30 0.11 0.39 0.03 0.01 0.46 0.00 95.69 76.89=> : : Linux display each multiple SMP CPU processors utilization individually.
#10: pmap – 查看进程的内存使用状态
pmap命令查看单个进程的内存映射表,通过此命令可以查出内存瓶颈.
# pmap -d PID
显示 pid 为 47394的进程的内存占用状况, 输入:
# pmap -d 47394
Sample Outputs:
47394: /usr/bin/php-cgi
Address Kbytes Mode Offset Device Mapping
0000000000400000 2584 r-x– 0000000000000000 008:00002 php-cgi
0000000000886000 140 rw— 0000000000286000 008:00002 php-cgi
00000000008a9000 52 rw— 00000000008a9000 000:00000 [ anon ]
0000000000aa8000 76 rw— 00000000002a8000 008:00002 php-cgi
000000000f678000 1980 rw— 000000000f678000 000:00000 [ anon ]
000000314a600000 112 r-x– 0000000000000000 008:00002 ld-2.5.so
000000314a81b000 4 r—- 000000000001b000 008:00002 ld-2.5.so
000000314a81c000 4 rw— 000000000001c000 008:00002 ld-2.5.so
000000314aa00000 1328 r-x– 0000000000000000 008:00002 libc-2.5.so
000000314ab4c000 2048 —– 000000000014c000 008:00002 libc-2.5.so
…..
……
..
00002af8d48fd000 4 rw— 0000000000006000 008:00002 xsl.so
00002af8d490c000 40 r-x– 0000000000000000 008:00002 libnss_files-2.5.so
00002af8d4916000 2044 —– 000000000000a000 008:00002 libnss_files-2.5.so
00002af8d4b15000 4 r—- 0000000000009000 008:00002 libnss_files-2.5.so
00002af8d4b16000 4 rw— 000000000000a000 008:00002 libnss_files-2.5.so
00002af8d4b17000 768000 rw-s- 0000000000000000 000:00009 zero (deleted)
00007fffc95fe000 84 rw— 00007ffffffea000 000:00000 [ stack ]
ffffffffff600000 8192 —– 0000000000000000 000:00000 [ anon ]
mapped: 933712K writeable/private: 4304K shared: 768000K最后一条很重要哦:
mapped: 933712K 映射到文件的总内存
writeable/private: 4304K 私有地址空间
shared: 768000K 与其它进程共享的内存空间
=> 相关命令: : Linux find the memory used by a program / process using pmap command
#11 and #12: netstat and ss – 查看网络状态
The command netstat displays network connections, routing tables, interface statistics, masquerade connections, and multicast memberships. ss command is used to dump socket statistics. It allows showing information similar to netstat. See the following resources about ss and netstat commands:
ss: Display Linux TCP / UDP Network and Socket Information
Get Detailed Information About Particular IP address Connections Using netstat Command
#13: iptraf – 查看实时网络状态
The iptraf command is interactive colorful IP LAN monitor. It is an ncurses-based IP LAN monitor that generates various network statistics including TCP info, UDP counts, ICMP and OSPF information, Ethernet load info, node stats, IP checksum errors, and others. It can provide the following info in easy to read format:
Network traffic statistics by TCP connection
IP traffic statistics by network interface
Network traffic statistics by protocol
Network traffic statistics by TCP/UDP port and by packet size
Network traffic statistics by Layer2 address
#14: tcpdump – 查看详细网络流量数据包
The tcpdump is simple command that dump traffic on a network. However, you need good understanding of TCP/IP protocol to utilize this tool. For.e.g to display traffic info about DNS, enter:
# tcpdump -i eth1 ‘udp port 53’
To display all IPv4 HTTP packets to and from port 80, i.e. print only packets that contain data, not, for example, SYN and FIN packets and ACK-only packets, enter:
# tcpdump ‘tcp port 80 and (((ip[2:2] – ((ip[0]&0xf)<<2)) – ((tcp[12]&0xf0)>>2)) != 0)’
To display all FTP session to 202.54.1.5, enter:
# tcpdump -i eth1 ‘dst 202.54.1.5 and (port 21 or 20’
To display all HTTP session to 192.168.1.5:
# tcpdump -ni eth0 ‘dst 192.168.1.5 and tcp and port http’
Use wireshark to view detailed information about files, enter:
# tcpdump -n -i eth1 -s 0 -w output.txt src or dst port 80
#15: strace – 查看系统调用
Trace system calls and signals. This is useful for debugging webserver and other server problems. See how to use to trace the process and see What it is doing.
#16: /Proc 文件系统 – 查看系统内核的诸多参数
/proc file system provides detailed information about various hardware devices and other Linux kernel information. See Linux kernel /proc documentations for further details. Common /proc examples:
# cat /proc/cpuinfo
# cat /proc/meminfo
# cat /proc/zoneinfo
# cat /proc/mounts
17#: Nagios – 服务器和网络监控
Nagios is a popular open source computer system and network monitoring application software. You can easily monitor all your hosts, network equipment and services. It can send alert when things go wrong and again when they get better. FAN is “Fully Automated Nagios”. FAN goals are to provide a Nagios installation including most tools provided by the Nagios Community. FAN provides a CDRom image in the standard ISO format, making it easy to easilly install a Nagios server. Added to this, a wide bunch of tools are including to the distribution, in order to improve the user experience around Nagios.
18#: Cacti – 基于Web的监控工具
Cacti is a complete network graphing solution designed to harness the power of RRDTool’s data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices. It can provide data about network, CPU, memory, logged in users, Apache, DNS servers and much more. See how to install and configure Cacti network graphing tool under CentOS / RHEL.
#19: KDE System Guard – KDE下查看实时系统监控和报告
KSysguard is a network enabled task and system monitor application for KDE desktop. This tool can be run over ssh session. It provides lots of features such as a client/server architecture that enables monitoring of local and remote hosts. The graphical front end uses so-called sensors to retrieve the information it displays. A sensor can return simple values or more complex information like tables. For each type of information, one or more displays are provided. Displays are organized in worksheets that can be saved and loaded independently from each other. So, KSysguard is not only a simple task manager but also a very powerful tool to control large server farm s.
See the KSysguard handbook for detailed usage.
#20: Gnome System Monitor – Gnome下查看实时系统状态和报告
The System Monitor application enables you to display basic system information and monitor system processes, usage of system resources, and file systems. You can also use System Monitor to modify the behavior of your system. Although not as powerful as the KDE System Guard, it provides the basic information which may be useful for new users:
Displays various basic information about the computer’s hardware and software.
Linux Kernel version
GNOME version
Hardware
Installed memory
Processors and speeds
System Status
Currently available disk space
Processes
Memory and swap space
Network usage
File Systems
Lists all mounted filesystems along with basic information about each.
其它工具:
nmap – scan your server for open ports.
lsof – list open files, network connections and much more.
ntop web based tool – ntop is the best tool to see network usage in a way similar to what top command does for processes i.e. it is network traffic monitoring software. You can see network status, protocol wise distribution of traffic for UDP, TCP, DNS, HTTP and other protocols.
Conky – Another good monitoring tool for the X Window System. It is highly configurable and is able to monitor many system variables including the status of the CPU, memory, swap space, disk storage, temperatures, processes, network interfaces, battery power, system messages, e-mail inboxes etc.
GKrellM – It can be used to monitor the status of CPUs, main memory, hard disks, network interfaces, local and remote mailboxes, and many other things.
vnstat – vnStat is a console-based network traffic monitor. It keeps a log of hourly, daily and monthly network traffic for the selected interface(s).
htop – htop is an enhanced version of top, the interactive process viewer, which can display the list of processes in a tree form.
mtr – mtr combines the functionality of the traceroute and ping programs in a single network diagnostic tool.