在UNIX下的管理性文件,大部分是不需要特殊的文件专用工具即可编辑,打印和阅读的简易文本文件。
这些文件大部分放在标准目录:/etc下。如:
常见的密码文件和组文件:(passwd,group)
文件系统加载表:(fstab,vfstab)
主机文件:(hosts)
默认的shell启动文件:(profile)
系统启动和关机的shell脚本:(存放于子目录树rc0.d,rc1.d … rc6.d下)
从结构化文本文件中提取数据
练习1:切割passwd下第一,第七字段
[linuxidc@test ~]$ vi patest.sh
#!/bin/bash
umask 077
PERSON=/tmp/pd.key.person.$$
OFFICE=/tmp/pd.key.office.$$
TELEPHONE=/tmp/pd.key.telephone.$$
USER=/tmp/pd.key.user.$$
trap “exit 1” HUP INT PIPE QUIT TERM
trap “rm -f $PERSON $OFFICE $TELEPHONE $USER ” EXIT
awk -F: ‘{ print $1 “:” $7 }’ /etc/passwd > $USER
awk -F: ‘{ print $1}’ < $USER | sort >$PERSON
sed -e ‘s=^[:]∗:[^/]*/[/]∗/.*$=\1:\2=’ < $USER | sort >$OFFICE
sed -e ‘s=^[:]∗:[^/]*/[^/]*/[/]∗=\1:\2=’ < $USER | sort >$TELEPHONE
join -t: $PERSON $OFFICE |
join -t: – $TELEPHONE |
sort -t: -k1,1 -k2,2 -k3,3 |
awk -F: ‘{ printf(“%-39s\t%s\t%s\n”, $1,$2,$3) }’
[linuxidc@test ~]$ chmod +x patest.sh
[linuxidc@test ~]$ bash +x patest.sh
adm sbin nologin
alert2system bin bash
alert2systemtest bin bash
avahi sbin nologin
bin sbin nologin
cvsroot bin bash
dbus sbin nologin
dovecot sbin nologin
ftp sbin nologin
ftpuser bin bash
games sbin nologin
gdm sbin nologin
git_test usr local/git/bin/git-shell
gopher sbin nologin
gup sbin nologin
….
练习2: 如果/etc/passwd 下第五字段包含姓名,办公室号码,电话等,
如下文档,试建立办公室名录
[linuxidc@test ~]$ vi passwd1
gz_willwu:x:843:843:Will wu/SN091/555-6728:/home/gz_willwu:/bin/bash
ninf_thomaschan:x:853:853:Thomas chan/INF002/554-4565:/home/sninf_thomaschan:/bin/bash
llwu:x:843:843:Will wu/SN091/555-6728:/home/gz_willwu:/bin/bash
sninf_thomaschan:x:853:853:Thomas chan/INF002/554-4565:/home/sninf_thomaschan:/bin/bash
sninf_tonyhung:x:856:856:Tonny huang/HK0501/553-6465:/home/sninf_tonyhung:/bin/bash
gz_kinma:x:857:857:Kin ma/SN021/555-6733:/home/gz_kinma:/bin/bash
linuxidc:x:859:859:Field yang/SN001/555-6765:/home/linuxidc:/bin/bash
gz_hilwu:x:843:843:hil wu/SN021/555-6744:/home/gz_willwu:/bin/bash
步骤解析:
①[linuxidc@test ~]$ awk -F: ‘{ print $1 “:” $5 }’ passwd1 |
> sed -e ‘s=/.*==’ -e ‘s=^[:]∗:.∗ []∗=\1:\3, \2=’
ninf_thomaschan:chan, Thomas
llwu:wu, Will
sninf_thomaschan:chan, Thomas
sninf_tonyhung:huang, Tonny
gz_kinma:ma, Kin
linuxidc:yang, Field
gz_willwu:wu, Will
# ^[:]∗ 匹配用户名称字段,如gz_willwu
# .∗□ 匹配文字到空白处,如will□wu
# []∗ 匹配剩下的非空白处,如will
# \1:\3, \2 引用第一个左括号匹配到的内容:第三个左括号匹配到的内容, 第二个左括号匹配到的内容
#结果如 sninf_thomaschan:chan, Thomas
②[linuxidc@test ~]$ awk -F: ‘{ print $1 “:” $5 }’ passwd1 |
> sed -e ‘s=^[:]∗:[^/]*/[/]∗/.*$=\1:\2=’
ninf_thomaschan:INF002
llwu:SN091
sninf_thomaschan:INF002
sninf_tonyhung:HK0501
gz_kinma:SN021
linuxidc:SN001
gz_willwu:SN091
③[linuxidc@test ~]$ awk -F: ‘{ print $1 “:” $5 }’ passwd1 |
> sed -e ‘s=^[:]∗:[^/]*/[^/]*/[/]∗=\1:\2=’
ninf_thomaschan:554-4565
llwu:555-6728
sninf_thomaschan:554-4565
sninf_tonyhung:553-6465
gz_kinma:555-6733
linuxidc:555-6765
gz_willwu:555-6728
实际运行脚本如下:建立办公室名录的脚本
[linuxidc@test ~]$ vi patest.sh
#!/bin/bash
# 过滤/etc/passwd之类的输入流
#并以此书库衍生出办公室名录
#
#
umask 077
PERSON=/tmp/pd.key.person.$$
OFFICE=/tmp/pd.key.office.$$
TELEPHONE=/tmp/pd.key.telephone.$$
USER=/tmp/pd.key.user.$$
trap “exit 1” HUP INT PIPE QUIT TERM
trap “rm -f $PERSON $OFFICE $TELEPHONE $USER ” EXIT
awk -F: ‘{ print $1 “:” $5 }’ passwd1 > $USER
sed -e ‘s=/.*==’ \
# s=/.*== 删除第一个/后直至行结尾所有内容,截取后结果如gz_willwu:Will wu
-e ‘s=^[:]∗:.∗ []∗=\1:\3, \2=’ < $USER | sort >$PERSON
sed -e ‘s=^[:]∗:[^/]*/[/]∗/.*$=\1:\2=’ < $USER | sort >$OFFICE
sed -e ‘s=^[:]∗:[^/]*/[^/]*/[/]∗=\1:\2=’ < $USER | sort >$TELEPHONE
join -t: $PERSON $OFFICE |
#结合个人信息与办公室位置
join -t: – $TELEPHONE |
#加入电话号码
cut -d: -f 2- |
#删除键值,使用cut截取字段2直至结束
sort -t: -k1,1 -k2,2 -k3,3 |
# 以:分隔字段,依次对字段1,2,3进行排序
awk -F: ‘{ printf(“%-39s\t%s\t%s\n”, $1,$2,$3) }’
#重新格式化输出
附:
$# 是传给脚本的参数个数
$0 是脚本本身的名字
$1 是传递给该shell脚本的第一个参数
$2 是传递给该shell脚本的第二个参数
$@ 是传给脚本的所有参数的列表
$* 是以一个单字符串显示所有向脚本传递的参数,与位置变量不同,参数可超过9个
$$ 是脚本运行的当前进程ID号
$? 是显示最后命令的退出状态,0表示没有错误,其他表示有错误
[linuxidc@test ~]$ ./patest2.sh
chan, Thomas INF002 554-4565
chan, Thomas INF002 554-4565
huang, Tonny HK0501 553-6465
ma, Kin SN021 555-6733
wu, hil SN021 555-6744
wu, Will SN091 555-6728
yang, Field SN001 555-6765
[linuxidc@test ~]$
练习3:建立一个脚本,查询匹配调节的特定文字
[linuxidc@test ~]$ vi puzzle-help.sh
#!/bin/bash
#通过一堆单词列表,进行模式匹配
#语法: ./puzzle-help.sh egrep-pattern [word-list-file]
FILES=”/usr/share/dict/words
/usr/dict/words
/usr/share/lib/dict/words
/usr/local/share/dict/words.biology
/usr/local/share/dict/words.chemistry
/usr/local/share/dict/words.general
/usr/local/share/dict/words.knuth
/usr/local/share/dict/words.latin
/usr/local/share/dict/words.manpages
/usr/local/share/dict/words.mathematics
/usr/local/share/dict/words.physics
/usr/local/share/dict/words.roget
/usr/local/share/dict/words.sciences
/usr/local/share/dict/words.UNIX
/usr/local/share/dict/words.webster
“
#FILES变量保存了单词列表文件的内建列表,可供各个本地站点定制
pattern=”$1″
egrep -h -i “$pattern” $FILES 2>/dev/null | sort -u -f
#grep -h :指示最后结果不要显示文件名,-i:表示忽略大小写
#sort -u :只有唯一的记录,丢弃所有具相同键值的记录
#sort -f :排序时忽略大小写,均视为大写字母
①[linuxidc@test ~]$ ./puzzle-help.sh ‘^b…..[xz]…$’ | fmt
Babelizing bamboozled bamboozler bamboozles baronizing Bellinzona
Belshazzar bigamizing bilharzial Birobizhan botanizing Brontozoum
Buitenzorg bulldozers bulldozing
#匹配b开头,中间任意五个字符,加上x/z,再加任意三个字符
②[linuxidc@test ~]$ ./puzzle-help.sh ‘[^aeiouy]{7}’ /usr/dict/words |fmt
2,4,5-t A.M.D.G. arch-christendom arch-christianity A.R.C.S.
branch-strewn B.R.C.S. bright-striped drought-stricken earth-sprung
earth-strewn first-string K.C.M.G. latch-string light-spreading
light-struck Llanfairpwllgwyngyll night-straying night-struck
Nuits-St-Georges pgnttrp R.C.M.P. rock-‘n’-roll R.S.V.P. scritch-scratch
scritch-scratching strength-bringing substrstrata thought-straining
tight-stretched tsktsks witch-stricken witch-struck world-schooled
world-spread world-strange world-thrilling
# 找出每行7个辅音字母的英文单词
[linuxidc@test ~]$ ./puzzle-help.sh ‘[^aeiouy]{8}’ /usr/dict/words |fmt
B.R.C.S. K.C.M.G. R.C.M.P. rock-‘n’-roll R.S.V.P.
③[linuxidc@test ~]$ ./puzzle-help.sh ‘[aeiouy]{6}’ /usr/dict/words |fmt
AAAAAA euouae
# 找出每行6个元音字母的英文单词
[linuxidc@test ~]$ ./puzzle-help.sh ‘[aeiouy]{5}’ /usr/dict/words |fmt
AAAAAA Aeaea Aeaean AIEEE ayuyu Bayeau Blueeye cadiueio Chaouia cooeeing
cooeyed cooeying euouae fooyoung gayyou Guauaenok Iyeyasu Jayuya
Liaoyang Mayeye miaoued miaouing Pauiie queueing Taiyuan taoiya theyaou
trans-Paraguayian ukiyoye Waiyeung
[linuxidc@test ~]$
练习4:试建立一个脚本,作为单词出现频率过滤器
[linuxidc@test ~]$ vi wf.sh
#!/bin/bash
#从标准输入流读取文本流,在输出出现频率最高的前n个单词的列表
#附上出现频率的计数,按照这几计数由大到小排列
#输出到标准输出
#语法 : ./wf [n] < file
#
tr -cs A-Za-z\’ ‘\n’ |
#将非字母字符置换成换行符号,相当于:
# tr -cs [^[A-Za-z] ‘\n’
tr A-Z a-z |
sort |
uniq -c |
#去除重复,并显示其计数
sort -k1,1nr -k2 |
#计数由大到小排序后,再按单词由小到大排序
#sort -k:定义排序键值字段,按照那个字段(file)进行排序
#sort -n :依照数值的大小排序
#sort -r :以相反的顺序来排序,由大到小
# sort -k1,1nr :表示从字段1起始处开始,以数值类型反向排序,并结束与字段1的结尾
sed ${1:-25}q
#显示前n行,默认为25行
[linuxidc@test ~]$ vi test #随意截取文段建立测试文件
Patent interference cases are historically rare; but they’ve become basically
non-existent since a change in the patent law in 2013. Today, patents are
awarded on a “first to file” basis. However, prior to 2013, patents were granted
on a “first to invent” basis, meaning whoever could prove they invented the idea
first would have rights to the patent. Since Doudna’s and Zhang’s patents were filed
before the switch went into effect, the case falls under the “first to invent” standard.
In the past, patent interference cases like this were concluded within a year,
Sherkow said, but given the value of this patent, it seems more than likely that
the losing party will appeal the decision. That process could stretch out for years.
测试实例:
①、默认情况下格式化输出
[linuxidc@test ~]$ ./wf.sh < test | pr -c4 -t -w80
10 the 3 were 2 interferenc 2 they
5 patent 2 are 2 invent 2 this
5 to 2 basis 2 on 1 and
4 a 2 but 2 s 1 appeal
4 first 2 cases 2 since 1 awarded
3 in 2 could 2 that 1 basically
3 patents
#pr -cn:产生n栏的输出 可缩写为-n
#pr -t:不显示标题
#pr -wn:每行至多n个字符
②、截取前面12行后格式化输出
[linuxidc@test ~]$ ./wf.sh 12 < test | pr -c4 -t -w80
10 the 4 a 3 patents 2 basis
5 patent 4 first 3 were 2 but
5 to 3 in 2 are 2 cases
③、算出去除重复行后有多少单词出现
[linuxidc@test ~]$ ./wf.sh 9999 < test | wc -l
82
[linuxidc@test ~]$ ./wf.sh 9999 < test | wc -w
164
[linuxidc@test ~]$ ./wf.sh 999 < test | wc -c
1153
# wc -l:计算行数 ,-c:计算字节数 , -w:计算字数
④、截取最不常见的出现的单词
[linuxidc@test ~]$ ./wf.sh 999 < test | tail -n -12 | pr -c4 -t -w80
1 today 1 ve 1 will 1 year
1 under 1 went 1 within 1 years
1 value 1 whoever 1 would 1 zhang
⑤、计算出测试文档中出现一次的单词个数
[linuxidc@test ~]$ ./wf.sh 999 < test | grep -c ‘^ *1.’
62
#接在数字1后的.表示的是制表字符(Tab),参数999无意义,可任意取大于文档字数的数字
#grep -c:统计每个文件匹配的行数
⑥、计算出经常出现的核心单词个数
[linuxidc@test ~]$ ./wf.sh 999 < test | awk ‘$1 >=3’ | wc -l
8
[linuxidc@test ~]$
本文永久更新链接地址:http://www.linuxidc.com/Linux/2016-04/130078.htm