首先启动hadoop集群三个节点。master用来汇聚日志,slave1,slave2用来采集日志
1、master主机下载安装包:
http://archive.apache.org/dist/flume/1.8.0/
2、创建flume文件夹
mkdir /flume
3、上传安装包到flume文件夹下
cd /flume
rz
解压:tar -xzf apache-flume-1.8.0-bin.tar.gz
4、flume-env.sh环境变量配置
cd /flume/apache-flume-1.8.0-bin/conf
创建配置文件
cp flume-env.sh.template flume-env.sh
编辑配置文件
vi flume-env.sh
在flume-env.sh中添加
export JAVA_HOME=/usr/java/jdk1.8.0_221
5、配置flume文件系统环境变量
vi /etc/profile
在最后两行添加
export FLUME_HOME=/flume/apache-flume-1.8.0-bin
export PATH=$PATH:$FLUME_HOME/bin
使环境变量立即生效
source /etc/profile
6、将创建好的配置远程复制到slave1、slave2节点
(1)复制安装包
scp -r /flume/ root@slave1:/
scp -r /flume/ root@slave2:/
(2)复制环境变量
scp -r /etc/profile root@slave1:/etc/
scp -r /etc/profile root@slave2:/etc/
(3)到slave1、slave2执行source命令
source /etc/profile
7、slave1、slave2创建日志文件
(1)slave1执行:
mkdir /flume/logs
cd /flume/logs
vi access.log 添加内容:slave1 access.log
vi nginx.log 添加内容:slave1 nginx.log
vi web.log 添加内容:slave1 web.log
(2)slave2执行:
mkdir /flume/logs
cd /flume/logs
vi access.log 添加内容:slave2 access.log
vi nginx.log 添加内容:slave2 nginx.log
vi web.log 添加内容:slave2 web.log
8、master、slave1、slave2创建采集日志配置文件
(1)slave1配置文件:
cd /flume/apache-flume-1.8.0-bin/conf
vi flume-conf.properties
添加内容如下:
#命名
a1.sources = r1 r2 r3
a1.sinks = k1
a1.channels = c1
#资源1
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /flume/logs/access.log
a1.sources.r1.channels = c1
#由于3个资源同时发送给node03,node03不认识都是哪个路径资源发送的,所以需要拦截器,标识一下
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = key
a1.sources.r1.interceptors.i1.value = access_log
#资源2
a1.sources.r2.type = exec
a1.sources.r2.command = tail -F /flume/logs/nginx.log
a1.sources.r2.channels = c1
#拦截器
a1.sources.r2.interceptors = i2
a1.sources.r2.interceptors.i2.type = static
a1.sources.r2.interceptors.i2.key = key
a1.sources.r2.interceptors.i2.value = nginx_log
#资源3
a1.sources.r3.type = exec
a1.sources.r3.command = tail -F /flume/logs/web.log
a1.sources.r3.channels = c1
#拦截器
a1.sources.r3.interceptors = i3
a1.sources.r3.interceptors.i3.type = static
a1.sources.r3.interceptors.i3.key = key
a1.sources.r3.interceptors.i3.value = web_log
#sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = master
a1.sinks.k1.port = 41414
#channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 1000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000
(2)将slave1的采集配置文件远程复制到slave2
scp -r /flume/apache-flume-1.8.0-bin/conf/flume-conf.properties root@slave2:/flume/apache-flume-1.8.0-bin/conf/
(3)创建master采集配置文件
cd /flume/apache-flume-1.8.0-bin/conf
vi flume-conf.properties
添加内容如下:
#命名
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#资源
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.1.4
a1.sources.r1.port = 41414
#定义通道
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000
#定义sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path =hdfs://192.168.1.4:9000/flume/logs/%{key}/%y-%m-%d/
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
9、master、slave1、slave2启动flume agent
三个节点分别执行:
flume-ng agent --conf conf --conf-file /flume/apache-flume-1.8.0-bin/conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console
10、查看日志采集结果
在crt中新开启一个 192.168.1.4窗口
查看hdfs上产生文件路径:hadoop fs -lsr /
查看文件内容:hadoop fs -cat /文件路径

