Hadoop Eco System Installation – Contents

Here is the list of pages, that can help you to install Hadoop and its ecosystem products

Distributed HBASE & ZooKeeper Installation and Configuration

Hue Installation and Configuration

Flume Installation and Configuration

Hi,

flume-mini-logo

Here is an other exercise in my course.

2000px-wikipedia-logo-v2-en-svg1

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

Download and Extract

hadoop@gandhari:/opt/hadoop-2.6.4$ wget https://repository.cloudera.com/artifactory/public/org/apache/flume/flume-ng-dist/1.6.0-cdh5.5.1/flume-ng-dist-1.6.0-cdh5.5.1-bin.tar.gz

hadoop@gandhari:/opt/hadoop-2.6.4$ gunzip flume-ng-dist-1.6.0-cdh5.5.1-bin.tar.gz

hadoop@gandhari:/opt/hadoop-2.6.4$ tar -xvf flume-ng-dist-1.6.0-cdh5.5.1-bin.tar

 hadoop@gandhari:~$ ln -s apache-flume-1.6.0-cdh5.5.1-bin/ flume

hadoop@gandhari:~$ vi .bashrc

#FLUME VARIABLES
export FLUME_HOME=/opt/hadoop/flume
export PATH=$PATH:$FLUME_HOME/bin
export FLUME_CONF_DIR=/etc/hadoop/conf
export FLUME_CLASSPATH=/etc/hadoop/conf

hadoop@gandhari:~$ source .bashrc

Flume setup

hadoop@gandhari:~$ cd flume

hadoop@gandhari:~/flume$ mkdir logs

hadoop@gandhari:~/flume$ cd conf/

hadoop@gandhari:~/flume/conf$ cp flume-conf.properties.template flume.conf


hadoop@gandhari:~/flume/conf$ vi flume.conf

agent.sources = avroSrc
agent.channels = memoryChannel
agent.sinks = loggerSink hdfs-sink

# For each one of the sources, the type is defined
agent.sources.avroSrc.type = exec
agent.sources.avroSrc.port = 3631
agent.sources.avroSrc.threads = 2
agent.sources.avroSrc.bind=0.0.0.0
agent.sources.avroSrc.command = tail -f /opt/hadoop/logs/test.log

# The channel can be defined as follows.
agent.sources.avroSrc.channels = memoryChannel

# Each sink's type must be defined
agent.sinks.loggerSink.type = logger

#Specify the channel the sink should use
agent.sinks.loggerSink.channel = memoryChannel
agent.sinks.hdfs-sink.hdfs.path=hdfs://gandhari:9000/test/flume
agent.sinks.hdfs-sink.type=hdfs
agent.sinks.hdfs-sink.channel=memoryChannel
agent.sinks.hdfs-sink.hdfs.fileType=DataStream
agent.sinks.hdfs-sink.hdfs.rollInterval=1
agent.sinks.hdfs-sink.hdfs.writeFormat=Text

# Each channel's type is defined.
agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 100

Execution

hadoop@gandhari:~/flume/bin$ flume-ng agent --name agent --conf-file ../conf/flume.conf -Dflume.root.logger=DEBUG,console >> /opt/hadoop/logs/test.log