一 需要准备的软件
1.Ubuntu 14.04
? ? ?三个主机
? ? ?192.168.71.136 ?cloud01
? ? ?192.168.71.135 ?cloud02
? ? ? 192.168.71.137 ?cloud03
2.jdk-7u51-linux-i586.tar.gz
3.hadoop-2.2.0.tar.gz
百度云盘链接:pan.baidu.com/s/1pKADKNL
二 操作步骤
单机搭建
1.修改主机名分别修改三个主机名为cloud01 cloud02 cloud03
Sudo gedit /etc/hostnname(重启)
2 在hosts中添加地址内容
192.168.71.134 cloud01
192.168.71.129 cloud02
192.168.71.130 cloud03
Sudo gedit /etc/hosts
3 ?安装java(分别安装)
新建文件夹并八java的压缩包拷贝到该目录下
Sudo mkdir/usr/java
解压
Sudo tar –zxvf文件名
修改配置文件
Sudo gedit/etc/profile
添加如下内容:
export JAVA_HOME=/usr/java/jdk1.7.0_51
exportCLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
exportPATH=$JAVA_HOME/bin:$PATH
执行命令
Source /etc/profile
查看是否安装成功
Java –version
4.安装hadoop
把文件拷贝到家目录解压
Sudo tar –zxvf文件名
解压之后chmod –R 777得到的文件名 赋予执行权限
这一步为止表示单机安装完毕 验证一下
在该目录下执行
./bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 20
伪分布搭建(接上)
5.安装ssh
执行命令sudo apt-get install ssh
在家目录新建文件.sshsudo mkdir .ssh
进入该文件夹cd .ssh
ssh-keygen –t rsa (一路enter)
cat id_rsa.pub >> authorized_keys
sudo service ssh restart
测试一下ssh cloud00(效果是不需要输入密码)
6配置hadoop环境变量
首先在家目录创建几个文件夹
~/hddata/dfs/name
~/hddata/dfs/data
~/hddata/tmp
然后在hadoop 2.2.0文件夹下 修改三个配置文件
gedit etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_51
gedit etc/hadoop/core-site.xml
fs.default.name
hdfs://localhost:9000
hadoop.tmp.dir
/home/hduser/hddata/tmp
gedit etc/hadoop/hdfs-site.xml
dfs.namenode.name.dir
/home/hduser/hddata/dfs/name
dfs.datanode.data.dir
/home/hduser/hddata/dfs/data
dfs.replication
1
cp etc/hadoop/mapred-site.xml.templateetc/hadoop/mapred-site.xml
gedit etc/hadoop/mapred-site.xml
mapred.job.tracker
localhost:54311
mapred.map.tasks
10
mapred.reduce.tasks
2
格式硬盘
./bin/hdfs namenode –format
启动所有的程序
./sbin/start-all.sh
查看启动程序
Jps
3776 ResourceManager
3354 NameNode
3645 SecondaryNameNode
3467 DataNode
3895 NodeManager
4382 Jps
测试在浏览器中输入loalhost:50070
这个时候 伪分布搭建已经完成
集群搭建
1.解压集群配置文件,在虚拟机中打开三个机器
2.修改每台机器的固定IP地址,注意查看网关和DNS
3.修改每台机器的hosts
sudo gedit /etc/hosts
192.168.71..136 cloud01
192.168.71.135 cloud02
192.168.71.137 cloud03
注:请将原文件最上面的第二行127.0.1.1删除掉,每台机器都要做
4.每台机器配公私钥
sudo apt-get install ssh
mkdir .ssh
cd .ssh
ssh-keygen -t rsa
cat id_rsa.pub>>authorized_keys
sudo service ssh restart
ssh localhost
如果存在.ssh文件夹,则应先删除.ssh(rm-rf .ssh)
5.发送主机的公钥,并加入到每台机器的授权文件中
cd .ssh
scp authorized_keyshduser@cloud02:~/.ssh/authorized_keys_from_cloud01
分别进入cloud02和cloud03,执行以下命令
cd .ssh
catauthorized_keys_from_cloud01>>authorized_keys
6.在每台机器上安装jdk
7.在主机上安装hadoop-2.2.0(tar-zxvf hadoop-2.2.0.tar.gz)
8.在每台机器的主文件夹下新建以下三个文件夹
~/hddata/dfs/name
~/hddata/dfs/data
~/hdata/tmp
scp -r ~/hddata hduser@cloud02:~/
scp -r ~/hddata hduser@cloud03:~/
9.在主机上修改7个配置文件
cd hadoop-2.2.0
(1)geditetc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_51
(2)geditetc/hadoop/yarn-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_51
(3)geditetc/hadoop/slaves
cloud01
cloud02
cloud03
(4)geditetc/hadoop/core-site.xml
fs.defaultFS
hdfs://cloud01:9000
io.file.buffer.size
131072
hadoop.tmp.dir
/home/hduser/hddata/tmp
(5)geditetc/hadoop/hdfs-site.xml
dfs.namenode.secondary.http-address
cloud01:9001
dfs.namenode.name.dir
/home/hduser/hddata/dfs/name
dfs.datanode.data.dir
/home/hduser/hddata/dfs/data
dfs.replication
2
dfs.webhdfs.enabled
true
(6)cpmapred-site.xml.template mapred-site.xml
gedit etc/hadoop/mapred-site.xml
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
cloud01:10020
mapreduce.jobhistory.webapp.address
cloud01:19888
(7)geditetc/hadoop/yarn-site.xml
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
yarn.resourcemanager.address
cloud01:8132
yarn.resourcemanager.scheduler.address
cloud01:8130
yarn.resourcemanager.resource-tracker.address
cloud01:8131
yarn.resourcemanager.admin.address
cloud01:8133
yarn.resourcemanager.webapp.address
cloud01:8188
9.将主机上的hadoop-2.2.0的文件夹发送给另两台机器
scp -r hadoop-2.2.0 hduser@cloud02:~/
scp -r hadoop-2.2.0 hduser@cloud03:~/
10.格式化namenode
cd hadoop-2.2.0
./bin/hdfs namenode -format
11.启动hadoop
./sbin/start-all.sh
查看文件块组成
./bin/hdfs fsck / -files -blocks
./bin/hdfs dfsadmin -report
http://192.168.71.136:50070
http://192.168.71.136:8188
./sbin/mr-jobhistory-daemon.sh start historyserver
12.运行pi
./bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 20