MODOP Cluster Hadoop /HDFS RAID6 (Single NameNode) – Partie 1

Inventaires des machines

Node-hadoop01 (NameNode)

  • Lan Public : 192.168.1.50/24
  • Lan Privé : 172.16.185.50/24

Node-hadoop02 (DataNode01)

  • Lan Public : 192.168.1.51/24
  • Lan Privé : 172.16.185.51/24

Node-hadoop03 (DataNode 02)

  • Lan Public : 192.168.1.52/24
  • Lan Privé : 172.16.185.52/24

Node-hadoop04 (DataNode 03)

  • Lan Public : 192.168.1.53/24
  • Lan Privé : 172.16.185.53/24

Node-hadoop05 (DataNode 04)

  • Lan Public : 192.168.1.54/24
  • Lan Privé : 172.16.185.54/24

Node-hadoop06 (DataNode 05)

  • Lan Public : 192.168.1.55/24
  • Lan Privé : 172.16.185.55/24

Prérequis des nœuds Hadoop

Mise à jour

[root@node-hadoop0x ~]# dnf update -y

Désactivation SELinux

[root@node-hadoop0x ~]# setenforce 0
[root@node-hadoop0x ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

Set Timedate Paris

[root@node-hadoop0x ~]# timedatectl set-timezone Europe/Paris

Hosts du Cluster Privé et Public (no DNS)

Hosts Lan Privé (172.16.185.0/24)

[root@node-hadoop0x ~]# echo "# Cluster Prive" >> /etc/hosts
[root@node-hadoop0x ~]# for i in {0..5};do echo "172.16.185.5$i node-hadoop0`expr $i + 1`" >> /etc/hosts ;done
[root@node-hadoop0x ~]# cat /etc/hosts

Hosts Lan Public(192.168.1.0/24)

[root@node-hadoop0x ~]# echo "# Cluster Public" >> /etc/hosts
[root@node-hadoop0x ~]# for i in {0..5};do echo "192.168.1.5$i hadoop0`expr $i + 1`" >> /etc/hosts ;done
[root@node-hadoop0x ~]# cat /etc/hosts

Stopper le firewalld (tous les nœuds)

[root@node-hadoop0x ~]# systemctl stop firewalld

Création HDFS RAID6 (tous les nœuds)

Inventaire des disques

[root@node-hadoop0x ~]# lsblk

Installation Raid Logiciel

[root@node-hadoop0x ~]# dnf -y install mdadm

[root@node-hadoop0x ~]# echo "modprobe raid6" >> /etc/rc.local
[root@node-hadoop0x ~]# chmod +x /etc/rc.local
[root@node-hadoop0x ~]# source /etc/rc.local
[root@node-hadoop01 ~]# cat /proc/mdstat

Préparation des disques

[root@node-hadoop0x ~]# for disk in sdb sdc sdd sde sdf sdg;do parted -s /dev/$disk mklabel msdos ; done
[root@node-hadoop0x ~]# for disk in sdb sdc sdd sde sdf sdg;do parted -s /dev/$disk mkpart primary 1MiB 100%; done
[root@node-hadoop0x ~]# for disk in sdb sdc sdd sde sdf sdg;do parted -s /dev/$disk set 1 raid on; done
[root@node-hadoop0x ~]# fdisk -l /dev/sd[b-g] |grep Linux

[root@node-hadoop0x ~]# mdadm -E /dev/sd[b-g]

Création du RAID6

[root@node-hadoop0x ~]# mdadm --create /dev/md0 --level=6 --raid-devices=6 /dev/sd[b-g]1
[root@node-hadoop0x ~]# watch -n1 cat /proc/mdstat

Début construction du RAID

[root@node-hadoop0x ~]# mdadm --detail /dev/md0

[root@node-hadoop0x ~]# mkfs.xfs -f /dev/md0

Comptes et Structures (tous les nœuds)

Création point de montage RAID /md0

[root@node-hadoop0x ~]# mkdir /hadoop_dir
[root@node-hadoop0x ~]# echo "/dev/md0 /hadoop_dir xfs defaults 0 0" >> /etc/fstab
[root@node-hadoop0x ~]# mount /hadoop_dir
[root@node-hadoop0x ~]# df -Th /hadoop_dir

Création User/group hadoop

[root@node-hadoop0x ~]# groupadd hadoop
[root@node-hadoop0x ~]# useradd hduser
[root@node-hadoop0x ~]# passwd hduser

[root@node-hadoop0x ~]# usermod -G hadoop hduser

Création structures HDFS sur le RAID

Pour le NameNodes

[root@node-hadoop0x ~]# mkdir /hadoop_dir/hdfs
[root@node-hadoop0x ~]# mkdir /hadoop_dir/hdfs/namenode

Pour les DataNodes

[root@node-hadoop01 ~]# mkdir /hadoop_dir/hdfs/datanode (Optionnel)

[root@node-hadoop02 ~]# mkdir /hadoop_dir/hdfs/datanode
[root@node-hadoop03 ~]# mkdir /hadoop_dir/hdfs/datanode
[root@node-hadoop04 ~]# mkdir /hadoop_dir/hdfs/datanode
[root@node-hadoop05 ~]# mkdir /hadoop_dir/hdfs/datanode
[root@node-hadoop06 ~]# mkdir /hadoop_dir/hdfs/datanode

Ajustement des droits

[root@node-hadoop0x ~]# chown hduser:hadoop -R /hadoop_dir

Download Hadoop et Java (tous les nœuds)

Paquet JDK java

Paquet hadoop

[root@node-hadoop0x hduser]# dnf install wget tar sshpass vim -y
[root@node-hadoop0x hduser]# wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz
[root@node-hadoop0x hduser]# chown hduser:hadoop *
[root@node-hadoop0x hduser]# ls -al |grep gz

Installation des paquets (tous les nœuds)

Paquet JDK java

Installation JDK

[root@node-hadoop0x hduser]# tar -xvzf jdk-8u202-linux-x64.tar.gz -C /opt
[root@node-hadoop0x hduser]# ls /opt/

Ajout Variable d’environnement

[root@node-hadoop0x hduser]# su - hduser
[hduser@node-hadoop0x hduser]# vi ~/.bashrc
#export JAVA_HOME=/opt/jdk1.8.0_202
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
export PATH=$JAVA_HOME/bin:$PATH
[hduser@node-hadoop0x hduser]# source ~/.bashrc

Paquet hadoop

Installation hadoop

[root@node-hadoop0x hduser]# tar -xzvf hadoop-3.3.5.tar.gz -C /opt/
[root@node-hadoop0x hduser]# mv /opt/hadoop-3.3.5/ /opt/hadoop/
[root@node-hadoop0x hduser]# chown -R hduser:hadoop /opt/hadoop/

Ajout Variable d’environnement

[root@node-hadoop0x hduser]# su - hduser
[hduser@node-hadoop0x hduser]# vi ~/.bashrc
export PATH=${PATH}:/opt/jdk1.8.0_202/bin
#export JAVA_HOME=/opt/jdk1.8.0_202
#export PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME=$(readlink -f /opt/jdk1.8.0_202/bin/java | sed "s:/bin/java::")
export HADOOP_HOME=/opt/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_INSTALL/bin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
#export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
xport PATH=/opt/jdk1.8.0_202/bin:$PATH
[hduser@node-hadoop0x hduser]# source ~/.bashrc

Ajout logs hadoop

[hduser@node-hadoop0x ~]$ mkdir /opt/hadoop/logs
[hduser@node-hadoop01 ~]$ chown -R hduser.hadoop /opt/hadoop/logs

Créer cléf SSH chaque nodes (tous les nœuds)

node-hadoop01

[hduser@node-hadoop01 ~]# ssh-keygen -t rsa

[hduser@node-hadoop01 ~]# echo « Mot_passe_hduser » > /home/hduser/.hduser
[hduser@node-hadoop01 ~]$ chmod 600 /home/hduser/.hduser
[hduser@node-hadoop01 ~]$ for ssh in `cat /etc/hosts |grep node |awk '{print $2}'`;do sshpass -f /home/hduser/.hduser ssh-copy-id hduser@${ssh}; done

node-hadoop02

[hduser@node-hadoop02 ~]# ssh-keygen -t rsa
[hduser@node-hadoop02 ~]# echo « Mot_passe_hduser » > /home/hduser/.hduser
[hduser@node-hadoop02 ~]$ chmod 600 /home/hduser/.hduser
[hduser@node-hadoop02 ~]$ for ssh in `cat /etc/hosts |grep node |awk '{print $2}'`;do sshpass -f /home/hduser/.hduser ssh-copy-id -o StrictHostKeyChecking=no hduser@${ssh}; done

node-hadoop03

[hduser@node-hadoop03 ~]# ssh-keygen -t rsa
[hduser@node-hadoop03 ~]# echo « Mot_passe_hduser » > /home/hduser/.hduser
[hduser@node-hadoop03 ~]$ chmod 600 /home/hduser/.hduser
[hduser@node-hadoop03 ~]$ for ssh in `cat /etc/hosts |grep node |awk '{print $2}'`;do sshpass -f /home/hduser/.hduser ssh-copy-id -o StrictHostKeyChecking=no hduser@${ssh}; done

node-hadoop04

[hduser@node-hadoop04 ~]# ssh-keygen -t rsa
[hduser@node-hadoop04 ~]# echo « Mot_passe_hduser » > /home/hduser/.hduser
[hduser@node-hadoop04 ~]$ chmod 600 /home/hduser/.hduser
[hduser@node-hadoop04 ~]$ for ssh in `cat /etc/hosts |grep node |awk '{print $2}'`;do sshpass -f /home/hduser/.hduser ssh-copy-id -o StrictHostKeyChecking=no hduser@${ssh}; done

node-hadoop05

[hduser@node-hadoop05 ~]# ssh-keygen -t rsa
[hduser@node-hadoop05 ~]# echo « Mot_passe_hduser » > /home/hduser/.hduser
[hduser@node-hadoop05 ~]$ chmod 600 /home/hduser/.hduser
[hduser@node-hadoop05 ~]$ for ssh in `cat /etc/hosts |grep node |awk '{print $2}'`;do sshpass -f /home/hduser/.hduser ssh-copy-id -o StrictHostKeyChecking=no hduser@${ssh}; done

node-hadoop06

[hduser@node-hadoop06 ~]# ssh-keygen -t rsa
[hduser@node-hadoop06 ~]# echo « Mot_passe_hduser » > /home/hduser/.hduser
[hduser@node-hadoop06 ~]$ chmod 600 /home/hduser/.hduser
[hduser@node-hadoop06 ~]$ for ssh in `cat /etc/hosts |grep node |awk '{print $2}'`;do sshpass -f /home/hduser/.hduser ssh-copy-id -o StrictHostKeyChecking=no hduser@${ssh}; done

Check clefs (node01)

[hduser@node-hadoop06 ~]$ ssh hduser@node-hadoop01
[hduser@node-hadoop01 ~]$ cat .ssh/authorized_keys

Configuration Hadoop (tous les nœuds)

Set JAVAHOME

Fichier conf mapred-env.sh (master)

[hduser@node-hadoop01 ~]$ vim /opt/hadoop/etc/hadoop/mapred-env.sh

export JAVA_HOME=$(readlink -f /usr/bin/java | sed « s:bin/java:: »)

Copie fichier mapred-env.sh (sur les nodes DataNode)

[hduser@node-hadoop01 ~]$ for ssh in `cat /etc/hosts |grep node |grep -v $HOSTNAME|awk '{print $2}'`;do scp /opt/hadoop/etc/hadoop/mapred-env.sh hduser@${ssh}:/opt/hadoop/etc/hadoop/mapred-env.sh ; done

Configuration Hadoop site

Fichier core-site.xml

Fichier core-site.xml (master)

[hduser@node-hadoop01 ~]$ vi /opt/hadoop/etc/hadoop/core-site.xml
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://node-hadoop01:50000</value>
  </property>
</configuration>

Déploiement du  fichier core-site.xml (DataNodes)

[hduser@node-hadoop01 ~]$ for ssh in `cat /etc/hosts |grep node |grep -v $HOSTNAME|awk '{print $2}'`;do scp /opt/hadoop/etc/hadoop/core-site.xml hduser@${ssh}:/opt/hadoop/etc/hadoop/core-site.xml ; done

Fichier yarn-site.xml

Fichier yarn-site.xml(master)

[hduser@node-hadoop01 ~]$ vim /opt/hadoop/etc/hadoop/yarn-site.xml
<configuration>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>node-hadoop01:8025</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>node-hadoop01:8035</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>node-hadoop01:8050</value>
  </property>
</configuration>

Déploiement du fichier yarn-site.xml (DataNodes)

[hduser@node-hadoop01 ~]$ for ssh in `cat /etc/hosts |grep node |grep -v $HOSTNAME|awk '{print $2}'`;do scp /opt/hadoop/etc/hadoop/yarn-site.xml hduser@${ssh}:/opt/hadoop/etc/hadoop/yarn-site.xml ; done

Configuration HDFS (File distribué mode block)

Fichier hdfs-site.xml

Fichier hdfs-site.xml (master)

[hduser@node-hadoop01 ~]$ vi /opt/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>6</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>file:///hadoop_dir/hdfs/datanode</value>
  </property>
  <property>
    <name>dfs.name.dir</name>
    <value>file:///hadoop_dir/hdfs/namenode</value>
  </property>
</configuration>

Déploiement  Fichier hdfs-site.xml (slave0x)

[hduser@node-hadoop01 ~]$ for ssh in `cat /etc/hosts |grep node |grep -v $HOSTNAME|awk '{print $2}'`;do scp /opt/hadoop/etc/hadoop/hdfs-site.xml hduser@${ssh}:/opt/hadoop/etc/hadoop/hdfs-site.xml ; done

Configuration mapred

Fichier mapred-site.xml

Fichier mapred-site.xml(Master)

[hduser@node-hadoop01 ~]$ vim /opt/hadoop/etc/hadoop/mapred-site.xml
<configuration>
  <property>
    <name>mapred.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

Déploiement du fichier mapred-site.xml(Slave0x)

[hduser@node-hadoop01 ~]$ for ssh in `cat /etc/hosts |grep node |grep -v $HOSTNAME|awk '{print $2}'`;do scp /opt/hadoop/etc/hadoop/mapred-site.xml hduser@${ssh}:/opt/hadoop/etc/hadoop/mapred-site.xml ; done

Configuration fichier « slaves »

[hduser@node-hadoop01 ~]$ vi /opt/hadoop/etc/hadoop/workers
172.16.185.50 (Si utilisé aussi en DATANODE)
172.16.185.51
172.16.185.52
172.16.185.53
172.16.185.54
172.16.185.55

Fichiers slaves (Master)


Adresse IP privé des nodes slaves

Fichiers slaves (Datanodes)

[hduser@node-hadoop0x ~]$ cat /opt/hadoop/etc/hadoop/workers

[hduser@node-hadoop01 ~]$ for ssh in `cat /etc/hosts |grep node |grep -v $HOSTNAME|awk '{print $2}'`;do ssh hduser@$ssh -t "cat /opt/hadoop/etc/hadoop/workers" ; done

Activation du Cluster Hadoop et HDFS

Formatage du cluster (master)

[hduser@node-hadoop01 ~]$ hdfs namenode -format

Stop le Cluster (master)

[hduser@node-hadoop01 ~]$ /opt/hadoop/sbin/stop-all.sh

Start le Cluster (master)

[hduser@node-hadoop01 ~]$ /opt/hadoop/sbin/start-all.sh

Statuts des services du Cluster

Statuts master

[hduser@node-hadoop01 ~]$ jps

Statuts slave

Taille HDFS

Sur le master

[hduser@node-hadoop01 ~]$ hdfs dfs -df -h /

Sur les DataNodes

[hduser@node-hadoop01 ~]$ for ssh in `cat /etc/hosts |grep node |grep -v $HOSTNAME|awk '{print $2}'`;do ssh hduser@$ssh -t "hdfs dfs -df -h /" ; done

Donc on a bien l’agrégation des 6 Nodes de 200Go (RAID6) en HDFS soit 6x200Go = 1,2To

Statuts HDFS

[hduser@node-hadoop01 sbin]$ hdfs dfsadmin –report

Vue de l’interface Web UI (Ressource Manager )

  • http://IP_Publique_Master:8088


« Nodes » du cluster

Vue de l’interface Web UI HADOOP

Overview


« Summary »


« Taille du FS »

Vue des nodes HDFS (datanodes)

Test réplication HDFS

Sur node-hadoop01 – Création répertoire

[hduser@node-hadoop01 sbin]$ hdfs dfs -ls /
[hduser@node-hadoop01 sbin]$ hdfs dfs -mkdir /spongeBob
[hduser@node-hadoop01 sbin]$ hdfs dfs -ls /

Sur node-hadoop02 à 06

[hduser@node-hadoop01 ~]$ for ssh in `cat /etc/hosts |grep node |grep -v $HOSTNAME|awk '{print $2}'`;do ssh hduser@$ssh -t "hdfs dfs -ls /" ; done

Sur node-hadoop01 – Copie d’un fichier

[hduser@node-hadoop01 ~]$ hdfs dfs -appendToFile /etc/hosts /spongeBob/hosts
[hduser@node-hadoop01 ~]$ hdfs dfs -ls /spongeBob

Vue de la réplication (Répertoire/Files)

Views: 18

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *