GitHub

Installation for:

Java, Hadoop, Spark Hive HBase MySQL Airflow

1. Oracle Virtualbox (optional)

Visit the link below and choose your right OS version to download: https://www.virtualbox.org/wiki/Downloads
Install. Follow the instructions based on your OS. If you are using Ubuntu, I recommend you download and install from the Ubuntu Software Center.
Download the Ubuntu OS version 18: Click here to download
Create a new VM in your Oracle Virtualbox and install the Ubuntu 18. Follow the instructions.
(Optional) After the installation, clone the newly install VM (for backup in case something goes wrong).

2. Install Java, Hadoop, Kafka, Spark

Update your packages:
```
sudo apt-get update
```
Install Git:
```
sudo apt-get install git -y
```

Let's clone a repository on the desktop:

cd ~/Desktop
sudo git clone https://github.com/dseneh-eit/hadoop

cd into the cloned repository and execute the bash command
```
cd Hadoop/
sudo bash install.sh
```
Wait for the installation to complete. Sometimes it takes a little longer.
Test you installation:
```
jps
```
If you get Jps back, then congratulations! If not, source your .bash_profile and .bashrc files respectively:
```
source ~/.bash_profile
source ~/.bashrc
```

3. Install Hive

In your terminal, paste the below code:

cd ~/opt
sudo wget http://archive.apache.org/dist/hive/hive-2.3.5/apache-hive-2.3.5-bin.tar.gz

Unzip the downloaded file and rename the folder:

tar -xvf apache-hive-2.3.5-bin.tar.gz
sudo mv apache-hive-2.3.5-bin hive

Let's open and edit the .bash_profile file:
```
sudo gedit ~/.bash_profile
```

In your .bash_profile file, paste the following:

#HIVE_HOME
export HIVE_HOME=~/opt/hive
export PATH=$PATH:$HIVE_HOME/bin

Source your .bash_profile file:
```
source ~/.bash_profile
```
Give it a quick test with:
```
hive --version
```
You should get the version of hive back
Next, we need to create some directories in HDFS. But before that, let's start our hadoop cluster. If you have yours started already, skip this step:
```
start-all.sh
```
To verfiy if your cluster is running, run the following command:
```
jps
```
If all goes well, you should see the below (the order doesn't matter):
```
NameNode
DataNode
ResourceManager
Jps
NodeManager
SecondaryNameNode
```
If you didn't get them all, then please check your configurations.

Create directories and add permissions in HDFS:

hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g+w /user/hive/warehouse

cd into hive config folder and create/edit hive-env.sh:
```
cd ~/opt/hive/conf
sudo gedit hive-env.sh
```
In the hive-env.sh file, find, uncommet and replace the values of the follow variables and to look like:
```
export HADOOP_HOME=~/opt/hadoop-2.7.3
export HADOOP_HEAPSIZE=512
export HIVE_CONF_DIR=~/opt/hive/conf
```

While still in ~/hive/conf, create/edit hive-site.xml:

sudo gedit hive-site.xml

Paste the below and save:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:derby:;databaseName=~/opt/hive/metastore_db;create=true</value>
        <description>JDBC connect string for a JDBC metastore.</description>
    </property>	
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
        <description>location of default database for the warehouse</description>
    </property>
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://localhost:9083</value>
        <description>Thrift URI for the remote metastore.</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>org.apache.derby.jdbc.EmbeddedDriver</value>
        <description>Driver class name for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.PersistenceManagerFactoryClass</name>
        <value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value>
        <description>class implementing the jdo persistence</description>
    </property>
    <property>
        <name>hive.server2.enable.doAs</name>
        <value>false</value>
    </property>
</configuration>

(optional) Since Hive and Kafka are running on the same system, you'll get a warning message about some SLF4J logging file. From your Hive home you can just rename the file:
```
cd ~/opt/hive
sudo mv lib/log4j-slf4j-impl-2.6.2.jar lib/log4j-slf4j-impl-2.6.2.jar.bak
```
Now we need to create a database schema for Hive to work with using schematool:
```
schematool -initSchema -dbType derby
```
We are now ready to enter the Hive shell and create the database for holding tweets. First, we need to start the Hive Metastore server with the following command:
```
hive --service metastore
```
This should give some output that indicates that the metastore server is running. You'll need to keep this running, so open up a new terminal tab to continue with the next steps.
Now, leave the hive service running and open a new tab, start the Hive shell with the hive command:
```
hive
```
If you are able to get to this point: CONGRATULATIONS!

4. Install MySQL

First, let's update our packages:
```
sudo apt-get update
```
Next, install MySQL server:
```
sudo apt-get install mysql-server
```
Enter the password as root when it prompts to enter a password
Login to mysql and check the available default databases:
```
sudo mysql -u root -p [YOUR PASSWORD] 
```
```
show databases;
```

(Optional) Set the root user's password to 'root':

ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'root';

Install the MySQL connector:
```
sudo apt-get install libmysql-java
```

4. Install HBase

Let's cd into our opt folder and download hbase:

 cd ~/opt
 sudo wget http://archive.apache.org/dist/hbase/1.1.4/hbase-1.1.4-bin.tar.gz

Unzip the .tar.gz file:
```
tar -xvf hbase-1.1.4-bin.tar.gz
```

In your .bash_profile file, paste the following:

 #HBASE_HOME
 export HBASE_HOME=~/opt/hbase-1.1.4
 export PATH=$PATH:$HBASE_HOME/bin

Source your .bash_profile file:
```
source ~/.bash_profile
```
cd into the hbase conf folder and edit the hbase-env.sh file:
```
 cd ~/opt/hbase-1.1.4/conf/
 sudo gedit hbase-env.sh
```
In the hbase-env.sh file, find the export HBASE_REGIONSERVERS variable and uncomment it, replace it's value to look like this:
```
 export JAVA_HOME=~/opt/jdk1.8.0_221
```
Also find and uncommet the following, then save and colse the file:
```
 export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers
 export HBASE_MANAGES_ZK=true
```
While still in the hbase conf directory, also open and edit the hbase-site.xml file:
```
sudo gedit hbase-site.xml
```

Paste the below between the <configuration> tags:

 <property>
     <name>hbase.rootdir</name>
     <value>hdfs://localhost:9000/hbase</value>
 </property>
 <property>
     <name>hbase.cluster.distributed</name>
     <value>true</value>
 </property>
 <property>
     <name>hbase.zookeeper.quorum</name>
     <value>localhost</value>
 </property>
 <property>
     <name>dfs.replication</name>
     <value>1</value>
 </property>
 <property>
     <name>hbase.zookeeper.property.clientPort</name>
     <value>2181</value>
 </property>
 <property>
     <name>hbase.zookeeper.property.dataDir</name>
     <value>~/opt/hbase-1.1.4/zookeeper</value>
 </property>

Start the Hbase daemons:
```
start-hbase.sh
```
To ensure everything is working, run the jps command and you should be able to get the following. If you didn't get them all, then please check your configurations:
```
HQuorumPeer
HMaster
HRegionServer
```
To login into HBase shell:
```
hbase shell
```

5. Install Airflow

Let's first install pip for linux:

sudo apt-get install python3-pip python-dev

Verify the installation:
```
pip3  --version
```
Let's create an airflow directory, and inside this directory, let's also create a dags directory. This is where we’ll store our python dag files:
```
 mkdir ~/airflow
 cd ~/airflow
 mkdir dags
```
(Optional) uninstall any old apache-airflow installations using pip:
```
sudo pip3 uninstall apache-airflow
```
Install apache-airflow using pip:
```
sudo pip3 install apache-airflow
```
Initialize apache-airflow database (default is sqlite):
```
airflow db init
```

Create admin user and password:

 airflow users create \
 --username admin \
 --firstname [YOUR_FIRST_NAME] \ 
 --lastname [yOUR_LAST_NAME] \
 --role Admin \
 --email spiderman@superhero.org

Open another terminal, start the web server and let it run:
```
airflow web server --port 8080
```
Open another terminal, start the scheduler and let it run:
```
airflow scheduler
```
Visit localhost:8080 in the browser to access the GUI
Enter your username and password (from step 8)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
scpt		scpt
README.md		README.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation for:

Java, Hadoop, Spark Hive HBase MySQL Airflow

1. Oracle Virtualbox (optional)

2. Install Java, Hadoop, Kafka, Spark

3. Install Hive

4. Install MySQL

4. Install HBase

5. Install Airflow

About

Uh oh!

Releases

Packages

Languages

Stephenlaye2/hadoop

Folders and files

Latest commit

History

Repository files navigation

Installation for:

Java, Hadoop, Spark Hive HBase MySQL Airflow

1. Oracle Virtualbox (optional)

2. Install Java, Hadoop, Kafka, Spark

3. Install Hive

4. Install MySQL

4. Install HBase

5. Install Airflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages