Hadoop And cloud : 2014

Monday, 7 April 2014

PUPPET INSTALLATION AND CONFIGURATIONS

Step 1 :-

Install RHEL EPEL repository on Centos 6.x on each system

wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
wget http://rpms.famillecollet.com/enterprise/remi-release-6.rpm
sudo rpm -Uvh remi-release-6*.rpm epel-release-6*.rpm

Step 2 :- 
Install Puppet on system 
On Master sever :- 
[root@PUPPET-MASTER ] # yum install puppet-server
[root@PUPPET-MASTER ] # /etc/init.d/puppetmaster start
On  all client agent system  :- 
[root@PUPPET-CLIENT1 ~] # yum install puppet
[root@PUPPET-CLIENT1 ~] # /etc/init.d/puppet restart

Step 3 :-

Configuration

On all client agent system :-

[root@PUPPET-CLIENT1 ~] # vim /etc/puppet/puppet.conf

Add below parameter into agent section

server = PUPPET-MASTER

runinterval = 120

[root@PUPPET-CLIENT1 ~] # /etc/init.d/puppet restart

Step 4 :-

On Master sever :-
List all cerificates from all clinet connected to server.
[root@PUPPET-MASTER ] # puppetca -l
  "PUPPET-CLIENT1" (E3:2B:85:FD:56:2E:34:A3:E0:FF:7A:33:3A:36:33:8C)
Sign all cerificate using below command 
[root@PUPPET-MASTER ] # puppetca -s PUPPET-CLIENT1

Step 5 :-

On all client agent system

[root@PUPPET-CLIENT1 ~] # puppet agent --test
info: Caching catalog for ug-th-0215-nn
notice: Finished catalog run in 0.10 seconds
[root@PUPPET-CLIENT1 ~] #

Above command say successful connection to sever with certificate .

Step 6 :-

Push customize conf files from master to server.
[root@PUPPET-MASTER ] # vim  /etc/puppet/manifests/site.pp
Add below entry into site.pp
import 'nodes.pp'

Step 7 :-

Create nodes.pp file 
[root@PUPPET-MASTER ] # vim  /etc/puppet/manifests/nodes.pp 
node 'PUPPET-CLIENT1' {
include nginx
}

node 'PUPPET-CLIENT2' {
include nginx
}

Step 8 :-

[root@PUPPET-MASTER ] # mkdir -p /etc/puppet/modules/nginx/{manifests,files}

lets create the nginx class file which lives in /etc/puppet/modules/nginx/manifests/init.pp

[root@PUPPET-MASTER ] # vim /etc/puppet/modules/nginx/manifests/init.pp

# Manage nginx webserver

class nginx {

package { 'nginx':

ensure => installed,

}

service { 'nginx':

ensure => running,

}

file { 'nginxconfig':

name => '/etc/nginx/nginx.conf',

source => 'puppet:///modules/nginx/nginx.conf',

}

Add customize file into /etc/puppet/modules/nginx/files/

Step 9 :-

[root@PUPPET-MASTER ] # puppet apply /etc/puppet/manifests/site.pp

Step 10 :-

Login to client system and check for /etc/nginx/nginx.conf you will get your customize file , it will take 2 min to update client system.

Step 11(Optional) :-

Hadoop Manifests

# Make sure /etc/puppet/module/ have permission of puppet puppet , unzip your hadoop file /etc/puppet/modules/hadoop/files/

class hadoop{

group { "hadoop":
ensure => present,
gid => 1000,
}

user { "hadoop":
ensure => present,
shell => "/bin/bash",
managehome => true,
home => "/home/hadoop",
password => '$1$jE8T0sFs$PMKB4bfP21IRqzZ14mwTR/',
}

file { "Hadoophome":
name => "/home/hadoop",
ensure => "directory",
owner => "hadoop",
group => "hadoop",
mode => 700,
}

file { "hadoopconf":
name => "/usr/local/hadoop",
recurse => true,
ensure => "directory",
owner => "hadoop",
group => "hadoop",
mode => 755,
source => 'puppet:///modules/hadoop/hadoop/',

}

}

Thursday, 13 March 2014

STANDALONE HBASE INSTALLATION CENTOS 6.X

In standalone mode, HBase does not use HDFS -- it uses the local filesystem instead and it runs all HBase daemons and a local ZooKeeper all up in the same JVM. Zookeeper binds to a well known port so clients may talk to HBSE

Step 1: -

Configure Cloudera repo

[nitin@nitin-ubuntu ~]# cat /etc/yum.repos.d/cdh.repo

cloudera-cdh4]

name = Cloudera CDH, Version 4.4.0

baseurl = http://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/4.4.0/

gpgkey = http://archive.cloudera.com/redhat/cdh/RPM-GPG-KEY-cloudera

gpgcheck = 1

nitin@nitin-ubuntu ~]#

Step 2: -

Install Hbase-master

[nitin@nitin-ubuntu ~]# yum clean all

[nitin@nitin-ubuntu ~]# yum install hbase-master

Step 3 :-

Add following lines into hbase-site.xml

[nitin@nitin-ubuntu ~]# cat /etc/hbase/conf/hbase-site.xml

<name>hbase.rootdir</name>

<value>file:///BIG_DATA/hbase</value>

</property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/BIG_DATA/zookeeper</value>

</property>

</configuration>

Step 4 :-

Edit datadir for zookeeper

[nitin@nitin-ubuntu ~]# cat /etc/zookeeper/conf/zoo.cfg

dataDir=/BIG_DATA/zookeeper/

Step 5:-

Create level one directory and change permission

[nitin@nitin-ubuntu ~]# mkdir /BIG_DATA/

[nitin@nitin-ubuntu ~]# chown -R hbase:hbase /BIG_DATA/

Step 6:-

Configure /etc/hosts in stanalone hbase system as well as client which is connecting to it .

[nitin@nitin-ubuntu ~]# cat /etc/hosts

10.10.10.110 nitin-ubuntu

[nitin@nitin-CLIENT1 ~]# cat /etc/hosts

10.10.10.110 nitin-ubuntu

Step 7 :-

Restart Hbase

[nitin@nitin-ubuntu ~]# /etc/init.d/hbase-master restart

Monday, 17 February 2014

HDFS TO AWS S3

Step 1 :-

Login to Cloudera manager

Go to services and Click on hdfs

Go to configuration and click on view and edit roles

Click on service-wide configuration

Click on advance

Step 2 :-

Add below details in Cluster-wide Configuration Safety Value for core-site.xml

<property>

<name>fs.s3n.awsAccessKeyId</name>

<value>XXXXXXXXXXXXXXXXXXX</value>

</property>

<property>

<name>fs.s3n.awsSecretAccessKey</name>

<value>XXXXXXXXXXXXXXXXXXXXXXX</value>

</property>

Step 3 :-

Save configuration.

Click on Action and Deploy Configuration

Restart hdfs once

Step 4:-

[nitin@nitin-ubuntu ~]# sudo -u hdfs hadoop distcp s3n://big-store/getting_price.sh hdfs://<CLSUTER1>:8020/user/nitin/

[nitin@nitin-ubuntu ~]# sudo -u hdfs hadoop distcp s3n://big-store/getting_price.sh hdfs://10.10.10.216:8020/user/nitin/

Friday, 7 February 2014

BACKUP HIVE META-STORE WITH POSTGRESQL IN CDH4.X

1) Go to scm server database

[nitin@nitin-ubuntu:~] # cd /var/lib/cloudera-scm-server-db/data

2) check file generated_password.txt . This file is cerated by cloudera manager

[nitin@nitin-ubuntu:~] # cat generated_password.txt

8UlBunj0MM

The password above was generated by /usr/share/cmf/bin/initialize_embedded_db.sh (part of the cloudera-manager-server-db package)

and is the password for the user 'cloudera-scm' for the database in the current directory.

Generated at 20140128-230553.

3) Login to PostgreSQL with password from above file that is 8UlBunj0MM

[nitin@nitin-ubuntu:~] # psql --user cloudera-scm --port=7432 –dbname=postgres

Password for user cloudera-scm:************

postgres=# \q

Note :- Make sure you give exact username and password

4) Take A dump

[nitin@nitin-ubuntu:~] # pg_dump hive -U cloudera-scm --port=7432 > hive.sql

Password:********

5) Its Done.

Tuesday, 28 January 2014

ACCESS HBASE TABLE WITH TABLEAU DESKTOP 8.0

Hope you have Tableau installed on system .

Concept :-

You can't directly connect to hbase table via tableau you need to connect to hive table and hive internally mapped to hbase table.

Please check below link for more explanation :

http://nosql.mypopescu.com/post/17262685876/visualizing-hadoop-data-with-tableau-software-and

Step 1 :-

Download Tableau driver for hive

http://www.cloudera.com/content/support/en/downloads/download-components/download-products/downloads-listing/connectors/tableau.html

Step 2 :- (Driver installation)

Install Above downloaded driver.

Step 3 :- (Configure ODBC driver)

Click on start go to Data Source (ODBC).

Click on System DSN.

Select Cloudera ODBC driver for Apache Hive.

Fill the details.

Save Setting.

Step 4 :- (Run Hive as Thrift service)

[ nitin@nitin-ubuntu:~ # ] $ sudo hive --service hiveserver --hiveconf /etc/hive/conf/hive-site.xml

Make sure you have auxpath set in above hive-site.xml and all jar present .

Below jar needed by hive client to talk to hbase and get data from hbase.

For example :-

<value>file:///usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.4.0.jar,file:///usr/lib/hbase/lib/hbase-0.94.6-cdh4.4.0.jar,file:///usr/lib/zookeeper/zookeeper-3.4.5-cdh4.4.0.jar,file:///usr/share/cmf/lib/guava-14.0.jar

</value>

</property>

Step 4 :- (Connect tableau to hive tables)

Select tableau from start menu.

Go to data Click on connect data than click on cloudera database.

It will ask you to make connections.

Give your hive thrift server IP and port as 10000.

Click on connect.

If its connected properly than you will get default in schema section.

Select table where you want to make computation.

Click OK.

Hadoop And cloud

Monday, 7 April 2014

PUPPET INSTALLATION AND CONFIGURATIONS

Install RHEL EPEL repository on Centos 6.x on each system

Thursday, 13 March 2014

STANDALONE HBASE INSTALLATION CENTOS 6.X

Monday, 17 February 2014

HDFS TO AWS S3

Friday, 7 February 2014

BACKUP HIVE META-STORE WITH POSTGRESQL IN CDH4.X

Tuesday, 28 January 2014

ACCESS HBASE TABLE WITH TABLEAU DESKTOP 8.0

Ansible Cheat sheet

Search This Blog