Wednesday, 23 December 2015

Configure HUE with OpenLdap



Step 1:

Add below lines in ldap section in hue.ini

[[ldap]]
    base_dn="dc=tuxhub,dc=com"
    ldap_url=ldap://adp034.tuxhub.com:389
    use_start_tls=false
    bind_dn="uid=nitin,ou=Users,dc=tuxhub,dc=com"
    bind_password=nitin
    ldap_username_pattern="uid=<username>,ou=Users,dc=tuxhub,dc=com"
    search_bind_authentication=false





Thursday, 17 December 2015

Impala openldap configuration



Step 1:

Add below line in impala env.sh


[root@mfs022 ~]# vim /opt/mapr/impala/impala-1.4.1/conf/env.sh
 
   IMPALA_SERVER_ARGS=
 
    -enable_ldap_auth \
    -ldap_uri ldap://adp034.tuxhub.com:389 \
    -ldap_bind_pattern "uid=#UID,ou=Users,dc=tuxhub,dc=com" \


Step 2 :

Connect to imapla shell 

[root@mfs022 ~]# impala-shell -l -u nitin
Starting Impala Shell using LDAP-based authentication
LDAP password for nitin:
Connected to mfs022.tuxhub.com:21000
Server version: impalad version 1.4.1 RELEASE (build 2b626c8e9f4c666d23872c228cf43daae4c9acbb)
Welcome to the Impala shell. Press TAB twice to see a list of available commands.

Copyright (c) 2012 Cloudera, Inc. All rights reserved.

(Shell build version: Impala Shell v1.4.1 (2b626c8) built on Thu Feb  5 14:53:44 PST 2015)
[mfs022.tuxhub.com:21000] >

Saturday, 5 December 2015

Impala through a Proxy for High Availability



Step 1
       
       Installing HAProxy

       [root@mfs021 ~]# yum install haproxy

Steps 2

      Edit Config file for haproxy

      [root@mfs021 ~]# cat /etc/haproxy/haproxy.cfg

#---------------------------------------------------------------------
# Example configuration for a possible web application.  See the
# full configuration options online.
#
#   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#---------------------------------------------------------------------

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    # to have these messages end up in /var/log/haproxy.log you will
    # need to:
    #
    # 1) configure syslog to accept network log events.  This is done
    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
    #    /etc/sysconfig/syslog
    #
    # 2) configure local2 events to go to the /var/log/haproxy.log
    #   file. A line like the following can be added to
    #   /etc/sysconfig/syslog
    #
    #    local2.*                       /var/log/haproxy.log
    #
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
##---------------------------------------------------------------------

defaults
    mode                    tcp
    log                     global
    retries                 3
    maxconn                 3000
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend  main *:5000
    acl url_static       path_beg       -i /static /images /javascript /stylesheets
    acl url_static       path_end       -i .jpg .gif .png .css .js

    use_backend static          if url_static
    default_backend             impala

#---------------------------------------------------------------------
# static backend for serving up images, stylesheets and such
#---------------------------------------------------------------------
backend static
    balance     roundrobin
    server      static 127.0.0.1:4331 check

#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------

backend impala
    mode tcp
    option tcplog
    balance leastconn

    server impala1 mfs022.tuxhub.com:21000
    server impala2 mfs023.tuxhub.com:21000

Step 3 :

       Connect to impala:

root@mfs021 ~]# impala-shell
Starting Impala Shell without Kerberos authentication
Error connecting: TTransportException, Could not connect to mfs021.tuxhub.com:21000
Welcome to the Impala shell. Press TAB twice to see a list of available commands.

Copyright (c) 2012 Cloudera, Inc. All rights reserved.

(Shell build version: Impala Shell v1.4.1 (2b626c8) built on Thu Feb  5 14:53:44 PST 2015)
[Not connected] > connect localhost:5000;
Connected to localhost:5000
Server version: impalad version 1.4.1 RELEASE (build 2b626c8e9f4c666d23872c228cf43daae4c9acbb)
[localhost:5000] > show tables;

     
  

Wednesday, 2 December 2015

Apache sentry configuration on MapR Hadoop



Step  1) Please add below property in hive-site.xml

[root@mfs021 ~]# vim /opt/mapr/hive/hive-0.13/conf/hive-site.xml

<property>
  <name>hive.server2.session.hook</name>
  <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>
</property>

<property>
  <name>hive.sentry.conf.url</name>
   <value>file:///opt/mapr/sentry/sentry-1.4.0/conf/sentry-site.xml</value>
 </property>

<property>
  <name>hive.security.authorization.task.factory</name>
  <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>
</property>

<property>
  <name> hive.metastore.execute.setugi </name>
   <value> true </value>
 </property>

Step 2) Please add below property in sentry-site.xml

[root@mfs021 ~]# vim /opt/mapr/sentry/sentry-1.4.0/conf/sentry-site.xml


<property>
        <name>sentry.hive.provider.backend</name>
        <value>org.apache.sentry.provider.file.SimpleFileProviderBackend</value>
      </property>

    <property>
        <name>sentry.hive.provider.resource</name>
        <value>file:///opt/mapr/sentry/sentry-1.4.0/conf/global-policy.ini</value>
    </property>

Step 3) Please add below property in global-policy.ini

[root@mfs021 ~]# vim /opt/mapr/sentry/sentry-1.4.0/conf/global-policy.ini

[groups]
mapr = admin_role
sentry_user = user_role

[roles]
admin_role = server=HS2
user_role = server=HS2->db=default->table=*->action=Select


Step 4) Add user and group

groupadd sentry_user
useradd -G sentry_user sentry_user1


Stpe 5) Check you configuration




[mapr@maprdemo ~]$ /opt/mapr/hive/hive-0.13/bin/beeline
beeline> !connect jdbc:hive2://localhost:10000
scan complete in 4ms
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: sentry_user1
Enter password for jdbc:hive2://localhost:10000: *
Connected to: Apache Hive (version 0.13.0-mapr-1510)
Driver: Hive JDBC (version 0.13.0-mapr-1510)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show tables;
+-----------+
| tab_name  |
+-----------+
+-----------+
No rows selected (0.409 seconds)
0: jdbc:hive2://localhost:10000> create table xyz(id int);
Error: Error while compiling statement: FAILED: SemanticException No valid privileges
 Required privileges for this query: Server=HS2->Db=default->action=*; (state=42000,code=40000)
0: jdbc:hive2://localhost:10000> show tables;
+-----------+
| tab_name  |
+-----------+
| mno       |
+-----------+
1 row selected (0.346 seconds)
0: jdbc:hive2://localhost:10000> drop table mno;
Error: Error while compiling statement: FAILED: SemanticException No valid privileges
 Required privileges for this query: Server=HS2->Db=default->Table=mno->action=*; (state=42000,code=40000)
0: jdbc:hive2://localhost:10000> select * from mno;
+---------+
| mno.id  |
+---------+
+---------+
No rows selected (0.616 seconds)
0: jdbc:hive2://localhost:10000>


Tuesday, 24 November 2015

Hive configuration to access hiveserver2



<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>

<property>
<name>hive.zookeeper.quorum</name>
<value>adp031.tuxhub.com</value>
</property>

<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>

Tuesday, 17 November 2015

HUE Integration with Resource manager HA



1) Edit your hue.ini add below parameters in yarn_cluster section


[[[default]]]
     security_enabled=${security_enabled}
     mechanism=${mechanism}
     history_server_api_url=http://node6.tuxhub.com:19888
     ssl_cert_ca_verify=False
     logical_name=my.cluster.com
     submit_to=True


     [[[ha]]]
     resourcemanager_host=node6.tuxhub.com
     resourcemanager_api_url=http://node6.tuxhub.com:8088
     proxy_api_url=http://node6.tuxhub.com:8088
     logical_name=my.cluster.com

     [[[ha1]]]
     resourcemanager_host=node5.tuxhub.com
     resourcemanager_api_url=http://node5.tuxhub.com:8088
     proxy_api_url=http://node5.tuxhub.com:8088
     logical_name=my.cluster.com


2) Restart hue.


Wednesday, 4 November 2015

Hive Storage Authorization




<configuration>
 <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://setup1:3306/hive?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
</property>
 <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
 </property>
 <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>username to use against metastore database</description>
 </property>
 <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    <description>password to use against metastore database</description>
 </property>
 <property>
    <name>hive.metastore.uris</name>
    <value>thrift://setup1:9083</value>
 </property>
 <property><name>hive.server2.authentication</name><value>NOSASL</value></property>

<!-- SECURITY -->

<property>
    <name>hive.metastore.pre.event.listeners</name>
    <value>org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener</value>
</property>


<property>
    <name>hive.security.metastore.authorization.manager</name>
    <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value>
</property>


<property>
    <name>hive.security.metastore.authenticator.manager</name>
    <value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value>
</property>

<property>
    <name>hive.security.authorization.manager</name>
        <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value>
        </property>


<property>
        <name>hive.security.authorization.enabled</name>
        <value>true</value>
</property>


<property>
        <name>hive.server2.enable.doAs</name>
        <value>true</value>
</property>
<!-- SECURITY CONFIG DONE -->

Wednesday, 21 October 2015

Installation Kerberos


Kerberos Server configurations

kerberos server :- adp031.tuxhub.com
kerberos client :-  adp032.tuxhub.com
RELEM           :-  TUXHUB.COM


1) Install Packages :

[root@adp031 ~]# yum install krb5-libs krb5-server krb5-workstation

2) Edit config file :

[root@adp031 ~]# cat /etc/krb5.conf
[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 default_realm = TUXHUB.COM
 dns_lookup_realm = false
 dns_lookup_kdc = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true

[realms]
 TUXHUB.COM = {
  kdc = adp031.tuxhub.com
  admin_server = adp031.tuxhub.com
 }

[domain_realm]
 .tuxhub.com = TUXHUB.COM
 tuxhub.com = TUXHUB.COM
[root@adp031 ~]#


3) Edit kerberos database config file

[root@adp031 ~]# cat /var/kerberos/krb5kdc/kdc.conf
[kdcdefaults]
 kdc_ports = 88
 kdc_tcp_ports = 88

[realms]
 TUXHUB.COM = {
  master_key_type = aes256-cts
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
  supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
 }
[root@adp031 ~]#


4) Setup database for kdc.

[root@adp031 ~]# kdb5_util create -s
Loading random data
Initializing database '/var/kerberos/krb5kdc/principal' for realm 'TUXHUB.COM',
master key name 'K/M@TUXHUB.COM'
You will be prompted for the database Master Password.
It is important that you NOT FORGET this password.
Enter KDC database master key:
Re-enter KDC database master key to verify:

5) Edit acl file :

[root@adp031 ~]# cat /var/kerberos/krb5kdc/kadm5.acl
*/admin@TUXHUB.COM      *
[root@adp031 ~]#

6) Create kerberos principle 

[root@adp031 ~]# kadmin.local
Authenticating as principal root/admin@TUXHUB.COM with password.
kadmin.local:  addprinc root/admin
WARNING: no policy specified for root/admin@TUXHUB.COM; defaulting to no policy
Enter password for principal "root/admin@TUXHUB.COM":
Re-enter password for principal "root/admin@TUXHUB.COM":
Principal "root/admin@TUXHUB.COM" created.


7) Start services 


[root@adp031 ~]# /etc/init.d/krb5kdc start
Starting Kerberos 5 KDC:                                   [  OK  ]

[root@adp031 ~]# /etc/init.d/kadmin start
Starting Kerberos 5 Admin Server:                          [  OK  ]
[root@adp031 ~]#


[root@adp031 ~]# chkconfig krb5kdc on
[root@adp031 ~]# chkconfig kadmin on


8) Check configuration 

[root@adp031 ~]# kinit root/admin@TUXHUB.COM
Password for root/admin@TUXHUB.COM:

9) Verify TGT 

[root@adp031 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: root/admin@TUXHUB.COM

Valid starting     Expires            Service principal
10/21/15 15:50:57  10/22/15 15:50:57  krbtgt/TUXHUB.COM@TUXHUB.COM
        renew until 10/21/15 15:50:57
[root@adp031 ~]#


Kerberos Client configurations:

1) Install packages :

[root@adp032 ~]$ sudo yum install krb5-libs krb5-workstation

2) Copy krb5.conf file from server to all client node.

[root@adp031 ~]# scp -pr  /etc/krb5.conf adp032:/etc/
root@adp032's password:
krb5.conf                                                                             100%  437     0.4KB/s   00:00


 


Saturday, 17 October 2015

Yarn Memory and CPU allocation




1) Add below lines in yarn-site.xml

Note :- Calculate RAM per node. follow below links

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_installing_manually_book/content/rpm-chap1-11.html    OR

http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_yarn_tuning.html


[hadoop@adp032 hadoop]$ cat /usr/local/hadoop/hadoop/etc/hadoop/yarn-site.xml


<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>3072</value>
</property>


<property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>256</value>
</property>

<property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>3072</value>
</property>


<property>
        <name>yarn.app.mapreduce.am.resource.mb</name>
        <value>512</value>
</property>

<property>
        <name>yarn.app.mapreduce.am.command-opts</name>
        <value>-Xmx409m</value>
</property>




2) Add below mapred-site.xml

[hadoop@adp032 hadoop]$ cat /usr/local/hadoop/hadoop/etc/hadoop/mapred-site.xml

<property>
        <name>mapreduce.map.memory.mb</name>
                <value>256</value>
                </property>

<property>
        <name>mapreduce.map.java.opts</name>
                <value>-Xmx201m</value>
                </property>

<property>
        <name>mapreduce.reduce.memory.mb</name>
                <value>512</value>
                </property>


<property>
        <name>mapreduce.reduce.java.opts</name>
                <value>-Xmx410m</value>
</property>


3) Restart resourcemanager rand nodemanager on all nodes.

[hadoop@adp032 hadoop]$  yarn-daemon.sh stop resourcemanager;yarn-daemon.sh start resourcemanager

[hadoop@adp032 hadoop]$ yarn-daemon.sh stop nodemanager;yarn-daemon.sh start nodemanager



CPU Allocations :

4) Add below line in yarn-site.xml

[hadoop@adp032 hadoop]$ cat /usr/local/hadoop/hadoop/etc/hadoop/yarn-site.xml

<property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>1</value>
</property>

<property>
        <name>mapreduce.map.cpu.vcores</name>
        <value>1</value>
</property>


<property>
         <name>mapreduce.reduce.cpu.vcores</name>
         <value>1</value>
</property>

5)  Perform step 3 . Restart demons.

6) Verify changes on resourcemanager webui. 




7)  You may face below error

Diagnostics: Container [pid=24068,containerID=container_1445068083523_0001_02_000001] is running beyond virtual memory limits. Current usage: 106.7 MB of 512 MB physical memory used; 1.1 GB of 1.0 GB virtual memory used. Killing container.

This is happening on Centos/RHEL 6 due to its aggressive allocation of virtual memory.


Add following property in yarn-site.xml


<property>
   <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>
Ref : http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits


Friday, 16 October 2015

Hive Openldap Integration



1) Add below line in hive-site.xml

[hadoop@adp031 conf]$ vim /usr/local/hadoop/hive/conf/hive-site.xml

<property>
  <name>hive.server2.authentication</name>
  <value>LDAP</value>
</property>

<property>
      <name>hive.server2.authentication.ldap.url</name>
       <value>ldap://adp034</value>
</property>

<property>
  <name>hive.server2.authentication.ldap.baseDN</name>
  <value>ou=Users,dc=tuxhub,dc=com</value>
</property>



2) Restart hiveserver2.

[hadoop@adp031 ~]$ nohup hiveserver2  &

3) Connect via beeline

[hadoop@adp031 conf]$ beeline
Beeline version 1.2.1 by Apache Hive
beeline> !connect jdbc:hive2://adp031:10000
Connecting to jdbc:hive2://adp031:10000
Enter username for jdbc:hive2://adp031:10000: nitin
Enter password for jdbc:hive2://adp031:10000: *****
Connected to: Apache Hive (version 1.2.1)
Driver: Hive JDBC (version 1.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://adp031:10000>

0: jdbc:hive2://adp031:10000> show tables;
+-----------+--+
| tab_name  |
+-----------+--+
| hive1     |
+-----------+--+
1 row selected (0.326 seconds)
0: jdbc:hive2://adp031:10000>





Wednesday, 30 September 2015

Hadoop 2.6 Historyserver Configuration



                                           Hadoop 2.6 Historyserver Configuration


1. Add below parameter in yarn-site.xml

[hadoop@adp031 hadoop]$ cat /usr/local/hadoop/hadoop/etc/hadoop/yarn-site.xml

<property>
    <name>yarn.log-aggregation-enable</name>
        <value>true</value>
        </property>

<property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
            <value>/tmp/logs</value>
            </property>

<property>
    <name>yarn.log-aggregation.retain-seconds</name>
        <value>259200</value>
        </property>

<property>
    <name>yarn.log-aggregation.retain-check-interval-seconds</name>
        <value>3600</value>
        </property>

<property>
<name>yarn.log.server.url</name>
<value>http://adp031:19888/jobhistory/logs/</value>
</property>


2. Add Below parameter in mapred-site.xml

[hadoop@adp031 hadoop]$ cat /usr/local/hadoop/hadoop/etc/hadoop/mapred-site.xml

<property>
<name>mapreduce.jobhistory.address </name>
<value>adp031:10020</value>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>adp031:19888</value>
</property>

<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/mr-history/tmp</value>
</property>

<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/mr-history/done</value>
</property>


3.  Start historyserver

[hadoop@adp031 hadoop]$ mr-jobhistory-daemon.sh start historyserver

4. Run a job 

yarn  jar /usr/local/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar  wordcount /user/hadoop/IN_wordcount /user/hadoop/out_wordcount_12/

5. Go to web UI type http://<history server name>:19888/jobhistory






Ansible Cheat sheet

Install Ansible  # yum install ansible Host file configuration  File  [ansible@kuber2 ~]$ cat /etc/ansible/hosts     [loca...