Monday, 18 April 2016

How to Configure YARN Fair Scheduler on a MAPR Cluster



1) Add below line is yarn-site.xml

<property><name>yarn.scheduler.fair.allocation.file</name><value>/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/fair-scheduler.xml</value></property>

<property><name>yarn.acl.enable</name><value>true</value></property>

<property><name>yarn.admin.acl</name><value>mapr mapr</value></property>

<property><name>yarn.resourcemanager.scheduler.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value></property>

2) Configure fair scheduler:

[root@mfs071 ~]#  vim /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/fair-scheduler.xml

<allocations>


 <queuePlacementPolicy>
                 <rule name="specified" create="false"/>
                <rule name="primaryGroup" create="false"/>
                <rule name="secondaryGroupExistingQueue" create="false"/>
                <!--Create interactive queue for generic shape group?-->
                <rule name="user" create="false" />
                 <rule name="reject" />
  </queuePlacementPolicy>
  <queue name="root">
    <minResources>2000 mb, 1 vcores,1 disks</minResources>
    <maxResources>5000 mb, 1 vcores,2 disks</maxResources>
    <maxRunningApps>10</maxRunningApps>
    <weight>2.0</weight>
    <schedulingPolicy>fair</schedulingPolicy>
    <aclSubmitApps> </aclSubmitApps>
    <aclAdministerApps>root</aclAdministerApps>
  <!--  <aclAdministerApps>mapr mapr</aclAdministerApps> -->
    <queue name="sample_sub_queue1">
        <minResources>1024 mb, 1 vcores,1 disks</minResources>
        <aclSubmitApps>nitin</aclSubmitApps>
        <aclAdministerApps>root</aclAdministerApps>
    </queue>
    <queue name="sample_sub_queue2">
        <minResources>1024 mb, 1 vcores,1 disks</minResources>
        <aclSubmitApps>mapr</aclSubmitApps>
        <aclAdministerApps>root</aclAdministerApps>
    </queue>
    <queue name="sample_sub_queue3">
        <minResources>1024 mb, 1 vcores,1 disks</minResources>
        <aclSubmitApps>kunal</aclSubmitApps>
        <aclAdministerApps>root</aclAdministerApps>
    </queue>
</queue>

</allocations>


3) Restart resourcemanager

4) Login via nitin and submit job to  sample_sub_queue3.

[nitin@mfs071 ~]$ yarn jar "/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0-mapr-1602.jar" teragen  -Dmapred.map.tasks=20 -Dmapred.reduce.tasks=0 -Dmapred.job.queue.name=sample_sub_queue3 1000 /tmp/teragen1

Job will fail with following exception

java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1460957957156_0002 to YARN : User nitin cannot submit applications to queue root.sample_sub_queue3

5) Try to submit job in sample_sub_queue1. It will be successful.

6) Login via kunal and try to kill the job.

[kunal@mfs072 ~]$ yarn application -kill application_1460957957156_0001

Job will fail with following exception

Exception in thread "main" org.apache.hadoop.yarn.exceptions.YarnException: java.security.AccessControlException: User kunal cannot perform operation MODIFY_APP on application_1460957957156_0001








Tuesday, 5 April 2016

Hiveserver2 high availability from beeline



Step 1:

 Edit hive-site.xml on every hs2 node

<property>
        <name>hive.server2.support.dynamic.service.discovery</name>
        <value>true</value>
</property>


<property>
        <name>hive.zookeeper.quorum</name>
        <value>mfs071:5181,mfs072:5181,mfs073:5181</value>
</property>

<property>
        <name>hive.server2.zookeeper.namespace</name>
        <value>hiveserver2</value>
</property>


Steps 2

Restart hs2


Steps 3

 Connect via beeline

!connect jdbc:hive2://mfs071:5181,mfs072:5181,mfs073:5181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2


[mapr@mfs072 ~]$ /opt/mapr/hive/hive-1.2/bin/beeline
Beeline version 1.2.0-mapr-1601 by Apache Hive
beeline> !connect jdbc:hive2://mfs071:5181,mfs072:5181,mfs073:5181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Connecting to jdbc:hive2://mfs071:5181,mfs072:5181,mfs073:5181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Enter username for jdbc:hive2://mfs071:5181,mfs072:5181,mfs073:5181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2: mapr
Enter password for jdbc:hive2://mfs071:5181,mfs072:5181,mfs073:5181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2: ****
Connected to: Apache Hive (version 1.2.0-mapr-1601)
Driver: Hive JDBC (version 1.2.0-mapr-1601)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://mfs071:5181,mfs072:5181,mfs07> show databases;
+----------------+--+
| database_name  |
+----------------+--+
| default        |
+----------------+--+
1 row selected (0.178 seconds)

Ansible Cheat sheet

Install Ansible  # yum install ansible Host file configuration  File  [ansible@kuber2 ~]$ cat /etc/ansible/hosts     [loca...