Get Latest De

DBA -Hadoop




Computer Courses


Technology Learners


Online and Classroom Sessions


Week Days and Week Ends

Duration :

45 Days

Azure Cloud Objectives

•How to work with DBA -Hadoop Tool.
•How to implement DBA -Hadoop on different Platforms.
•Learn to how to code and deploy DBA -Hadoop
•Eliminate duplicate code and consolidate script files using DBA -Hadoop.
•From A-Z: The Complete Beginners-Advanced Masterclass – Learn DBA -Hadoop
•Learn DBA -Hadoop from Scratch, Start from basic to advanced level
•Components states props how to pass variables between components in DBA -Hadoop.
•You will be able to do web development projects on your own.
•Have an understand of DBA -Hadoop and how to apply it in your own programs

DBA -Hadoop Training Features

•24 × 7 = 365 days supportive faculty
•Get Training from Certified Professionals
•We assist on Internship on Real-Time Project 
•Personal attention and guidance for every student
•Fast track and Sunday Batches available on request
•Training by Proficient Trainers with more than a decade of experience
•Our trainers have experience in training End Users & Students & Corporate employees.
•Lifetime access to our 24×7 online support team who will resolve all your technical queries, through ticket based tracking system.

Who are eligible for DBA -Hadoop

•Architect, Program Manager, Delivery Head, Technical Specialist, developer, Sr. Developer, Transition Manager, Quality Manager, Consultant
•IOS Developer, .net c#, c c++ java, accounts finance sap fico, sap mm functional consultant
•Java, Net, C#, Manual Testing, Automation Testing, Manual Testing With Healthcare, Android And Ios Developer
•PHP, OpenCart Developer, Magento Developer, Html, Javascript, Jquery, Css, Photoshop, html, css, bootstrap, javascript, jquery, Business Development
•Software Developer, quality testing engineers, Ui Developers, Oracle Apps, Sap Modules, Sharepoint Developers, Tibco Developers, .net Developers, java

DBA -HADOOP Syllabus

Introduction to Hadoop

&#10004High Availability
&#10004Advantages and Challenges
&#10004Introduction to Big Data
&#10004What is Big data
&#10004Big Data opportunities, Challenges
&#10004Characteristics of Big data
&#10004Introduction to Hadoop
&#10004Hadoop Distributed File System
&#10004Comparing Hadoop & SQL
&#10004Industries using Hadoop
&#10004Data Locality
&#10004Hadoop Architecture
&#10004Map Reduce & HDFS
&#10004Using the Hadoop single node image (Clone)

Hadoop Distributed File System (HDFS)

&#10004HDFS Design & Concepts
✔Blocks, Name nodes and Data nodes
✔HDFS High-Availability and HDFS Federation
✔Hadoop DFS The Command-Line Interface
✔Basic File System Operations
✔Anatomy of File Read, File Write
✔Block Placement Policy and Modes
✔More detailed explanation about Configuration files
✔Metadata, FS image, Edit log, Secondary Name Node and Safe Mode
✔How to add New Data Node dynamically, decommission a Data Node dynamically (Without stopping cluster)
✔FSCK Utility. (Block report)
✔How to override default configuration at system level and Programming level
✔HDFS Federation
✔ZOOKEEPER Leader Election Algorithm
✔Exercise and small use case on HDFS

Map Reduce

&#10004Map Reduce Functional Programming Basics
✔Map and Reduce Basics
✔How Map Reduce Works
✔Anatomy of a Map Reduce Job Run
✔Legacy Architecture ->Job Submission, Job Initialization, Task Assignment, Task Execution, Progress and Status Updates
✔Job Completion, Failures
✔Shuffling and Sorting
✔Splits, Record reader, Partition, Types of partitions & Combiner
✔Optimization Techniques -> Speculative Execution, JVM Reuse and No. Slots
✔Types of Schedulers and Counters
✔Comparisons between Old and New API at code and Architecture Level
✔Getting the data from RDBMS into HDFS using Custom data types
✔Distributed Cache and Hadoop Streaming (Python, Ruby and R)


&#10004Sequential Files and Map Files
✔Enabling Compression Codec’s
✔Map side Join with distributed Cache
✔Types of I/O Formats: Multiple outputs, NLINEinputformat
✔Handling small files using CombineFileInputFormat
✔Map Reduce Programming – Java Programming
✔Hands on “Word Count” in Map Reduce in standalone and Pseudo distribution Mode
✔Sorting files using Hadoop Configuration API discussion
✔Emulating “grep” for searching inside a file in Hadoop
✔DBInput Format
✔Job Dependency API discussion
✔Input Format API discussion,Split API discussion
✔Custom Data type creation in Hadoop


&#10004ACID in RDBMS and BASE in NoSQL
✔CAP Theorem and Types of Consistency
✔Types of NoSQL Databases in detail
✔Columnar Databases in Detail (HBASE and CASSANDRA)
✔TTL, Bloom Filters and Compensation


&#10004HBase Installation, Concepts
✔HBase Data Model and Comparison between RDBMS and NOSQL
✔Master  & Region Servers
✔HBase Operations (DDL and DML) through Shell and Programming and HBase Architecture
✔Catalog Tables
✔Block Cache and sharding
✔DATA Modeling (Sequential, Salted, Promoted and Random Keys)
✔JAVA API’s and Rest Interface
✔Client Side Buffering and Process 1 million records using Client side Buffering
✔HBase Counters
✔Enabling Replication and HBase RAW Scans
✔HBase Filters
✔Bulk Loading and Co processors (Endpoints and Observers with programs)
✔Real world use case consisting of HDFS,MR and HBASE


&#10004Hive Installation, Introduction and Architecture
✔Hive Services, Hive Shell, Hive Server and Hive Web Interface (HWI)
✔Meta store, Hive QL
✔Working with Tables
✔Primitive data types and complex data types
✔Working with Partitions
✔User Defined Functions
✔Hive Bucketed Tables and Sampling
✔External partitioned tables, Map the data to the partition in the table, Writing the output of one query to another table, Multiple inserts
✔Dynamic Partition
✔Differences between ORDER BY, DISTRIBUTE BY and SORT BY
✔Bucketing and Sorted Bucketing with Dynamic partition
✔RC File
✔Compression on hive tables and Migrating Hive tables
✔Dynamic substation of Hive and Different ways of running Hive
✔How to enable Update in HIVE
✔Log Analysis on Hive
✔Access HBASE tables using Hive
✔Hands on Exercises


&#10004Pig Installation
✔Execution Types
✔Grunt Shell
✔Pig Latin
✔Data Processing
✔Schema on read
✔Primitive data types and complex data types
✔Tuple schema, BAG Schema and MAP Schema
✔Loading and Storing
✔Filtering, Grouping and Joining
✔Debugging commands (Illustrate and Explain)
✔Validations, Type casting in PIG
✔Working with Functions
✔User Defined Functions
✔Types of JOINS in pig and Replicated Join in detail
✔SPLITS and Multiquery execution
✔Error Handling, FLATTEN and ORDER BY
✔Parameter Substitution
✔Nested For Each
✔User Defined Functions, Dynamic Invokers and Macros
✔How to access HBASE using PIG, Load and Write JSON DATA using PIG
✔Piggy Bank
✔Hands on Exercises


&#10004Sqoop Installation
✔Import Data.(Full table, Only Subset, Target Directory, protecting Password, file format other than CSV, Compressing, Control Parallelism,  All tables Import)
✔Incremental  Import(Import only New data, Last Imported data, storing Password in Metastore, Sharing Metastore between Sqoop Clients)
✔Free Form Query Import
✔Export data to RDBMS,HIVE and HBASE
✔Hands on Exercises
✔HCatalog Installation
✔Introduction to HCatalog
✔About Hcatalog with PIG,HIVE and MR


&#10004Flume Installation
✔Introduction to Flume
✔Flume Agents: Sources, Channels and Sinks
✔Log User information using Java program in to HDFS using LOG4J and Avro Source, Tail Source
✔Log User information using Java program in to HBASE using LOG4J and Avro Source, Tail Source
✔Flume Commands
✔Use case of Flume: Flume the data from twitter in to HDFS and HBASE. Do some analysis using HIVE and PIG

More Ecosystems

&#10004HUE.(Hortonworks and Cloudera)


&#10004Workflow (Action, Start, Action, End, Kill, Join and Fork), Schedulers, Coordinators and Bundles.,to show how to schedule Sqoop Job, Hive, MR and PIG
✔Real world Use case which will find the top websites used by users of certain ages and will be scheduled to run for every one hour
✔Zoo Keeper
✔HBASE Integration with HIVE and PIG
✔Proof of concept (POC)


&#10004Spark Overview
✔Linking with Spark, Initializing Spark
✔Using the Shell
✔Resilient Distributed Datasets (RDDs)
✔Parallelized Collections
✔External Datasets
✔RDD Operations
✔Basics, Passing Functions to Spark
✔Working with Key-Value Pairs
✔RDD Persistence
✔Which Storage Level to Choose?
✔Removing Data
✔Shared Variables