Dr. D. Y. Patil Pratishthan's
Institute for Advanced Computing and Software Development

PG-DBDA


COURSE OUTCOME

After completing this course students will be trained in statistics and machine learning using Python. They will make data driven decisions which provide them a competitive advantage in the market, technologies like Hadoop, Spark, Hive, Machine Learning provides a spring board for AI which makes them ready for Industry 4.0. At the end of the course students will be able to work as Data Analysts, Data Engineers. Studying Big Data will broaden their horizon by surpassing market forecast / predictions for Big Data Analytics

ELIGIBILITY

The Diploma in Big Data Analytics (DBDA) is a 24-weeks fulltime postgraduate course comprising of 10 Compulsory Modules and a Project.

Qualification:

1. Graduate in Engineering (10+2+4 or 10+3+3 years) in IT / Computer Science / Electronics / Telecommunications / Electrical / Instrumentation, OR
2. MSc/MS (10+2+3+2 years) in Computer Science, IT, Electronics with Mathematics in 10+2, OR
3. Graduate in any discipline of Engineering, OR
4. MCA, MCM, OR
5. Post Graduate Degree in Physics / Mathematics / Statistics, OR
6. Post Graduate Degree in Management with graduation in IT / Computer Science / Computer Applications.

    Note: The candidates must have secured a minimum of 55% marks in their qualifying examination

Application Form:

C-DAC's application form is common to Post Graduate Diploma in Big Data Analytics (PG-DBDA) Application forms for all the courses are to be filled online at http://acts.cdac.in (recommended).

C-CAT Application Fee:
Category-wise C-CAT examination fee

Course Category

C-CAT Paper(s)

Examination fee

I

A

Rs. 1350/-

II

A+B

Rs. 1550/-

III

A+B+C

Rs.1750/-

After filling the online C-CAT application form, the examination fee may be paid online through the ‘Make Payment’ step on the main menu of the online application. No cheque or demand draft (DD) will be accepted towards payment of C-CAT examination fee.

Online: The examination fee can be paid using credit/debit cards and net banking through the payment gateway that will be opened upon clicking the 'Online' option of the 'Make Payment' step. Candidates are advised to follow the instructions/steps given on the payment gateway, and also print/keep the transaction details for their records.

SELECTION PROCESS

Admissions to all PG Diploma courses of C-DAC are done through C-DAC's Computerised Common Admission Test (C-CAT). Candidates have to apply for C-CAT online at www.cdac.in or acts.cdac.in . Every year, C-CAT is usually conducted in June(for August admissions) and December (for February admissions).

Candidates will be provided ranks based on their performance in Section A, Sections A+B, Sections A+B+C of C-CAT. Along with the ranks, information on how many candidates are there above him/her in the courses applied will also be indicated.

If a candidate appears for multiple sections, he/she will be provided multiple ranks depending on his/her choice of courses at the time of filling the application form. For example, if a candidate appears for Sections A and B and had chosen courses under Category I and Category II in the application form, he/she shall be provided two ranks: (i) based on the performance in Section A, and (ii) based on the performance in Sections A+B. However, if a candidate appears for Sections A and B but had chosen only courses under Category II in the application form, he/she will be provided only one rank based on the performance in Sections A+B.

Candidates with the lowest 10% performances in Section A, Section B and Section C will not be considered for ranking in any category. Even after the removal of the lowest 10% performers as stated above, if there exist candidates in any category with zero or less than zero marks, then these candidates are also not considered for ranking. The remaining candidates will be ranked based on their performance in Section A (for candidates who have applied for Category I courses), total performance in Sections A+B (for candidates who have applied for Category II courses), and total performance in Sections A+B+C (for candidates who have applied for Category III courses).

If two or more candidates have acquired the same marks in Section A or Sections A+B or Sections A+B+C, then the candidate having more marks in Section A will be given the higher rank. If these candidates have the same marks in Section A also, then the candidate having higher value in the ratio of 'number of correct answers / number of attempted questions' in the specific section required only for that category of courses will be given the higher rank. Candidates who have the same value of this ratio and having the same total marks as well as marks in Section A will be given the same rank.

Admissions to C-DAC's PG Diploma courses at various training centres will be offered in the order of ranks obtained in C-CAT and based on the preferences of courses and centres given by the candidates. Only those candidates who are in the C-CAT rank-list will be considered for admissions to C-DAC's PG Diploma courses.

Rank-lists of August 2021 C-CAT are only applicable for admission to the Sept 2021 intake of C-DAC's PG Diploma courses. Candidates should note that mere appearance in C-CAT or being in any of the rank-lists neither guarantees nor provides any automatic entitlement to admission. Qualified candidates will have to apply for admission as per the prescribed procedure.

Important dates related to admission to C-DAC’s PG Diploma courses of March 2021 batch.

Sr No.
Event
Dates
1 Beginning of Online Registration and Application for C-CAT 9 July 2021         
2 Closing of Online Registration & Application, and Payment of Application Fee 29 July 2021         
3 Downloading of C-CAT Admit Cards 5 - 7 August 2021         
4 C-DAC's Common Admission Test (C-CAT) CCAT 7 Aug 2021          CCAT 8 Aug 2021
5 Announcement of C-CAT Ranks 20 August 2021         
6 Online Selection of Courses and Centers (1st Counseling) 20 - 26 August 2021         
7 Declaration of First Round of Seat Allocation 28 August 2021         
8 Last Date of Payment of first installment for candidates allocated seats through the first round 3 September 2021 (till 5pm)         
9 Declaration of Second Round of Seat Allocation 6 September 2021         
10 Last Date of Payment of first installment for candidates allocated seats through the second round 9 September 2021 (till 5pm)         
11 Payment of Caution Deposit and Online selection of course and centre (2nd Counseling) 11-14 September 2021 (till 5pm)         
12 Declaration of Third Round of Seat Allocation(based on 2nd Counseling) 15 September 2021         
13 Last Date of Payment of Balance Course Fee 17 September 2021         
14 Last Date of Online Registration of Students 20 September 2021         
15 Start of Online Diploma Courses across India 21 September 2021         


COURSE FEE

The fee for PG-DBDA is Rs. 97,750/- plus Goods and Service Tax (GST) currently 18%. The course fee includes expenses towards course materials, computer lab usage, examinations, final mark-list/certificate, and placement assistance provided at the local training centre. The fee for all the PG Diploma courses is to be paid in two installments. Candidates may take note that no Demand Draft (DD) or cheque or cash will be accepted at any C-DAC training centre towards payment of any installment of course fees.

The course fees has to be paid in two installment as per the schedule.
   1. First installment is Rs. 10,000/- plus Goods and Service Tax (GST) currently 18%.
   2. Second installment is Rs. 87,750/- plus Goods and Service Tax (GST) currently 18%.

C-CAT PREPARATION

1) From the current academic year admissions to all PG diploma courses will be made through a common Admission test (C-CAT).
2) C-CAT will be conducted in the form of three test papers labeled as

SECTION - A (English, Logical Reasoning and Quantitative Aptitude),
SECTION - B (Computer Fundamentals, C Programming, Data Structures, Data Communications & Networks, Object Oriented Programming, Operating Systems)
SECTION - C (Computer Architecture, Digital Electronics, Microprocessors)

Depending upon the choice(s) of the programme(s) made by the candidate he/she will have to either appear in just one test paper (SECTION - A) or two test papers (SECTION - A and SECTION - B) or all the three test papers (SECTION - A, SECTION - B and SECTION - C).

Depending on the course chosen, candidate need to appear for the test papers (relevant sections) as per the table given below:


Programme(s) Test paper(s) to be taken

PG Diploma in Big Data Analytics

(PG-DBDA)
Section A + Section B

3) Those Candidates who qualify in C-CAT will be offered admission to various PG diploma courses covered on the basis of their ranks and choices. There is no age restriction to appear in C-CAT.

4) Candidates may chose one of the dates as per their convenience while filling the application. The choice of date once made will not be altered unless approved in writing by C-DAC.

5) To apply for admission to a desired programme, a candidate is required to qualify in the corresponding test paper(s) and also satisfy the minimum eligibility criteria of the respective academic programme.

6) The candidates who have either appeared or are due to appear in the final examination of their qualifying degree are also eligible to appear in the test. By qualifying in C-CAT,candidates can apply for provisional admission subject to the condition that: (a) all parts of their final examination shall be completed by the date of registration in the programme, and (b) proof of having passed the qualifying degree with required eligibility by 31 May 2020.

7) The candidates will be provided ranks based on their performance in Section A, Section A+B, Section A+B+C. If a candidate appears in multiple sections, he/she shall be provided multiple ranks accordingly. For example if a candidate appears in Section A and Section B, he/she shall be provided two ranks, based on performance in Section A and based on performance in Section A and B. A candidate can appear only in those sections which are chosen at the time of filling in the application. A candidate, who has not appeared for a particular section, will not get any position in the merit lists, which span over that section. For each programme a separate merit list will be prepared from the list of candidates opting for that programme. Admissions to various programmes at different centres will be made on the basis of merit in C-CAT subject to fulfilling of eligibility requirements.

8) Candidates should note that mere appearance in C-CAT or being in any of the merit list neither guarantees nor provides any automatic entitlement to admission. Qualified candidates will have to apply for admission as per the prescribed procedure. Admissions shall be made in order of merit based on the choice exercised by the candidate and depending on the number of seats available in the programmes at the Admitting Centre(s).

9) With regard to the interpretation of the provisions of any matter not covered in this Information Brochure, the decision of the C-DAC shall be final and binding on all the parties concerned.

The C-CAT will test the candidate's knowledge of the above topics. The candidate must possess good knowledge of C Language in terms of the syntax and its appropriate use. The candidate should carefully study the books recommended herein. However, merely reading language constructs from the book cannot develop programming ability. It is absolutely necessary to actually write one's own code in C Language and implement at least 100 good C Programs on a computer. These programs should be of increasing complexity and should exploit appropriate constructs and advanced features of C. Candidates should solve all the problems given in the recommended books. This will help the candidates in not only mastering the language but also develop good problem solving ability, which is most critical for any successful career.

The applicant should also practice the use of good features of the language, modularize his/her code, put suitable comments to improve readability of the code, make extensive use of library routines and format the programs to express the logical flow clearly.

The candidate should note that the rigorous programming practice as prescribed above is not only required to succeed in the C-CAT but is also required to learn various modules of PG-DBDA with rapid pace. The rigorous programming practice is in fact the most important prerequisite to undertake the PG-DBDA Course and possible successful career in the IT industry thereafter. The candidate may avail the facility of online Pre-DAC course on ACTS Website. The candidate may also contact the nearest Authorised Training Centre for attending the Pre-DAC course.



SYLLABUS FOR COMMON ENTRANCE TEST (C-CAT)

The C-CAT will be conducted in computerized mode in various cities across India.The C-CAT centres will be allocated to candidates on a first-come, first-served basis of application, depending on the centres' seating capacity.

The C-CAT date and city once selected in the online application form cannot be changed unless approved in writing by C-DAC, subject to availability of seats in requested city. All such signed letters of requests with proof of valid reasons should be received at C-DAC ACTS, 5th Floor, Innovation Park, Sr. No. 34/B/1, Panchvati, Pashan, Pune 411008, before the last date of C-CAT application.
CCAT will be conducted in the form of three objective type test papers labeled as

Section – A (English, Critical Reasoning and Quantitative Aptitude),
Section – B (Computer Fundamentals, C Programming, Data Structures, Data Communications & Networks, Object Oriented Programming, Operating Systems) and
Section – C (Computer Architecture, Digital Electronics, Microprocessors)

Every section will have 50 objective-type questions of 3 marks each (maximum 150 marks for any one section). Each objective-type question in C-CAT will have four choices as possible answers of which only one will be correct. There will be +3 (plus three) marks for each correct answer and -1 (minus one) for each wrong answer. Multiple answers to a question will be treated as a wrong answer. For each un-attempted question, 0 (zero) mark will be awarded.

Use of Candidate's Laptops for C-CAT Candidates will have the option to use their own laptops for the C-CAT. Such candidates have to choose the 'Own Laptop' option in the online application form.

1) Minimum configurations required of the laptop:
  Processor : Intel’s Pentium 4 or above
  RAM : 1 GB or more (Recommended: 2 GB)
  Operating System : Windows (XP, 7, 8 or 8.1) based
  Browser : Any one of the browsers: IE6, IE7, IE8, IE9, Mozilla20, Mozilla16, Chrome26

2) JavaScript must be enabled on the browser.
3) Laptop should be LAN and WiFi enabled and these ports should be working. 4) Laptop should have antivirus running on it and should be free from any virus, malware and spyware.
5) Laptop should have proper power backup until the duration of the exam.

Note:
1. Use of logarithmic tables/calculator of any kind/cellular phone/electronic gadgets is NOT permitted in the examination hall.
2. The medium for all the test papers will be English only.
3. Use of unfair means by a candidate in C-CAT,whether detected at the time of test, evaluation or at any other stage, will lead to cancellation of his/her candidature as well as disqualification of the candidate from appearing in C-CAT in future.

Sept 2021 BATCH C-CAT SCHEDULE

Schedule of August 2021 batch C-CAT (The slot timings may vary slightly. The final timings will be printed on the admit cards.)

C-CAT Dates

Test Paper

Morning Slot Timings

Afternoon Slot Timings

07th August, 2021 and 08th August, 2021

Section A

9:30 am – 10:30 am

2:00 pm – 3:00 pm

Section B

10:45 am – 11:45 am

3:15 pm – 4:15 pm

Section C

12:00 noon – 1:00 pm

4:30 pm – 5:30 pm


Important dates related to admission to C-DAC’s PG Diploma courses of Sept 2021 batch.
Sr No.
Event
Dates
1 Beginning of Online Registration and Application for C-CAT 9 July 2021         
2 Closing of Online Registration & Application, and Payment of Application Fee 29 July 2021         
3 Downloading of C-CAT Admit Cards 5 - 7 August 2021         
4 C-DAC's Common Admission Test (C-CAT) CCAT 7 Aug 2021          CCAT 8 Aug 2021
5 Announcement of C-CAT Ranks 20 August 2021         
6 Online Selection of Courses and Centers (1st Counseling) 20 - 26 August 2021         
7 Declaration of First Round of Seat Allocation 28 August 2021         
8 Last Date of Payment of first installment for candidates allocated seats through the first round 3 September 2021 (till 5pm)         
9 Declaration of Second Round of Seat Allocation 6 September 2021         
10 Last Date of Payment of first installment for candidates allocated seats through the second round 9 September 2021 (till 5pm)         
11 Payment of Caution Deposit and Online selection of course and centre (2nd Counseling) 11-14 September 2021 (till 5pm)         
12 Declaration of Third Round of Seat Allocation(based on 2nd Counseling) 15 September 2021         
13 Last Date of Payment of Balance Course Fee 17 September 2021         
14 Last Date of Online Registration of Students 20 September 2021         
15 Start of Online Diploma Courses across India 21 September 2021         

IMPORTANT NOTE
In all matters concerning C-CAT , the decision of C-DAC will be final and binding on all the applicants.


COMPUTING FACILITIES

Given below is the computing setup that exists at our institute. A minimum of 06 hrs per day computer time on a dedicated client node is to be shared by 2 students. The institute is open 24 hours even on all Sundays / Holidays.

Servers
Windows 2012 Server,
SCO Unix Server with ODT or Fedora or Sun Solaris,
Application Servers / Dummy Servers configured for various modules.

Configuration
Quad Core 1.3 GHz with 8 GBRAM,
Fast Wide SCSI Interface,
1 TB Fast HDD (minimum),
LED Color Monitor (18.5"),
AGP Card with 4/8 MB VRAM,
PCI Network Card 10/100 BaseT UTP Ethernet,
DVD RW Drive,

Clients Machines / Network Nodes
Configuration
Core i7 3.0 Ghz, 8 GB RAM,
1 TB GB Hard Disk IDE,
LED Color Monitor (18.5"),
AGP- 64 Bit VGA Card with 8 MB/4 MB VRAM,
PCI Network Card 10/100 BaseT UTP Ethernet,
Microsoft/Logitech Mouse,
2 serial ports; 1 parallel port,
104 Keys Keyboard.

Network
Network 10/100 BaseT UTP Switches

Communication and Internet
Lease Line 16 Mbps Connectivity.

Printers
HP LaserJet Printer

Additional Lab Equipment / Audio Visual Equipment
Sound cards,
Video cards,
Color Scanner,
Modem 56 KBPS,
Microphones,
Speakers,
Television Set,
Hi-Lumen OHPs,
Video Projection Unit (SVGA/XGA Compatible).

Common Software’s and Operating Environments
SCO Unix OR Fedora and Windows 2008 Server,
Windows IIS Server, etc,.
Suitable CASE Tools
JDK 1.8, JDK, Java Web Server, Eclipse, Jboss,
Oracle 11g,
MS SQL Server 2008
Microsoft Office 2010
Python 3.3.4
MySQL, MongoDB and NOSQL
D3.js, npm Package for nodejs/Tableu software
VMware
Eclipse IDE Juno
Packages of Hadoop & Hadoop Distribution
Wireshark
Intel Parallel Studio XE
R Packages
MPI


EVALUATION METHODOLOGY

The evaluation process forms an important part of the course that leads to conferring the Diploma in Advanced Computing upon the eligible students.
The evaluation is a continous process that goes on throughout the duration of the course. Normally, evaluation for each module is carried out as soon as the module ends and the results for each module are announced within fifteen days of the end of the module. The final result of the Diploma in Advanced Computing course is usually declared within 15 days of completing evaluation of the final module of the course.
The evaluation will consist of three components: a written test, a laboratory test and ongoing evaluation of lab assignments.

The weightage for each component will normally be:

Weightage Percentage
Theory examination – (CEE) Conducted By C-DAC ACTS 40%
Laboratory examination 40%
Internal marks (Lab assignments, surprise tests, viva, seminars etc. ) 20%

There may be variation in these ratios for the following modules:

Operating System Concepts, Software Engineering and Data Communication and Networking. A student will have to score a minimum of 40% marks in each component of the evaluation in order to successfully complete any module. A student will have to successfully complete all modules of the course to be eligible for receiving the Diploma in Advanced Computing. The question papers for the theory as well as the laboratory examinations at all the centers will be set by ACTS, Pune. The evaluation of the written and laboratory will be conducted locally by the centers according to guidelines and model answers provided by ACTS, Pune. The lab examination problems will also be provided by ACTS, Pune.



The student will be awarded a grade based on his aggregate score of all modules as per the following scale:

Grade Percentage
A+ 85% and above
A 70-84.9 %
B 60-69.9 %
C 50-59.9 %
D 40-49.9 %
F Below 40%

A student who is absent for a test or is unable to successfully clear any module at the first attempt may be allowed to appear for a re-examination at the discretion of the course coordinator. However, his score at the re-examination will be de-rated by 20%. Only one re-examination will be conducted.

A student has to successfully complete all the modules and clear both lab and theory exam in order to be eligible to receive the Diploma in Advanced Computing. Students unable to complete all the modules within the course duration will be awarded a certificate for the modules successfully cleared by him/her. No student will be allowed to appear for any module after completion of the course duration. Performance statements and certificates will be issued to all students by ACTS, Pune within 15 days of completing evaluation of the final module of the course.


Course Contents

Linux Programming

Installation (Ubuntu and CentOS), Basics of Linux, Configuring Linux, Shells, Commands, and Navigation, Common Text Editors, Administering Linux, Introduction to Users and Groups, Linux shell scripting, shell computing, Introduction to enterprise computing, Remote access.

Introduction to Cloud Computing

Cloud Computing Basics, Understanding Cloud Vendors (AWS/Azure/GCP), Definition, Characteristics, Components, Cloud provider, SAAS, PAAS, IAAS and other Organizational scenarios of clouds, Administering & Monitoring cloud services, benefits and limitations, Deploy application over cloud. Comparison among SAAS, PAAS, IAAS, Cloud Products and Solutions, Cloud Pricing, Compute Products and Services, Elastic Cloud Compute, Dashboard.

Python Programming

Python basics, If, If- else, Nested if-else, Looping, For, While, Nested loops, Control Structure, Break, Continue, Pass, Strings and Tuples, Accessing Strings, Basic Operations, String slices, Working with Lists, Accessing list, Operations, Function and Methods, Files, Modules, Dictionaries, Functions and Functional Programming, Declaring and calling Functions, Declare, assign and retrieve values from Lists, Introducing Tuples, Accessing tuples, Visualizing using Matplotlib, Seaborn, OOPs concept, Class and object, Attributes, Inheritance, Overloading, Overriding, Data hiding, Operations Exception, Exception Handling, except clause, Try-finally clause, User Defined Exceptions, Data wrangling, Data cleaning.

R Programming

Reading and Getting Data into R, Exporting Data from R, Data Objects-Data Types & Data Structure. Viewing Named Objects, Structure of Data Items, Manipulating and Processing Data in R (Creating, Accessing, Sorting data frames, Extracting, Combining, Merging, reshaping data frames), Control Structures, Functions in R (numeric, character, statistical), working with objects, Viewing Objects within Objects, Constructing Data Objects, Packages – Tidyverse, Dplyr, Tidyr etc., Queuing Theory, Non parametric Tests- ANOVA, chi-Square, t-Test, U-Test, Interactive reporting with R markdown, Introduction to Rshiny.

Oops Concepts, Data Types, Operators and Language, Constructs, Inner Classes and Inheritance, Interface and Package, Exceptions, Collections, Threads, Java.lang, Java.util, Java Virtual Machine, Reflection in JVM, JVM’s architecture, Lambda Expressions, Functional Programming and Interfaces, Introduction to Streams, Introduction of JDBC API.

Introduction to Business Analytics using some case studies, Summary Statistics, Making Right Business Decisions based on data, Statistical Concepts, Descriptive Statistics and its measures, Probability theory, Probability Distributions (Continuous and discrete- Normal, Binomial and Poisson distribution) and Data, Sampling and Estimation, Statistical Interfaces, Predictive modeling and analysis, Bayes’ Theorem, Central Limit theorem, Data Exploration & preparation, Concepts of Correlation, Covariance, Outliers, Regression Analysis, Forecasting Techniques, Simulation and Risk Analysis, Optimization, Linear, Nonlinear, Integer, Overview of Factor Analysis, Directional Data Analytics, Functional Data Analysis , Predictive Modelling (From Correlation To Supervised Segmentation): Identifying Informative Attributes, Segmenting Data By Progressive Attributive, Models, Induction And Prediction, Supervised Segmentation, Visualizing Segmentations, Trees As Set Of Rules, Probability Estimation; Overfitting And Its Avoidance: Generalization, Holdout Evaluation Vs Cross Validation; Decision Analytics: Evaluating Classifiers, Analytical Framework, Evaluation, Baseline, Performance And Implications For Investments In Data; Evidence And Probabilities, Explicit Evidence Combination With Bayes Rule, Probabilistic Reasoning, Business Strategy, Achieving Competitive Advantages, Sustaining Competitive Advantages.

Python Libraries

Pandas, Numpy, Scipy, Scrapy,Plotly, Beautiful soup

Database Concepts (File System and DBMS), OLAP vs OLTP, Database Storage Structures (Tablespace, Control files, Data files), Structured and Unstructured data, SQL Commands (DDL, DML & DCL), Stored functions and procedures in SQL, Conditional Constructs in SQL, data collection, Designing Database schema, Normal Forms and ER Diagram, Relational Database modelling, Stored Procedures, Triggers. The tools and how data can be gathered in a systematic fashion, Data ware Housing concept, No-SQL, Data Models - XML, working with MongoDB, Cassandra- overview, architecture, comparison with MongoDB, working with Cassendra, Connecting DB’s with Python, Introduction to Data Driven Decisions, Enterprise Data Management, data preparation and cleaning techniques.

Introduction to Big Data

Beyond the Hype, Big Data Skills and Sources of Big Data, Big Data Adoption, Research and Changing Nature of Data Repositories, Data Sharing and Reuse Practices and Their Implications for Repository Data Curation.

Hadoop

Introduction of Big data programming-Hadoop, The ecosystem and stack, The Hadoop Distributed File System (HDFS), Components of Hadoop, Design of HDFS, Java interfaces to HDFS, Architecture overview, Development Environment, Hadoop distribution and basic commands, Eclipse development, The HDFS command line and web interfaces, The HDFS Java API (lab), Analyzing the Data with Hadoop, Scaling Out, Hadoop event stream processing, complex event processing, MapReduce Introduction, Developing a Map Reduce Application, How Map Reduce Works, The MapReduce Anatomy of a Map Reduce Job run, Failures, Job Scheduling, Shuffle and Sort, Task execution, Map Reduce Types and Formats, Map Reduce Features, Real-World MapReduce.

Hadoop Environment

Setting up a Hadoop Cluster, Cluster specification, Cluster Setup and Installation, Hadoop Configuration, Security in Hadoop, Administering Hadoop, HDFS – Monitoring & Maintenance, Hadoop benchmarks.

Apache Airflow

Introduction to Data warehousing and Data lakes, Designing Data warehousing for an ETL Data Pipeline, Designing Data Lakes for ETL Data Pipeline, ETL vs ELT.

Introduction to HIVE

Programming with Hive: Data warehouse system for Hadoop, Optimizing with Combiners and Practitioners (lab), Bucketing, more common algorithms: sorting, indexing and searching (lab), Relational manipulation: map-side and reduce-side joins (lab), evolution, purpose and use, Case Studies on Ingestion and warehousing.

HBase

Overview, comparison and architecture, java client API, CRUD operations and security

Apache Spark APIs for large-scale data processing:

APIs for large-scale data processing: Overview, Linking with Spark, Initializing Spark, Resilient Distributed Datasets (RDDs), External Datasets, RDD Operations, Passing Functions to Spark, Job optimization, Working with Key-Value Pairs, Shuffle operations, RDD Persistence, Removing Data, Shared Variables, EDA using PySpark, Deploying to a Cluster Spark Streaming, Spark MLlib and ML APIs, Spark Data Frames/Spark SQL, Integration of Spark and Kafka, Setting up Kafka Producer and Consumer, Kafka Connect API, Mapreduce, Connecting DB’s with Spark.

Business Intelligence- requirements, content and managements, information Visualization, Data analytics Life Cycle, Analytic Processes and Tools, Analysis vs. Reporting, MS Excel: Functions, Formula, charts, Pivots and Lookups, Data Analysis Tool pack: Descriptive Summaries, Correlation, Regression, Introduction to Power BI, Modern Data Analytic Tools, Visualization Techniques.

Supervised and Unsupervised Learning , Uses of Machine learning , Clustering, K means, Hierarchical Clustering, Decision Trees, Classification problems, Bayesian analysis and Naïve Bayes classifier, Random forest, Gradient boosting Machines, Association rules learning, PCA, Apriori, Support vector Machines, Linear and Non liner classification, ARIMA, XG Boost, CAT Boost, Neural Networks and its application, Tensorflow 2.x framework, Deep learning algorithms, KNN, NLP, Bert in NLP,NLP transformers, NLTK, Introduction to Pytorch framework, AI and its application.

No Contents To Show

Stay Connected
To the Best institute for PG-DBDA in Pune.
Please fill out enquiry below and we'll get right back to you