Big Data Analytics Using NoSQL and Hadoop : Big Data Analytics Module B

ชื่อหลักสูตร : Big Data Analytics Using NoSQL and Hadoop : Big-Data-B

จำนวนวันที่อบรม : 5 วัน

จุดประสงค์ของหลักสูตร :

– สามารถจัดการกับข้อมูลขนาดใหญ่ที่เกินความสามารถของการเก็บและประมวลผลของเครื่องเพียงเครื่องเดียวได้
– สามารถสร้าง Cluster เพื่อใช้ในการวิเคราะห์ข้อมูลที่ต้องการ โดยใช้ Hadoop as a service บน Cloud ได้
– สามารถเข้าใจและทำการวิเคราะห์ข้อมูลขนาดใหญ่ได้ โดยใช้เทคโนโลยีที่ส่งโปรแกรมไปประมวลผลเพื่อหลีกเลี่ยงการย้ายข้อมูลขนาดใหญ่มาประมวลผลได้
– สามารถจัดการกับข้อมูลแบบไม่มีโครงสร้าง เช่น text file, ข้อมูลกึ่งโครงสร้าง เช่น CSV, Json รวมถึงข้อมูลแบบมีโครงสร้าง เช่น ข้อมูลจากฐานข้อมูลเชิงสัมพันธ์ได้
– สามารถเข้าใจถึงความจำเป็นของการนำ NoSQL มาใช้กับข้อมูลบางประเภทได้
– สามารถจัดการกับข้อมูลที่ขาดหายไปด้วยวิธีการที่เหมาะสม
– สามารถแปลงข้อมูลชนิดต่าง ๆ เพื่อให้สามารถนำไปวิเคราะห์เชิงตัวเลขได้โดยไม่ผิดความหมายดั้งเดิม
– สามารถตรวจสอบความผิดปกติข้องข้อมูลหรือพฤติกรรมของลูกค้าได้

ข้อมูลที่ใช้เป็นกรณีศึกษา

– ข้อมูลการทำธุรกรรมของลูกค้าธนาคาร
– ข้อมูลทางสำมโนประชากรของลูกค้า
– ข้อมูลประวัติการเข้าใช้เว็บของลูกค้า
– ข้อมูลผู้ถือบัตรเครดิต
– ข้อมูลจริงจากแหล่งอื่นที่เกี่ยวข้องกับเรื่องที่อบรม เช่น ข้อมูลการรักษาผู้ป่วยโรคเบาหวาน เพื่อใช้เป็นตัวอย่างในการวิเคราะข้อมูลขาดหายและการแปลงข้อมูล

หมายเหตุ

– ข้อมูลของลูกค้าเป็นข้อมูลจำลองเสมือนจริงไม่สามารถระบุตัวตนผู้ใช้ได้
– เครื่องมือที่ใช้ในการวิเคราะห์ใช้ทั้งสองแบบคือเครื่องมือสำหรับข้อมูลทั่วไปและเครื่องมือสำหรับ Big data โดยเฉพาะ

หลักสูตรนี้เหมาะสำหรับ

– ทีมงานที่ทำด้าน Data Analytics
– Database Staff / IT Staff / IT Developer / Big Data Developer

พื้นฐานของผู้เข้าอบรม

มีประสบการณ์ในการจัดการกับข้อมูลมาอย่างน้อย 1 ปี

ผู้สอน

ดร.ไพรสันต์ ผดุงเวียง (Profile)

* ผู้เข้าอบรมต้องเตรียมคอมพิวเตอร์มาเอง

Course Outline

Day 1 / 5
Time	Title
9.00 – 10.30 น.	Introduction to Hadoop – Hadoop cluster – Hadoop Distributed File System (HDFS) – MapReduce and YARN
10.30 – 10.45 น.	Break
10.45 – 12.00 น.	– Comparison of Hadoop Software Distribution Products – Comparison of Hardware for Hadoop Ecosystem LAB: Hadoop as a service on cloud providers
12.00 – 13.00 น.	Lunch
13.00 – 14.30 น.	– MapReduce Framework LAB: Hadoop MapReduce programming
14.30 – 14.45 น.	Break
14.45 – 16.30 น.	LAB: Bank’s customer transaction statistics using Spark’s map and reduce API
Day 2 / 5
Time	Title
9.00 – 10.30 น.	Introduction to Spark – Overviewing and Concepts – Spark Architecture – Spark Core – Spark’s APIs for operating on large datasets: – Resilient Distributed Dataset (RDD) – DataFrame
10.30 – 10.45 น.	Break
10.45 – 12.00 น.	LAB: operations: transformations, actions, caching
12.00 – 13.00 น.	Lunch
13.00 – 14.30 น.	LAB: Web Application log analytics using Spark DataFrame API and SQL
14.30 – 14.45 น.	Break
14.45 – 16.30 น.	LAB: Web Application log analytics using Spark DataFrame API and SQL (cont.)
Day 3 / 5
Time	Title
9.00 – 10.30 น.	Introduction to NoSQL – What is NoSQL – No SQL Architecture – What Makes NoSQL Different – Advantages and Disadvantages of NoSQL Databases
10.30 – 10.45 น.	Break
10.45 – 12.00 น.	Introduction to NoSQL (cont.) – NoSQL vs. Relational Databases – Types of NoSQL datastores
12.00 – 13.00 น.	Lunch
13.00 – 14.30 น.	HBase: The NoSql, Column family big database – Introduction to HBase – HBase Architecture – HBase Data model
14.30 – 14.45 น.	Break
14.45 – 16.30 น.	– HBase Shell and General Commands LAB: Storing NoSQL data on HBase table
Day 4 / 5
Time	Title
9.00 – 10.30 น.	Big Data Analystics using Hive – Hive data model and managing table – Hive Data Types – Partitioning and bucketing
10.30 – 10.45 น.	Break
10.45 – 12.00 น.	LAB: Hive Queries: Create, Alter, Drop Tables
12.00 – 13.00 น.	Lunch
13.00 – 14.30 น.	LAB: Hive external table vs Hive managed table
14.30 – 14.45 น.	Break
14.45 – 16.30 น.	LAB: Working with Semi structured data and variety sources using Hive – use case: – log of users accessing website (text file), – Customer transaction (from RDBMS and CSV files)
Day 5 / 5
Time	Title
9.00 – 10.30 น.	Data sampling and preprocessing – Spark DataFrames working with tabular data – Type of data and attribute transformation – Missing values
10.30 – 10.45 น.	Break
10.45 – 12.00 น.	LAB: Missing value and attribution transformation on small and Big data (use case credit card application and related data)
12.00 – 13.00 น.	Lunch
13.00 – 14.30 น.	– Outlier/Anomaly detection – Useful data visualizations
14.30 – 14.45 น.	Break
14.45 – 16.30 น.	LAB: Customer anomaly detection and visualization on small and Big data

Download กำหนดการอบรม

หมายเหตุ

– กำหนดการอาจมีการปรับเปลี่ยนตามความเหมาะสม
– ทุกขั้นตอนที่ฝึกปฏิบัติจะมีตัวอย่างประกอบ พร้อมมีวิทยากรให้คำแนะนำตลอดการสัมมนาเชิงปฏิบัติการ

รายละเอียดเพิ่มเติม : สอบถามเพิ่มเติมได้ทาง e-mail : sales@rdbi.co.th , โทร. 064-798-4192