Introduction to Big Data Management
BASIC DATA
course listing
A - main register
course code
IXX9108
course title in Estonian
Sissejuhatus suurandmete haldamisse
course title in English
Introduction to Big Data Management
course volume CP
-
ECTS credits
6.00
to be declared
yes
fully online course
yes
assessment form
Pass/fail assessment
teaching semester
autumn - spring
language of instruction
Estonian
English
Study programmes that contain the course
code of the study programme version
course compulsory
IAXD22/22
no
Structural units teaching the course
IT - Department of Software Science
Course description link
Timetable link
View the timetable
Version:
VERSION SPECIFIC DATA
course aims in Estonian
Õppeaine põhieesmärk on tutvustada doktorandile kõige uuemaid ja võimsamaid infotehnoloogiaid, mida kasutatakse suurandmete töötlemiseks, salvestamiseks ning analüüsiks.
course aims in English
The key objective of this course is to familiarize a Ph.D. student with the most potent cutting-edge information technologies used in manipulating, storing, and analyzing big data.
learning outcomes in the course in Est.
Aine läbinud üliõpilane:
- selgitab suurandmete olemust ja peamisi suurandmete töötlemise tööriistu;
- kirjeldab Sparki’i mitmekihilist keskkonda;
- paigaldab Sparki klastri (paigaldades ja seadistades sõlmed, seadistades MESOS-i);
- kasutab struktureeritud andmeid Spark SQLi abil;
- töötleb jooksvalt sissetulevaid andmeid;
- kohaldab suurandmete ennustavat analüüsi tegeliku elu kasutusstsenaariumites.
learning outcomes in the course in Eng.
Upon successful completion of this course, the student:
- identifies Big Data challenges and recognize main Big Data tools and frameworks;
- describes the multi-layer ecosystem of Spark;
- sets up and configure a Spark cluster (installing and configuring nodes, configuring MESOS);
- leverages structured data with Spark SQL;
- processes data coming in the flight;
- applies predictive analytics on big data in real-life use cases.
brief description of the course in Estonian
Oleme tunnistajaks sotsiaalmeedia plahvatuslikule kasvule ning sotsiaalse ja majandustegevuse kõigi aspektide arvutipõhiseks muutumisele, mistõttu kasvavad andmemahud kiiremini kui töötlemiskiirus. See on kaasa toonud suure hulga peamiselt struktureerimata andmete loomise: ajaveebid, videod, kõnesalvestused, fotod, e-kirjad, säutsud, kui nimetada vaid mõnda.
Õppeaine põhieesmärk on tutvustada doktorandile kõige uuemaid ja võimsamaid infotehnoloogiaid, mida kasutatakse suurandmete töötlemiseks, salvestamiseks ja analüüsiks. Kõige suurem väljakutse doktorantidele oleks näha oma uurimisküsimusi uues valguses suurandmete haldamise vaatenurgast.
Selles loengus saate teada, kuidas koguda, säilitada ja töödelda suuri ja heterogeenseid andmevorminguid, kasutades suurandmete raamistikku Spark, et luua infosüsteemi integreeritud töötlusahelaid. Kuna andmeid saab salvestada kettale või edastada voona, saate mudeli koostamiseks ja uute andmete hõlpsaks klassifitseerimiseks rakendada masinõppemudeleid.
Loengud on mitmekeelsed, st doktorant saab kasutada keelt, mida valdab kõige paremini (Java, Python, Scala).
Kava:
1. Suurandmete tutvustus
2. Core Spark
4. Spark SQL and Data Frames
5. Striimingu analüütika, kasutades Kafka ja Sparki Striimingut
6. SparkML – sissejuhatus masinõppe vahenditesse
brief description of the course in English
Data is growing faster than processing speeds since we witness an explosion of social media and the computerization of every aspect of social and economic activity. The latter lead to the creation of overwhelming volumes of primarily unstructured data: weblogs, videos, speech recordings, photographs, e-mails, Tweets, to name but a few.

The key objective of this course is to familiarize a Ph.D. student with the most potent cutting-edge information technologies used in manipulating, storing, and analyzing big data. The most thriving challenge would be how a Ph.D. student would rediscover his research questions from the big data management point of view.

In this lecture, you will learn how to collect, store, and process large and heterogeneous data formats using the big data framework Spark to set up processing chains integrated into the Information System. Since data can be stored on a disk or arriving in a stream, you will be able to apply machine learning models to build your model and classify New Data easily.

The lectures will be multilingual, i.e., the Ph.D. student can use the language that masters the most Java, Python, Scala.
Outline:

1. Big Data introduction
2. Core Spark - RDD-Transformations and Actions
3. Spark SQL and Data Frames
4. Streaming analytics using Kafka and Spark Streaming
5. SparkML: Introduction to Machine learning tools
type of assessment in Estonian
Arvestuse saamiseks esitavad doktorandid sooritatud praktiliste ülesannete kohta aruanded.
type of assessment in English
In order to receive the assessment, doctoral students submit reports on the hands-on activities performed.
independent study in Estonian
-
independent study in English
-
study literature
-
study forms and load
daytime study: weekly hours
4.0
session-based study work load (in a semester):
lectures
2.0
lectures
-
practices
2.0
practices
-
exercises
0.0
exercises
-
lecturer in charge
-
LECTURER SYLLABUS INFO
semester of studies
teaching lecturer / unit
language of instruction
Extended syllabus
2022/2023 spring
Sadok Ben Yahia, IT - Department of Software Science
English
    display more
    2021/2022 spring
    Sadok Ben Yahia, IT - Department of Software Science
    English
      Course description in Estonian
      Course description in English