New generation integration (NiFi, Kylo) and Spark SQL internals
Dear DataKRKers,
Soon, we are hosting another event where we have two great presentations confirmed:
- New generation data integration tools: NiFi and Kylo
Abstract:
Many enterprise organizations lack the expertise to make the transition from traditional data warehousing strategies to operationalized big data. To assist with this issue, companies have started using a multitude of newer generation big data integration tools. In this session, we will explore two such tools: NiFi and Kylo. Apache NiFi comes from the NSA project NiagaraFiles, and supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic — essentially, flow-based programming for big data. Kylo is a data lake management software platform for self-service data ingest and data preparation with integrated metadata management, governance, security and best practices inspired by Think Big’s 150+ big data implementation projects.
Bio:
Nicholas Fish works with Think Big Analytics, a Teradata Company, helping companies gain insights about their businesses from the many datasets they have accrued over time. When he’s not wrangling data in Hive, he can be found in Copenhagen, Denmark, usually either riding his bike or walking his dog Bobbie.
- Spark SQL internals, debugging and optimization
Abstract:
In recent years Apache Spark has received a lot of hype in the Big Data community. It is seen as a silver bullet for all problems related to gathering, processing and analysing massive datasets. Due to its rapid evolution (do not forget that Spark is one the most active open source projects), some of the ideas behind it seem to be unclear and require digging into different blog posts and presentations. During this talk we will dive into the internals of Spark SQL, look how our queries are translated to the actual code executed on the nodes and find different ways to debug and optimize them.
Bio:
Mikołaj Kromka is a Software engineer and Spark trainer at VirtusLab, focused on finding new connections between Scala, Functional Programming and Big Data. In his spare time likes to analyse complex networks, take photos and explore Cracow's museums.