李飞飞
犹他大学计算机系副教授 目前担任ACM TODS和IEEE的副编辑
个人介绍:
Feifei Li is currently an associate professor at the School of Computing, University of Utah. He obtained his Bachelor's degree from Nanyang Technological University (transferred from Tsinghua University) in 2001 and PhD from Boston University in 2007. His research focuses on improving the scalability, efficiency, and effectiveness of data analytics and large-scale data management systems. He also works on data security problems in these systems. He was a recipient for a NSF career award in 2011, two HP IRP awards in 2011 and 2012 respectively, a Google App Engine award in 2013, IEEE ICDE best paper award in 2004, IEEE ICDE 10+ Years Most Influential Paper Award in 2014, a Google Faculty award in 2015, SIGMOD Best Demonstration Award in SIGMOD 2015, SIGMOD 2016 Best Paper Award, SIGMOD Research Highlight Award in 2017, and a VISA research faculty award in 2017. He is/was the demo PC co-chair for SIMGOD 2018, a member of the SIGMOD Jim Gray Dissertation Award selection committee in 2017, a member of the CIKM 2017 best paper award selection committee, a PC area chair for SIGMOD 2015 and ICDE 2014, the demo PC co-chair for VLDB 2014, and the general co-chair for SIGMOD 2014. He currently serves as an associate editor for both ACM TODS and IEEE TKDE.
议题:
Towards Building Interactive and Online Analytical Systems
议题介绍:
Supporting interactive queries and analytics over large data is a critical requirement in many data-driven applications. The classic external memory model based on IO optimizations no longer works well in the era of big data due to its high latency. Instead, newer systems (e.g., Spark, Impala) rely on in-memory computing over a cluster of commodity machines to offer scale-out interactive data analytics. In the context of large spatio-temporal data, this talk presents the Simba system that offers scalable and efficient in-memory analytics over a cluster. Simba extends the Spark SQL engine to support rich query and analytical semantics through both SQL and DataFrame API (e.g., spatial join, knn join, trajectories). An effective query optimizer leveraging its indexing support and geometry-aware query optimization is designed. Furthermore, the system is able to provide online analytics that explores the accuracy-efficiency tradeoff through novel online aggregation techniques that support complex multi-way join queries and random sampling over joins. Lastly, we will also present ongoing extensions to Simba that explores spatio-temporal learning and sentiment analysis over large data.