DISA

Centre for Data Intensive Sciences and Applications

New PhD-course in Statistical Learning 7,5 credits

2020-01-21

The main objective with this course is to get an introduction into modern statistical methods for modeling and and prediction of data. After successfully completing the course, the student is anticipated to be able to

  • Demonstrate a conceptual understanding of the following fields in statistics: classification, resampling methods, linear model selection and regularization, and unsupervised learning.
  • Apply modern statistical software for classification, resampling methods, linear model selection and regularization, and unsupervised learning.

The course contains

  • Linear regression: simple and multiple linear regression with assessing the accuracy of the coefficients and the model and comparison with K-nearest neighbors
  • Classification: logistic regression, linear discriminant analysis, K-nearest neighbors
  • Resampling methods: cross-validation, bootstrap
  • Linear model selection and regularization: subset selection, shrinkage methods, dimension reduction methods, considerations in high dimensions
  • Unsupervised learning: Principal component analysis, clustering methods
  • Writing and presentation of a report where real data materials are analyzed with appropriate statistical approaches from the particular statistical field

Type of Instruction
Teaching consists of lectures, presentations, laboratory work, and tutoring.

Examination
The course is assessed with the grades A, B, C, D, E, Fx or F. The grade A constitutes the highest grade on the scale and the remaining grades follow in descending order where the grade E is the lowest grade on the scale that will result in a pass. The grade F means that the student’s performance is assessed as fail (i.e. received the grade F). The student’s knowledge is assessed in form of

  • Graded conceptual assignments (3 credits), grades A to F
  • Graded computer assignments (3 credits), grades A to F
  • Presentation of the use of statistics in the student’s research alternatively presentation of a statistical topic not covered in this course (1,5 credits), grading scale U-G.

Required reading
G. James, D. Witten, T. Hastie, R. Tibshirani, An introduction to statistical learning: with applications in R, latest edition, Springer.
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer, latest edition.

Timetable
The course will start on Fri March 27 and finish by the end of May/beginning of June. If convenient for participants it is suggested that we meet weekly on Fridays at, say 13:15.

Registration
Try to finalize the registration no later than Mar 17 so it will be easier to plan. Register here https://forms.gle/a3zzgG5toouMFqYb7

Prerequisites
1MA501 Probability Theory and Statistics 7,5 credits or an equivalent course in mathematics, mathematical statistics, or statistics.

If you have any questions please contact Roger Pettersson (https://lnu.se/personal/roger.pettersson/).

Seminar: “A Few Notes on Artificial Intelligence and Database Technology” – Thursday Jan. 23 at 13-14

Title: A Few Notes on Artificial Intelligence and Database Technology

Place: D1173
Time: 13-14
Date: January 23, 2020

Abstract: In this seminar I would like clarify the importance of the scientific theory functional object-types to approach reality. This theory help to evaluate knowledge representation formalisms, deep learning and data modelling and data transformations. Further, the theory, together with mathematical logic, is the foundation for the Match™ Technology Ecosystem that enables organisations to model, build no code applications and simplifying IT architectures.

About: Dr.Larry Lucardie has a background in Artificial Intelligence and Semantic Database modelling. At The Technical University of Eindhoven he graduated on a theory of complexity, functional classifications, that is fundamental to knowledge representation, deep learning and data modelling. Larry is the main architect of the Match™ AI & Data Technology Platform that is aligned to functional classifications. The Match™ platform enables organisations to design ISO compliant models of enterprise content, business fluid no-code applications, simplified IT architectures and smart internet portals.

As a Professor at the Uppsala University in Sweden, Larry lectured logic programming, knowledge and data modelling and E-business and supervised PHD students. He is founder and the current CEO of Knowledge Values and as such involved in improvement projects in the areas of value chain and process re-engineering and of process underlying technology as application development, IT architectures and data processing. Business areas: regulation management and compliance, E-commerce, financial processes products, incident management, compliance and Brexit.

Warm welcome,
Arianit Kurti, Associate Professor, Department of Computer Science and Media Technology

Keynote: Machine Learning for better entertainment recommendations: A Nordic perspective

2019-11-22

During this years Big Data Conference at Linnaeus University on December 5-6 2019 we have several very interesting Keynote speakers, one of them is Antonina Danylenko, Head of Applied Machine Learning at The Nordic Entertainment Group who offers video-on-demand streaming, linear TV channels and radio broadcasting – probably best known for their Viaplay, Viafree & Viasat platforms.

She will talk about how the entertainment industry is transforming at a rapid rate. This is driven by new trends, growing customer expectations and AI technologies allowing for more innovation, disruption and opportunities for growth. At the same time, the industry is getting increasingly crowded – as the use of streaming services is on the rise, and the Nordic region spends more time online than ever before. Nearly four out of ten people watch video content on a daily basis, with three-quarters of the 16-24 year-old age bracket streaming that content from subscription-based services. We are seeing a new phenomenon emerge known as ‘stacking’ behaviour – where households typically subscribe to more than one service, just to keep their options open when it comes to deciding what to watch. With so many options out there, people can be paralysed by what’s known as the ‘paradox of choice.’ Personalising every aspect of the customer journey has become our main focus in the recommendation space, as consumers of entertainment have never been more spoilt for choice. Serving up relevant content recommendations at the right time is key to making the decision process as easy as possible. However, building and maintaining the lifecycle of recommender systems to capture customers’ behavior and use different algorithms to guide them towards something they will enjoy watching is not easy. In this presentation, I will outline the end-to-end process of building a recommender system utilising Big Data and Machine Learning to address this challenge.”

Don’t miss out on the opportunity to listen to him and take part of the conference by signing up here by November 25th.

More about Antonina Danylenko who holds a PhD in Computer Science from Linnaeus University, Sweden where she wrote a dissertation on the topic of “Decision Algebra: A General Approach to Learning and Using Classifiers”. After several years working at IKEA within Solution Architecture and Data Science domains , she joined the Nordic Entertainment Group—where she is now the Head of Applied Machine Learning. The Nordic Entertainment Group offers video-on-demand streaming, linear TV channels and radio broadcasting – probably best known for their Viaplay, Viafree & Viasat platforms. They’re responsible for connecting over 1.4 million subscribers to the content they love, with more than 1900 employees across the Nordics and the UK.

Meet Keynote speaker: Flaminio Squazzoni, University of Milan

During this years Big Data Conference at Linnaeus University on December 5-6 2019 we have several very interesting Keynote speakers, one of them is Flaminio Squazzoni, University of Milan, Italy. He will talk about When ready-made data must be tailored and repurposed. The challenge of creating big confidential dataset in science in a public-private partnership.

Research on science relies on available data. However, while we have plenty of data on publications and citations, which help measure the prestige of scientists and their institutions, we lack data on internal processes of peer review at journals and funding agencies.

These data are crucial to understand whether allocation of resources and merit in science is biased and assess if science is still a cooperative, civilized game between disinterested experts or a corrupted race towards hyper-competition and the ‘publish-or-perish’ mentality. In this talk, I will share my experience as leader of a large-scale European project that developed a protocol for data sharing of journal data with a group of publishers representing the vast majority of the current scholarly communication market. This experience testifies to the nexus of technological, legal and organisational aspects involved in data sharing between stakeholders, the power of hybridization of data sharing models and the beauty of the digital age. And it tells you that science is not corrupted!

Don’t miss out on the opportunity to listen to him and take part of the conference by signing up here by November 25th.

More information about Flaminio Squazzoni is full professor of Sociology at the University of Milan, Department of Social and Political Sciences, where he teaches Behavioural Sociology. He is the head of BEHAVE (www.behavelab.org), and also editor of JASSS-Journal of Artificial Societies and Social Simulation, co-editor of Sociologica -International Journal for Sociological Debate and member of the editorial board of Research Integrity and Peer Review, Sistemi Intelligenti and Socio-Cognitive Systems. He is advisory editor of the Wiley Series in Computational and Quantitative Social Science and the Springer Series in Computational Social Science. He is former President of the European Social Simulation Association (Sept 2012/Sept 2016) and former Director of the NASP ESLS PhD Programme in Economic Sociology and Labour Studies (2015-2016). E-mail: flaminio.squazzoni@unimi.it

Keynote: The false truth about everybody being data-driven

2019-11-21

During this years Big Data Conference at Linnaeus University on December 5-6 2019 we have several very interesting Keynote speakers, one of them is Tobias Wagenknecht, Head of Data & Analytics at Aftonbladet.

He will talk about how everybody is stressing out, they all feel the urgency to become data-driven. Established businesses disappear and unicorns disrupt the market and question well established work-flows. There is a hysteria about the need to change and to do it all at once over each and every business area. This presentation is supposed to put things into perspective, I will speak about my own mistakes and how the general perception of everybody else succeeding tricks us into feeling bad. In the end you will realise that you are not alone and that changes take time – no matter how fast paced we have become.

Don’t miss out on the opportunity to listen to him and take part of the conference by signing up here by November 25th.

More information about Tobias Wagenknecht: Born in Germany, raised in Spain, migrated to Sweden in 2011 – I consider myself a European data-nerd, who loves the beauty of numbers and charts as much as the satisfaction of  being able to come up with an actionable decision instead of just another report. I spent almost half of my life within travel & hospitality and learned a lot about the eternal struggle of making a conservative industry more data-driven. It is a story about many failures, learnings and iterations – so let’s have a talk and then try again!

Keynote: Open Science with the European Open Science Cloud

2019-11-20

During this years Big Data Conference at Linnaeus University on December 5-6 2019 we have several very interesting Keynote speakers, one of them is Gergely Sipos works as Customer and Technical Outreach Manager for the EGI Foundation. He will give a talk about Open Science with the European Open Science Cloud.

Don’t miss out on the opportunity to listen to him and take part of the conference by signing up here by November 25th.

In recent years, the vision of Open Science has emerged as a new paradigm for transparent, data-driven science capable of accelerating competitiveness and innovation. The embodiment of this vision in Europe is the European Open Science Cloud (EOSC). This presentation will introduce the EOSC initiative, its current implementation from the EOSC-hub and other projects, and will show
how EOSC can already facilitate Open Science. EOSC-hub is a 33 million Euro project that started in January 2018 with the involvement of over 100 institutes. EOSC-hub defines, creates and operates the integration and management system of the EOSC. This integration and management system (the Hub) builds on mature processes, policies and tools from the leading European e-infrastructures to cover the whole life-cycle of services from planning to delivery. Through this management system online and ‘human’ services, software and data are delivered towards researchers via the EOSC Portal and its Markerplace. The Portal already includes over 100 services from 3 e-infrastructure communities (EGI, EUDAT, INDIGO-DataCloud), and from over 20 Research Infrastructures and scientific service providers. The catalogue of services is expected to radically grow in the next years. The Hub acts as a single contact point for researchers and innovators to discover, access, use and reuse a broad spectrum of services starting from baseline infrastructure services (such as HTC clusters, IaaS clouds, storage, security) to domain specific applications, datasets and portals.

You can also meet Gergely Sipos during the tutorial session about Open Science with Jupyter, Zenodo and Binder on December 4th.

More information about Gergely Sipos he coordinates EGI’s engagement programme and supports researcher communities and educators from academia and industry in tackling big-data and big-compute challenges using state of the art services from the EGI community. Gergely holds an MSc and a PhD in computer science and project management from the University of Miskolc, Hungary. He became involved in grid computing in 2002 and researched high-level user environments and collaborative design tools. Prior to EGI, Gergely worked in training, consultancy and user support for the EGEE project from his base in Budapest, where he promoted grid technology and distributed computing practices to scientific communities.

New course: Digital Humanities Research Methods (7.5 credits)

2019-10-08

The course “Digital Humanities Research Methods” is given at Linnaeus University, Sweden, online, in English, from 30 March 2020 till 03 May 2020, and is free of charge for EU citizens. 

The aim of this course is to learn about digital research methods to address research questions from the humanities. The course gives an overview of the impact of digitization on the way research is conducted, an insight into a range of different digital methods, as well as an awareness of difficulties related to the methodology. The deadline to apply is 15 October.

For more information about the course and how to apply see: https://lnu.se/en/course/digital-humanities-research-methods/distance-international-autumn/