THE DATA SCIENCE PROGRAM FOR 2021 INCLUDES ONLINE AND FACE-TO-FACE COURSES.
The Barcelona GSE Data Science Summer School introduces participants to some of the tools and methods of Data Science.
Course list for 2021
Week 0 (June 21-25, 2021)
One week before the rest of the Barcelona GSE Summer Schools begin
- Foundations of Data Science Online
Instructor: Joan Verdú (Barcelona GSE Data Science Center)
Week 1 (June 28 - July 2, 2021)
No Data Science courses offered during Week 1
Week 2 (July 5-9, 2021)
- Foundations of Data Science FACE-TO-FACE
Instructor: Joan Verdú (Barcelona GSE Data Science Center) - Using Text as Data: Methods and Applications FACE-TO-FACE
Instructors: Hannes Mueller (IAE-CSIC and Barcelona GSE) and Rubén Durante (ICREA-UPF and Barcelona GSE)
Week 3 (July 12-16, 2021)
- Deep Learning and Applications FACE-TO-FACE
Instructors: Vicenç Gómez (UPF Artificial Intelligence and Machine Learning group) and Anders Jonsson (UPF Artificial Intelligence and Machine Learning group)
Program director
Apply for Summer School
Applications will open in February 2021
Early-bird payment deadline: May 14, 2021
Fees and discounts
Fees vary by course. You may be eligible for one or more available Summer School discounts. Our staff can provide a personalized quote for you.
Deep Learning and Applications
Overview and Objectives
Data Science Summer School Week 1 and 2 participants are expected to have a basic knowledge of linear algebra, basic computing skills, and familiarity with any kind of programming language.
However, Deep Learning and Applications in the Week 3 is taught at a higher level and participants must know fundamentals of programming and data analysis with Python and fundamental concepts in statistical learning, overall the type of material which is covered in Week 1 of the Data Science Summer School (Foundations of Data Science).
Course Outline
In this course we will introduce several aspects of modern machine learning, deep learning and it’s applications:
- An overall introduction to Deep Learning covering the fundamental architectures such as convolutional and recurrent natural networks, auto-encoders, as well as the basic optimization algorithms for training deep models.
- Three individual, largely practical, sessions which cover different paradigms or application domains:
2.1. Computer Vision, with focus on advanced convolutional network architectures and generative models for tasks such as image classification, object detection or image captioning.
2.2. Natural Language Processing, with focus on several aspects of language modeling, word and document embeddings, conversational models and visualization.
2.3. Deep Reinforcement Learning, with focus on how deep learning can be used in sequential decision making problems such as board/video-games.
In case candidates for this course are not also taking a Week 1 course, they must upload a statement of purpose explaining their qualification for the course at the moment of applying. The statement should include any relevant info about academic knowledge and personal skills and interests as well as professional experience in the sector if any. In case of having joined the Data Science Summer School in the past, this should be mentioned in the statement.
About the Instructors
Vicenç Gómez is currently a tenure-track faculty under a Ramon y Cajal fellowship in the Artificial Intelligence and Machine Learning group at UPF, which he joined with a transnational academic career grant (FP7 Marie Curie Actions) in 2014. Prior to this, he worked for more than six years as a research scientist in the machine learning group at the Radboud University Nijmegen, the Netherlands. His main research interests are probabilistic inference and reinforcement learning. He works on developing novel machine learning methods derived from first principles and understanding their theoretical properties, as well as their application for modeling, understanding and improving the functioning of networked systems.
Anders Jonsson is professor in the ICT Department of UPF, and director of the Artificial Intelligence and Machine Learning group. He received his MSc in Engineering Physics from the Royal Institute of Technology in Stockholm, Sweden, and his PhD in Computer Science from the University of Massachusetts Amherst, USA. His research work is mainly focused on sequential decision problems, either in the form of reinforcement learning, or in the form of automatic planning. In particular, his work focuses on finding and exploiting the structure of these types of problems in order to simplify their solution. He has participated in numerous international projects.
Reading Material:
1. Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD. Jeremy Howard, Sylvain Gugger. Publisher: O'Reilly Media
2. Representation Learning: A Review and New Perspectives. Yoshua Bengio, Aaron Courville, Pascal Vincent
3. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. Song Han, Huizi Mao, William J. Dally
4. Distilling the Knowledge in a Neural Network - Geoffrey Hinton, Oriol Vinyals, Jeff Dean
Vicenç Gómez
Ramón y Cajal Fellow, UPF Artificial Intelligence and Machine Learning GroupUPF
Foundations of Data Science
Prerequisites to Enroll
Although not mandatory, some knowledge of Python, Jupyter notebooks, and algebra is recommended.
Students must have their own laptop or desktop computer, and a good internet connection. Last version of Anaconda with Python and R should be installed, following instructions to be sent. Check hardware requirements to run Anaconda here.
Notebooks might be run as well using Google Collaboratory.
Further instructions will be sent prior to the start of the course.
Overview and Objectives
This is an intensive 20-hour course based on a hands-on approach using Jupyter notebooks, all material is motivated by specific information retrieval and data analysis questions and each thematic unit concludes with a small project. The course provides basic training in data analysis and machine learning with Python and R.
The course will be delivered by Joan Verdú, Head of Consulting and Knowledge Transfer of the BGSE Data Science Center, in collaboration with Data Scientists affiliated to the Data Science Center (as TAs).
Course Outline
Classes last 5 days, 4 hours per day
The course provides training in data analysis and machine learning with Python and R and evolves along the thematic following units:
1. Programming with Python. Main topics:
- Intro to Jupyter Notebooks
- Loops, control flow
- Lists, maps, reductions
- Functions and classes
- Inputs and Outputs
Keywords: data types, functions, objects
2. Data analysis with Python
- Inputs and Outputs
- Series and Dataframe
- Group, apply, combine
- Merge and concat
- Non-rectangular data
Keywords: pandas, database management
3. Data visualization
- Elements of data visualization
- Scatter plots
- Line plots
- Exploration plots: barplots, boxplots
- Advanced plots: correlation, regression, biplots
- Special plots
- Reporting using visualization
Keywords: seaborn, plotly
4. Data preparation
- Handling missing data: imputation methods
- Feature transformation and engineering: normalization, dimensionality reduction, category encoding
Keywords: sklearn
5. Supervised learning
5.1 Linear models for regression
- Linear models and non-linear feature maps
- Model evaluation
- Convexity
- Bias-Variance tradeoff
- Penalized likelihood and lasso
- Cross validation and model selection
5.2. Linear models for classification
- Logistic regression
- Misclassification, ROC, AUC
- Class imbalance
- Generative vs. discriminative models
5.3. Non-linear models: decision trees
- Decision trees
- Variable selection
- Forests
- Bagging and boosting
Keywords: sklearn, linear models, cross validation, regularisation, lasso, trees, ensembles, boosting
6. Unsupervised learning
- Continuous latent variables
- PCA and SVD
- Multidimensional Scaling
- Clustering
Keywords: clustering, factors, independent component analysis
7. Intro to data science in R
7.1. R basic programming
- Loops, control flow
- Lists, vectors, matrices, dataframes
- Functions
7.2. Data exploration and visualization in R
- Scatter plots
- Line plots
- Exploration plots
- Advanced plots
7.3. Sample data preparation, supervised and unspervised learning with R
- Data preparation: missing data imputation, feature transformation
- Supervised learning: linear, non-linear
- Unsupervised learning: PCA, clustering
Keywords: tidyr, data.table, caret, ggplot2, plotly
Required Activities:
Class participation is compulsory. During classes there will be little projects that students are expected to perform while in class.
Students will have access to videos corresponding to most of the topics, where the notebooks and materials are explained in advance. They are supposed to watch them in advance, so that we can speed up during online sessions, focus on Q&A and solve the short exercises that are embedded in the notebooks.
At the end of each module (5 of them, one per day) there will be a project assignment, to be done individually, with some support from TAs (Teaching Assistants)
About the Instructors
The course will be delivered Joan Verdu, Head of Consulting and Knowledge Transfer of the Data Science Center. His background has a mixture of academics, research, consultancy and engineering. Joan is an Agronomic and Forest Engineer and received his PhD in Environmental Sciences from the University of Lleida. After the PhD, he spent some years in the public sector as a planning engineer at the Catalan Water Agency.
Joan Verdú
Barcelona GSE Data Science CenterUsing Text as Data: Methods and Applications
Overview and Objectives
An ever-increasing share of human communication is recorded as digital text. Analyzing and making sense of this vast amount of data is increasingly important for research in the social sciences.
This course provides an accelerated introduction to the theory and practice of text analysis by surveying methods for systematically extracting quantitative information from text, from classical content analysis and dictionary-based methods, to classification methods, scaling methods, and topic models.
The course introduces the theoretical foundations for text analysis but mainly takes a practical approach, illustrating the methods through state-of-the-art applications to research questions in economics, political science, and finance. Lectures will cover a series of case-studies from economics and related fields, and will be complemented by hands-on programming sessions in Python.
By the end of the course students will be able to:
i) Convert vast archives of text into a format that can be used for data analysis
ii) Use the data generated from text to tackle research and policy problems relevant to their interests and organizations.
Course schedule
1. From text to data
- Introduction to documents, metadata, corpora
- Word counts, document-feature matrix
- Collocation and n-grams
2. Important notions
- TF-IDF
- Measuring text length, diversity, and complexity
- Measuring similarity between documents
3. Machine Learning
- Introduction to statistical learning theory
- Supervised and unsupervised learning
4. Statistical methods
- Dictionary-based methods
- Penalized linear models
- Dimension reduction and feature selection
- Non-Linear text regression
- Random forests
5. Generative language models: unsupervised methods
- Latent semantic analysis
- Topic models and Latent Dirichlet Allocation (LDA)
- K-means clustering
6. Word Embeddings
- Word2Vec
- Doc2Vec
About the instructors
Hannes Mueller is a tenured researcher at the Institute for Economic Analysis (IAE-CSIC) and an Associate Research Professor at the Barcelona GSE. His fields of interest are Political Economy, Development Economics and Conflict Studies with a particular focus on the effect of violent conflict on the economy. Most recently, Prof. Mueller is trying to adopt supervised and unsupervised machine learning techniques for economics and political science research. He has published in leading journals in Economics and Political Science such as the American Economic Review (AER), the American Political Science Review (APSR), the Journal of the European Economic Association (JEEA) and the American Journal of Economics: Macro (AEJ: Macro). He has contributed reports for the International Growth Centre (UK government) and the World Bank on the economic effects of conflict, a joint UN/World Bank study on conflict prevention and the UN Economic Commission for Africa on structural change in Northern Africa. He is currently involved in projects with the Banco de España developing techniques for nowcasting and forecasting economic conditions with text.
Ruben Durante is ICREA Research Professor at UPF and Affiliated Professor of the Barcelona GSE. He works in the field of political economics, with a focus on the functioning and impact of traditional and new media in democratic societies. His work has been published in a number of top economic journals, including the Journal of Political Economy, the American Economic Journal: Applied Economics, and the Journal of the European Economic Association, and has been featured extensively in the popular press.
Main readings:
- Bengfort, B., Ojeda, T., Bilbro, R., Applied Text Analysis with Python, 2018, O’Reilly Media.
- Gentzkow, M., Kelly, B. T., and Taddy, M., 2017, Text as Data. NBER Working Paper #23276.
- Grimmer, J. and Stewart, B., 2013, Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts, Political Analysis, vol. 21, n. 3, pp. 267-297.
- Krippendorff, K., 2013, Content Analysis: An Introduction to Its Methodology, Sage.
- Manning, C. D., Raghavan, P., and Shutze, H., 2008, An Introduction to Information Retrieval, Cambridge University Press.
Applications:
- Ash, E., Chen, D., and Naidu, S., 2018, Ideas Have Consequences: The Impact of Law and Economics on American Justice, Working Paper.
- Baker, S. R., Bloom, N., and Davis, S. J., 2016, Measuring Economic Policy Uncertainty, Quarterly Journal of Economics, vol. 131, n.4, pp. 1593-1636.
- Bandiera, O., Prat, A., Hansen, S., and Sadun, R., 2020, CEO Behavior and Firm Performance, Journal of Political Economy, vol. 128, n. 4, pp. 1325-1369.
- Blei, D., Ng, A., and Jordan, M., 2003, Latent Dirichlet Allocation, Journal of Machine Learning Research, vol. 3, pp. 993–1022.
- Blei, D., and Lafferty, D., 2006, Dynamic Topic Models, In Proceedings of the 23rd International Conference on Machine Learning.
- Cage, J., Hervé, N., and Viaud, M.-L., The Production of Information in an Online World, forthcoming, Review of Economic Studies.
- Garg, N., Schiebinger, L., Jurafsky, D., and Zou, J, 2018, Word embeddings quantify 100 years of gender and ethnic stereotypes, Proceedings of the National Academy of Sciences, vol. 115, n. 16, pp. 3635-3644
- Gentzkow, M. and Shapiro, J., 2009, What Drives Media Slant?, Econometrica, vol. 78, n.1, pp.35-71.
- Hassan, T., Hollander, S., van Lent, L., and Tahoun, A., 2018, Firm-Level Political Risk: Measurement and Effects, Working Paper.
- Hansen, S., McMahon, M., and Prat, A., 2018, Transparency and Deliberation within the FOMC: a Computational Linguistics Approach, Quarterly Journal of Economics, vol. 133, n. 2, pp. 801-870.
- Kelly, Bryan T., Papanikolaou, D., Seru, A., and Taddy, M, Measuring Technological Innovation Over the Long Run, forthcoming, American Economic Review: Insights.
- Loughran, T. and McDonald, B., 2011, When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10K-s, The Journal of Finance, vol. 66, n. 1, pp. 35-65.
- Tetlock, P. C., 2007, Giving Content to Investor Sentiment: The Role of Media in the Stock Market, Journal of Finance, vol. 62, n. 3, pp. 1139-1168.
Laptop required for face-to-face practical courses
Practical sessions for face-to-face courses will be held in a lecture room, not in a computer lab. Participants must bring a laptop in order to follow these sessions.
Entry requirements
Applicants to all Summer School programs should meet the basic entry requirements. In addition, Data Science participants are expected to have a basic knowledge of linear algebra, basic computing skills, and familiarity with any kind of programming language (not necessarily R, Python)
Certificate of attendance
Participants will receive a Certificate of Attendance stating the courses and number of hours completed. At the conclusion of the Summer Schools, participants will receive a certificate for the number of hours attended. All Barcelona GSE courses require an average of twice the lecture hours for readings, pre-readings and class preparation. Interested students should check with their universities to see if these hours are transferable into ECTS credits.
Fees
Multiple course discounts are available. Fees for courses in other Summer School programs may vary.
Course | Modality | Hours | Regular Fee | Reduced Fee* |
---|---|---|---|---|
Deep Learning and Applications | Face-to-face | 20 | 2200€ | 1320€ |
Foundations of Data Science (Week 0 - Online) | Online | 20 | 2000€ | 1200€ |
Foundations of Data Science (Week 2 - Face-to-face) | Face-to-face | 20 | 2200€ | 1320€ |
Using Text as Data: Methods and Applications | Face-to-face | 20 | 2200€ | 1320€ |
* Reduced Fee applies for PhD or Master's students, Alumni of Barcelona GSE Master's programs, and participants who are unemployed.
See more information about available discounts or request a personalized discount quote by email.
Course schedule
The schedule is designed to allow students to participate in all courses in the Data Science program. Courses can also be taken individually or in combination with courses in other Barcelona GSE Summer School programs, schedule permitting.
Day / Time | Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|
15:00 - 17:00 | Foundations of Data Science - Online Edition | ||||
17:00 - 17:15 | Foundations of Data Science - Online Edition (Networking breaks) | ||||
17:30 - 19:30 | Foundations of Data Science - Online Edition | ||||
19:30 - 20:30 | Happy Hour |
There are no Data Science courses scheduled for Week 1 (June 28 - July 2, 2021).
Day / Time | Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|
9:00 - 11:00 | Foundations of Data Science - Face-to-face Edition | ||||
11:30 - 13:30 | Using Text as Data: Methods and Applications | ||||
14:00 - 16:00 | Foundations of Data Science - Face-to-face Edition | ||||
16:00 - 18:00 | Using Text as Data: Methods and Applications |
Day / Time | Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|
9:00 - 11:00 | Deep Learning and Applications | ||||
11:00 - 14:00 | |||||
14:00 - 16:00 | Deep Learning and Applications |
Mix and match your summer courses!
Remember that you can combine Data Science courses with courses in other programs happening during Week 1, Week 2, and Week 3 (schedule permitting).