In dynamic and rapidly changing labour markets, the identification of skills needs is an important challenge. Imbalances on the labour market, reflected by difficulties faced by businesses in sourcing the skills they need, high incidences of skills mismatch, and significant unemployment or underemployment especially among youth, are common to most countries, independently of their level of economic development. In order to tackle these issues, policy-makers, employers, workers, providers of education and training and students all need timely and accurate information about demand for skills on the labour market and how it relates to skills supply.
This publication collects together the contributions presented during the ILO workshop “Can we use big data for skills anticipation and matching?”, which took place on 19–20 September 2019 at ILO headquarters in Geneva, Switzerland, and the discussions during the workshop considered the feasibility of using big data in the context of skills anticipation and matching, and both the potential and the limitations of big data in skills analysis.
First-generation applications of “big data” techniques to skills analysis have focused principally on analysis of large-scale collections of vacancies data, either scraped from multiple online sources or provided directly by jobs websites or employment agencies. Assembly, coding and cleaning of these data sets, and subsequent analysis and interpretation of the data, present significant challenges, but this type of application is progressing towards maturity. There are multiple competing vendors, covering a range of countries with different languages, providing services commercially to customers such as universities and employers. The technology is being adopted by multiple public LMI and skills anticipation systems to complement their existing methodologies and information products.
Information gained from applying big data techniques to large-scale vacancies data is a higher-resolution, bigger-data version of the analyses of print advertisements that were used for content analysis in some LMI systems in the past. It is not a direct substitute for the core data sources for LMI and skills anticipation, such as LFS. Even when the challenges presented by the data sources are resolved to the greatest extent feasible, the nature of the information provided is different from that provided by regular statistics.
For instance, a household survey conducted through a statistically well-founded sampling process provides a detailed snapshot of employment in an economy for the time at which the survey is conducted. It can also be used to derive information on change, both at aggregate level by comparing successive iterations of the survey, and at micro-level, through recording responses to questions on how the employment characteristics of individuals in each household survey have changed in the preceding month or year. The main proxy indicators that provide insights into workers’ skills are those on occupation, sector of employment and qualifications. Age also provides a proxy for years of experience.
By contrast, vacancies data provides indicators on skills requirements associated with a combination of new jobs and churn in existing jobs. The indicators are imperfect in ways that are discussed in the contributions to this report, but at their best they can provide information on advertised vacancies sorted by occupations and qualifications, coded similarly to regular statistics, taking into account standard classifications such as ISCO (SOC), ISIC and ISCED. As standard national statistics rarely provide data on specific salient skills available and required, big data in OJV becomes a unique complementary source to fill out the up-to-date picture with information on specific job tasks, skills, qualifications and certifications valued, and experience demanded. Such analyses are already proving valuable to private users of OJV big data, and collections of indicators combining different information sources seems likely to become a regular feature of national LMI and skills anticipation systems as they integrate OJV into their processes and products.
In developing economies, especially in low-income countries, LMI on skills proxies such as occupation and level of education, let alone on specific skills and competencies, is often missing. Usually, such information is collected by qualitative and semi-qualitative methods, such as occupational research, expert committees or surveys among employers and workers. Such approaches may be rather costly, especially if the information collected is to provide the level of granularity needed for analysis capable of informing the design of education and training. In low-income countries, regular statistical sources of information, such as LFS, may not be conducted frequently or regularly enough. The use of big real-time data analytics may potentially resolve this problem.
However, access to the Internet and levels of digital literacy are important limitations in developing countries. Furthermore, given the large size of the informal economy in these countries, only a small portion of actual vacancies are covered by online job platforms, where it is very likely that only the formal sector is represented. These considerations indicate significant challenges facing the use of OJV data, particularly in developing countries. Therefore, once the focus shifts to developing countries, it is important to keep these limitations in mind and treat data analysis as valid in the specific context only.
Because of how job advertisements are written, the information on skills requirements available for ex- traction is better for technology jobs and high-skill white-collar jobs, thinner for mid-level jobs, and often marginal for low-level jobs. Vacancies not advertised online remain unobserved, typically meaning that lower-skill vacancies and vacancies in less developed economies are under-represented, and that vacancies intended to be filled through internal organizational movements are largely absent. That is why, at least for the time being, big data is best used when analysed in relation to what it represents – specific occupations and sectors, mostly digitally intensive and more highly skilled.
Up to now, the main focus of skills-related big data activity has been on OJV. Circumstances have been favourable for this. The principal source of vacancies data, in the form of online job advertisements, is accessible without substantial restrictions or fees. Because the data do not relate to individuals, and are already published online, constraints associated with individual and commercial privacy have so far been minimal though they are likely to surface and require attention in the future. There is a commercial market in industrialized countries for vacancies analysis services – for example, for course planning and careers services at universities, for recruitment intelligence in major businesses and for analysis of financial instru- ments – so there is already a business model that does not rely on public policy applications.
However, a wide range of other types of big and “small” data resources on skills exists, going far beyond vacancies data, which could potentially be combined for analysis, subject to the satisfactory management of issues around access, privacy, individual identifiers and data protection. These may be summarized as follows:
X Public administrative microdata: Administrative data drawn from across ministries and other public organizations is increasingly used by national statistical offices to complement or replace survey data in the preparation of both statistics for publication and customized analyses in support of policy formation. Individual ministries and their agencies have access to their own administrative data, and may also negotiate access to administrative data held by other ministries for policy analysis. Most if not all countries have large holdings of public administrative data relevant to skills.
X National statistical office microdata: National statistical offices have huge microdata holdings, including data relevant to skills from statistical surveys, such as LFS, census of population and other household surveys; the US Occupational Requirements Survey; Eurostat’s Adult Education Survey and Continuing Vocational Training Survey.
X Large-scale skills survey and skills measurement microdata: In many countries, public and private organizations beyond national statistical offices undertake substantial enterprise skills surveys, some of them as one-off events, others repeated periodically. Various others provide services such as large-scale analysis of CVs on behalf of recruiting companies, or large-scale measurement of skills for purposes of issuing certifications.
In this report, a number of studies and approaches have been presented, showing the multitude of contexts in which job vacancy big data can be used to gain a better understanding of the underlying dynamics of the labour market, in particular on the demand side. The studies range from more technical ones, focusing on the actual extraction and management of OJV, and how these can be rendered susceptible to any kind of analysis, to national and cross-country studies that combine big data analytics with other types of regular LMI. While the majority of experience so far comes from advanced economies, the studies of India, Myanmar, Latin America and the Caribbean prove that real-time big time data may open up new ways of better understanding the dynamics in emerging and developing economies and may compensate, to some extent at least, for the absence of standard statistics. The future will show to what extent big data analytics will be mainstreamed into national LMI systems, replacing or complementing other sources of information, and whether it will allow less developed countries that do not have established data systems to leapfrog into effective skills needs anticipation and matching.
Chosen excerpts by Job Market Monitor. Read the whole story @ The feasibility of using big data in anticipating and matching skills needs