How is the sample selected?
The sample for the survey is drawn in two stages, with municipalities and households as, respectively, the primary and secondary sampling units. Before the primary units are selected, they are stratified by region and population size.
Within each stratum, the municipalities, in which interviews are conducted, are selected to include all those with a population of more than 40,000 and those with panel households (self-representing municipalities), while the smaller towns are selected on the basis of probability proportional to size sampling. The individual households to be interviewed are then selected randomly from the population register.
In order to increase design efficiency, starting from the 2020 edition, second-stage units were drawn after identifying an appropriate stratification based on household income and debt (second-stage unit stratification).
For more information see Methodological Notes for the Survey on Household Income and Wealth.
Which are the terms and the conditions for the usage of the data?
Anonymized survey data are distributed for research purposes.
The author takes full responsibility for the use of the data and the Bank of Italy is not involved in any way. Authors who refer in their publications to results obtained using this survey data are required to cite the source (Bank of Italy, Survey on Household Income and Wealth) and to indicate the version of the historical or annual archive used.
In order to keep the bibliography of the survey up to date, the author must send the complete reference of his/her research work based on the data to firstname.lastname@example.org.
Which types of database are available?
Two types of database are available. The annual database contains information relating to the surveys starting from 1989 wave. The historical database contains information relating to surveys from the 1977 wave to the most recent. Both the database are available as 'comma separated' ASCII files (CSV) and in compressed SAS and STATA (version 7 or higher) formats and accompanied by explanatory notes on their contents (methodologies, classifications, definitions).
Visit the page Distribution of the microdata to access the databases.
What is the Household Finance and Consumption Survey (HFCS)?
The Household Finance and Consumption Survey (HFCS) is a harmonized sample survey on euro-area households' wealth, income and consumption carried out by the national central banks.
The Italian data are drawn from the Survey on Household Income and Wealth and have been suitably harmonized (e.g. incomes are reported as gross values, including taxes and social security contributions).
The national central banks have followed a methodology that is as homogenous as possible and used harmonized definitions to gather data on households' balance sheets, both stocks and flows, with special focus on the wealth components.
The full documentation is available on the project website.
How can I access the Household Finance and Consumption Survey (HFCS) data?
The microdata, in anonymous form, are available on request to researchers for academic and research purposes. Instructions on requesting access to the data can be found on the ECB website. The microdata for Italy and the related documentation are available on the Eurosystem Survey data (HFCS) page.
Who is the reference person of the household (RP)?
The reference person (RP) is the person primarily responsible for or most knowledgeable about the household budget. The RP, once identified, provides the interviewer with all the information about the household and its individual members, if absent. In individual data files, the NORD variable indicates the member to which the information refers. The NORD variable is always set at '1' for the reference person.
How are households defined in this survey?
The term 'household' as used in the survey refers to all persons that normally reside in the same dwelling on 31 December of the year to which the survey refers and that contributed at least part of their income to the household. It also includes any members temporarily absent (e.g. on vacation, away for study, etc.) and any non-relatives living permanently in the home at 31 December of the reference year. Therefore it does not include children born in the year following the reference year, students living away from home, people who only lived part of the year in the household dwelling.
How can I link data from households interviewed in previous surveys (panel households)?
Panel households have the same household ID (NQUEST) as the previous surveys.
The household members can be linked through the variable NORDP (in the CARCOM file) which indicates the household member ID (NORD) that they had in the previous survey.
However, the same household members may not be found from one survey to the next: those who have died or have left the household for other reasons may no longer be present, while new members may be added through births or other arrivals that occur during the period between surveys. These cases are noted in a specific section of the questionnaire. In the CARCOM file, members who have left the household are simply absent in the subsequent survey (i.e. they can be detected by difference), while new members are present but are given a NORDP that indicates missing data ('.').
What does 'net wealth' mean?
The Survey on Household Income and Wealth (SHIW) was begun in order to gather data on the income and wealth of Italian households. All components of the two aggregates are net value, recorded after tax.
Net wealth (W) is defined as the difference between households' total assets and their total liabilities, aggregated as indicated below. For further details refer to the Methodological Notes for the Survey on Household Income and Wealth.
- Real assets (real estate, business equity, valuables);
- Financial assets (deposits, government and other securities, trade credit or credit due from other households)
- Financial liabilities (liabilities to banks and financial companies, trade debt and liabilities to other households).
The W variable is available in the RICFAMxx file of the annual database, where xx are the last digits of the year to which the survey refers, and in the RICF file of the historical database.
What does 'net income' mean?
The Survey on Household Income and Wealth (SHIW) was begun to gather data on the income and wealth of Italian households. All components of the two aggregates are net value, recorded after tax.
Net disposable income (Y) is defined as the sum of the incomes after taxes of all members of the household (i.e. salary and wages, retirement income and self-employment income) plus any transfer payments and income from property and financial assets are added. For further details refer to the Methodological Notes for the Survey on Household Income and Wealth.
The Y variable is available in the RFAMxx e RISFAMxx files of the annual databases, where xx denotes the last two digits of the survey year, and in the RFAM e CONS files of the historical database.
Similarly, net income data for each member of the household are available in RPERxx of the annual databases and RPER file of the historical database. Property income is assigned entirely to the reference person (NORD = 1).
Have missing data been imputed?
The questionnaire variables included in the annual and historical databases do not contain imputed values. Missing data ('don't know', 'no answer', 'not applicable') are denoted by '.'.
Imputed values are used only to compute aggregated variables, such as household income and wealth, where data is missing.
What are and how do I use sampling weights?
The sampling weight is a weighting coefficient assigned to each household interviewed that makes it possible to obtain unbiased estimates of the phenomena of interest.
This weight is obtained through the following steps:
- an initial weight is computed as the inverse of selection probability (design weight);
- this weight is then adjusted for unit nonresponse by multiplying the design weight by the inverse of response rate for the municipality;
- last, the weight is calibrated to account for additional social and demographic information coming from the Italian National Institute of Statistics (Istat) - i.e. population distribution by gender, age group, geographical area, size of the municipality of residence.
In the annual databases, the households sampling weights are available in the CARCOMxx file (where xx denotes the last two digits of the survey year) as the PESOFIT variable. All members of the same household have the same weight. In the historical database, the sampling weights are available in the PESO file as the PESO variable. For each survey, the sum of weights over all households is the total number of households interviewed. PESO and PESOFIT may differ because the variable included in the annual databases (PESOFIT) is not revised, while the one included in the historical database (PESO) is aligned to the demographic statistics on the Italian population published by Istat (e.g. reconstructions between censuses) at the time the data are revised. The PESO file also contains the PESOPOP variable, which is obtained by multiplying PESO by a constant (different for each year of the survey) and allows the estimate of the totals for the universe (the Italian resident population).
As of the 2020 survey, sampling weight construction has been revised to take account of the introduction of household stratification in the second stage of survey design. Moreover, for the purposes of historical data comparison, a specific weight was built using an iterative weight rebalancing technique (raking) to reduce the difference in household selection probability under the new design compared with the previous one.
As of the 2020 edition, the PESOFIT variable of the annual database is the sampling weight obtained with the new design and does not allow for comparison with previous years, while the PESO variable of the historical database is the sampling weight obtained using the rebalancing technique that allows for comparison with previous editions.
In statistical analysis, in general, the use of sampling weights is recommended to obtain unbiased estimates.
For a detailed description of the weighting scheme used in the IBF until the 2016 survey, see I. Faiella and R. Gambacorta, 'The weighting process in the SHIW', Banca d'Italia, Temi di Discussione (Working Papers), 636, 2007.
For more details on the changes made to the weighting process in the IBF as of the 2020 edition, see Methodological Notes for the Survey on Household Income and Wealth, as well as R. Gambacorta and E. Porreca, 'Bridging techniques in the redesign of the Italian survey on household income and wealth', Banca d'Italia, Questioni di economia e finanza (Occasional Papers), forthcoming.
What are and how can I use the replication weights?
Replication weights make it possible to compute the standard errors of the estimators taking into account sampling design features.
These standard errors can then be used in hypothesis testing and in the construction of confidence intervals around the point estimate of interest.
Replication weights are constructed by selecting random sub-samples from the original sample based on the same sampling design and using a specific replication methodology (i.e., jackknife, bootstrap, BRR). For each sub-sample, the sampling weights are recalculated as if that were the sample actually interviewed. These weights are then used to replicate the calculation of the estimator of interest on all the sub-samples. An important advantage of this methodology is that it allows researchers to correctly calculate standard errors without disseminating information about the sampled municipalities (thus ensuring the privacy of the households interviewed).
In the SHIW, the replication weights are calculated using the Jackknife method and are available from the 2008 survey. The weights are contained in the annual archives in the PESIJACKxx dataset, where xx indicate the last two digits of the reference year. The generic replication weight is called PWTx where x indicates the progressive number of the replication. For details in the construction of replication weights and for their use in the estimation of sample variance in SHIW, the reader can refer to the documentation Methodological Notes for the Survey on Household Income and Wealth.
For a description of the inferential problems associated with complex sample designs and of the variance estimation, the reader can refer to Faiella I., 'Accounting for sampling design in the SHIW', Bank of Italy Temi di Discussione (Working Papers), 662, 2008.
Why are some variables in the questionnaire or in the historical archive documentation not available?
The Bank of Italy does not release, in any way, the microdata of all the questionnaire variables marked with an asterisk out of respect for the privacy of the households interviewed. These variables (e.g. day and month of birth, municipality and province of residence, the ABI code, etc.) are classified as sensitive information and may enable the households to be identified.