Outputs from the 2002 NATSISS

There are three broad categories of output from the 2002 NATSISS that individuals are likely to use in their analysis of the data. These are: ABS publications (mainly the initial results publication), customised tables ordered from the ABS, and the CURF. We critically explore the first and the last of these categories as the chapter by Webster, Rogers and Black (in this volume) documents the process of securing customised tables in some detail.

ABS publications using the 2002 NATSISS

In June 2004, the ABS released the initial results from the 2002 NATSISS (ABS 2004c). This document included a 15-page summary of findings as well as 23 tables presenting the data in different ways and making comparisons with other data sources. The summary of findings focuses on selected changes through time (i.e. comparing the 1994 NATSIS and the 2002 NATSISS) as well as making explicit comparisons between the Indigenous population and the non-Indigenous population (as measured by the GSS).

While the depth of data available is impressive, and the provision of standard errors commendable, one criticism of ABS (2004c) could be made. In Table 9 of that publication, the proportion of people with various characteristics is presented, broken down by their equivalised household income. Based on analysis of the non-Indigenous population (ABS 2003c), the ABS outputs data for the second and third decile as a measure of ‘low income’, arguing that the lowest income decile has characteristics closer to those with higher incomes. ABS (2004c) use this ‘low income’ group as an indicator for ‘poverty’ in their discussion of results. This appears to be an implicit endorsement of criticism of income-based measures of poverty, especially the claim that measurement error (i.e. under-reporting) is pronounced for low income earners, especially those who indicate they have an income less than or equal to zero (Tsumori, Saunders & Hughes 2002). One group in which it is particularly difficult to accurately measure income is the self-employed. The recent public debate on the difficulties in identifying a socially accepted minimum standard of living illustrates that this criticism is not totally uncontested (Harding, Lloyd & Greenwell 2001; Saunders 2002; Tsumori, Saunders & Hughes 2002). One reason to be cautious about adopting the ABS definition of ‘low income’ without question is that self-employment is not prominent in the Indigenous population (Hunter 2004a: chapter 5).

Even if one can make a case for using this ‘low income’ indicator for the non-Indigenous population, a cursory examination of Table 4.2 shows that this assumption is suspect for the Indigenous population. Compared to those in the second and third decile, those in the first decile are less likely to be employed and own or purchase a home, and are more likely to have fair or poor health. There is no significant difference between the bottom decile and the ABS ‘low income’ group in terms of qualifications.

Table 4.2. Implications of ABS definition of low incomea

 

1st decile

2nd–3rd decile

4th–10th decile

Not stated

Employed

16.9

30.2

76.0

48.0

Has qualification and/or completed Year 12

17.7

21.7

44.5

27.6

Owns or purchasing home

11.9

15.1

43.8

29.0

Health fair or poor

30.5

26.5

16.6

23.4

a. The difference between the values for the 1st decile and the 2nd-3rd decile is always significant at the 5% level of significance.

Source: Customised cross-tabulations

Other variables not reported in Table 4.2 indicate that the bottom decile respondents are more disadvantaged than the ‘low income group’, as they are more likely to have not completed Year 12, be unemployed or outside the labour force, have been arrested in the last year, and have transport difficulties. The ABS definition of low income therefore tends to understate the incidence of Indigenous disadvantage, and should not be used.

Another criticism of ABS (2004c) is the lack of clarity regarding the difference between the concept of remote areas and the different sampling methodology employed in CAs. One example is that information on cash flow problems were collected in NCAs, but not in CAs (ABS 2004d: 32). However, in Table 4 of ABS (2004c), the footnote says the data was collected in non-remote areas only. However, this begs the question of what information was collected from the 1997 individuals who were collected as part of the NCA enumeration strategy but whose usual residence is in a remote area. The ABS probably has retained this information in the MURF and people may be able to access it when requesting customised tables. If this is the case, it would be helpful if the ABS made this clear. If the data is not available because of reliability concerns, it would minimise misunderstanding by explaining why this is the case.

There also needs to be more discussion about the underlying models and theory that appear to motivate the layout and choice of variables for some of the tables. For example, if you compare the age breakdowns in tables 10 and 11 from ABS (2004c), quite different age groups are used. While a reader could speculate as to why the different age groups are chosen, the publication needs to provide links to the previous research or theoretical models the ABS used to inform their judgments. In addition, the ABS needs to provide justification for why certain variables were deemed to be amenable to age standardisation in the official publication. The age standardisation of health is a reasonably standard technique, but it is unclear why employment status is better suited to age standardisation than, say, education participation (ABS 2004c: Table 5).

The ABS has also issued a number of publications and related tables for each State and Territory, but these follow the same basic conventions and formats used in ABS (2004c). Webster, Rogers & Black (this volume) give more detail about future planned output from the NATSISS.

The 2002 NATSISS CURF

On 7 June 2005 the ABS released the CURF for the 2002 NATSISS. Containing a unique record for all those households and individuals who were part of the survey, the data set enables the researcher to run their own cross-tabulations (subject to constraints) as well as more detailed statistical analysis of the data. Unlike the 1994 NATSIS and the GSS, the 2002 NATSISS is only available via the RADL. That is, individuals who have access to the data submit Statistical Analysis Software (SAS) or Statistical Package for the Social Sciences (SPSS) programming code which is then checked to make sure confidential information is not released. This is then run by the ABS and the output posted on the user’s section of the RADL web site.

There are two separate files available via the RADL. The first has household information and a weight for the 5887 households in the sample. The second has information on each of the 9359 individuals, a household identifier that enables household information to be merged with the data, and a person level weight. These person and household weights can (and should) be used to turn information on the sample into estimates for the population, taking into account a person’s or household’s chance of selection. Furthermore—and this is a substantial improvement over the 1994 NATSIS—each household and person has 250 replicate weights which can be used to generate standard errors for estimates (as shown in ABS 2005c).

Another major improvement is that the 2002 NATSISS CURF has some information that has not been available to researchers on other ABS data sets. For example, previous data sets that included significant Indigenous populations, including the 1994 NATSIS, 2001 NHS and past censuses, have not included continuous income data. The 2002 NATSISS, on the other hand, contains both continuous individual data and household income data, up to a cut-off beyond which income data is censored. [6]This will enable distributional analysis outside the ABS that has not been possible before. Household rent and mortgage payments are also given as continuous values, although they are also right censored. [7]

There are, however, data items which have been made confidential, restricting the type of analysis that is possible. A number of employment variables are outputted in ranges. These include duration of unemployment and CDEP employment, as well as the number of hours usually worked per week. This latter variable places particular restrictions on analysis of hourly—as opposed to weekly—income.

More importantly, apart from in Queensland, one cannot separately identify Aboriginal Australians from Torres Strait Islanders. This is despite the fact that the Torres Strait Islander population was over-sampled in order to produce reliable (separate) estimates of the characteristics of Torres Strait Islanders living in the Strait and the rest of Queensland (ABS 2005c: 53).

The largest number of—and arguably the most confusing—restrictions on CURF data are imposed on the geographic variables. Apart from the variables mentioned in Table 4.1 that have different categories for CA/NCA or remote/non-remote areas, there are also restrictions on the type of geographic breakdowns that are possible using the 2002 NATSISS CURF. In particular, it is possible to undertake analysis by State/Territory or by remoteness, but not both simultaneously. As an example of the implication of such restrictions, it is possible to examine whether income is higher in remote versus non-remote areas throughout Australia, or higher in NSW versus other States. However, it is not possible to examine whether income is higher in non-remote versus non-remote areas within NSW specifically. While this restriction is understandable, given the imperative to protect the confidentiality of respondents, it does restrict the ability to make meaningful interstate comparisons (that is, by constraining the ability to control for the level of remoteness in the respective States).

If users want to obtain some sort of cross-classification of State/Territory and remoteness structure, the ABS insists they use a specific variable (STXREM) when attempting to do this on the RADL. Users are not permitted to submit programs that include both State/Territory and remoteness structure under any circumstances.

An additional restriction on geographic analysis is that Tasmania and the Australian Capital Territory (ACT) are included as one variable. This is despite the statement that ‘the NATSISS was designed to provide reliable estimates at the national level and for each State and Territory’ (ABS 2005c) and that ABS (2004c) outputted data for each State separately. Within these two combined localities, there were 736 individuals from Tasmania and 330 from the ACT, which when weighted represented about 10 900 and 2600 individuals respectively. The State governments in these two States have quite different policies, and the presence of the federal government and two large universities in Canberra (among other things) make these two populations quite different. In addition, the geography of these two regions in terms of remoteness and access to resources are quite dissimilar. Thus the joining of these two localities somewhat constrains policy analysis and restricts a modeler’s ability to control for geography when analysing the relationship between other variables.

At the moment, one of the biggest constraints for analysing the 2002 NATSISS using the RADL is the requirement to undertake the analysis in only two statistical programs: SAS (version 8.2) or SPSS. While SAS performs most of the standard types of analysis, there are a number of things it either cannot do or that are not done as easily as in other statistical packages (such as the calculation of ‘robust’ standard errors and the easy output of marginal effects from estimation on the probit model). Perhaps more importantly, SAS is expensive. This may not be as big an issue in large government departments, or even perhaps large research centres, where the fixed costs can be spread across a number of users, but for smaller organisations or individual researchers, the costs can become prohibitive. [8] SPSS is more affordable, but it does not have the range of applications that other programs have. Users may be able to visit a data laboratory at the ABS site where they have been able to install and run the widely-used statistical program, Stata. However, it is likely that this option will cost the user extra money because the ABS may need to employ a staff member to ensure the user does not compromise confidentiality. The ABS hopes to have a fully operational version of Stata on the RADL in the near future.