standardized mortality ratio using istdize

Rodrigo Badilla

Join Date: Apr 2014
Posts: 340

standardized mortality ratio using istdize

03 Oct 2019, 08:48

Hi all,

I am trying to replicate results from https://data.princeton.edu/eco572/std, to calculate standardized mortality ratio by Indirect Standarization using istdize, lamentably this function give me a error: "using required".

Its possible get the same results with istdize??. Example of my do file is:

HTML Code:

clear all
set more off

infile str10 country str5 ageg pop deaths using https://data.princeton.edu/eco572/da...ston21long.dat

gen rates = 1000 * deaths / pop
bysort age (country): gen swrates = rates[2]

tabstat swrates [fw=pop], by(country)
quietly sum rates [fw=pop] if country == "Kazakhstan"
scalar kzcmr = r(mean)
quietly sum swrates [fw=pop] if country == "Kazakhstan"
di "SMR = ", kzcmr/r(mean)

SMR =  1.7671271

*here I tried to replicate but fail with it.

istdize deaths pop ageg , by( country )
using required
r(100);

I know that istdize require a .dta file but I am confuse what kind of file input whit using whit this web example.

I tried with a direct Standarization but gave differents results:

HTML Code:

dstdize deaths pop ageg , by( country )

Summary of Study Populations:
  country             N      Crude     Adj_Rate       Confidence Interval
 --------------------------------------------------------------------------
Kazakhsta       8698860   0.007423     0.010412    [  0.010333,    0.010491]
   Sweden       4385469   0.010548     0.006328    [  0.006270,    0.006385]

Thanks in advance.
Rodrigo

Last edited by Rodrigo Badilla; 03 Oct 2019, 09:25.

Tags: None

German Rodriguez

Join Date: Feb 2017

Posts: 169
#2

03 Oct 2019, 12:52

Hi Rodrigo Badilla. The page you cite does direct and indirect standardization “by hand”, which has the advantage that you see exactly how everything is done. However, it is possible to replicate the calculations using the Stata commands dstdize and istdize. First we read the data and save it locally

Code:

infile str10 country str5 ageg pop deaths /// using https://data.princeton.edu/eco572/datasets/preston21long.dat, clear save preston211, replace

Indirect Standardization

You are correct that to use istdize you need a Stata data file. In order to use Sweden as a standard, we just save the data for that country. If you then try to run istdize it will complain that the number of deaths is not constant by age within country. I suppose the command assumes you would not have age-specific counts but only the total, so we modify the deaths variable to have just the total and then run the command:

Code:

keep if country == "Sweden" save sweden, replace use preston211 egen temp = sum(deaths), by(country) replace deaths = temp istdize deaths pop age using sweden, by(country) popvars(deaths pop) ... long output suppressed ... Summary of Study Populations (Rates): Cases country Observed Crude Adj_Rate Confidence Interval ------------------------------------------------------------------- Kazakhstan 64,572 0.007423 0.018639 [0.018495, 0.018783] Sweden 46,256 0.010548 0.010548 [0.010452, 0.010644] Summary of Study Populations (SMR): Cases Cases Exact country Observed Expected SMR Confidence Interval -------------------------------------------------------------------- Kazakhstan 64,572 36,540.67 1.767 [1.753523, 1.780810] Sweden 46,256 46,256.00 1.000 [0.990907, 1.009155]

As you can see, the SMR of 1.767 is the same you get. As for the adjusted rate, the website shows that if Kazakhstan had Sweden's rates but its own age composition, the CDR would be 4.20. The ratio of the observed rate of 7.42 to that gives the SMR of 1.767. Multiplying this by Sweden's CDR of 10.55 gives 186.4, in agreement with istdize. This last step is shown at the end of this handout. I should probably add it to the website.

Direct Standardization

The reason you got different results with dstdize is that by default that command combines all populations to get a standard. The website follows Preston et al in using a weighted average age composition as the standard. We can pass this along to dstdize in a separate Stata file,as shown below

Code:

use preston211, clear egen pcpop = pc(pop), by(country) egen avgcomp = mean(pcpop), by(ageg) keep in 1/19 replace country = "Average" replace pop = round(1000000*avgcomp) replace deaths = 0 // not needed save avgcomp, replace use preston211, clear dstdize deaths pop age, by(country) using(avgcomp) ... long output supressed ... Summary of Study Populations: country N Crude Adj_Rate Confidence Interval ---------------------------------------------------------------------------- Kazakhsta 8698860 0.007423 0.011882 [ 0.011790, 0.011974] Sweden 4385469 0.010548 0.007374 [ 0.007308, 0.007440]

As you can see, the results agree exactly with the website. By the way. you can get the same results that dstdize produces by default if you follow the website but use pcpop as the standard.
Comment
Rodrigo Badilla

Join Date: Apr 2014

Posts: 340
#3

03 Oct 2019, 13:14

Thanks German for your complete reply!. Now I am clear, was not so easy like I thought.

Regards
Rodrigo
Comment

Rodrigo Badilla

Join Date: Apr 2014
Posts: 340

03 Oct 2019, 15:45

German Rodriguez, Just one question, please...

I am trying to replicate result to Direct Standarization using epidat and stata (using your last commands).

Lamentably I can not get the same results:

Code:

input str10 CIUDAD str10 EDAD POBLAC CASOS
"Cali" "<15" 217645 0
"Cali" "15-24" 145409 2
"Cali" "25-34" 86644 16
"Cali" "35-44" 63454 34
"Cali" "45-54" 41180 44
"Cali" "55-64" 24551 36
"Cali" "65+" 19042 37
"São_Paulo" "<15" 992534 0
"São_Paulo" "15-24" 746750 14
"São_Paulo" "25-34" 639214 76
"São_Paulo" "35-44" 423847 195
"São_Paulo" "45-54" 328074 266
"São_Paulo" "55-64" 208108 228
"São_Paulo" "65+" 173968 186
end

save cali, replace
egen pcpop = pc(POBLAC), by(CIUDAD)
egen avgcomp = mean(pcpop), by(EDAD)
keep if CIUDAD == "São_Paulo"
replace CIUDAD = "Average"
replace POBLAC = round(1000000*avgcomp)
replace CASOS = 0 // not needed
save avgcomp, replace

use cali,clear

dstdize CASOS POBLAC EDAD, by(CIUDAD) using(avgcomp)

*stata results
Summary of Study Populations:
   CIUDAD             N      Crude     Adj_Rate       Confidence Interval
 --------------------------------------------------------------------------
     Cali        597925   0.000283     0.000333    [  0.000283,    0.000384]
São_Paulo       3512495   0.000275     0.000240    [  0.000225,    0.000255]

Epidat results:

from: Londoño JL. Metodología de la investigación epidemiológica.2000. Ed. Universidad de Antioquia
and Epidat 4.1 Manual:
(Result Rates x100000)

Sao Paulo rate its very similar...

Click image for larger version

Name: Sin título.png
Views: 1
Size: 3.7 KB
ID: 1519121

Also I tried directly, but failed with the replicate results:

Code:

clear all
set more off

input str10 CIUDAD str10 EDAD POBLAC CASOS
"Cali" "<15" 217645 0
"Cali" "15-24" 145409 2
"Cali" "25-34" 86644 16
"Cali" "35-44" 63454 34
"Cali" "45-54" 41180 44
"Cali" "55-64" 24551 36
"Cali" "65+" 19042 37
"São_Paulo" "<15" 992534 0
"São_Paulo" "15-24" 746750 14
"São_Paulo" "25-34" 639214 76
"São_Paulo" "35-44" 423847 195
"São_Paulo" "45-54" 328074 266
"São_Paulo" "55-64" 208108 228
"São_Paulo" "65+" 173968 186
end

dstdize CASOS POBLAC EDAD, by(CIUDAD)


Summary of Study Populations:
   CIUDAD             N      Crude     Adj_Rate       Confidence Interval
 --------------------------------------------------------------------------
     Cali        597925   0.000283     0.000369    [  0.000313,    0.000425]
São_Paulo       3512495   0.000275     0.000265    [  0.000248,    0.000281]

Where could be the error?

Please I would grateful any comment
Regards
Rodrigo

Last edited by Rodrigo Badilla; 03 Oct 2019, 16:27.

Comment

German Rodriguez

Join Date: Feb 2017
Posts: 169

04 Oct 2019, 07:03

There is no error in your calculations. It all depends on what you use as standard. Here are the crude rates using the data you pasted above

Code:

. gen rates = CASOS/POBL

. tabstat rates [fw=POBL], by(CIUDAD)

Summary for variables: rates
     by categories of: CIUDAD 

    CIUDAD |      mean
-----------+----------
      Cali |  .0002826
 São_Paulo |  .0002747
-----------+----------
     Total |  .0002759
----------------------

To reproduce your first analysis we use the average composition

Code:

. egen pcpop = pc(POBL), by(CIUDAD)

. egen avgcomp = mean(pcpop), by(EDAD)

. tabstat rates [aw=avgcomp], by(CIUDAD)

Summary for variables: rates
     by categories of: CIUDAD 

    CIUDAD |      mean
-----------+----------
      Cali |  .0003334
 São_Paulo |  .0002401
-----------+----------
     Total |  .0002867
----------------------

To reproduce the dstdize default we use the total population (not pcpop as I said earlier)

Code:

. egen total = sum(POBL), by(EDAD)

. tabstat rates [aw=total], by(CIUDAD)

Summary for variables: rates
     by categories of: CIUDAD 

    CIUDAD |      mean
-----------+----------
      Cali |  .0003693
 São_Paulo |  .0002647
-----------+----------
     Total |   .000317
----------------------

I am not familiar with epitab. Surely the manual will say what they use as a standard, but clearly it is not one of these two. Actually, I found it online and they use a standard from CELADE. Using that information we can reproduce that result as well.

Code:

. mata std = 77500\ 43291\ 34589\ 24275\ 16355\ 11693\ 11220

. gen celade = .
(14 missing values generated)

. mata st_store(., "celade", std\std)

. tabstat rates [aw=celade], by(CIUDAD)

Summary for variables: rates
     by categories of: CIUDAD 

    CIUDAD |      mean
-----------+----------
      Cali |   .000349
 São_Paulo |  .0002474
-----------+----------
     Total |  .0002982
----------------------

As you can see, computing these rates as weighted means is quite simple. Alternatively, you can save the CELADE standard as a Stata file and use dstdize.

Comment

Rodrigo Badilla

Join Date: Apr 2014

Posts: 340
#6

04 Oct 2019, 07:18

Thanks German Rodriguez again for you answer!

Yes you are right, Epitab manual use a CELADE as standard: 1950-2000. Boletín Demográfico 1991; 48: 31.

I agree with you, your last method its more easy to undestand:

Code:

tabstat rates [aw=celade], by(CIUDAD)

By my side not more question! All its clear.

Regards
Rodrigo

Last edited by Rodrigo Badilla; 04 Oct 2019, 07:26.
Comment

Announcement

standardized mortality ratio using istdize

Comment

Comment

Comment

Comment

Comment