Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • standardized mortality ratio using istdize

    Hi all,

    I am trying to replicate results from https://data.princeton.edu/eco572/std, to calculate standardized mortality ratio by Indirect Standarization using istdize, lamentably this function give me a error: "using required".

    Its possible get the same results with istdize??. Example of my do file is:

    HTML Code:
    clear all
    set more off
    
    infile str10 country str5 ageg pop deaths using https://data.princeton.edu/eco572/da...ston21long.dat
    
    gen rates = 1000 * deaths / pop
    bysort age (country): gen swrates = rates[2]
    
    tabstat swrates [fw=pop], by(country)
    quietly sum rates [fw=pop] if country == "Kazakhstan"
    scalar kzcmr = r(mean)
    quietly sum swrates [fw=pop] if country == "Kazakhstan"
    di "SMR = ", kzcmr/r(mean)
    
    SMR =  1.7671271
    
    *here I tried to replicate but fail with it.
    
    istdize deaths pop ageg , by( country )
    using required
    r(100);
    I know that istdize require a .dta file but I am confuse what kind of file input whit using whit this web example.

    I tried with a direct Standarization but gave differents results:

    HTML Code:
    dstdize deaths pop ageg , by( country )
    
    Summary of Study Populations:
      country             N      Crude     Adj_Rate       Confidence Interval
     --------------------------------------------------------------------------
    Kazakhsta       8698860   0.007423     0.010412    [  0.010333,    0.010491]
       Sweden       4385469   0.010548     0.006328    [  0.006270,    0.006385]
    


    Thanks in advance.
    Rodrigo
    Last edited by Rodrigo Badilla; 03 Oct 2019, 09:25.

  • #2
    Hi Rodrigo Badilla. The page you cite does direct and indirect standardization “by hand”, which has the advantage that you see exactly how everything is done. However, it is possible to replicate the calculations using the Stata commands dstdize and istdize. First we read the data and save it locally

    Code:
    infile str10 country str5 ageg pop deaths ///
        using https://data.princeton.edu/eco572/datasets/preston21long.dat, clear
    save preston211, replace
    Indirect Standardization

    You are correct that to use istdize you need a Stata data file. In order to use Sweden as a standard, we just save the data for that country. If you then try to run istdize it will complain that the number of deaths is not constant by age within country. I suppose the command assumes you would not have age-specific counts but only the total, so we modify the deaths variable to have just the total and then run the command:

    Code:
    keep if country == "Sweden"
    save sweden, replace
    use preston211
    egen temp = sum(deaths), by(country)
    replace deaths = temp
    istdize deaths pop age using sweden, by(country) popvars(deaths pop)
    
    ... long output suppressed ...
    
    Summary of Study Populations (Rates):
    
                    Cases
      country    Observed     Crude   Adj_Rate  Confidence Interval
    -------------------------------------------------------------------
    Kazakhstan     64,572  0.007423   0.018639  [0.018495, 0.018783]
        Sweden     46,256  0.010548   0.010548  [0.010452, 0.010644]
     
     
    Summary of Study Populations (SMR):
    
                    Cases        Cases                   Exact
      country    Observed     Expected    SMR     Confidence Interval
    --------------------------------------------------------------------
    Kazakhstan     64,572    36,540.67   1.767    [1.753523, 1.780810]
        Sweden     46,256    46,256.00   1.000    [0.990907, 1.009155]
    As you can see, the SMR of 1.767 is the same you get. As for the adjusted rate, the website shows that if Kazakhstan had Sweden's rates but its own age composition, the CDR would be 4.20. The ratio of the observed rate of 7.42 to that gives the SMR of 1.767. Multiplying this by Sweden's CDR of 10.55 gives 186.4, in agreement with istdize. This last step is shown at the end of this handout. I should probably add it to the website.

    Direct Standardization

    The reason you got different results with dstdize is that by default that command combines all populations to get a standard. The website follows Preston et al in using a weighted average age composition as the standard. We can pass this along to dstdize in a separate Stata file,as shown below

    Code:
    use preston211, clear
    egen pcpop = pc(pop), by(country)
    egen avgcomp = mean(pcpop), by(ageg)
    keep in 1/19
    replace country = "Average"
    replace pop = round(1000000*avgcomp)
    replace deaths = 0 // not needed
    save avgcomp, replace
    use preston211, clear
    dstdize deaths pop age, by(country) using(avgcomp)
    
    ... long output supressed ...
    
    Summary of Study Populations:
    
    country N Crude Adj_Rate Confidence Interval
    ----------------------------------------------------------------------------
    Kazakhsta       8698860   0.007423     0.011882    [  0.011790,    0.011974]
       Sweden       4385469   0.010548     0.007374    [  0.007308,    0.007440]
    As you can see, the results agree exactly with the website. By the way. you can get the same results that dstdize produces by default if you follow the website but use pcpop as the standard.

    Comment


    • #3
      Thanks German for your complete reply!. Now I am clear, was not so easy like I thought.

      Regards
      Rodrigo

      Comment


      • #4
        German Rodriguez, Just one question, please...

        I am trying to replicate result to Direct Standarization using epidat and stata (using your last commands).

        Lamentably I can not get the same results:

        Code:
        input str10 CIUDAD str10 EDAD POBLAC CASOS
        "Cali" "<15" 217645 0
        "Cali" "15-24" 145409 2
        "Cali" "25-34" 86644 16
        "Cali" "35-44" 63454 34
        "Cali" "45-54" 41180 44
        "Cali" "55-64" 24551 36
        "Cali" "65+" 19042 37
        "São_Paulo" "<15" 992534 0
        "São_Paulo" "15-24" 746750 14
        "São_Paulo" "25-34" 639214 76
        "São_Paulo" "35-44" 423847 195
        "São_Paulo" "45-54" 328074 266
        "São_Paulo" "55-64" 208108 228
        "São_Paulo" "65+" 173968 186
        end
        
        save cali, replace
        egen pcpop = pc(POBLAC), by(CIUDAD)
        egen avgcomp = mean(pcpop), by(EDAD)
        keep if CIUDAD == "São_Paulo"
        replace CIUDAD = "Average"
        replace POBLAC = round(1000000*avgcomp)
        replace CASOS = 0 // not needed
        save avgcomp, replace
        
        use cali,clear
        
        dstdize CASOS POBLAC EDAD, by(CIUDAD) using(avgcomp)
        
        *stata results
        Summary of Study Populations:
           CIUDAD             N      Crude     Adj_Rate       Confidence Interval
         --------------------------------------------------------------------------
             Cali        597925   0.000283     0.000333    [  0.000283,    0.000384]
        São_Paulo       3512495   0.000275     0.000240    [  0.000225,    0.000255]
        Epidat results:

        from: Londoño JL. Metodología de la investigación epidemiológica.2000. Ed. Universidad de Antioquia
        and Epidat 4.1 Manual:
        (Result Rates x100000)

        Sao Paulo rate its very similar...
        Click image for larger version

Name:	Sin título.png
Views:	1
Size:	3.7 KB
ID:	1519121







        Also I tried directly, but failed with the replicate results:

        Code:
        clear all
        set more off
        
        input str10 CIUDAD str10 EDAD POBLAC CASOS
        "Cali" "<15" 217645 0
        "Cali" "15-24" 145409 2
        "Cali" "25-34" 86644 16
        "Cali" "35-44" 63454 34
        "Cali" "45-54" 41180 44
        "Cali" "55-64" 24551 36
        "Cali" "65+" 19042 37
        "São_Paulo" "<15" 992534 0
        "São_Paulo" "15-24" 746750 14
        "São_Paulo" "25-34" 639214 76
        "São_Paulo" "35-44" 423847 195
        "São_Paulo" "45-54" 328074 266
        "São_Paulo" "55-64" 208108 228
        "São_Paulo" "65+" 173968 186
        end
        
        dstdize CASOS POBLAC EDAD, by(CIUDAD)
        
        
        Summary of Study Populations:
           CIUDAD             N      Crude     Adj_Rate       Confidence Interval
         --------------------------------------------------------------------------
             Cali        597925   0.000283     0.000369    [  0.000313,    0.000425]
        São_Paulo       3512495   0.000275     0.000265    [  0.000248,    0.000281]
        Where could be the error?

        Please I would grateful any comment
        Regards
        Rodrigo
        Last edited by Rodrigo Badilla; 03 Oct 2019, 16:27.

        Comment


        • #5
          There is no error in your calculations. It all depends on what you use as standard. Here are the crude rates using the data you pasted above


          Code:
          . gen rates = CASOS/POBL
          
          . tabstat rates [fw=POBL], by(CIUDAD)
          
          Summary for variables: rates
               by categories of: CIUDAD 
          
              CIUDAD |      mean
          -----------+----------
                Cali |  .0002826
           São_Paulo |  .0002747
          -----------+----------
               Total |  .0002759
          ----------------------

          To reproduce your first analysis we use the average composition

          Code:
          . egen pcpop = pc(POBL), by(CIUDAD)
          
          . egen avgcomp = mean(pcpop), by(EDAD)
          
          . tabstat rates [aw=avgcomp], by(CIUDAD)
          
          Summary for variables: rates
               by categories of: CIUDAD 
          
              CIUDAD |      mean
          -----------+----------
                Cali |  .0003334
           São_Paulo |  .0002401
          -----------+----------
               Total |  .0002867
          ----------------------
          To reproduce the dstdize default we use the total population (not pcpop as I said earlier)

          Code:
          . egen total = sum(POBL), by(EDAD)
          
          . tabstat rates [aw=total], by(CIUDAD)
          
          Summary for variables: rates
               by categories of: CIUDAD 
          
              CIUDAD |      mean
          -----------+----------
                Cali |  .0003693
           São_Paulo |  .0002647
          -----------+----------
               Total |   .000317
          ----------------------

          I am not familiar with epitab. Surely the manual will say what they use as a standard, but clearly it is not one of these two. Actually, I found it online and they use a standard from CELADE. Using that information we can reproduce that result as well.

          Code:
          . mata std = 77500\ 43291\ 34589\ 24275\ 16355\ 11693\ 11220
          
          . gen celade = .
          (14 missing values generated)
          
          . mata st_store(., "celade", std\std)
          
          . tabstat rates [aw=celade], by(CIUDAD)
          
          Summary for variables: rates
               by categories of: CIUDAD 
          
              CIUDAD |      mean
          -----------+----------
                Cali |   .000349
           São_Paulo |  .0002474
          -----------+----------
               Total |  .0002982
          ----------------------
          As you can see, computing these rates as weighted means is quite simple. Alternatively, you can save the CELADE standard as a Stata file and use dstdize.

          Comment


          • #6
            Thanks German Rodriguez again for you answer!

            Yes you are right, Epitab manual use a CELADE as standard: 1950-2000. Boletín Demográfico 1991; 48: 31.

            I agree with you, your last method its more easy to undestand:
            Code:
            tabstat rates [aw=celade], by(CIUDAD)
            By my side not more question! All its clear.

            Regards
            Rodrigo
            Last edited by Rodrigo Badilla; 04 Oct 2019, 07:26.

            Comment

            Working...
            X