Looping with apply commands in R

After a long post Exploring data files with R, this is the time to get into looping. Instead of looping statements like while, for etc, we shall apply command in R.

Let’s take the mtcars data set available in R.

> mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

My aim is to find the mean of the above data. I have already written about summary() in my previous post. It gives the min, max, mean, median for each variables of mtcars.

> summary(mtcars)
      mpg             cyl             disp             hp             drat      
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0   Min.   :2.760  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:3.080  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0   Median :3.695  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7   Mean   :3.597  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.920  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0   Max.   :4.930  
       wt             qsec             vs               am              gear      
 Min.   :1.513   Min.   :14.50   Min.   :0.0000   Min.   :0.0000   Min.   :3.000  
 1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:3.000  
 Median :3.325   Median :17.71   Median :0.0000   Median :0.0000   Median :4.000  
 Mean   :3.217   Mean   :17.85   Mean   :0.4375   Mean   :0.4062   Mean   :3.688  
 3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:4.000  
 Max.   :5.424   Max.   :22.90   Max.   :1.0000   Max.   :1.0000   Max.   :5.000  
      carb      
 Min.   :1.000  
 1st Qu.:2.000  
 Median :2.000  
 Mean   :2.812  
 3rd Qu.:4.000  
 Max.   :8.000

Apply

To find the mean of all variables, we need to do a looping across all rows of mtcars, which is performed using apply() command.

> apply(mtcars, 2, mean)
       mpg        cyl       disp         hp       drat         wt       qsec 
 20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
        vs         am       gear       carb 
  0.437500   0.406250   3.687500   2.812500 

Arguments:

  1. mtcars – dataset
  2. 2 – row or column wise calculation. 1 means row, 2 means column
  3. mean – function

Similar task shall be accomplished using colMeans(), rowMeans().

> rowMeans(mtcars)
          Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
           29.90727            29.98136            23.59818            38.73955 
  Hornet Sportabout             Valiant          Duster 360           Merc 240D 
           53.66455            35.04909            59.72000            24.63455 
           Merc 230            Merc 280           Merc 280C          Merc 450SE 
           27.23364            31.86000            31.78727            46.43091 
         Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
           46.50000            46.35000            66.23273            66.05855 
  Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
           65.97227            19.44091            17.74227            18.81409 
      Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
           24.88864            47.24091            46.00773            58.75273 
   Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
           57.37955            18.92864            24.77909            24.88027 
     Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
           60.97182            34.50818            63.15545            26.26273 
> colMeans(mtcars)
       mpg        cyl       disp         hp       drat         wt       qsec 
 20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
        vs         am       gear       carb 
  0.437500   0.406250   3.687500   2.812500

But these row or column commands do not have all functions like sd(),scale() etc which is possible with apply command. Lets take a small dataset.

> mtcars5by5 <- mtcars[1:5, 1:5]
> mtcars5by5
                   mpg cyl disp  hp drat
Mazda RX4         21.0   6  160 110 3.90
Mazda RX4 Wag     21.0   6  160 110 3.90
Datsun 710        22.8   4  108  93 3.85
Hornet 4 Drive    21.4   6  258 110 3.08
Hornet Sportabout 18.7   8  360 175 3.15

For the above data set, below given is the row wise and column wise sum.

> mtcars5by5$total <- apply(mtcars5by5, 1, sum)
> mtcars5by5$total
[1] 300.90 300.90 231.65 398.48 564.85
> #total is added a new variable in our data set
> mtcars5by5
                   mpg cyl disp  hp drat  total
Mazda RX4         21.0   6  160 110 3.90 300.90
Mazda RX4 Wag     21.0   6  160 110 3.90 300.90
Datsun 710        22.8   4  108  93 3.85 231.65
Hornet 4 Drive    21.4   6  258 110 3.08 398.48
Hornet Sportabout 18.7   8  360 175 3.15 564.85
> #column sum
> apply(mtcars5by5, 2, sum)
    mpg     cyl    disp      hp    drat   total 
 104.90   30.00 1046.00  598.00   17.88 1796.78

Transform

Transform() helps us to prepare data. Using this, we shall create n number of new variables.

> transform(mtcars5by5,tot=sum(mtcars5by5[,1:5]),mtper=mpg/drat,ntprod=mpg/hp)
                   mpg cyl disp  hp drat  total     tot    mtper    ntprod
Mazda RX4         21.0   6  160 110 3.90 300.90 1796.78 5.384615 0.1909091
Mazda RX4 Wag     21.0   6  160 110 3.90 300.90 1796.78 5.384615 0.1909091
Datsun 710        22.8   4  108  93 3.85 231.65 1796.78 5.922078 0.2451613
Hornet 4 Drive    21.4   6  258 110 3.08 398.48 1796.78 6.948052 0.1945455
Hornet Sportabout 18.7   8  360 175 3.15 564.85 1796.78 5.936508 0.1068571

I have added new variables tot, mtper, ntprod above.

lapply

This help us to issue a function over a list. It loops over a list and evaluate a function on each element

> lapply(mtcars5by5, mean)
$mpg
[1] 20.98

$cyl
[1] 6

$disp
[1] 209.2

$hp
[1] 119.6

$drat
[1] 3.576

$total
[1] 359.356

It went through the complete list to provide the mean of each variable.

tapply

tapply() helps us to apply the function in a ragged array or a subset of vector.

> tapply(mtcars5by5$mpg, mtcars5by5$cyl, mean)
       4        6        8 
22.80000 21.13333 18.70000

Consider our mtcars data set. I need to find out mean and maximum horse power hp grouped by different gears

> mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
> ctmean
              v        s
automatic 15.05 20.74286
manual    19.75 28.37143
> tapply(mtcars$hp, mtcars$gear, mean)
       3        4        5 
176.1333  89.5000 195.6000 
> tapply(mtcars$hp, mtcars$gear, max)
  3   4   5 
245 123 335

We got mean horse power for 3, 4 and 5 gears.

We may even provide a list to group the mean operation. In the below given example, we shall calculate the mean for different transmission model am (0 – automatic; 1 – manual) and v/s.

> list(mtcars$am,mtcars$vs)
[[1]]
 [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1

[[2]]
 [1] 0 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 1

> ctmean <- tapply(mtcars$hp, list(mtcars$am,mtcars$vs), mean)
> rownames(ctmean) <- c("automatic", "manual")
> colnames(ctmean) <- c("v", "s")
> ctmean
                 v         s
automatic 194.1667 102.14286
manual    180.8333  80.57143

I think I shall stop here. See you in another interesting post.

Have a leisurely weekend.

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s