Looping with apply commands in R

After a long post Exploring data files with R, this is the time to get into looping. Instead of looping statements like while, for etc, we shall apply command in R.

Let’s take the mtcars data set available in R.

> mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

My aim is to find the mean of the above data. I have already written about summary() in my previous post. It gives the min, max, mean, median for each variables of mtcars.

> summary(mtcars)
      mpg             cyl             disp             hp             drat
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0   Min.   :2.760
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:3.080
 Median :19.20   Median :6.000   Median :196.3   Median :123.0   Median :3.695
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7   Mean   :3.597
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.920
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0   Max.   :4.930
       wt             qsec             vs               am              gear
 Min.   :1.513   Min.   :14.50   Min.   :0.0000   Min.   :0.0000   Min.   :3.000
 1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:3.000
 Median :3.325   Median :17.71   Median :0.0000   Median :0.0000   Median :4.000
 Mean   :3.217   Mean   :17.85   Mean   :0.4375   Mean   :0.4062   Mean   :3.688
 3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:4.000
 Max.   :5.424   Max.   :22.90   Max.   :1.0000   Max.   :1.0000   Max.   :5.000
      carb
 Min.   :1.000
 1st Qu.:2.000
 Median :2.000
 Mean   :2.812
 3rd Qu.:4.000
 Max.   :8.000

Apply

To find the mean of all variables, we need to do a looping across all rows of mtcars, which is performed using apply() command.

> apply(mtcars, 2, mean)
       mpg        cyl       disp         hp       drat         wt       qsec
 20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750
        vs         am       gear       carb
  0.437500   0.406250   3.687500   2.812500

Arguments:

  1. mtcars – dataset
  2. 2 – row or column wise calculation. 1 means row, 2 means column
  3. mean – function

Similar task shall be accomplished using colMeans(), rowMeans().

> rowMeans(mtcars)
          Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive
           29.90727            29.98136            23.59818            38.73955
  Hornet Sportabout             Valiant          Duster 360           Merc 240D
           53.66455            35.04909            59.72000            24.63455
           Merc 230            Merc 280           Merc 280C          Merc 450SE
           27.23364            31.86000            31.78727            46.43091
         Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental
           46.50000            46.35000            66.23273            66.05855
  Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla
           65.97227            19.44091            17.74227            18.81409
      Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28
           24.88864            47.24091            46.00773            58.75273
   Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa
           57.37955            18.92864            24.77909            24.88027
     Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E
           60.97182            34.50818            63.15545            26.26273
> colMeans(mtcars)
       mpg        cyl       disp         hp       drat         wt       qsec
 20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750
        vs         am       gear       carb
  0.437500   0.406250   3.687500   2.812500

But these row or column commands do not have all functions like sd(),scale() etc which is possible with apply command. Lets take a small dataset.

> mtcars5by5 <- mtcars[1:5, 1:5]
> mtcars5by5
                   mpg cyl disp  hp drat
Mazda RX4         21.0   6  160 110 3.90
Mazda RX4 Wag     21.0   6  160 110 3.90
Datsun 710        22.8   4  108  93 3.85
Hornet 4 Drive    21.4   6  258 110 3.08
Hornet Sportabout 18.7   8  360 175 3.15

For the above data set, below given is the row wise and column wise sum.

> mtcars5by5$total <- apply(mtcars5by5, 1, sum)
> mtcars5by5$total
[1] 300.90 300.90 231.65 398.48 564.85
> #total is added a new variable in our data set
> mtcars5by5
                   mpg cyl disp  hp drat  total
Mazda RX4         21.0   6  160 110 3.90 300.90
Mazda RX4 Wag     21.0   6  160 110 3.90 300.90
Datsun 710        22.8   4  108  93 3.85 231.65
Hornet 4 Drive    21.4   6  258 110 3.08 398.48
Hornet Sportabout 18.7   8  360 175 3.15 564.85
> #column sum
> apply(mtcars5by5, 2, sum)
    mpg     cyl    disp      hp    drat   total
 104.90   30.00 1046.00  598.00   17.88 1796.78

Transform

Transform() helps us to prepare data. Using this, we shall create n number of new variables.

> transform(mtcars5by5,tot=sum(mtcars5by5[,1:5]),mtper=mpg/drat,ntprod=mpg/hp)
                   mpg cyl disp  hp drat  total     tot    mtper    ntprod
Mazda RX4         21.0   6  160 110 3.90 300.90 1796.78 5.384615 0.1909091
Mazda RX4 Wag     21.0   6  160 110 3.90 300.90 1796.78 5.384615 0.1909091
Datsun 710        22.8   4  108  93 3.85 231.65 1796.78 5.922078 0.2451613
Hornet 4 Drive    21.4   6  258 110 3.08 398.48 1796.78 6.948052 0.1945455
Hornet Sportabout 18.7   8  360 175 3.15 564.85 1796.78 5.936508 0.1068571

I have added new variables tot, mtper, ntprod above.

lapply

This help us to issue a function over a list. It loops over a list and evaluate a function on each element

> lapply(mtcars5by5, mean)
$mpg
[1] 20.98

$cyl
[1] 6

$disp
[1] 209.2

$hp
[1] 119.6

$drat
[1] 3.576

$total
[1] 359.356

It went through the complete list to provide the mean of each variable.

tapply

tapply() helps us to apply the function in a ragged array or a subset of vector.

> tapply(mtcars5by5$mpg, mtcars5by5$cyl, mean)
       4        6        8
22.80000 21.13333 18.70000

Consider our mtcars data set. I need to find out mean and maximum horse power hp grouped by different gears

> mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
> ctmean
              v        s
automatic 15.05 20.74286
manual    19.75 28.37143
> tapply(mtcars$hp, mtcars$gear, mean)
       3        4        5
176.1333  89.5000 195.6000
> tapply(mtcars$hp, mtcars$gear, max)
  3   4   5
245 123 335

We got mean horse power for 3, 4 and 5 gears.

We may even provide a list to group the mean operation. In the below given example, we shall calculate the mean for different transmission model am (0 – automatic; 1 – manual) and v/s.

> list(mtcars$am,mtcars$vs)
[[1]]
 [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1

[[2]]
 [1] 0 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 1

> ctmean <- tapply(mtcars$hp, list(mtcars$am,mtcars$vs), mean)
> rownames(ctmean) <- c("automatic", "manual")
> colnames(ctmean) <- c("v", "s")
> ctmean
                 v         s
automatic 194.1667 102.14286
manual    180.8333  80.57143

I think I shall stop here. See you in another interesting post.

Have a leisurely weekend.

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s