Exploring data files with R

I have written about data types and data structures of R in my previous post Working with data types of R. We shall explore a data set in this post.

mtcars is a dataset exists in R already

> mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

File structure

Very first question would be about the size of the data set.

> dim(mtcars)
[1] 32 11

It contains 32 rows and 11 columns.

Now, how many variables we have in mtcars?

> names(mtcars)
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"

We shall preview the data using head and tail commands.

> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
> tail(mtcars)
                mpg cyl  disp  hp drat    wt qsec vs am gear carb
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

Similar to unix tail, head commands, you would see first and last 6 records in R.

This is the time to know about the structure of the data set.

> str(mtcars)
'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

So, this is a data frame. Each variables are explained above.

How the data is being stored?

> mode(mtcars)
[1] "list"

It is stored as a list.

Let’s take another dataset available with R – airquality.

> head(airquality, n=10)
   Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9
10    NA     194  8.6   69     5  10

You may parameterize head commands as I have shown above.

Let’s omit the records with NA.

> aqNoNA=na.omit(airquality)
> dim(aqNoNA)
[1] 111   6
> dim(airquality)
[1] 153   6

So, out new object aqNoNA contains the records without missing cases, totally 111 rows.

We have is.na command to check if the data is NA.

> is.na(airquality)
       Ozone Solar.R  Wind  Temp Month   Day
  [1,] FALSE   FALSE FALSE FALSE FALSE FALSE
  [2,] FALSE   FALSE FALSE FALSE FALSE FALSE
  [3,] FALSE   FALSE FALSE FALSE FALSE FALSE
  [4,] FALSE   FALSE FALSE FALSE FALSE FALSE
  [5,]  TRUE    TRUE FALSE FALSE FALSE FALSE
  [6,] FALSE    TRUE FALSE FALSE FALSE FALSE
> sum(is.na(airquality))
[1] 44

So totally 44 NAs found.

Summary command gives us minimum, quartile 1, median, mean, 3rd quartile, maximum and number of NAa.

> summary(airquality)
     Ozone           Solar.R           Wind             Temp           Month
 Min.   :  1.00   Min.   :  7.0   Min.   : 1.700   Min.   :56.00   Min.   :5.000
 1st Qu.: 18.00   1st Qu.:115.8   1st Qu.: 7.400   1st Qu.:72.00   1st Qu.:6.000
 Median : 31.50   Median :205.0   Median : 9.700   Median :79.00   Median :7.000
 Mean   : 42.13   Mean   :185.9   Mean   : 9.958   Mean   :77.88   Mean   :6.993
 3rd Qu.: 63.25   3rd Qu.:258.8   3rd Qu.:11.500   3rd Qu.:85.00   3rd Qu.:8.000
 Max.   :168.00   Max.   :334.0   Max.   :20.700   Max.   :97.00   Max.   :9.000
 NA's   :37       NA's   :7
      Day
 Min.   : 1.0
 1st Qu.: 8.0
 Median :16.0
 Mean   :15.8
 3rd Qu.:23.0
 Max.   :31.0

1st quartile meant of 24% data

2nd quartile meant of 50 percentile of data

3rd quartile meant of 75 percentile of data

We shall filter the summary commands as given below.

> summary(airquality$Ozone)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
   1.00   18.00   31.50   42.13   63.25  168.00      37

We have given the summary of only one variable Ozone above.

We shall apply other mathematical functions as given below

> sd(aqNoNA$Ozone)
[1] 33.27597

We have calculated the standard deviation for Ozone above. I have taken the data without NA, as my sd operation will fail if I have NA.

Finding the Outlier

> myNumbers <- rep(c(1,4,10, 100), c(5, 5, 5, 1))
> myNumbers
 [1]   1   1   1   1   1   4   4   4   4   4  10  10  10  10  10 100
> mean(myNumbers)
[1] 10.9375
> sd(myNumbers)
[1] 24.04293
> scale(myNumbers)
             [,1]
 [1,] -0.41332316
 [2,] -0.41332316
 [3,] -0.41332316
 [4,] -0.41332316
 [5,] -0.41332316
 [6,] -0.28854636
 [7,] -0.28854636
 [8,] -0.28854636
 [9,] -0.28854636
[10,] -0.28854636
[11,] -0.03899275
[12,] -0.03899275
[13,] -0.03899275
[14,] -0.03899275
[15,] -0.03899275
[16,]  3.70431136
attr(,"scaled:center")
[1] 10.9375
attr(,"scaled:scale")
[1] 24.04293

scaling of the metrics (z score) is calculated using the formula

each sample data – mean / standard deviation

After scaling, if you find any values is greater than ±2, they are called outliers (odd points located away from the central measures). If SD > mean, it is outlier.

Outlier

Other way to test the normality is Shapiro test

> shapiro.test(myNumbers)

	Shapiro-Wilk normality test

data:  myNumbers
W = 0.40055, p-value = 3.532e-07

If the p-value is greater than 0.05, it is normalized data. Here it is not.

Data Analysis

Here comes another interesting part. How to analyze the data, after you upload your files.

To explain this, I’d prepare sample data set first.

> #Employee ID
> sn&lt;-seq(1, 10, 1)
> sn
 [1]  1  2  3  4  5  6  7  8  9 10
> #Employee gender
> gender&lt;-rep(c("male", "female"), c(6,4))
> gender
 [1] "male"   "male"   "male"   "male"   "male"   "male"   "female" "female" "female"
[10] "female"
> #available departments
> dept&lt;-rep(c("Admin", "HR", "Prod", "Contractor"), c(1, 3, 3, 3))
> dept
 [1] "Admin"      "HR"         "HR"         "HR"         "Prod"       "Prod"
 [7] "Prod"       "Contractor" "Contractor" "Contractor"
> #Employee Salary
> sal <- rnorm(10, 1000, 200);
> sal
 [1]  888.2026  876.6272  919.7453 1005.9058 1084.4704 1337.7696  909.4302  801.1482
 [9] 1025.9457 1182.9774
> #Now our data set
> mydataset &lt;- data.frame(sn, gender, sal, dept)
> mydataset
   sn gender       sal       dept
1   1   male  888.2026      Admin
2   2   male  876.6272         HR
3   3   male  919.7453         HR
4   4   male 1005.9058         HR
5   5   male 1084.4704       Prod
6   6   male 1337.7696       Prod
7   7 female  909.4302       Prod
8   8 female  801.1482 Contractor
9   9 female 1025.9457 Contractor
10 10 female 1182.9774 Contractor

So we have serial number, gender, salary and department.

Which gender is majority in the given data? we shall use table function to arrive at a simple frequency distribution table.

> #which gender is more in each dept
> #this is frequency distribution
> table(mydataset$gender)

female   male
     4      6

So we have 4 females and 6 males. Let’s group the above FD by department now.

> table(mydataset$gender, mydataset$dept)

         Admin Contractor HR Prod
  female     0          3  0    1
  male       1          0  3    2
> #assign the results to an object
> freqDis <- table(mydataset$gender, mydataset$dept);
> #transpose the table
> t(freqDis)

             female male
  Admin           0    1
  Contractor      3    0
  HR              0    3
  Prod            1    2

So all departments except contractors, have male as majority. The function t() stands for transpose.

Let’s do a proportion of the data using prop.table()

> #proportion
> prop.table(freqDis)

         Admin Contractor  HR Prod
  female   0.0        0.3 0.0  0.1
  male     0.1        0.0 0.3  0.2

Rather than proportion, % of males and females would give us a better visibility. hence we multiply proportion by 100.

> #Percentage
> prop.table(freqDis)*100

         Admin Contractor HR Prod
  female     0         30  0   10
  male      10          0 30   20

Column sum and row sums are frequently asked in our day today life.

> #sum
> colSums(freqDis)
     Admin Contractor         HR       Prod
         1          3          3          3

Row sum shall be calculated as below.

> rowSums(freqDis)
female   male
     4      6

Salary is interesting part in our profession. I’d like to see who earns more using aggregate() function? Male or Female?

> #who earns more - male or female?
> aggregate(sal~gender, mean, data = mydataset)
  gender       sal
1 female  979.8754
2   male 1018.7868
> #sal is continuous variable
> #gender is categorical variable
> #mean is the function
> #data is our data source

We used mean salary above. Lets use sum now.

> aggregate(sal~gender, sum, data = mydataset)
  gender      sal
1 female 3919.501
2   male 6112.721

Similarly, we use standard deviation.

> aggregate(sal~gender, sd, data = mydataset)
  gender      sal
1 female 163.5837
2   male 175.1006

So salary package for females looks to be more consistent than that of males.

We calculated all the above functions individually. psych package helps to calculate everything in one command.

> install.packages("psych")
Installing package into ‘D:/gandhari/documents/R/win-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/psych_1.7.5.zip'
Content type 'application/zip' length 3966969 bytes (3.8 MB)
downloaded 3.8 MB

package ‘psych’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\pandian\AppData\Local\Temp\Rtmpy2L0Yq\downloaded_packages
> library(psych)
> describe (mydataset)
        vars  n    mean     sd median trimmed    mad    min     max  range  skew
sn         1 10    5.50   3.03   5.50    5.50   3.71   1.00   10.00   9.00  0.00
gender*    2 10    1.60   0.52   2.00    1.62   0.00   1.00    2.00   1.00 -0.35
sal        3 10 1003.22 162.35 962.83  986.66 119.22 801.15 1337.77 536.62  0.71
dept*      4 10    2.80   1.03   3.00    2.88   1.48   1.00    4.00   3.00 -0.20
        kurtosis    se
sn         -1.56  0.96
gender*    -2.05  0.16
sal        -0.72 51.34
dept*      -1.42  0.33

n is total samples, sd is standard deviation etc. All the aggregated functions are calculated for serial numbmer, gender, salary and department.

In the above data, I get only the salary data, which is in 3rd row.

> describe (mydataset[,3])
   vars  n    mean     sd median trimmed    mad    min     max  range skew kurtosis
X1    1 10 1003.22 162.35 962.83  986.66 119.22 801.15 1337.77 536.62 0.71    -0.72
      se
X1 51.34

 

Lets group the above described data by gender, ie, mean, sd for males and females separately.

> describe.by(mydataset, mydataset$gender)

 Descriptive statistics by group
group: female
        vars n   mean     sd median trimmed    mad    min     max  range skew
sn         1 4   8.50   1.29   8.50    8.50   1.48   7.00   10.00   3.00 0.00
gender*    2 4   1.00   0.00   1.00    1.00   0.00   1.00    1.00   0.00  NaN
sal        3 4 979.88 163.58 967.69  979.88 166.64 801.15 1182.98 381.83 0.14
dept*      4 4   2.50   1.00   2.00    2.50   0.00   2.00    4.00   2.00 0.75
        kurtosis    se
sn         -2.08  0.65
gender*      NaN  0.00
sal        -2.04 81.79
dept*      -1.69  0.50
---------------------------------------------------------------
group: male
        vars n    mean     sd median trimmed    mad    min     max  range  skew
sn         1 6    3.50   1.87   3.50    3.50   2.22   1.00    6.00   5.00  0.00
gender*    2 6    2.00   0.00   2.00    2.00   0.00   2.00    2.00   0.00   NaN
sal        3 6 1018.79 175.10 962.83 1018.79 119.22 876.63 1337.77 461.14  0.83
dept*      4 6    3.00   1.10   3.00    3.00   0.74   1.00    4.00   3.00 -0.76
        kurtosis    se
sn         -1.80  0.76
gender*      NaN  0.00
sal        -1.02 71.48
dept*      -0.92  0.45
Warning message:
describe.by is deprecated.  Please use the describeBy function

 

 

 

Advertisements

Working with data types of R

I have discussed about package management in the previous post. This post concentrates on data type, data structure and data coercion.

Data types & Data Structure

We have different Object types in R as given below.

  • Vectors
  • Lists
  • Matrices
  • Arrays
  • Factors
  • Data Frames

We have the following data types in R.

  • Logical
    myvar <- TRUE
    
  • Numeric
    myvar <- 23.5
    
  • Integer
    myvar <- 2L
    
  • Complex
    myvar <- 2+5i
    
  • Character
    myvar <- "GOOD"
    
  • Raw
    myvar &amp;lt;- charToRaw("GOOD")
    

We could see them with examples below.

I have already written about c command in Basic functions in R language

> id1<-c(1,2,3,4)
> id1+10
[1] 11 12 13 14

rep command helps us to repeat the values.

> gender <- rep(c("male", "female"), c(6,4))
> gender
[1] "male" "male" "male" "male" "male" "male" "female" "female"
[9] "female" "female"

male is repeated 6 times where as female repeated 4 times.

Serial number is given by the command sn.

> sn<-seq(1, 10, 1)
> sn
[1] 1 2 3 4 5 6 7 8 9 10 > sn <-seq(1, 10, 2)
> sn
[1] 1 3 5 7 9

we passed start value, end value and interval.

cbind meant for column bind.

> #vector to matrix - cbind
> slno_gender <- cbind(sn, gender)
> slno_gender
 sn gender 
 [1,] "1" "male" 
 [2,] "2" "male" 
 [3,] "3" "male" 
 [4,] "4" "male" 
 [5,] "5" "male" 
 [6,] "6" "male" 
 [7,] "7" "female"
 [8,] "8" "female"
 [9,] "9" "female"
[10,] "10" "female"

R studio 14 - cbind

This gives column wise data, in matrix. Matrix takes single data type only. Hence serial number is changed to character data type.

Let’s look at similar example using DataFrame. We shall overcome this data type problem.

> #data frame
> df <- data.frame(sn, gender)
> df
   sn gender
1   1   male
2   2   male
3   3   male
4   4   male
5   5   male
6   6   male
7   7 female
8   8 female
9   9 female
10 10 female

Let’s create another object called names

> name <- c("Sophia","Jackson","Isabella","Lucas","Charlotte","Oliver","Amelia","Benjamin","Sarah","Julian")
> name
 [1] "Sophia"    "Jackson"   "Isabella"  "Lucas"     "Charlotte" "Oliver"   
 [7] "Amelia"    "Benjamin"  "Sarah"     "Julian"

Let’s create another object called salary.  when we specify number of samples (10), mean (1000) and standard deviation (20), this command would generate random numbers to satisfy this.

> sal <- rnorm(10, 1000, 200);
> sal
 [1]  898.5731 1105.1638  757.7067  934.9025 1006.0053 1014.4837 1188.6763  611.0265
 [9]  643.5498 1121.6005
> mean(sal)
[1] 928.1688
> sd(sal)
[1] 200.4666

Let’s combine 3 objects together now to form a new object, using data.frame

> mydataset <- data.frame(sn, gender, sal)
> mydataset
   sn gender       sal
1   1   male  898.5731
2   2   male 1105.1638
3   3   male  757.7067
4   4   male  934.9025
5   5   male 1006.0053
6   6   male 1014.4837
7   7 female 1188.6763
8   8 female  611.0265
9   9 female  643.5498
10 10 female 1121.6005

After showing the above example for data frame, lets switch to list.

> mydatalist <- list(v1=id1, v2=sn, v3=slno_gender, v4=mydataset)
> mydatalist
$v1
[1] 1 2 3 4

$v2
 [1]  1  2  3  4  5  6  7  8  9 10

$v3
      sn   gender  
 [1,] "1"  "male"  
 [2,] "2"  "male"  
 [3,] "3"  "male"  
 [4,] "4"  "male"  
 [5,] "5"  "male"  
 [6,] "6"  "male"  
 [7,] "7"  "female"
 [8,] "8"  "female"
 [9,] "9"  "female"
[10,] "10" "female"

$v4
   sn gender       sal
1   1   male  898.5731
2   2   male 1105.1638
3   3   male  757.7067
4   4   male  934.9025
5   5   male 1006.0053
6   6   male 1014.4837
7   7 female 1188.6763
8   8 female  611.0265
9   9 female  643.5498
10 10 female 1121.6005

I put different object types together in a single collection and stored it in one object. We shall refer to individual data as variables as shown below.

R studio 15 - list

> mydatalist$v1
[1] 1 2 3 4

Cool isn’t it!

Data Coercion

Let’s look at data coercion now. mode command helps is to find out the data type of an object

> mode(mydatalist$v1)
[1] "numeric"
> mode(mydatalist$v3)
[1] "character"
> mode(gender)
[1] "character"

Let’s convert this character type to factor type now.

How is this possible to represent the text  as numeric? Let’s look at the class, which shows us the data structure.

> class(gender1)
[1] "factor

Okay, let’s unclass gender1 now to know how is it stored.

> unclass(gender1)
 [1] 2 2 2 2 2 2 1 1 1 1
attr(,"levels")
[1] "female" "male" 

So 1 represents female, 2 represents male.

See you in another interest post.

Package management in R

Hi,

We have seen how to load the data into R language in my previous post Loading Data into R. It is an important part of this blog series. Let’s talk about packages now.

Packages are not new to programmers. Any programming language comes with packages, of course limited set of packages. Additional packages are added a la carte. We shall see same behavior in R as well. The default installation of is a thin solution, which has only basic packages. If needed we need to add additional packages. Lets see how.

Viewing the packages

search() would help us to check the list of loaded packages.

> search()
 [1] ".GlobalEnv"        "package:readr"     "tools:rstudio"     "package:stats"    
 [5] "package:graphics"  "package:grDevices" "package:utils"     "package:datasets" 
 [9] "package:methods"   "Autoloads"         "package:base"

installed.packages() shows us the packages installed but not loaded.

> installed.packages()
             Package        LibPath                                   Version   
BH           "BH"           "D:/gandhari/documents/R/win-library/3.4" "1.65.0-1"
hms          "hms"          "D:/gandhari/documents/R/win-library/3.4" "0.3"     
R6           "R6"           "D:/gandhari/documents/R/win-library/3.4" "2.2.2"   
Rcpp         "Rcpp"         "D:/gandhari/documents/R/win-library/3.4" "0.12.12" 
readr        "readr"        "D:/gandhari/documents/R/win-library/3.4" "1.1.1"   
rlang        "rlang"        "D:/gandhari/documents/R/win-library/3.4" "0.1.2"   
tibble       "tibble"       "D:/gandhari/documents/R/win-library/3.4" "1.3.4"   
base         "base"         "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
boot         "boot"         "C:/Program Files/R/R-3.4.1/library"      "1.3-19"  
class        "class"        "C:/Program Files/R/R-3.4.1/library"      "7.3-14"  
cluster      "cluster"      "C:/Program Files/R/R-3.4.1/library"      "2.0.6"   
codetools    "codetools"    "C:/Program Files/R/R-3.4.1/library"      "0.2-15"  
compiler     "compiler"     "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
datasets     "datasets"     "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
foreign      "foreign"      "C:/Program Files/R/R-3.4.1/library"      "0.8-69"  
graphics     "graphics"     "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
grDevices    "grDevices"    "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
grid         "grid"         "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
KernSmooth   "KernSmooth"   "C:/Program Files/R/R-3.4.1/library"      "2.23-15" 
lattice      "lattice"      "C:/Program Files/R/R-3.4.1/library"      "0.20-35" 
MASS         "MASS"         "C:/Program Files/R/R-3.4.1/library"      "7.3-47"  
Matrix       "Matrix"       "C:/Program Files/R/R-3.4.1/library"      "1.2-10"  
methods      "methods"      "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
mgcv         "mgcv"         "C:/Program Files/R/R-3.4.1/library"      "1.8-17"  
nlme         "nlme"         "C:/Program Files/R/R-3.4.1/library"      "3.1-131" 
nnet         "nnet"         "C:/Program Files/R/R-3.4.1/library"      "7.3-12"  
parallel     "parallel"     "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
rpart        "rpart"        "C:/Program Files/R/R-3.4.1/library"      "4.1-11"  
spatial      "spatial"      "C:/Program Files/R/R-3.4.1/library"      "7.3-11"  
splines      "splines"      "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
stats        "stats"        "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
stats4       "stats4"       "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
survival     "survival"     "C:/Program Files/R/R-3.4.1/library"      "2.41-3"  
tcltk        "tcltk"        "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
tools        "tools"        "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
translations "translations" "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
utils        "utils"        "C:/Program Files/R/R-3.4.1/library"      "3.4.1"   
             Priority      Depends                                          
BH           NA            NA                                               
hms          NA            NA                                               
R6           NA            "R (>= 3.0)"                                     
Rcpp         NA            "R (>= 3.0.0)"                                   
readr        NA            "R (>= 3.0.2)"                                   
rlang        NA            "R (>= 3.1.0)"                                   
tibble       NA            "R (>= 3.1.0)"                                   
base         "base"        NA                                               
boot         "recommended" "R (>= 3.0.0), graphics, stats"                  
class        "recommended" "R (>= 3.0.0), stats, utils"                     
cluster      "recommended" "R (>= 3.0.1)"                                   
codetools    "recommended" "R (>= 2.1)"                                     
compiler     "base"        NA                                               
datasets     "base"        NA                                               
foreign      "recommended" "R (>= 3.0.0)"                                   
graphics     "base"        NA                                               
grDevices    "base"        NA                                               
grid         "base"        NA                                               
KernSmooth   "recommended" "R (>= 2.5.0), stats"                            
lattice      "recommended" "R (>= 3.0.0)"                                   
MASS         "recommended" "R (>= 3.1.0), grDevices, graphics, stats, utils"
Matrix       "recommended" "R (>= 3.0.1)"                                   
methods      "base"        NA                                               
mgcv         "recommended" "R (>= 2.14.0), nlme (>= 3.1-64)"                
nlme         "recommended" "R (>= 3.0.2)"                                   
nnet         "recommended" "R (>= 2.14.0), stats, utils"                    
parallel     "base"        NA                                               
rpart        "recommended" "R (>= 2.15.0), graphics, stats, grDevices"      
spatial      "recommended" "R (>= 3.0.0), graphics, stats, utils"           
splines      "base"        NA                                               
stats        "base"        NA                                               
stats4       "base"        NA                                               
survival     "recommended" "R (>= 2.13.0)"                                  
tcltk        "base"        NA                                               
tools        "base"        NA                                               
translations NA            NA                                               
utils        "base"        NA                                               
             Imports                                            LinkingTo 
BH           NA                                                 NA        
hms          "methods"                                          NA        
R6           NA                                                 NA        
Rcpp         "methods, utils"                                   NA        
readr        "Rcpp (>= 0.12.0.5), tibble, hms, R6"              "Rcpp, BH"
rlang        NA                                                 NA        
tibble       "methods, rlang, Rcpp (>= 0.12.3), utils"          "Rcpp"    
base         NA                                                 NA        
boot         NA                                                 NA        
class        "MASS"                                             NA        
cluster      "graphics, grDevices, stats, utils"                NA        
codetools    NA                                                 NA        
compiler     NA                                                 NA        
datasets     NA                                                 NA        
foreign      "methods, utils, stats"                            NA        
graphics     "grDevices"                                        NA        
grDevices    NA                                                 NA        
grid         "grDevices, utils"                                 NA        
KernSmooth   NA                                                 NA        
lattice      "grid, grDevices, graphics, stats, utils"          NA        
MASS         "methods"                                          NA        
Matrix       "methods, graphics, grid, stats, utils, lattice"   NA        
methods      "utils, stats"                                     NA        
mgcv         "methods, stats, graphics, Matrix"                 NA        
nlme         "graphics, stats, utils, lattice"                  NA        
nnet         NA                                                 NA        
parallel     "tools, compiler"                                  NA        
rpart        NA                                                 NA        
spatial      NA                                                 NA        
splines      "graphics, stats"                                  NA        
stats        "utils, grDevices, graphics"                       NA        
stats4       "graphics, methods, stats"                         NA        
survival     "graphics, Matrix, methods, splines, stats, utils" NA        
tcltk        "utils"                                            NA        
tools        NA                                                 NA        
translations NA                                                 NA        
utils        NA                                                 NA        
             Suggests                                                                                   
BH           NA                                                                                         
hms          "testthat, lubridate"                                                                      
R6           "knitr, microbenchmark, pryr, testthat, ggplot2, scales"                                   
Rcpp         "RUnit, inline, rbenchmark, highlight, pkgKitten (>= 0.1.2)"                               
readr        "curl, testthat, knitr, rmarkdown, stringi, covr"                                          
rlang        "knitr, rmarkdown (>= 0.2.65), testthat, covr"                                             
tibble       "covr, dplyr, knitr (>= 1.5.32), microbenchmark, nycflights13,\ntestthat, rmarkdown, withr"
base         "methods"                                                                                  
boot         "MASS, survival"                                                                           
class        NA                                                                                         
cluster      "MASS"                                                                                     
codetools    NA                                                                                         
compiler     NA                                                                                         
datasets     NA                                                                                         
foreign      NA                                                                                         
graphics     NA                                                                                         
grDevices    "KernSmooth"                                                                               
grid         "lattice"                                                                                  
KernSmooth   "MASS"                                                                                     
lattice      "KernSmooth, MASS, latticeExtra"                                                           
MASS         "lattice, nlme, nnet, survival"                                                            
Matrix       "expm, MASS"                                                                               
methods      "codetools"                                                                                
mgcv         "splines, parallel, survival, MASS"                                                        
nlme         "Hmisc, MASS"                                                                              
nnet         "MASS"                                                                                     
parallel     "methods"                                                                                  
rpart        "survival"                                                                                 
spatial      "MASS"                                                                                     
splines      "Matrix, methods"                                                                          
stats        "MASS, Matrix, SuppDists, methods, stats4"                                                 
stats4       NA                                                                                         
survival     NA                                                                                         
tcltk        NA                                                                                         
tools        "codetools, methods, xml2, curl"                                                           
translations NA                                                                                         
utils        "methods, XML"                                                                             
             Enhances                                License                    
BH           NA                                      "BSL-1.0"                  
hms          NA                                      "GPL-3"                    
R6           NA                                      "MIT + file LICENSE"       
Rcpp         NA                                      "GPL (>= 2)"               
readr        NA                                      "GPL (>= 2) | file LICENSE"
rlang        NA                                      "GPL-3"                    
tibble       NA                                      "MIT + file LICENSE"       
base         NA                                      "Part of R 3.4.1"          
boot         NA                                      "Unlimited"                
class        NA                                      "GPL-2 | GPL-3"            
cluster      NA                                      "GPL (>= 2)"               
codetools    NA                                      "GPL"                      
compiler     NA                                      "Part of R 3.4.1"          
datasets     NA                                      "Part of R 3.4.1"          
foreign      NA                                      "GPL (>= 2)"               
graphics     NA                                      "Part of R 3.4.1"          
grDevices    NA                                      "Part of R 3.4.1"          
grid         NA                                      "Part of R 3.4.1"          
KernSmooth   NA                                      "Unlimited"                
lattice      "chron"                                 "GPL (>= 2)"               
MASS         NA                                      "GPL-2 | GPL-3"            
Matrix       "MatrixModels, graph, SparseM, sfsmisc" "GPL (>= 2) | file LICENCE"
methods      NA                                      "Part of R 3.4.1"          
mgcv         NA                                      "GPL (>= 2)"               
nlme         NA                                      "GPL (>= 2) | file LICENCE"
nnet         NA                                      "GPL-2 | GPL-3"            
parallel     "snow, nws, Rmpi"                       "Part of R 3.4.1"          
rpart        NA                                      "GPL-2 | GPL-3"            
spatial      NA                                      "GPL-2 | GPL-3"            
splines      NA                                      "Part of R 3.4.1"          
stats        NA                                      "Part of R 3.4.1"          
stats4       NA                                      "Part of R 3.4.1"          
survival     NA                                      "LGPL (>= 2)"              
tcltk        NA                                      "Part of R 3.4.1"          
tools        NA                                      "Part of R 3.4.1"          
translations NA                                      "Part of R 3.4.1"          
utils        NA                                      "Part of R 3.4.1"          
             License_is_FOSS License_restricts_use OS_type MD5sum NeedsCompilation
BH           NA              NA                    NA      NA     "no"            
hms          NA              NA                    NA      NA     "no"            
R6           NA              NA                    NA      NA     "no"            
Rcpp         NA              NA                    NA      NA     "yes"           
readr        NA              NA                    NA      NA     "yes"           
rlang        NA              NA                    NA      NA     "yes"           
tibble       NA              NA                    NA      NA     "yes"           
base         NA              NA                    NA      NA     NA              
boot         NA              NA                    NA      NA     "no"            
class        NA              NA                    NA      NA     "yes"           
cluster      NA              NA                    NA      NA     "yes"           
codetools    NA              NA                    NA      NA     "no"            
compiler     NA              NA                    NA      NA     NA              
datasets     NA              NA                    NA      NA     NA              
foreign      NA              NA                    NA      NA     "yes"           
graphics     NA              NA                    NA      NA     "yes"           
grDevices    NA              NA                    NA      NA     "yes"           
grid         NA              NA                    NA      NA     "yes"           
KernSmooth   NA              NA                    NA      NA     "yes"           
lattice      NA              NA                    NA      NA     "yes"           
MASS         NA              NA                    NA      NA     "yes"           
Matrix       NA              NA                    NA      NA     "yes"           
methods      NA              NA                    NA      NA     "yes"           
mgcv         NA              NA                    NA      NA     "yes"           
nlme         NA              NA                    NA      NA     "yes"           
nnet         NA              NA                    NA      NA     "yes"           
parallel     NA              NA                    NA      NA     "yes"           
rpart        NA              NA                    NA      NA     "yes"           
spatial      NA              NA                    NA      NA     "yes"           
splines      NA              NA                    NA      NA     "yes"           
stats        NA              NA                    NA      NA     "yes"           
stats4       NA              NA                    NA      NA     NA              
survival     NA              NA                    NA      NA     "yes"           
tcltk        NA              NA                    NA      NA     "yes"           
tools        NA              NA                    NA      NA     "yes"           
translations NA              NA                    NA      NA     NA              
utils        NA              NA                    NA      NA     "yes"           
             Built  
BH           "3.4.1"
hms          "3.4.1"
R6           "3.4.1"
Rcpp         "3.4.1"
readr        "3.4.1"
rlang        "3.4.1"
tibble       "3.4.1"
base         "3.4.1"
boot         "3.4.1"
class        "3.4.1"
cluster      "3.4.1"
codetools    "3.4.1"
compiler     "3.4.1"
datasets     "3.4.1"
foreign      "3.4.1"
graphics     "3.4.1"
grDevices    "3.4.1"
grid         "3.4.1"
KernSmooth   "3.4.1"
lattice      "3.4.1"
MASS         "3.4.1"
Matrix       "3.4.1"
methods      "3.4.1"
mgcv         "3.4.1"
nlme         "3.4.1"
nnet         "3.4.1"
parallel     "3.4.1"
rpart        "3.4.1"
spatial      "3.4.1"
splines      "3.4.1"
stats        "3.4.1"
stats4       "3.4.1"
survival     "3.4.1"
tcltk        "3.4.1"
tools        "3.4.1"
translations "3.4.1"
utils        "3.4.1"

R Studio IDE has a tab which shows the loaded/not loaded packages.

R studio 10 - packages tab

Installing new packages

To install a new package we shall use Install packages option in R Studio, or install.packages() command.

R studio 11 - installing new packages

> install.packages("regress")
Installing package into ‘D:/gandhari/documents/R/win-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/regress_1.3-15.zip'
Content type 'application/zip' length 32695 bytes (31 KB)
downloaded 31 KB

package ‘regress’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\pandian\AppData\Local\Temp\Rtmpq29bYI\downloaded_packages

Removing packages

Removing a package using R Studio is as easy as clicking the x mark.

R studio 12 - removing packages.PNG

To do it from console, we shall use remove.packages()

> remove.packages("regress")
Removing package from ‘D:/gandhari/documents/R/win-library/3.4’
(as ‘lib’ is unspecified)

Loading the packages

To load a package, we shall just check ✅ the needed package in package tab. The same task shall be performed using library() command in console.

R studio 13 - loading the package

> library("BH", lib.loc="~/R/win-library/3.4")

See in another interesting post. 💖 you all.

 

 

Loading Data into R

I have written about storing and retrieving objects in R language in my previous post. Lets see how to load data in R language here.

c Command

R offers a command called c, which stands for combine. It used to enter numeric, alphanumeric data.

> marks = c (100, 80, 85, 70, 35)

R studio 3

Following commands show how to load the numeric, alphabetic and alphanumeric data. See how R responds when you give alphanumeric data.

> marks = c (100, 80, 85, 70, 35)
> marks
[1] 100  80  85  70  35
> names = c("sun", "moon", "earth")
> names
[1] "sun"   "moon"  "earth"
> alphanu = c("sun", "moon", "earth", 2, 3)
> alphanu
[1] "sun"   "moon"  "earth" "2"     "3"    
> #append data
> marks = c(marks, 10, 20)
> marks
[1] 100  80  85  70  35  10  20
> #combine two objects
> combo = c(names, marks)
> combo
 [1] "sun"   "moon"  "earth" "100"   "80"    "85"    "70"    "35"    "10"    "20"

Scan command

We give the complete data as CSV when we use c command. Scan command helps us to enter the data interactively. Double enter to complete the data loading process.

> #scan numbers
> scan()
1: 10
2: 20
3: 30
4: 
Read 3 items
[1] 10 20 30
> scan(what='character')
1: tamil
2: english
3: maths
4: 
Read 3 items
[1] "tamil"   "english" "maths"

Loading single dimensional data from flat files

Scan shall be used to read your data files. I have a data file in D:/gandhari/videos/Advanced Business Analytics/marks.txt

R studio 4

Here is the way, we shall read the values.

> marks = scan(file = 'D:/gandhari/videos/Advanced Business Analytics/marks.txt')
Read 20 items
> marks;
 [1]  80  90 100 100  90  70  85  67  74  76  50  55  57  62  51  35  30  27  40  39

So scan forms everything as single dimension array.

I have given the complete path of the file in the above example. If you have multiple file in a same folder, it would be easier to change the working directory to ease the loading process. We shall give only the file name instead of the complete path.

> getwd()
[1] "D:/gandhari/documents"
> setwd("D:/gandhari/videos/Advanced Business Analytics/")
> marks = scan(file="marks.txt")

Loading multi-dimensional data from CSV file

How to load multi-dimensional array? Let’s use read.csv command.

This is my input file.

R studio 5

> marks<-read.csv(file = 'marks.csv', header = FALSE, sep = ",")
> marks
  V1 V2  V3  V4 V5
1 80 90 100 100 90
2 70 85  67  74 76
3 50 55  57  62 51
4 35 30  27  40 39

v1, v2, … v5 are variables

1, 2, … 5 are rows

R Studio data import

R Studio has an option to import the CSV files interactively using GUI.

Following is our input data

R studio 5

Follow the steps given below.

R studio 6

R studio 7

R studio 8

R studio 9

R studio 9A

 

 

Basic functions in R language

I have written about R installation in my previous post R language, R studio – Installation. Let’s do something more in this post.

Basic commands

> 1+2
[1] 3
> log(4)
[1] 1.386294
> tan(45)
[1] 1.619775
> atan(5)
[1] 1.373401
> #addition
> 10+15
[1] 25
> #Subtraction
> 450-300
[1] 150
> #Multiplication
> 3 * 4;
[1] 12
> #Division
> 5/2
[1] 2.5
> #Expressions
> 1+(4/2)/3
[1] 1.666667
> #Exponentiation
> 3^2
[1] 9
> #Square root
> sqrt(25)
[1] 5
> #Constants
> pi
[1] 3.141593
> oth<-1:100
> oth
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
 [20]  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38
 [39]  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57
 [58]  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76
 [77]  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95
 [96]  96  97  98  99 100
> letters
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t"
[21] "u" "v" "w" "x" "y" "z"
> letters[1:10]
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
> letters[26:1]
 [1] "z" "y" "x" "w" "v" "u" "t" "s" "r" "q" "p" "o" "n" "m" "l" "k" "j" "i" "h" "g"
[21] "f" "e" "d" "c" "b" "a"
> LETTERS
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T"
[21] "U" "V" "W" "X" "Y" "Z"

() – Function bracket

[] – data set contains row and columns [row, column]

{} – user defined functions

> matrix(1:30)
      [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4
 [5,]    5
 [6,]    6
 [7,]    7
 [8,]    8
 [9,]    9
[10,]   10
[11,]   11
[12,]   12
[13,]   13
[14,]   14
[15,]   15
[16,]   16
[17,]   17
[18,]   18
[19,]   19
[20,]   20
[21,]   21
[22,]   22
[23,]   23
[24,]   24
[25,]   25
[26,]   26
[27,]   27
[28,]   28
[29,]   29
[30,]   30
> matrix(1:30, nrow=3)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    4    7   10   13   16   19   22   25    28
[2,]    2    5    8   11   14   17   20   23   26    29
[3,]    3    6    9   12   15   18   21   24   27    30
> matrix(1:30, ncol=3)
      [,1] [,2] [,3]
 [1,]    1   11   21
 [2,]    2   12   22
 [3,]    3   13   23
 [4,]    4   14   24
 [5,]    5   15   25
 [6,]    6   16   26
 [7,]    7   17   27
 [8,]    8   18   28
 [9,]    9   19   29
[10,]   10   20   30

Store the results to environment

All the results shown above are not stored. So they are cleared from memory. So we’ll store the results to the R environment and retrieve it.

> result = 1+2+3+4+5
> result2 = 6+7+8+9+10
> final = result + result2
> avg = mean(result, result2)

The assignment may also use <- arrow.

> tamil<-80
> tamil
[1] 80

Everything is object in R. tamil is on object, so as avg, final, result, result2.

We shall use : operator to mention a series as given below

> 1:10
 [1]  1  2  3  4  5  6  7  8  9 10
> one_to_ten<-1:10
> one_to_ten
 [1]  1  2  3  4  5  6  7  8  9 10

Got it? If you are using R Studio, you would have seen that, the values are being stored.

R studio 2

We shall print it on the console like the following.

> result
[1] 15
> result2
[1] 40
> final
[1] 55
> avg
[1] 15

Object management

This returns all the user-defined variables in memory

> #view all objects in use
> ls()
[1] "avg"     "final"   "result"  "result2" "tamil"  
> objects()
[1] "avg"     "final"   "result"  "result2" "tamil"

Lets see how to remove those objects from memory.

> #remove one object
> rm(avg)
> remove(tamil)
> ls()
[1] "final"   "result"  "result2"
> #remove multiple objects
> rm(final, result)
> ls()
[1] "result2"
> rm(list=ls())
> ls()
character(0)

I’ll see you in another post with interesting subject.

 

R language, R studio – Installation

After completing the series of statistical post in this blog post series, I’d be writing about language R in this post.

R is a OpenSource statistical computing and graphics. This is a software environment which gives an interactive workspace called console.

Setup

Download the setup from https://cran.r-project.org/
Choose the ‘base’ when you downloadR language setup 01

Download R Studio, which is an IDE to work with R. You may download RStudio Desktop
Open Source License which is free to use.

R language setup 02

Install R-3.4.1-win.exe

R language setup 03

Open R console, by clicking the above icon.

R language setup 04

Get familiarized with R console

Let’s get familiarize ourselves by executing some basic commands.

> 2+3
[1] 5
> 10*12+4
[1] 124
> 10^3
[1] 1000
> sqrt(4)
[1] 2
> pi
[1] 3.141593
> 10 + (2*3)
[1] 16
> pi * 2^2
[1] 12.56637
> 100 + 6/3
[1] 102

R language setup 05

R Studio

Install the R Studio, we downloaded earlier.

R studio.PNG

We can give the same commands to get it working. See you in another detailed post regarding R programming.

why R

 

 

 

 

 

 

Linear Programming – Covering Model using LibreOffice Calc Solver

🕋 Eid Mubarak, Selamat Hari Raya Haji ☪️

I have written about Linear Programming – Allocation model in my previous post Linear Programming and Linear Programming with LibreOffice Calc Solver. This post would talk about Linear Programming – Covering models.

First question would be – what’s the difference between Allocation model and Covering model. There is no difference in the optimization function. The difference exists in the constraints. All our constraints talk about maximum in allocation model. All those constraints had symbol. Covering models talk about minimization, usually cost.

Example

I’ll use the data set given in https://paginas.fe.up.pt/~mac/ensino/docs/OR/otherDocs/PowellAllocationCoveringBlendingConstraints.pdf

Dahlby Outfitters wishes to introduce packaged trail mix as a new product. The ingredients for the trail mix are seeds, raisins, flakes, and two kinds of nuts. Each ingredient contains certain amounts of vitamins, minerals, protein, and calories.

The marketing department has specified that the product be designed so that a certain minimum nutritional profile is met. The decision problem is to determine the optimal product composition—that is, to minimize the product cost by choosing the amount for each of the ingredients in the mix. The data shown below summarize the parameters of the problem:

Component Grams per pound Nutritional Requirement
Seeds Raisins Flakes Pecans Walnuts
Vitamins 10 20 10 30 20 20
Minerals 5 7 4 9 2 10
Protein 1 4 10 2 1 15
Calories 500 450 160 300 500 600
Cost/pound 4 5 3 7 6

Lets  denote the product names as S, R, F, P and W. Our objective function would be like this.

Total Cost = 4S+5R+3F+7P+6W

Rewriting the above statement as –

Zmin = 4S+5R+3F+7P+6W

subject to constraints –

Vitamin content 10S + 20R + 10F + 30P + 20W greater than or eq 20
Mineral content 5S + 7R + 4F + 9P + 2W greater than or eq 10
Protein content 1S + 4R + 10F + 2P + 1W greater than or eq 15
Calorie content 500S + 450R + 160F + 300P + 500W greater than or eq 600

Rewriting the above constraints as linear equations as given below,

10S + 20R + 10F + 30P + 20W ≥ 20
5S + 7R + 4F + 9P + 2W ≥ 10
1S + 4R + 10F + 2P + 1W ≥ 15
500S + 450R + 160F + 300P + 500W ≥ 600

Spreadsheet method (LibreOffice Calc)

Prepare the data set. G9 is highlighted in yellow colour. This would be our minimizing figure.

Linear Programming covering model libreoffice calc solver 01

The data have given the cost of each product already. So, our aim is to find how much amount of each product shall be produced. This would be the decision variable. We need to find out. The cells of the decision variables are also highlighted in yellow colour.

Linear Programming covering model libreoffice calc solver 02

Let’s write the constraints now. Our aim is to find how much vitamin, mineral etc to be added in our product. Those cells are highlighted in yellow colour.

Linear Programming covering model libreoffice calc solver 03

Let’s open the Solver now. Following is my selection.

  1. Target cell is where we find the minimum cost.
  2. As we are talking about minimum, we choose ‘optimize result to’ as ‘Minimum’
  3. By changing cells = Decision variable cells
  4. Limiting Constraints are highlighted with => operator.

Linear Programming covering model libreoffice calc solver 04

Following is the result.

Linear Programming covering model libreoffice calc solver 05

The answer I get in Calc is not equal to what I see in the reference PDF. However, let’s take it as the decision at the moment –

We would take 24.6, 10, 15, 600 for vitamins, minerals, protein and calories.

Linear programming suggests us to avoid pecans and walnuts.

0.5 x seeds, 0.3 x Raisins and 1.3 x Flakes are sufficient.

With this, we would be able to provide 24.6 vitamins, 10 minerals, 15 protein and 600 calories.

With this I’m closing the statistics post. I’d be starting the next part of this series soon, which is R programming.