Write R Code(S) to Read the Contents of Data.csv and Store Them in a Data Frame Called Data.

Addressing Data

Overview

Pedagogy: twenty min
Exercises: 0 min

Questions

  • What are the different methods for accessing parts of a data frame?

Objectives

  • Empathize the three different ways R tin can address information inside a data frame.

  • Combine unlike methods for addressing data with the assignment operator to update subsets of data.

R is a powerful language for data manipulation. There are three master means for addressing information inside R objects.

  • By index (subsetting)
  • Past logical vector
  • Past proper noun

Lets commencement by loading some sample data:

                          dat                                          <-                                          read.csv              (              file                                          =                                          'data/sample.csv'              ,                                          header                                          =                                          TRUE              ,                                          stringsAsFactors                                          =                                          False              )                                                  

The outset row of this csv file is a list of cavalcade names. We used the header = TRUE argument to read.csv and so that R can interpret the file correctly. We are using the stringsAsFactors = FALSE argument to override the default behaviour for R. Using factors in R is covered in a dissever lesson.

Lets take a wait at this data.

R has loaded the contents of the .csv file into a variable chosen dat which is a data frame.

We tin compactly display the internal structure of a data frame using the structure role str.

            'data.frame':	100 obs. of  ix variables:  $ ID           : chr  "Sub001" "Sub002" "Sub003" "Sub004" ...  $ Gender       : chr  "m" "m" "thousand" "f" ...  $ Group        : chr  "Command" "Treatment2" "Treatment2" "Treatment1" ...  $ BloodPressure: int  132 139 130 105 125 112 173 108 131 129 ...  $ Historic period          : num  16 17.ii 19.5 15.7 19.ix fourteen.3 17.7 xix.viii 19.4 18.8 ...  $ Aneurisms_q1 : int  114 148 196 199 188 260 135 216 117 188 ...  $ Aneurisms_q2 : int  140 209 251 140 120 266 98 238 215 144 ...  $ Aneurisms_q3 : int  202 248 122 233 222 320 154 279 181 192 ...  $ Aneurisms_q4 : int  237 248 177 220 228 294 245 251 272 185 ...                      

The str part tell us that the data has 100 rows and ix columns. It is as well tell united states of america that the data frame is made upwards of grapheme chr, integer int and numeric vectors.

                          ID Gender      Grouping BloodPressure  Age Aneurisms_q1 Aneurisms_q2 one Sub001      g    Control           132 16.0          114          140 2 Sub002      k Treatment2           139 17.2          148          209 3 Sub003      m Treatment2           130 19.5          196          251 four Sub004      f Treatment1           105 fifteen.seven          199          140 5 Sub005      m Treatment1           125 19.nine          188          120 6 Sub006      M Treatment2           112 14.3          260          266   Aneurisms_q3 Aneurisms_q4 i          202          237 ii          248          248 3          122          177 4          233          220 5          222          228 6          320          294                      

The data is the results of an (not real) experiment, looking at the number of aneurysms that formed in the eyes of patients who undertook iii different treatments.

Addressing by Index

Information can be accessed past index. Nosotros have already seen how square brackets [ tin be used to subset data (sometimes also chosen "slicing"). The generic format is dat[row_numbers,column_numbers].

Selecting Values

What volition be returned by dat[ane, 1]? Think about the number of rows and columns you lot would expect as the result.

Solution

If we leave out a dimension R will interpret this as a asking for all values in that dimension.

Selecting More Values

What will be returned past dat[, 2]?

Solution

                                  [one] "m" "m" "m" "f" "k" "M" "f" "k" "m" "f" "m" "f" "f" "m" "grand" "yard" "f" "m"  [xix] "k" "F" "f" "m" "f" "f" "m" "Chiliad" "Chiliad" "f" "m" "f" "f" "m" "k" "1000" "yard" "f"  [37] "f" "grand" "M" "chiliad" "f" "g" "m" "m" "f" "f" "One thousand" "M" "m" "m" "m" "f" "f" "f"  [55] "m" "f" "m" "1000" "m" "f" "f" "f" "f" "M" "f" "g" "f" "f" "M" "k" "m" "yard"  [73] "F" "m" "thousand" "f" "M" "G" "Yard" "f" "chiliad" "M" "Chiliad" "m" "one thousand" "f" "f" "f" "1000" "grand"  [91] "f" "one thousand" "F" "f" "m" "m" "F" "m" "M" "One thousand"                              

The colon : can be used to create a sequence of integers.

Creates a vector of numbers from 6 to 9.

This can be very useful for addressing data.

Subsetting with Sequences

Utilise the colon operator to alphabetize just the aneurism count data (columns 6 to ix).

Solution

                                  Aneurisms_q1 Aneurisms_q2 Aneurisms_q3 Aneurisms_q4 1            114          140          202          237 two            148          209          248          248 3            196          251          122          177 4            199          140          233          220 five            188          120          222          228 vi            260          266          320          294 7            135           98          154          245 8            216          238          279          251 9            117          215          181          272 10           188          144          192          185 11           134          155          247          223 12           152          177          323          245 13           112          220          225          195 xiv           109          150          177          189 xv           146          140          239          223 sixteen            97          172          203          207 17           165          157          200          193 eighteen           158          265          243          187 19           178          109          206          182 xx           107          188          167          218 21           174          160          203          183 22            97          110          194          133 23           187          239          281          214 24           188          191          256          265 25           114          199          242          195 26           115          160          158          228 27           128          249          294          315 28           112          230          281          126 29           136          109          105          155 30           103          148          219          228 31           132          151          234          162 32           118          154          260          160 33           166          176          253          233 34           152          105          197          299 35           191          148          166          185 36           152          178          158          170 37           161          270          232          284 38           239          184          317          269 39           132          137          193          206 twoscore           168          255          273          274 41           140          184          239          202 42           166           85          179          196 43           141          160          179          239 44           161          168          212          181 45           103          111          254          126 46           231          240          260          310 47           192          141          180          225 48           178          180          169          183 49           167          123          236          224 50           135          150          208          279 51           150          166          153          204 52           192           eighty          138          222 53           153          153          236          216 54           205          264          269          207 55           117          194          216          211 56           199          119          183          251 57           182          129          226          218 58           180          196          250          294 59           111          111          244          201 lx           101           98          178          116 61           166          167          232          241 62           158          171          237          212 63           189          178          177          238 64           189          101          193          172 65           239          189          297          300 66           185          224          151          182 67           224          112          304          288 68           104          139          211          204 69           222          199          280          196 70           107           98          204          138 71           153          255          218          234 72           118          165          220          227 73           102          184          246          222 74           188          125          191          157 75           180          283          204          298 76           178          214          291          240 77           168          184          184          229 78           118          170          249          249 79           169          114          248          233 fourscore           156          138          218          258 81           232          211          219          246 82           188          108          180          136 83           169          168          180          211 84           241          233          292          182 85            65          207          234          235 86           225          185          195          235 87           104          116          173          221 88           179          158          216          244 89           103          140          209          186 xc           112          130          175          191 91           226          170          307          244 92           228          221          316          259 93           209          142          199          184 94           153          104          194          214 95           111          118          173          191 96           148          132          200          194 97           141          196          322          273 98           193          112          123          181 99           130          226          286          281 100          126          157          129          160                              

Finally nosotros can utilize the c() (combine) function to address non-sequential rows and columns.

                          ID Gender      Group BloodPressure  Age i Sub001      m    Control           132 16.0 5 Sub005      1000 Treatment1           125 xix.ix 7 Sub007      f    Control           173 17.vii nine Sub009      m Treatment2           131 19.four                      

Returns the offset 5 columns for patients in rows i,5,seven and ix

Subsetting Non-Sequential Data

Write code to render the age and gender values for the showtime 5 patients.

Solution

                                  Age Gender 1 16.0      m 2 17.2      thou 3 xix.5      m four 15.7      f 5 19.9      m                              

Addressing by Proper name

Columns in an R data frame are named.

            [1] "ID"            "Gender"        "Group"         "BloodPressure" [5] "Age"           "Aneurisms_q1"  "Aneurisms_q2"  "Aneurisms_q3"  [ix] "Aneurisms_q4"                      

Default Names

If column names are not specified e.g. using headers = Fake in a read.csv() office, R assigns default names V1, V2, ..., Vn

We usually employ the $ operator to address a column by name

                          [1] "g" "m" "thousand" "f" "1000" "G" "f" "m" "one thousand" "f" "m" "f" "f" "k" "m" "m" "f" "thousand"  [xix] "yard" "F" "f" "m" "f" "f" "m" "Grand" "M" "f" "m" "f" "f" "g" "thousand" "m" "1000" "f"  [37] "f" "grand" "One thousand" "m" "f" "grand" "one thousand" "m" "f" "f" "1000" "M" "one thousand" "m" "one thousand" "f" "f" "f"  [55] "m" "f" "m" "m" "m" "f" "f" "f" "f" "Yard" "f" "yard" "f" "f" "M" "g" "thousand" "m"  [73] "F" "m" "yard" "f" "M" "M" "G" "f" "m" "M" "M" "m" "chiliad" "f" "f" "f" "thousand" "m"  [91] "f" "m" "F" "f" "g" "m" "F" "m" "M" "M"                      

When we extract a single column from a data frame using the $ operator, R will return a vector of that column class and not a data frame.

Named addressing can likewise be used in foursquare brackets.

                          caput              (              dat              [,                                          c              (              'Age'              ,                                          'Gender'              )])                                                  
                          Age Gender i xvi.0      m 2 17.2      thousand 3 19.five      m four fifteen.7      f 5 19.9      m 6 fourteen.3      K                      

Best Practise

Best practice is to address columns past name. Often, you will create or delete columns and the cavalcade position will modify.

Rows in an R information frame can also be named, and rows can also be addressed by their names.
Past default, row names are indices (i.e. position of each row in the data frame):

                          [i] "1"   "ii"   "three"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11"  "12"   [13] "13"  "14"  "15"  "16"  "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"   [25] "25"  "26"  "27"  "28"  "29"  "30"  "31"  "32"  "33"  "34"  "35"  "36"   [37] "37"  "38"  "39"  "40"  "41"  "42"  "43"  "44"  "45"  "46"  "47"  "48"   [49] "49"  "l"  "51"  "52"  "53"  "54"  "55"  "56"  "57"  "58"  "59"  "60"   [61] "61"  "62"  "63"  "64"  "65"  "66"  "67"  "68"  "69"  "seventy"  "71"  "72"   [73] "73"  "74"  "75"  "76"  "77"  "78"  "79"  "eighty"  "81"  "82"  "83"  "84"   [85] "85"  "86"  "87"  "88"  "89"  "90"  "91"  "92"  "93"  "94"  "95"  "96"   [97] "97"  "98"  "99"  "100"                      

Nosotros can add together row names as we read in the file with the row.names parameter in read.csv.
In the following example, we choose the outset column ID to get the vector of row names of the data frame, with row.names = 1.

                          dat2                                          <-                                          read.csv              (              file                                          =                                          'data/sample.csv'              ,                                          header                                          =                                          True              ,                                          stringsAsFactors                                          =                                          Simulated              ,                                          row.names              =              ane              )                                          rownames              (              dat2              )                                                  
                          [i] "Sub001" "Sub002" "Sub003" "Sub004" "Sub005" "Sub006" "Sub007" "Sub008"   [9] "Sub009" "Sub010" "Sub011" "Sub012" "Sub013" "Sub014" "Sub015" "Sub016"  [17] "Sub017" "Sub018" "Sub019" "Sub020" "Sub021" "Sub022" "Sub023" "Sub024"  [25] "Sub025" "Sub026" "Sub027" "Sub028" "Sub029" "Sub030" "Sub031" "Sub032"  [33] "Sub033" "Sub034" "Sub035" "Sub036" "Sub037" "Sub038" "Sub039" "Sub040"  [41] "Sub041" "Sub042" "Sub043" "Sub044" "Sub045" "Sub046" "Sub047" "Sub048"  [49] "Sub049" "Sub050" "Sub051" "Sub052" "Sub053" "Sub054" "Sub055" "Sub056"  [57] "Sub057" "Sub058" "Sub059" "Sub060" "Sub061" "Sub062" "Sub063" "Sub064"  [65] "Sub065" "Sub066" "Sub067" "Sub068" "Sub069" "Sub070" "Sub071" "Sub072"  [73] "Sub073" "Sub074" "Sub075" "Sub076" "Sub077" "Sub078" "Sub079" "Sub080"  [81] "Sub081" "Sub082" "Sub083" "Sub084" "Sub085" "Sub086" "Sub087" "Sub088"  [89] "Sub089" "Sub090" "Sub091" "Sub092" "Sub093" "Sub094" "Sub095" "Sub096"  [97] "Sub097" "Sub098" "Sub099" "Sub100"                      

We can now extract i or more rows using those row names:

                          Gender   Group BloodPressure  Age Aneurisms_q1 Aneurisms_q2 Aneurisms_q3 Sub072      m Control           116 17.4          118          165          220        Aneurisms_q4 Sub072          227                      
                          dat2              [              c              (              "Sub009"              ,                                          "Sub072"              ),                                          ]                                                  
                          Gender      Grouping BloodPressure  Age Aneurisms_q1 Aneurisms_q2 Sub009      m Treatment2           131 nineteen.4          117          215 Sub072      thousand    Control           116 17.iv          118          165        Aneurisms_q3 Aneurisms_q4 Sub009          181          272 Sub072          220          227                      

Notation that row names must be unique!
For example, if nosotros try and read in the data setting the Group column as row names, R volition throw an error because values in that cavalcade are duplicated:

                          dat2                                          <-                                          read.csv              (              file                                          =                                          'information/sample.csv'              ,                                          header                                          =                                          TRUE              ,                                          stringsAsFactors                                          =                                          Imitation              ,                                          row.names              =              3              )                                                  
            Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are non allowed                      

Addressing past Logical Vector

A logical vector contains only the special values True and Faux.

                          c              (              True              ,                                          True              ,                                          False              ,                                          Simulated              ,                                          TRUE              )                                                  
            [1]  True  True False FALSE  TRUE                      

Truth and Its Opposite

Note the values TRUE and Imitation are all capital letters and are not quoted.

Logical vectors tin can be created using relational operators east.1000. <, >, ==, !=, %in%.

                          x                                          <-                                          c              (              i              ,                                          2              ,                                          three              ,                                          eleven              ,                                          12              ,                                          13              )                                          x                                          <                                          ten                                                  
            [1]  TRUE  Truthful  TRUE FALSE FALSE FALSE                      
            [i]  True  TRUE  TRUE Faux Simulated Faux                      

We tin can use logical vectors to select data from a data frame. This is often referred to as logical indexing.

                          index                                          <-                                          dat              $              Grouping                                          ==                                          'Control'                                          dat              [              index              ,]              $              BloodPressure                                                  
                          [1] 132 173 129  77 158  81 137 111 135 108 133 139 126 125  99 122 155 133  94 [twenty]  98  74 116  97 104 117  90 150 116 108 102                      

Oftentimes this operation is written as one line of code:

                          plot              (              dat              [              dat              $              Group                                          ==                                          'Command'              ,                                          ]              $              BloodPressure              )                                                  

plot of chunk logical_vectors_indexing2

Using Logical Indexes

  1. Create a scatterplot showing BloodPressure for subjects not in the command group.
  2. How many ways are at that place to index this set of subjects?

Solution

  1. The lawmaking for such a plot:

                                                                                          plot                      (                      dat                      [                      dat                      $                      Group                                                                  !=                                                                  'Command'                      ,                                                                  ]                      $                      BloodPressure                      )                                                                                  

    plot of chunk plot-logical

  2. In addition to dat$Group != 'Control', 1 could employ dat$Group %in% c("Treatment1", "Treatment2").

Combining Addressing and Consignment

The consignment operator <- can be combined with addressing.

                          x                                          <-                                          c              (              1              ,                                          2              ,                                          3              ,                                          xi              ,                                          12              ,                                          thirteen              )                                          x              [              10                                          <                                          10              ]                                          <-                                          0                                          x                                                  

Updating a Subset of Values

In this dataset, values for Gender have been recorded every bit both capital M, F and lowercase yard, f. Combine the addressing and assignment operations to convert all values to lowercase.

Solution

                                  dat                  [                  dat                  $                  Gender                                                      ==                                                      'Grand'                  ,                                                      ]                  $                  Gender                                                      <-                                                      '1000'                                                      dat                  [                  dat                  $                  Gender                                                      ==                                                      'F'                  ,                                                      ]                  $                  Gender                                                      <-                                                      'f'                                                                  

Key Points

  • Data in data frames can be addressed by alphabetize (subsetting), by logical vector, or by name (columns only).

  • Employ the $ operator to address a column by name.

chittyyeagaing.blogspot.com

Source: https://swcarpentry.github.io/r-novice-inflammation/10-supp-addressing-data/

0 Response to "Write R Code(S) to Read the Contents of Data.csv and Store Them in a Data Frame Called Data."

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel