Search This Blog

Tutorial # 9. Data import

Method 1:  Dataset importing from web link

 

Example:

links = 
"https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# header = False because we do not need the header in machine learning
read.csv(url(links),header = FALSE)
 
# Display the datasets in tab. 
View(iris
 
Output:
  
 Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
-
-
-
149          6.2         3.4          5.4         2.3  virginica
150          5.9         3.0          5.1         1.8  virginica 
 

Method 2:  Dataset importing the from package

Example:

print(iris)
 
Output: 
 
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
-
-
-
149          6.2         3.4          5.4         2.3  virginica
150          5.9         3.0          5.1         1.8  virginica 
 

Method 3: Dataset importing the from a download dataset file
 
 
Example: 
 
File -> Import Dataset -> from text base -> choose file and then import

# Display the dimension of a dataset
print(dim(iris))
 
Output:
  
     V1  V2  V3  V4              V5
1   5.1 3.5 1.4 0.2     Iris-setosa
2   4.9 3.0 1.4 0.2     Iris-setosa
-
-
-
149 6.2 3.4 5.4 2.3  Iris-virginica
150 5.9 3.0 5.1 1.8  Iris-virginica
 
 

=> Add / modify the data frame name

 
Example: 
 
colnames(iris) = c("C-1", "C-2","C-3", "C-4", "Item name")
View(iris)
 

=> Handling Missing value in dataset, and denoted by R is -NA

 
Example: 
 
df = data.frame(c(1,2,3,4) , c(5,6,NA,NA) )
 
Output: 
  
  c.1..2..3..4. c.5..6..NA..NA.
1             1               5
2             2               6
3             3              NA
4             4              NA
 

=>Check the dataset and any missing entry. 

Example:
 
 is.na(df)
 
Output: 
 
     c.1..2..3..4. c.5..6..NA..NA.
[1,]         FALSE           FALSE
[2,]         FALSE           FALSE
[3,]         FALSE            TRUE
[4,]         FALSE            TRUE
 
  

=>checks the entire datset for any of the entry is TRUE

Example
 
 any(is.na(df)) 
 
Output: 
 
[1] TRUE 
 

 => Display the total sum of missing value in a dataset 

 
Example
 
sum(is.na(df))  
View(df)

Output: 

[1] 2


You may also interested in:

An Introduction to language R
Tutorial # 1.Data Type & Variable  Declaration
Tutorial # 2.Operators
Tutorial # 3.Vector  & Element Access
Tutorial # 4.Matrix
Tutorial # 5.Data Frame
Tutorial # 6.Arrays
Tutorial # 7.Lists
Tutorial # 8.Factors
Tutorial # 9.Data import
Tutorial # 10.Machine Learning
Tutorial # 11.Visualization


Post a Comment

0 Comments