Hands on programming with r


SUBMITTED BY: Guest

DATE: Jan. 27, 2019, 6:24 p.m.

FORMAT: Text only

SIZE: 19.2 kB

HITS: 239

  1. Hands on programming with r
  2. => http://stevenarem.nnmcloud.ru/d?s=YToyOntzOjc6InJlZmVyZXIiO3M6MjE6Imh0dHA6Ly9iaXRiaW4uaXQyX2RsLyI7czozOiJrZXkiO3M6Mjc6IkhhbmRzIG9uIHByb2dyYW1taW5nIHdpdGggciI7fQ==
  3. This makes plain-text files a sort of lingua franca for data science. Do you need to build a data set from scratch to use it in R?
  4. For example, you could use array to sort values into a cube of three dimensions or a hypercube in 4, 5, or n dimensions. For example, have you ever wondered what day it was a million seconds after 12:00 a.
  5. It teaches you all the knowledge to master the R programming language by way of three practical challenges: Simulating weighted dice, playing cards and a slot machine. Learn how to program by diving into the R language, and then use your newfound skills to solve practical data science problems. You can create your card like this. In general, you will have a smoother experience if you do not let R make factors until you ask for them. Which course are you likely to take? Simply explained, a data scientist is a statistician with an extra asset: computer programming skills. Orbitz: Statistical analysis to suggest best hotels to promote to its users. So why bother with them? You will be introduced to the mind-set and thought-process of working on Data Science Projects and Application development. Here are a few ways industry stalwarts are using R and contributing to the R ecosystem. If you want to turn your R program into an executable, you can specify that you want the file to run using Rscript by adding the following line at the beginning of your R script.
  6. R tutorial: A beginner's guide to R programming - For example, the die matrix is a special case of a double vector. R added a class attribute to die when you changed its dimensions.
  7. When you are finished, your deck of cards will look something like this: face suit value king spades 13 queen spades 12 jack spades 11 ten spades 10 nine spades 9 eight spades 8. Do you need to build a data set from scratch to use it in R. You can load most data sets into R with one simple step, see. But this exercise will teach you how R stores data, and how you can assemble—or disassemble—your own data sets. You will also learn about the various types of objects available for you to use in R not all R objects are the same. Consider this exercise a rite of passage; by doing it, you will become an expert on storing data in R. The most simple type of object in R is an atomic vector. Atomic vectors are not nuclear powered, but they are very simple and they do show up everywhere. You can also make an atomic vector with just one value. Each atomic vector stores its values as a one-dimensional vector, and each atomic vector can only store one type of data. You can save different types of data in R by using different types of atomic vectors. Altogether, R recognizes six basic types of atomic vectors: doubles, integers, characters, logicals, complex, and raw. To create your card deck, you will need to use different types of atomic vectors to save different types of information text and numbers. You can do this by using some simple conventions when you enter your data. For example, you can create an integer vector by including a capital L with your input. R will recognize the convention and use it to create an atomic vector of the appropriate type. Vector types help R behave as you would expect. Get ready to say hello to the six types of atomic vectors in R. The numbers can be positive or negative, large or small, and have digits to the right of the decimal place or not. In general, R will save any number that you type in R as a double. Double is a computer science term. You can specifically create an integer in R by typing a number followed by an uppercase L. Integer numbers without the L will be saved as doubles. Why would you save your data as an integer instead of a double. Sometimes a difference in precision can have surprising effects. Your computer allocates 64 bits of memory to store each double in an R program. This allows a lot of precision, but some numbers cannot be expressed exactly in 64 bits, the equivalent of a sequence of 64 ones and zeroes. Many decimal numbers share a similar fate. As a result, each double is accurate to about 16 significant digits. This introduces a little bit of error. In most cases, this rounding error will go unnoticed. However, in some situations, the rounding error can cause surprising results. As a result, R has to round the quantity, and the expression resolves to something very close to—but not quite—zero. These errors are known as floating-point errors, and doing arithmetic in these conditions is known as floating-point arithmetic. Floating-point arithmetic is not a feature of R; it is a feature of computer programming. Just keep in mind that they may be the cause of surprising results. You can avoid floating-point errors by avoiding decimals and only using integers. However, this is not an option in most data-science situations. You cannot do much math with integers before you need a noninteger to express the result. Luckily, the errors caused by floating-point arithmetic are usually insignificant and when they are not, they are easy to spot. Note that a string can contain more than just letters. You can assemble a character string from numbers or symbols as well. You can tell strings from real numbers because strings come surrounded by quotes. In fact, anything surrounded by quotes in R will be treated as a character string—no matter what appears between the quotes. It is easy to confuse R objects with character strings. Because both appear as pieces of text in R code. Expect an error whenever you forget your quotation marks; R will start looking for an object that probably does not exist. It is doubtful that you will ever use these to analyze data, but here they are for the sake of thoroughness. Complex vectors store complex numbers. Making raw vectors gets complicated, but you can make an empty raw vector of length n with raw n. Which type of vector will you use to save the names. A character vector is the most appropriate type of atomic vector in which to save card names. You can build a more sophisticated object from an atomic vector by giving it some attributes and assigning it a class. R will normally ignore this metadata, but some R functions will check for specific attributes. These functions may use the attributes to do special things with the data. You can see which attributes an object has with attributes. Each of these attributes has its own helper function that you can use to give attributes to an object. You can also use the helper functions to look up the value of these attributes for objects that already have them. You can give one to die by assigning a character vector to the output of names. To do this, set the dim attribute to a numeric vector of length n. R will reorganize the elements of the vector into n dimensions. Each dimension will have as many rows or columns, etc. In general, rows always come first in R operations that deal with both rows and columns. For example, R always fills up each matrix by columns, instead of by rows. They do the same thing as changing the dim attribute, but they provide extra arguments to customize the process. To create one, first give matrix an atomic vector to reorganize into a matrix. Then, define how many rows should be in the matrix by setting the nrow argument to a number. For example, you could use array to sort values into a cube of three dimensions or a hypercube in 4, 5, or n dimensions. There is more than one way to build this matrix, but in every case, you will need to start by making a character vector with 10 values. For example, the hands on programming with r matrix is a special case of a double vector. Every element in the matrix is still a double, but the elements have been arranged into a new structure. R added a class attribute to die when you changed its dimensions. R will expect objects of a class to share certain traits, such as attributes, that your object may not possess. For example, the time above occurs 1,395,057,600 seconds after then. R creates the time object by building a double vector with one element, 1395057600. For example, have you ever wondered what day it was a million seconds after 12:00 a. A million seconds goes by faster than you would think. There are many different classes of data in R and its packages, and new classes are invented every day. It would be difficult to learn about every class, but you do not have to. Most classes are only useful in specific situations. Since each class comes with its own help page, you can wait to learn about a class until you encounter it. However, there is one class of data that is so ubiquitous in R that you should learn about it alongside the atomic data types. Think of a factor as something like a gender; it can only have certain values male or femaleand these values may have their own idiosyncratic order ladies first. This arrangement makes factors very useful for recording the treatment levels of a study and other categorical variables. To make a factor, pass an atomic vector into the factor function. R will recode the data in the vector as integers and store the results in an integer vector. R will display each 1 as female, the first label in the levels vector, and each 2 as male, the second label. If the factor included 3s, they would be displayed as the third label, and so on: gender male female female male Levels: female male Factors make it easy to put categorical variables into a statistical model because the variables are already coded as numbers. However, factors can be confusing since they look like character strings but behave like integers. R will often try to convert character strings to factors when you load and create data. In general, you will have a smoother experience if you do not let R make factors until you ask for them. You can convert a factor to a character string with the as. R will retain the display version of the factor, not the integers stored in memory: as. For example, in blackjack, each face card is worth 10 points, each number card is worth between 2 and 10 points, and each ace is worth 1 or 11 points, depending on the final score. What type of atomic vector will result. Check if you are right. You may have guessed that this hands on programming with r would not go well. Each atomic vector can only store one type of data. Data types in vectors If you try to put multiple types of data into a vector, R will convert the elements to a single type of data. Since matrices and arrays are special cases of atomic vectors, hands on programming with r suffer from the same behavior. Each can only store one type of data. This creates a couple of problems. First, many data sets contain multiple types of data. Simple programs like Excel and Numbers can save multiple types of data in the same data set, and you should hope that R can too. R always follows the same rules when it coerces data types. So how does R coerce data types. If a character string is present in an atomic vector, R will convert everything else in the vector to character strings. If character strings are present, everything will be coerced to a character string. Otherwise, logicals are coerced to numerics. It is easy to look at a character string and tell what information it used to contain. R uses the same coercion rules when you try to do math with logical values. You can explicitly ask Hands on programming with r to convert data from one type to another with the as functions. R will convert the data whenever there is a sensible way to do so: as. To do that, you will need to avoid coercion altogether. You can do this by using a new type of object, a list. Many data sets contain multiple types of information. The inability of vectors, matrices, and arrays to store multiple data types seems like a major limitation. So why bother with them. In some cases, using only a single type of data is a huge advantage. Vectors, matrices, and arrays make it very easy to do math on large sets of numbers because R knows that it can manipulate each value the same way. Operations with vectors, matrices, and arrays also tend to be fast because the objects are so simple to store in memory. In other cases, allowing only a single type of data is not a disadvantage. Vectors are the most common data structure in R because they store variables very well. However, lists do not group together individual values; lists group together R objects, such as atomic vectors and other lists. For example, you can make a list that contains a numeric vector of length 31 in its first element, a character vector of length 1 in its second element, and a new list of length 2 in its third element. To do this, use the list function. The double-bracketed indexes tell you which element of the list is being displayed. The hands on programming with r indexes tell you which subelement of an element is being displayed. For example, 100 is the first subelement of the first element in the list. This two-system notation arises because each element of a list can be any R object, including a new vector hands on programming with r list with its own indexes. Lists are a basic type of object in R, on par with atomic vectors. Like atomic vectors, they are used as building blocks to create many more spohisticated types of R objects. As you can imagine, the structure of lists can become quite complicated, but this flexibility makes lists a useful all-purpose storage tool in R: you can group together anything with a list. However, not every list needs to be complicated. You can store hands on programming with r playing card in a very simple list. You can create your card like this. In the following example, the first element of the list is a character vector of length 1. Since you can save a single playing card as a list, you can save a deck of playing cards as a list of 52 sublists one for each card. You can use a special class of list, known as a data frame. They are far and away the most useful storage structure for data analysis, and they provide an ideal way to store an entire deck of cards. Data frames group vectors together into a two-dimensional table. Each vector becomes a column in the table. As a result, each column of a data frame can contain a different type of data; but within a column, every cell must be the same type of data, as in Figure. Each column can be a different data type. Every column in a data frame must be the same length. Creating a data frame by hand takes a lot of typing, but you can do it if you like with the data. Each vector should be set equal to a name that describes the vector. In the previous code, I named the arguments in data. Names You can also give names to a list or vector when you create one of these objects. Use the same syntax as with data. If you look at the type of a data frame, you will see that it is a list. In fact, each data frame is a list with class data. I told you that R likes factors. You can make each row in the data frame a playing card, and each column a type of value—each with its own appropriate data type. The data frame would look something like this: face suit value king spades 13 queen spades 12 jack spades 11 ten spades 10 nine spades 9 eight spades 8 seven spades 7 six spades 6 five spades 5 four spades 4 three spades 3 two spades 2 ace spades 1 king clubs 13 queen clubs 12 jack clubs 11 ten clubs 10. You could create this data frame with data. It is always better to acquire large data sets as a computer file. You can then ask R to read the file and store the contents hands on programming with r an object. Instead, turn your attention toward loading data into R. Please take a moment to download the file before reading on. Each row of the table is saved on its own line, and a comma is used to separate the cells within each row. Most data-science applications can open plain-text files and export data as plain-text files. This makes plain-text files a sort of lingua franca for data science. To help you out, the wizard shows you what the raw file looks like, as well as what your loaded data will look like based on the input settings. If you do, R will load all of your character strings as character strings. If you do not, R will convert them to factors. Once everything looks right, click Import. This is a good way to check that everything came through as expected. You can examine the data frame in the console with head deck. You can open any data frame in a View tab at any time with the View function. Now it is your turn. If everything goes correctly, the first few lines of your data frame should look like this: head deck face suit value king spades 13 queen spades 12 jack spades 11 ten spades 10 nine spades 9 eight spades 8 head and tail are two functions that provide an easy way to peek at large data sets. To see a different number of rows, give head or tails a second argument, the number of rows you would like to view, for example, head deck, 10. Visit to learn how to open other common types of files in R. That way you can email it to a colleague, store it on a thumb drive, or open it in a different program. You can save any data frame in R to a. To save deck, run: write. To see where your working directory is, run getwd. You can customize the save process with write. However, there are three arguments that you should use every time you run write. First, you should hands on programming with r write. Next, you should provide a file name to give your file. R will take this name quite literally, so be sure to provide an extension. Finally, you should add the argument row. This will prevent R from adding a column of numbers at the start of your data frame. These numbers will identify your rows from 1 to 52, but it is unlikely that whatever program you open cards. More than likely, the program will assume that the row names are the first column of data in your data frame. In fact, this is exactly what R will assume if you reopen cards. If you save and open cards. For more details about saving files, including how to compress saved files and how to save files in other formats, see. You now have a virtual deck of cards to work with. Of these objects, data frames are by far the most useful for data science. Data frames store one of the most common forms of data used in data science, tabular data. This requirement is not as limiting as it sounds. Most software programs can export data as a plain-text file. In fact, opening a file in its original program is good practice. Excel files use metadata, like sheets and formulas, that help Excel work with the file. No program is better at converting Excel files than Excel. However, you may find yourself with a program-specific file, but not the program that created it. Thankfully R can open many types of files, including files from other programs and databases. R even has its own program-specific formats that can help you save memory and time if you know that you will be working entirely in R. Here, you learned how to store data in R.

comments powered by Disqus