3  Data

All of your classes so far have been about traditional statistical methods. They have used small datasets for the examples. Basically all of the datasets have included one computer file with a limited number of column and less than approximately less that 100,000 rows of data. Is that true?

In the modern world of Data Science, Machine Learning, Deep Learning, and Business Analytics datasets include large numbers of columns of data, millions of rows of data, and usually include multiple data files. Beyond this data can come in the form of collections of words in documents, collections of images in various file formats, collections of videos in various file formats, collections of songs in various files formats, and collections of lyrics, which are text documents, etc. While working with data in columns or dataframes is still the starting point of most data analysis, having further experience working with multiple files and multiple types of files is important to experience before entering a modern data job.

Also in the modern world of Data Science algorithms are used. These algorithms are usually computer packages that are downloaded, installed, and loaded into R, Python, or Julia. It is not common to program a computer to implement the calculation of a t-test or linear regression, functions are used in software. The vast number of algorithms that are now available is endless. Becoming familiar with how to discover these algorithms and how to implement them is conducting basic research in Data Science.

Learning about an existing algorithm and how to apply it to your data is data science research.