Exploring Data with RapidMiner
Format: PDF / Kindle (mobi) / ePub
RapidMiner is a highly versatile tool that can make data work harder for you. This book will show you how to import, parse, and structure your data with remarkable speed and efficiency. It's data mining made accessible.
- See how to import, parse, and structure your data quickly and effectively
- Understand the visualization possibilities and be inspired to use these with your own data
- Structured in a modular way to adhere to standard industry processes
Data is everywhere and the amount is increasing so much that the gap between what people can understand and what is available is widening relentlessly. There is a huge value in data, but much of this value lies untapped. 80% of data mining is about understanding data, exploring it, cleaning it, and structuring it so that it can be mined. RapidMiner is an environment for machine learning, data mining, text mining, predictive analytics, and business analytics. It is used for research, education, training, rapid prototyping, application development, and industrial applications.
Exploring Data with RapidMiner is packed with practical examples to help practitioners get to grips with their own data. The chapters within this book are arranged within an overall framework and can additionally be consulted on an ad-hoc basis. It provides simple to intermediate examples showing modeling, visualization, and more using RapidMiner.
Exploring Data with RapidMiner is a helpful guide that presents the important steps in a logical order. This book starts with importing data and then lead you through cleaning, handling missing values, visualizing, and extracting additional information, as well as understanding the time constraints that real data places on getting a result. The book uses real examples to help you understand how to set up processes, quickly..
This book will give you a solid understanding of the possibilities that RapidMiner gives for exploring data and you will be inspired to use it for your own work.
What you will learn from this book
- Import real data from files in multiple formats and from databases
- Extract features from structured and unstructured data
- Restructure, reduce, and summarize data to help you understand it more easily and process it more quickly
- Visualize data in new ways to help you understand it
- Detect outliers and methods to handle them
- Detect missing data and implement ways to handle it
- Understand resource constraints and what to do about them
A step-by-step tutorial style using examples so that users of different levels will benefit from the facilities offered by RapidMiner.
Who this book is written for
If you are a computer scientist or an engineer who has real data from which you want to extract value, this book is ideal for you. You will need to have at least a basic awareness of data mining techniques and some exposure to RapidMiner.
get a sense of how big the data is and what its range is. This view is available to show example sets when the Results view is selected. It is always worthwhile to take a careful look at this view to check that the attributes are of the correct type. Numerical attributes should have an average and a standard deviation that looks sensible and nominal values should have a full set of valid values and dates within an expected range. The Statistics View also shows which attributes have missing
to exploring data using RapidMiner Studio. Something like 80 percent of a data mining or predictive analytics project is spent importing, cleaning, visualizing, restructuring, and summarizing data in order to understand it. This book focuses on this vital aspect and gives practical advice using RapidMiner Studio to help with the process. A number of techniques are illustrated and it is the nature of exploratory data analysis that they can be re-used and modified in different places. By drawing
take account of an outlier in an attribute that has not been seen before. Manual inspection Manual inspection is an important method. People are generally good at seeing patterns and can detect anomalies with ease. The challenge is presenting the data in such a way so as to allow patterns to be seen. Creativity is important and some of the visualization techniques described in Chapter 3, Visualizing Data, will help in this case. Outliers As an example, the following screenshot shows some
examples will have a common id, an attribute with a value based on the name of the original attribute, and a value derived from the intersection of the original attribute and the common id. The parameters needed to make the De-Pivot operator work are shown in the following screenshot: [ 102 ] Chapter 7 The index attribute is the name of an attribute that will be created as a result of the de-pivot operation. The attribute name parameter is set to value, and this will be the name of an
removing 109, 110 useless below parameters 111 W windowing 104 Window operator 142 Write CSV operator 24 X XPath abou 57-59 using 138 X-Validation operator 130 T text mining 142 time series 142 time series data about 39 series, plotting 39-41 survey plotter, using 42, 43 type 14 [ 148 ] Thank you for buying Exploring Data with RapidMiner About Packt Publishing Packt, pronounced 'packed', published its first book "Mastering phpMyAdmin for Effective MySQL Management" in April 2004 and