Dr. Zakir Laliwala, Abdulbasit Shaikh - Web Crawling and Data Mining with Apache Nutch [2013, PDF/EPUB/MOBI, ENG]

Reply to topic
DL-List and Torrent activity
Size:  16 MB   |    Registered:  9 years   |    Completed:  2 times
Seeders:  552  [  0 KB/s  ]   Leechers:  10  [  0 KB/s  ]   Show peers in full details
 
   
 
 
Author Message

Download WYSIWYG ®

Gender: Male

Longevity: 10 years

Posts: 1539

Post 17-Jul-2015 11:40

[Quote]

Web Crawling and Data Mining with Apache Nutch
Год: 2013
Автор: Dr. Zakir Laliwala, Abdulbasit Shaikh
Издательство: Packt Publishing
ISBN: 978-1-78328-685-0
Язык: Английский
Формат: PDF/EPUB/MOBI
Качество: Изначально компьютерное (eBook)
Количество страниц: 136
Описание:
In Detail
Apache Nutch helps you to create your own search engine and customize it according to your needs. You can integrate Apache Nutch very easily with your existing application and get the maximum benefit from it. It can be easily integrated with different components like Apache Hadoop, Eclipse, and MySQL.
"Web Crawling and Data Mining with Apache Nutch" shows you all the necessary steps to help you in crawling webpages for your application and using them to make your application searching more efficient. You will create your own search engine and will be able to improve your application page rank in searching.
"Web Crawling and Data Mining with Apache Nutch" starts with the basics of crawling webpages for your application. You will learn to deploy Apache Solr on server containing data crawled by Apache Nutch and perform Sharding with Apache Nutch using Apache Solr.
You will integrate your application with databases such as MySQL, Hbase, and Accumulo, and also with Apache Solr, which is used as a searcher.
With this book, you will gain the necessary skills to create your own search engine. You will also perform link analysis and scoring that are helpful in improving the rank of your application page.
What you will learn from this book
Carry out web crawling for your application
Make your application searching efficient by integrating it with Apache Solr
Integrate your application with different databases for data storage purposes
Run your application in a cluster environment by integrating it with Apache Hadoop
Perform crawling operations with Eclipse, which is used as an IDE instead of the command line
Create your own plugin in Apache Nutch
Integrate Apache Solr with Apache Nutch, and deploy Apache Solr on Apache Tomcat
Apply Sharding on Apache Tomcat for getting good results from Apache Solr while searching
Approach
This book is a user-friendly guide that covers all the necessary steps and examples related to web crawling and data mining using Apache Nutch.
Who this book is written for
"Web Crawling and Data Mining with Apache Nutch" is aimed at data analysts, application developers, web mining engineers, and data scientists. It is a good start for those who want to learn how web crawling and data mining is applied in the current business world. It would be an added benefit for those who have some knowledge of web crawling and data mining.

Примеры страниц

Оглавление

Preface
Chapter 1: Getting Started with Apache Nutch

Introduction to Apache Nutch
Installing and configuring Apache Nutch
Installation dependencies
Verifying your Apache Nutch installation
Crawling your first website
Installing Apache Solr
Integration of Solr with Nutch
Crawling your website using the crawl script
Crawling the Web, the CrawlDb, and URL filters
InjectorJob
GeneratorJob
FetcherJob
ParserJob
DbUpdaterJob
Invertlinks
Indexing with Apache Solr
Parsing and parse filters
Webgraph
Loops
LinkRank
ScoreUpdater
A scoring example
The Apache Nutch plugin
The Apache Nutch plugin example
Modifying plugin.xml
Describing dependencies with the ivy module
The Indexer extension program
The Scoring extension program
Using your plugin with Apache Nutch
Compiling your plugin
Understanding the Nutch Plugin architecture
Chapter 2: Deployment, Sharding, and AJAX Solr with Apache Nutch
Deployment of Apache Solr
Introduction of deployment
Need of Apache Solr deployment
Setting up Java Development Kit
Setting up Tomcat
Setting up Apache Solr
Running Solr on Tomcat
Sharding using Apache Solr
Introduction to sharding
Use of sharding with Apache Nutch
Distributing documents across shards
Sharding Apache Solr indexes
Single cluster
Splitting shards with Apache Nutch
Cleaning up with Apache Nutch
Splitting cluster shards
Checking statistics of sharding with Apache Nutch
The finaltest with Apache Nutch
Working with AJAX Solr
Architectural overview of AJAX Solr
Applying AJAX Solr on Reuters' data
Running AJAX Solr
Chapter 3: Integration of Apache Nutch with Apache
Hadoop and Eclipse
Integrating Apache Nutch with Apache Hadoop
Introducing Apache Hadoop
InstallingApache Hadoop and Apache Nutch
Downloading Apache Hadoop and Apache Nutch
Setting up Apache Hadoop with the cluster
Installing Java
Downloading Apache Hadoop
Configuring SSH
Disabling IPv6
Installing Apache Hadoop
Required ownerships and permissions
The configuration required for Hadoop_HOME/conf/*
Formatting the HDFS filesystem using the NameNode
Setting up the deployment architecture of Apache Nutch
Installing Apache Nutch
Key points of the Apache Nutch installation
Starting the cluster
Performing crawling on the Apache Hadoop cluster
Configuring Apache Nutch with Eclipse
Introducing Apache Nutch configuration with Eclipse
Installation and building Apache Nutch with Eclipse
Crawling in Eclipse
Chapter 4: Apache Nutch with Gora, Accumulo, and MySQL
Introduction to Apache Accumulo
Main features of Apache Accumulo
Introduction to Apache Gora
Supported data stores
Use of Apache Gora
Integration of Apache Nutch with Apache Accumulo
ConfiguringApache Gora with Apache Nutch
Setting up Apache Hadoop and Apache ZooKeeper
Installing and configuring Apache Accumulo
Testing Apache Accumulo
Crawling with Apache Nutch on Apache Accumulo
Integration of Apache Nutch with MySQL
Introduction to MySQL
Benefits of integrating MySQL with Apache Nutch
Configuring MySQL with Apache Nutch
Crawling with Apache Nutch on MySQL
Index
[solely-soft.top].t30439.torrent
Torrent: Registered [ 2015-07-17 11:40 ] · 7BE083B729DD8AFEA578F6A40E5CF57166781DAE

20 KB

Status: checked
Completed: 2 times
Size: 16 MB
Rate: 
(Vote: 0)
Have thanked: 0  Thanks
Dr. Zakir Laliwala, Abdulbasit Shaikh - Web Crawling and Data Mining with Apache Nutch [2013, PDF/EPUB/MOBI, ENG] download torrent for free and without registration
[Profile] [PM]
Display posts:    
Reply to topic

Current time is: 04-Jun 01:47

All times are UTC + 2



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum