diff --git a/.gitignore b/.gitignore old mode 100644 new mode 100755 diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md old mode 100644 new mode 100755 diff --git a/LICENSE b/LICENSE old mode 100644 new mode 100755 diff --git a/README.md b/README.md old mode 100644 new mode 100755 index 2fae14f..2e5e86c --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ A collection of awesome web crawler,spider and resources in different languages. - [Go](#go) - [Scala](#scala) -## Python +## Python * [Scrapy](https://github.com/scrapy/scrapy) - A fast high-level screen scraping and web crawling framework. * [django-dynamic-scraper](https://github.com/holgerd77/django-dynamic-scraper) - Creating Scrapy scrapers via the Django admin interface. * [Scrapy-Redis](https://github.com/rolando/scrapy-redis) - Redis-based components for Scrapy. @@ -35,14 +35,14 @@ A collection of awesome web crawler,spider and resources in different languages. * [portia](https://github.com/scrapinghub/portia) - Visual scraping for Scrapy. * [crawley](https://github.com/jmg/crawley) - Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations. * [RoboBrowser](https://github.com/jmcarp/robobrowser) - A simple, Pythonic library for browsing the web without a standalone web browser. -* [MSpider](https://github.com/manning23/MSpider) - A simple ,easy spider using gevent and js render. +* [MSpider](https://github.com/manning23/MSpider) - A simple ,easy spider using gevent and js render. * [brownant](https://github.com/douban/brownant) - A lightweight web data extracting framework. * [PSpider](https://github.com/xianhu/PSpider) - A simple spider frame in Python3. * [Gain](https://github.com/gaojiuli/gain) - Web crawling framework based on asyncio for everyone. * [sukhoi](https://github.com/iogf/sukhoi) - Minimalist and powerful Web Crawler. -* [spidy](https://github.com/rivermont/spidy) - The simple, easy to use command line web crawler. +* [spidy](https://github.com/rivermont/spidy) - The simple, easy to use command line web crawler. * [newspaper](https://github.com/codelucas/newspaper) - News, full-text, and article metadata extraction in Python 3 -* [aspider](https://github.com/howie6879/aspider) - An async web scraping micro-framework based on asyncio. +* [aspider](https://github.com/howie6879/aspider) - An async web scraping micro-framework based on asyncio. ## Java * [ACHE Crawler](https://github.com/ViDA-NYU/ache) - An easy to use web crawler for domain-specific search. @@ -66,10 +66,10 @@ A collection of awesome web crawler,spider and resources in different languages. * [Norconex Web Crawler](https://github.com/Norconex/collector-http) - Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repository of your choice (e.g. a search engine). Can be used as a stand alone application or be embedded into Java applications. -## C# +## C# * [ccrawler](http://www.findbestopensource.com/product/ccrawler) - Built in C# 3.5 version. it contains a simple extension of web content categorizer, which can separate between the web page depending on their content. * [SimpleCrawler](https://github.com/lei-zhu/SimpleCrawler) - Simple spider base on mutithreading, regluar expression. -* [DotnetSpider](https://github.com/zlzforever/DotnetSpider) - This is a cross platfrom, ligth spider develop by C#. +* [DotnetSpider](https://github.com/zlzforever/DotnetSpider) - This is a cross platform, light spider develop by C#. * [Abot](https://github.com/sjdirect/abot) - C# web crawler built for speed and flexibility. * [Hawk](https://github.com/ferventdesert/Hawk) - Advanced Crawler and ETL tool written in C#/WPF. * [SkyScraper](https://github.com/JonCanning/SkyScraper) - An asynchronous web scraper / web crawler using async / await and Reactive Extensions. @@ -85,10 +85,10 @@ A collection of awesome web crawler,spider and resources in different languages. * [x-ray](https://github.com/lapwinglabs/x-ray) - Web scraper with pagination and crawler support. * [node-osmosis](https://github.com/rchipka/node-osmosis) - HTML/XML parser and web scraper for Node.js. * [web-scraper-chrome-extension](https://github.com/martinsbalodis/web-scraper-chrome-extension) - Web data extraction tool implemented as chrome extension. -* [supercrawler](https://github.com/brendonboshell/supercrawler) - Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits. +* [supercrawler](https://github.com/brendonboshell/supercrawler) - Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits. * [headless-chrome-crawler](https://github.com/yujiosaka/headless-chrome-crawler) - Headless Chrome crawls with jQuery support * [Squidwarc](https://github.com/n0tan3rd/squidwarc) - High fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head -* [crawlee](https://github.com/apify/crawlee) - A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast. +* [crawlee](https://github.com/apify/crawlee) - A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast. ## PHP @@ -124,7 +124,7 @@ A collection of awesome web crawler,spider and resources in different languages. ## R * [rvest](https://github.com/hadley/rvest) - Simple web scraping for R. -## Erlang +## Erlang * [ebot](https://github.com/matteoredaelli/ebot) - A scalable, distribuited and highly configurable web cawler. ## Perl @@ -134,7 +134,7 @@ A collection of awesome web crawler,spider and resources in different languages. * [pholcus](https://github.com/henrylee2cn/pholcus) - A distributed, high concurrency and powerful web crawler. * [gocrawl](https://github.com/PuerkitoBio/gocrawl) - Polite, slim and concurrent web crawler. * [fetchbot](https://github.com/PuerkitoBio/fetchbot) - A simple and flexible web crawler that follows the robots.txt policies and crawl delays. -* [go_spider](https://github.com/hu17889/go_spider) - An awesome Go concurrent Crawler(spider) framework. +* [go_spider](https://github.com/hu17889/go_spider) - An awesome Go concurrent Crawler(spider) framework. * [dht](https://github.com/shiyanhui/dht) - BitTorrent DHT Protocol && DHT Spider. * [ants-go](https://github.com/wcong/ants-go) - A open source, distributed, restful crawler engine in golang. * [scrape](https://github.com/yhat/scrape) - A simple, higher level interface for Go web scraping.