Kasha – Stealth Web Scraper (CLI)
Description
Kasha is a stealth web scraping utility designed for precision and silence. Named after the Japanese demon that steals corpses from funerals, Kasha moves with similar discretion — gathering data from remote web targets without drawing attention.
Built upon Python HTTPX and fortified with BeautifulSoup for parsing, it employs a random User-Agent engine — and not just some of them, but every known browser string ever compiled. The complete arsenal was too immense to list here.
Purpose
Kasha is crafted for developers, researchers, and digital samurai who must collect remote data efficiently — whether for:
- Archiving or mirroring websites
- Analyzing HTML structures and links
- Harvesting assets such as images, CSS, or JavaScript
- Following internal or external link structures recursively
Usage
Invoke from terminal:
./kasha <url> [options]
Available Options:
| Option | Description |
|---|---|
--resources |
Scrape and save all assets (images, CSS, JS) |
--dynamic |
Enable Playwright mode for dynamic pages |
--logging |
Activate detailed logging output |
--follow-internal |
Follow and scrape all internal links |
--follow-all |
Follow all links (internal & external) |
--rate-limit N |
Pause N seconds between requests |
Example Command:
./kasha https://example.com --resources --follow-internal --rate-limit 2
Structure
All scraped data is preserved in thescrapes/directory, organized by domain:
scrapes/
├── example.com/
│ ├── index.html
│ ├── assets/
│ ├── css/
│ └── js/
└── anotherdomain.org/
Features
- Massive randomized User-Agent rotation
- Recursive link following (internal/external)
- Optional rate limiting for stealth operations
- Support for static and dynamic content (Playwright)
- Clean directory mirroring and structured output
Philosophy
In the tradition of the Ronin, Kasha acts without master or mercy — silent, methodical, and precise.
Each scrape is a strike: deliberate, unseen, and final.
“Strike once, unseen — leave only echoes.”
Requirements
- Python 3.8+
- Libraries:
httpx,beautifulsoup4,playwright(optional)
Install dependencies:
pip install httpx beautifulsoup4 playwright
License
This project is distributed under a permissive open license. Use with responsibility and respect for target servers. The sword is sharp — wield it wisely.