Book Search Engine

This project comprises several components including a Scrapy crawler, a Solr server setup, and a UI. It is designed to crawl, index, and search book data efficiently.

Overview video

Features

Multiple languages: Supports searching in multiple languages, applying Solr pipeline to remove stop words, stemming, etc.
Operators: Supports complex queries through UI elements called operators, which are essentially a reskin of Solr's querying language.
Fast querying and retrieval
Scraping data from various sources

Project Structure

Root/
├── project3Book/         # Scrapy for crawling
│   └── data/
│       └── bookData3.json # Crawled data
├── bookSearchUI/         # User Interface
└── Solr-8.11.0/          # Solr server

Components

Scrapy - Web Crawling

To run Scrapy:

Navigate to the spiders directory:
```
cd project3Book/project3Book/spiders
```

Execute the spider:

scrapy runspider bookSpider.py -o out.json

The scraper extracts data from the following sources:

Solr - Search Platform

This project uses Solr 8.11.0 as the search engine. Use the distributed code as some settings have been modified (refer to the guide for details).

To start Solr:

Navigate to the Solr directory:
```
cd Solr-8.11.0
```
Start Solr in cloud mode on the default port (8983):
```
bin/solr start -e cloud
```
When prompted to create a new collection:
- Provide the collection name as bookTest2 to use the example scraped data.

UI - User Interface (Quasar/Vue/Vuex)

The UI is hosted on GitLab Pages to avoid local dependency setup (needs to run the application on localhost:8983):

🔗 Book Search UI

Quasar 2.0.0
Vue 3.2
Vuex 4

To run locally:

Navigate to the UI directory:
```
cd bookSearchUI/quasar-demo
```
Install dependencies:
```
npm install
```
Run the application in development mode:
```
quasar dev
```
To build the application:
```
quasar build
```
Note: Running the build locally requires setting up a server or using GitLab Pages.

Guide/Report

For more detailed information, refer to the Guide PDF.

Additional Information

Solr Customizations

CORS: Disabled CORS protection for direct queries.
URI Size: Increased maxRequestHeaderSize in jetty.xml to handle long queries.

Query Strategy

Base Query: Defined in bookSearchUI/quasar-demo/src/store/solr.js.
ORed Words: Increases recall by ORing all words split by space in the query.
Field Boosts: Important fields like subjects and title have higher boosts.

Multiple Languages

Dynamic Fields: Uses Solr dynamic fields with language-specific suffixes (_txt_en, _txt_it).
Querying: Evaluates queries across all supported languages, leading to long queries.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
bookSearchUI		bookSearchUI
gitData		gitData
project3Book		project3Book
solr-8.11.0		solr-8.11.0
.gitignore		.gitignore
README.MD		README.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book Search Engine

Features

Project Structure

Components

Scrapy - Web Crawling

To run Scrapy:

Solr - Search Platform

To start Solr:

UI - User Interface (Quasar/Vue/Vuex)

To run locally:

Guide/Report

Additional Information

Solr Customizations

Query Strategy

Multiple Languages

About

Releases

Packages

Languages

JoyAlbertini/BookSearchEngine

Folders and files

Latest commit

History

Repository files navigation

Book Search Engine

Features

Project Structure

Components

Scrapy - Web Crawling

To run Scrapy:

Solr - Search Platform

To start Solr:

UI - User Interface (Quasar/Vue/Vuex)

To run locally:

Guide/Report

Additional Information

Solr Customizations

Query Strategy

Multiple Languages

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages