- Multiple languages: Supports searching in multiple languages, applying Solr pipeline to remove stop words, stemming, etc.
- Operators: Supports complex queries through UI elements called operators, which are essentially a reskin of Solr's querying language.
- Fast querying and retrieval
- Scraping data from various sources
Root/
├── project3Book/ # Scrapy for crawling
│ └── data/
│ └── bookData3.json # Crawled data
├── bookSearchUI/ # User Interface
└── Solr-8.11.0/ # Solr server
- Navigate to the spiders directory:
cd project3Book/project3Book/spiders
- Execute the spider:
scrapy runspider bookSpider.py -o out.json
The scraper extracts data from the following sources:
- DOABooks
- Feedbooks
- Book Depository (closed)
This project uses Solr 8.11.0 as the search engine. Use the distributed code as some settings have been modified (refer to the guide for details).
- Navigate to the Solr directory:
cd Solr-8.11.0
- Start Solr in cloud mode on the default port (8983):
bin/solr start -e cloud
- When prompted to create a new collection:
- Provide the collection name as
bookTest2
to use the example scraped data.
- Provide the collection name as
The UI is hosted on GitLab Pages to avoid local dependency setup (needs to run the application on localhost:8983):
- Quasar 2.0.0
- Vue 3.2
- Vuex 4
- Navigate to the UI directory:
cd bookSearchUI/quasar-demo
- Install dependencies:
npm install
- Run the application in development mode:
quasar dev
- To build the application:
Note: Running the build locally requires setting up a server or using GitLab Pages.
quasar build
For more detailed information, refer to the Guide PDF.
- CORS: Disabled CORS protection for direct queries.
- URI Size: Increased
maxRequestHeaderSize
injetty.xml
to handle long queries.
- Base Query: Defined in
bookSearchUI/quasar-demo/src/store/solr.js
. - ORed Words: Increases recall by ORing all words split by space in the query.
- Field Boosts: Important fields like
subjects
andtitle
have higher boosts.
- Dynamic Fields: Uses Solr dynamic fields with language-specific suffixes (
_txt_en
,_txt_it
). - Querying: Evaluates queries across all supported languages, leading to long queries.