Skip to content
This repository has been archived by the owner on Oct 29, 2024. It is now read-only.

fix(web): fix bugs and improve the performance #377

Merged
merged 2 commits into from
Sep 27, 2024
Merged

Conversation

chuang8511
Copy link
Member

@chuang8511 chuang8511 commented Sep 27, 2024

Because

  • async process should lock first before inserting data to avoid race condition
  • colly can parse response onResponse

This commit

  • fix the bug
  • refactor the web crawl, and it will improve the speeds and save the resources

Copy link

linear bot commented Sep 27, 2024

Copy link

codecov bot commented Sep 27, 2024

Codecov Report

Attention: Patch coverage is 0% with 19 lines in your changes missing coverage. Please review.

Project coverage is 38.57%. Comparing base (79b84b3) to head (37c260d).
Report is 29 commits behind head on main.

Files with missing lines Patch % Lines
operator/web/v0/crawl_website.go 0.00% 19 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #377      +/-   ##
==========================================
+ Coverage   37.69%   38.57%   +0.87%     
==========================================
  Files         215      226      +11     
  Lines       27830    27268     -562     
==========================================
+ Hits        10491    10519      +28     
+ Misses      15752    15155     -597     
- Partials     1587     1594       +7     
Flag Coverage Δ
unittests 38.57% <0.00%> (+0.87%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chuang8511 chuang8511 marked this pull request as draft September 27, 2024 14:05
@chuang8511 chuang8511 marked this pull request as ready for review September 27, 2024 16:18
@chuang8511 chuang8511 changed the title fix(web): add lock to fix race condition-related bugs fix(web): fix bugs and improve the performance Sep 27, 2024
@donch1989 donch1989 merged commit f0f4f89 into main Sep 27, 2024
10 checks passed
@donch1989 donch1989 deleted the chunhao/ins-6386 branch September 27, 2024 17:26
donch1989 pushed a commit that referenced this pull request Sep 30, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.29.0-beta](v0.28.0-beta...v0.29.0-beta)
(2024-09-27)


### Features

* **compogen:** add table css and update doc
([#370](#370))
([c59b167](c59b167))
* **document:** improve image extraction from pdf
([#372](#372))
([39cdb2c](39cdb2c))
* **openai:** use batch inference for embedding
([#375](#375))
([cc897af](cc897af))


### Bug Fixes

* **compogen:** fix json format
([#364](#364))
([e6619ce](e6619ce))
* **text:** bug about table judgement in markdown chunking
([#367](#367))
([1ab13e2](1ab13e2))
* the input validator can not validate array format
([#379](#379))
([38074c8](38074c8))
* **web:** fix bugs and improve the performance
([#377](#377))
([f0f4f89](f0f4f89))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants