Skip to content

Commit

Permalink
fix: usecases formatting (#458)
Browse files Browse the repository at this point in the history
  • Loading branch information
innnotruong authored Feb 11, 2025
1 parent 51f0542 commit cc8ef8a
Show file tree
Hide file tree
Showing 15 changed files with 162 additions and 205 deletions.
50 changes: 24 additions & 26 deletions Use Cases/ai-powered-monthly-project-reports.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,31 @@ date: "2024-11-14"
description: "An in-depth look at Dwarves' monthly Project Reports system - a lean, efficient system that transforms communication data into actionable intelligence for Operations teams. This case study explores how we orchestrate multiple data streams into comprehensive project insights while maintaining enterprise-grade security and cost efficiency."
tags:
- "data-engineering"
- "project-management"
- "ai-agents"
- "llmops"
- "case-study"
title: "Project reports system: a case study"
---

At Dwarves, we've developed a monthly Project Reports system - a lean, efficient system that transforms our communication data into actionable intelligence for our Operations team. This system orchestrates multiple data streams into comprehensive project insights while maintaining enterprise-grade security and cost efficiency.
At Dwarves, we've developed a Monthly Project Reports system that transforms communication data into actionable intelligence. This lean system orchestrates multiple data streams into comprehensive project insights while maintaining enterprise-grade security and cost efficiency.

## The need for orchestrated intelligence
Our engineering teams exchange thousands of Discord messages daily across projects, capturing critical technical discussions, architectural decisions, and implementation details. However, while Discord excels at real-time communication, valuable insights often remain buried in chat histories, making it difficult to:

Our engineering teams generate thousands of Discord messages daily across multiple projects. These messages contain critical technical discussions, architectural decisions, and implementation details that traditionally remained trapped in chat histories. While Discord excels as a communication platform, its real-time nature makes it challenging to track project progress against client requirements or ensure alignment between ongoing discussions and formal documentation.
1. Track project progress against client requirements.
2. Align ongoing discussions with formal documentation.
3. Extract actionable insights from technical conversations.

This challenge sparked the development of our Project Reports system. Like a skilled conductor bringing order to complex musical pieces, our system coordinates multiple data streams into clear, actionable project intelligence
This challenge led us to develop the Project Reports system - an intelligent orchestration layer that transforms scattered communication data into structured project intelligence. Our system processes multiple data streams, extracting key insights and patterns to generate comprehensive project visibility.

## The foundation: Data architecture

Our architecture follows a simple yet powerful approach to data management, emphasizing efficiency and practicality over complexity. We've built our system on three core principles:

1. **Lean Storage**: S3 serves as our primary data lake and warehouse, using Parquet and CSV files to optimize for both cost and performance
2. **Efficient Processing**: DuckDB and Polars provide high-performance querying without the overhead of traditional data warehouses
3. **Secure Access**: Modal orchestrates our serverless functions, ensuring secure and efficient data processing
1. **Lean storage**: S3 serves as our primary data lake and warehouse, using Parquet and CSV files to optimize for both cost and performance
2. **Efficient processing**: DuckDB and Polars provide high-performance querying without the overhead of traditional data warehouses
3. **Secure access**: Modal orchestrates our serverless functions, ensuring secure and efficient data processing

### Data Flow Overview
### Data flow overview

```mermaid
graph TB
Expand Down Expand Up @@ -93,7 +96,7 @@ graph TB

The system begins with raw data collection from various sources, primarily Discord at present, with planned expansion to Git, JIRA, Google Docs, and Notion. This data moves through our S3-based landing and gold zones, where it undergoes quality checks and transformations before feeding into our platform and AI engineering layers.

### Detailed Processing Pipeline
### Detailed processing pipeline

```mermaid
graph LR
Expand Down Expand Up @@ -157,12 +160,12 @@ graph LR

Our processing pipeline emphasizes efficiency and security:

1. **Collection Layer**: Weekly scheduled collectors gather data from various sources
2. **Processing Pipeline**: Data undergoes PII scrubbing, validation, and schema enforcement
3. **Storage Layer**: Processed data is stored in S3 using Parquet and CSV formats
4. **Query Layer**: DuckDB and Polars engines provide fast, efficient data analysis
1. **Collection layer**: Weekly scheduled collectors gather data from various sources
2. **Processing pipeline**: Data undergoes PII scrubbing, validation, and schema enforcement
3. **Storage layer**: Processed data is stored in S3 using Parquet and CSV formats
4. **Query layer**: DuckDB and Polars engines provide fast, efficient data analysis

## Dify - Operational Intelligence through Low-code Workflows
## Dify - Operational intelligence through low-code workflows

We use Dify to transform our raw data streams into intelligent insights through low-code workflows. This process bridges the gap between our data collection pipeline and the operational insights needed by our team.

Expand Down Expand Up @@ -213,17 +216,16 @@ The workflow system easily integrates with our existing data pipeline, pulling f
- **Maintainable Intelligence**
Templates and workflows are version-controlled and documented, making it easy for team members to understand and modify the intelligence generation process. This ensures our reporting system can evolve with our organizational needs.

## Operational Impact
## Operational impact

The Project Reports system serves as the foundation for our Operations team's project oversight. It provides:

- **Real-time Project Visibility**: Operations can track progress across multiple projects through consolidated communication data, enabling early identification of potential issues or bottlenecks.
- **Data-Driven Decision Making**: By analyzing communication patterns and project discussions, we can make informed decisions about resource allocation and project timelines.
- **Automated Reporting**: The system generates comprehensive monthly reports, reducing manual effort and ensuring consistent project tracking across the organization.

## Technical Implementation

### Secure Data Collection
## Technical implementation
### Secure data collection

The cornerstone of our system is a robust collection pipeline built on Modal. Our collection process runs weekly, automatically processing Discord messages through a sophisticated filtering system that preserves critical technical discussions while ensuring security and privacy.

Expand All @@ -242,8 +244,7 @@ def weekly_discord_collection():

Through Modal's serverless architecture, we've implemented separate landing zones for different project data, ensuring granular access control and comprehensive audit trails. Each message undergoes content filtering and PII scrubbing before being transformed into optimized Parquet format, providing both storage efficiency and query performance.

### Query Interface

### Query interface
The system provides a flexible API for accessing processed data:

```python
Expand All @@ -262,12 +263,10 @@ def query_messages(item: QueryRequest, token: str = Depends(verify_token)) -> Di

```

## Measured Impact

## Measured impact
The implementation of Project Reports has fundamentally transformed our project management approach. Our operations team now have greater visibility into project progress, with tracking and early issue identification becoming the norm rather than the exception. The automated documentation of key decisions has significantly reduced meeting overhead, while the correlation between discussions and deliverables ensures nothing falls through the cracks.

## Future Development

## Future development
We're expanding the system's capabilities in several key areas:

- **Additional Data Sources**: Integration with Git metrics, JIRA tickets, and documentation platforms will provide a more comprehensive view of project health.
Expand All @@ -277,7 +276,6 @@ We're expanding the system's capabilities in several key areas:
We also don’t plan to be vendor-locked using entirely Modal. The foundations we’ve layed out to create our landing zones and data lake makes it very easy to swap in-and-out query and API architectures.

## Conclusion

At Dwarves, our Project Reports system demonstrates the power of thoughtful data engineering in transforming raw communication into strategic project intelligence. By combining secure data collection, efficient processing, and AI-powered analysis, we've created a system that doesn't just track progress – it actively contributes to project success.

The system continues to coordinate our project data streams with precision and purpose, ensuring that every piece of information contributes to a clear picture of project health. Through this systematic approach, we're setting new standards for data-driven project management in software development, one report at a time.
3 changes: 1 addition & 2 deletions Use Cases/ai-ruby-travel-assistant-chatbot.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,8 @@ authors:
date: "2024-11-21"
description: "A case study exploring how we built an AI-powered travel assistant using Ruby and AWS Bedrock, demonstrating how choosing the right tools over popular choices led to a more robust and maintainable solution. This study examines our approach to integrating AI capabilities within existing Ruby infrastructure while maintaining enterprise security standards."
tags:
- "ruby"
- "ai-agents"
- "ai-engineering"
- "ai"
- "case-study"
title: "AI-powered Ruby travel assistant"
---
Expand Down
24 changes: 11 additions & 13 deletions Use Cases/binance-transfer-matching.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
title: "Building better Binance transfer tracking"
date: 2024-11-18
tags:
- data
- sql
- binance
description: A deep dive into building a robust transfer tracking system for Binance accounts, transforming disconnected transaction logs into meaningful fund flow narratives through SQL and data analysis
- "data-engineering"
- fintech
- defi
description: A deep dive into building a robust transfer tracking syste m for Binance accounts, transforming disconnected transaction logs into meaningful fund flow narratives through SQL and data analysis
authors:
- bievh
---
Expand All @@ -16,8 +16,8 @@ Everything worked well at the beginning, motivating the clients to increase the

This emergency lets us begin record every transfers between accounts in the system, then notify to the clients continuously.

### Limitations of Binance income history
---
## Limitations of Binance income history

To record every transfers, we need the help of Binance APIs, specifically is [Get Income History (USER_DATA)](https://developers.binance.com/docs/derivatives/usds-margined-futures/account/rest-api/Get-Income-History). Once calling to this endpoint with proper parameters, we can retrieve the following `JSON` response.

```JSON
Expand Down Expand Up @@ -58,9 +58,8 @@ To me, it looks bad. Ignore the wrong destination balance because of another iss

If you pay attention to the `JSON` response of Binance API, an idea can be raised in your mind that "*Hmm, it looks easy to get the better version of logging by just only matching the transaction ID aka tranId field value*". Yes, it is the first thing that popped into my mind. Unfortunately, once the transfer happens between two accounts, different transaction IDs are produced on each account side.

### Our approach to transfer history mapping
---
#### Current implementation
## Our approach to transfer history mapping
### Current implementation
It can make you a bit of your time at the beginning when looking at the response of Binance API and ask yourself "Why does Binance give us a bad API response?". Bit it is not a dilemma. And Binance API is not as bad as when I mentioned it. This API serves things enough for its demand in the Biance. And more general means can serve more use cases at all.

Enough to explain, now, we get to the important part: matching transfers to make the transfer history logging becomes more robust. I think we have more than two ways to do it. But because this issue comes from a data aspect, we will use a database solution to make it better.
Expand Down Expand Up @@ -105,8 +104,7 @@ The flow chart above shows how the current system produced transfer tracking log
- From `Future Incomes`, we simply query transfer information such as amount, time, and its sign.
- Using the time of transfer, query `Balance snapshots` to detect balance before and after it is changed by the transfer.

#### How to make it better?

### How to make it better?
To do it better, we need to match the transfers together to know the source and destination of the fund. To match the transfers together, we need to specify what is the transfer before and after it (**with the assumption that transfers of the same fund on the send and receive side happen in a small gap of time, and two transfers can't happen in the same time**). We are lucky that Postgresql provides us with two convenient window functions, LEAD and LAG. LEAD is used to access a row following the current row at a specific physical offset. On the other hand, LAG helps with previous row access. With simple syntax and better performance, it is our choice to do transfer paring.

```sql
Expand Down Expand Up @@ -219,8 +217,8 @@ flowchart TD
```
*Figure 4: Upgraded process to build transfer history*

### Conclusions
## Conclusions
From the problem to the idea and finally is the implementation, nothing is too difficult. Every normal software developer can do it even better. But to do the huge thing, we first should begin from the smaller and make it done subtly and carefully. From this small problem, I learned some things:
- **The answer may lie in the question itself.** Instead of blaming Binance API for being so bad, we can take a sympathetic look at it, and see if there is anything we can get out of it.
- **One small change can make everything better.** When comparing the original transfer tracking log, and the version after upgrading with some small changes in the DB query, there is a huge difference when seeing the new one. This reminds uss that impactful solutions don't always require complex architectures – sometimes they just need careful refinement of existing approaches.
- **Data challenges are often best addressed through data-driven solutions**. Rather than seeking fixes elsewhere, the key is to leverage the inherent patterns and structure within the data itself.
- **Data challenges are often best addressed through data-driven solutions**. Rather than seeking fixes elsewhere, the key is to leverage the inherent patterns and structure within the data itself.
2 changes: 2 additions & 0 deletions Use Cases/bitcoin-alt-performance-tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ title: "Tracking Bitcoin-Altcoin Performance Indicators in BTC Hedging Strategy"
date: 2025-01-02
tags:
- data
- fintech
- blockchain
- crypto
description: "This article provides an overview of the importance of tracking Bitcoin-Altcoin performance indicators in a trading strategy known as Hedge, and explains how to visualize this data effectively. It also demonstrates how to render a chart for this strategy using Matplotlib and Seaborn"
authors:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@ authors:
date: '2024-11-21'
description: 'A technical case study detailing the implementation of an AI chatbot agent in a project management platform. Learn how the team leveraged LangChain, LangGraph, and GPT-4 to build a multi-agent system using the supervisor-worker pattern. '
tags:
- 'ai'
- 'project-management'
- 'ai-agents'
- 'aiops'
- 'langchain'
- 'case-study'
title: 'Building chatbot agent to streamline project management'
---
Expand All @@ -18,7 +19,6 @@ The challenge was to natively integrate a generative AI chatbot that could assis
Implementing the chatbot agent involved key technical domains such as developing an interface to communicate with external AI platforms like OpenAI, creating an agentic system to interpret and execute user requests, and setting up usage monitoring to control AI token consumption and track chatbot performance.

## System requirements

### Business requirements

- Chatbot should be able to answer general questions about project management, such as writing project proposals or epic planning.
Expand Down Expand Up @@ -87,7 +87,6 @@ Implementing the chatbot agent involved key technical domains such as developing
The data flows from the user to the Supervisor, which routes the request to the appropriate worker agent. The worker agent processes the request, interacting with the necessary tools and the database, and generates a response. The response is then returned to the Supervisor and finally to the user.

## Technical implementation

### Core workflows

```mermaid
Expand Down Expand Up @@ -147,7 +146,6 @@ To address the need for displaying custom UI elements instead of text-only respo
- **MongoDB**: NoSQL database for storing chat history, token usage, and other relevant data, offering flexibility and scalability.

## Lessons learned

### What worked well

1. Implementing the supervisor-worker pattern using LangGraph allowed us to build a scalable and extensible multi-agent AI system that could handle increasing functionalities without compromising performance.
Expand Down
Loading

0 comments on commit cc8ef8a

Please sign in to comment.