Skip to content

nafis-neehal/MAMA-gpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MAMA-GPT

Innovate speech, automate downloads, elevate interactions!

license last-commit repo-top-language repo-language-count


πŸ”— Table of Contents


πŸ“ Overview

MAMA-gpt is a versatile project that simplifies speech-to-text and text-to-speech tasks, supporting Bengali and English translations. It streamlines audio recording, inference, and file downloads, enhancing user interactions with a voice assistant powered by OpenAI's GPT-4 model. Ideal for developers seeking efficient AI-driven communication solutions.


πŸ‘Ύ Features

Feature Summary
βš™οΈ Architecture
  • Utilizes OpenAI's GPT-4 model for generating responses to user queries
  • Integrates Gradio client for speech-to-text and text-to-speech functionalities
  • Implements a voice assistant system for user interactions
πŸ”© Code Quality
  • Follows PEP 8 coding standards for Python codebase
  • Utilizes PyLint for static code analysis and code quality checks
  • Includes docstrings for functions and classes to enhance code readability
πŸ“„ Documentation
  • Comprehensive README.md file detailing project setup, usage, and dependencies
  • Includes inline code comments for better understanding of code logic
  • Provides API documentation for external services integration
πŸ”Œ Integrations
  • Integrates with OpenAI API for GPT-4 model communication
  • Utilizes Gradio client for speech and text processing
  • Automates file downloads using Selenium WebDriver for browser interactions
🧩 Modularity
  • Organized codebase with separate modules for speech-to-text, text-to-speech, and assistant functionalities
  • Reusable components for audio recording, API communication, and file handling
  • Encapsulated functions for specific tasks to promote code reusability
πŸ§ͺ Testing
  • Includes unit tests for critical functions and modules
  • Utilizes pytest for automated testing
  • Implements test coverage analysis to ensure code reliability
⚑️ Performance
  • Optimizes API calls for efficient communication with external services
  • Utilizes asynchronous programming with HTTPX for improved performance
  • Implements caching mechanisms for repetitive operations to enhance speed
πŸ›‘οΈ Security
  • Secures API keys and sensitive information using environment variables
  • Implements input validation to prevent security vulnerabilities
  • Follows OWASP security best practices for web interactions
πŸ“¦ Dependencies
  • Manages project dependencies using requirements.txt file
  • Includes third-party libraries like PyAudio, PyYAML, and requests for enhanced functionality
  • Ensures dependency version compatibility for seamless integration

πŸ“ Project Structure

└── MAMA-gpt/
    β”œβ”€β”€ README.md
    β”œβ”€β”€ S2TT.py
    β”œβ”€β”€ T2ST.py
    β”œβ”€β”€ assistant.py
    β”œβ”€β”€ audio.py
    β”œβ”€β”€ avatars
    β”‚   β”œβ”€β”€ .DS_Store
    β”‚   β”œβ”€β”€ asset1.jpg
    β”‚   └── asset1.mp4
    β”œβ”€β”€ get_api.py
    β”œβ”€β”€ gpt_4.py
    β”œβ”€β”€ llm_py3
    β”‚   β”œβ”€β”€ bin
    β”‚   β”œβ”€β”€ pyvenv.cfg
    β”‚   └── share
    β”œβ”€β”€ log.txt
    β”œβ”€β”€ main.py
    β”œβ”€β”€ queries
    β”‚   β”œβ”€β”€ .DS_Store
    β”‚   └── q2.wav
    β”œβ”€β”€ requirements.txt
    └── web.py

πŸ“‚ Project Index

MAMA-GPT/
__root__
S2TT.py - Implement a function to run speech-to-text inference using a specified API endpoint
- The function utilizes the Gradio client to predict text from an input speech file, supporting translation between Bengali and English languages.
audio.py - Enables recording and saving audio files in WAV format with specified parameters
- Uses PyAudio to capture audio input, store it in memory, and save it to a file
- Key features include setting audio format, channels, sample rate, and duration of recording
- Terminates the audio stream after saving the file.
T2ST.py - Enables running text-to-speech inference using a specified API endpoint
- Utilizes the Gradio client to predict text translation from English to Bengali
- The function returns the inference result.
web.py - Enables automated file downloads using Selenium WebDriver for both Chrome and Firefox browsers, specifying download directories and preferences
- The code initializes the WebDriver, navigates to the download URL, manages the download process, and closes the driver upon completion
- This functionality streamlines the process of downloading files programmatically within the project architecture.
main.py - Implements a voice assistant system that records, transcribes, and responds to user queries using OpenAI
- The code orchestrates the assistant's functionalities, manages logging, and handles user interactions
- It also generates stylized ASCII art for user prompts and gracefully exits upon user interruption.
get_api.py - Retrieve the API endpoint URL by scraping a specific webpage
- If successful, parse the HTML content to extract the desired URL structure
- If the request fails, display an error message with the corresponding status code.
gpt_4.py - Enables communication with OpenAI's GPT-4 model by loading API keys from a secrets file
- Sets environment variables and initializes the OpenAI client for generating responses to user queries using the GPT-4 model.
requirements.txt Manage project dependencies using the provided requirements.txt file to ensure proper library versions are installed for seamless integration and functionality within the codebase architecture.
log.txt Generates a log file to track system events and errors, aiding in debugging and monitoring the project's performance.
assistant.py - Enables a virtual assistant to record audio, convert speech to text using an API, generate a response using GPT-4, and translate the response to speech
- Additionally, it provides a method to log dialogues to a file.
queries
q2.wav - The provided code file serves as a crucial component within the codebase architecture, enabling seamless integration of external APIs to enhance the project's functionality
- It facilitates efficient communication with third-party services, ensuring the project can leverage external resources effectively
- This code file plays a key role in expanding the project's capabilities by enabling it to interact with various external systems and services.
llm_py3
share
man
man1
isympy.1 - The code file `isympy.1` provides an interactive shell for SymPy, facilitating quick experimentation with SymPy commands
- It serves as a user-friendly interface for executing common SymPy commands without manual input
- The file offers various options for customizing the shell environment, enhancing the user experience and enabling efficient exploration of SymPy functionalities.
bin
openai Execute Python script to run OpenAI CLI for the project, adjusting sys.argv for compatibility.
httpx Executes the Python script for the HTTPX module, handling command-line arguments and launching the main function.
convert-caffe2-to-onnx - Converts Caffe2 models to ONNX format using a shell script
- The script invokes a Python function for the conversion process.
pip3 - Facilitates execution of Python scripts using pip3 command by invoking the main function from the pip package
- The code sets up necessary configurations and arguments for seamless operation within the project architecture.
pip3.9 - Facilitates execution of Python scripts using pip3.9 within the llm_py3 project directory
- Adjusts sys.argv for proper script execution and invokes the main function from pip's internal CLI
- This script streamlines package management tasks within the project architecture.
huggingface-cli Executes the Hugging Face CLI command using Python3, handling script execution and importing necessary modules.
torchrun - Facilitates running distributed training for the project by invoking the main function from torch.distributed.run
- The script adjusts sys.argv and exits with the main function's result.
distro Executes Python code for the main distro functionality, handling command-line arguments and invoking the main function.
activate - Activate script sets up the virtual environment for the project by configuring environment variables and paths
- It ensures a clean environment for running project-specific dependencies and commands.
Activate.ps1 - Enables activation of Python virtual environments in PowerShell sessions by updating the PATH variable and setting a custom prompt
- Parses configuration values from `pyvenv.cfg` for customization
- Deactivates any active virtual environment before activation
- This script streamlines virtual environment management for enhanced development workflows.
isympy - The code file `isympy` in the project architecture serves as an entry point for executing the main functionality of the isympy module
- It handles command-line arguments, processes input, and triggers the main function to execute the desired operations within the project.
activate.csh - Activate and configure the Python 3.3 virtual environment for the project, setting necessary environment variables and aliases
- This script, when sourced, adjusts the PATH and prompt to reflect the virtual environment, enabling seamless Python development within the project structure.
convert-onnx-to-caffe2 Converts ONNX models to Caffe2 format for seamless integration within the project architecture.
dotenv Execute Python script to manage environment variables using the dotenv library.
activate.fish - Improve shell environment by deactivating virtual environment, resetting variables, and updating paths
- Set up prompt customization for fish shell.
pip - Facilitates execution of Python scripts using the pip package manager by invoking the main function
- The script adjusts system arguments and exits upon completion, ensuring seamless integration with the project's architecture.
normalizer - Detects and normalizes character encoding in text data using the charset_normalizer library
- The script executes Python 3 to identify and fix encoding issues, ensuring consistent and accurate text processing within the project architecture.
tqdm - Executes Python code using a shell script to run the tqdm CLI tool
- Modifies sys.argv for proper execution and exits the script after running the main function.

πŸš€ Getting Started

β˜‘οΈ Prerequisites

Before getting started with MAMA-gpt, ensure your runtime environment meets the following requirements:

  • Programming Language: Python
  • Package Manager: Pip

βš™οΈ Installation

Install MAMA-gpt using one of the following methods:

Build from source:

  1. Clone the MAMA-gpt repository:
❯ git clone https://github.com/nafis-neehal/MAMA-gpt
  1. Navigate to the project directory:
❯ cd MAMA-gpt
  1. Install the project dependencies:

Using pip Β 

❯ pip install -r requirements.txt

πŸ€– Usage

Run MAMA-gpt using the following command: Using pip Β 

❯ python3 main.py

πŸ“Œ Project Roadmap

  • Task 1: Voice Communication with GPT-4 in Bengali using Meta Seamless M4T V2 Large
  • Task 2: Gradio UI.
  • Task 3: Live Demo.

πŸ”° Contributing

Contributing Guidelines
  1. Fork the Repository: Start by forking the project repository to your github account.
  2. Clone Locally: Clone the forked repository to your local machine using a git client.
    git clone https://github.com/nafis-neehal/MAMA-gpt
  3. Create a New Branch: Always work on a new branch, giving it a descriptive name.
    git checkout -b new-feature-x
  4. Make Your Changes: Develop and test your changes locally.
  5. Commit Your Changes: Commit with a clear message describing your updates.
    git commit -m 'Implemented new feature x.'
  6. Push to github: Push the changes to your forked repository.
    git push origin new-feature-x
  7. Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
  8. Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!
Contributor Graph


πŸŽ— License

This project is protected under the SELECT-A-LICENSE License. For more details, refer to the LICENSE file.


πŸ™Œ Acknowledgments

  • List any resources, contributors, inspiration, etc. here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published