Skip to content

uhh-lt/dats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

DATS Logo

The AI-powered platform for multi-modal discourse analysis.


Quick start โ€ข Why DATS? โ€ข Demo โ€ข User Guide โ€ข FAQ โ€ข Wiki โ€ข D-WISE

Demo Licence Pre-commit Release DOI


Discourse Analysis Tool Suite (DATS)

DATS is a machine-learning powered web application for multi-modal discourse analysis. It provides tools for the typical workflow of a discsourse analysis project including data collection, data management, exploration, annotation, qualitative & quantitative analysis, interpratation and reflection. See the Features section to learn more about the various functionalities.

Why DATS?

  • Multi-modal: Support for ๐Ÿ“ text, ๐Ÿ–ผ image, ๐ŸŽตaudio, and ๐ŸŽž video documents
  • Multi-lingual: Support for ๐Ÿ‡บ๐Ÿ‡ธ english, ๐Ÿ‡ฉ๐Ÿ‡ช german, ๐Ÿ‡ฎ๐Ÿ‡น italian and more
  • โš™๏ธ Extensive pre-processing (e.g. automatic transcriptions, entity identification, keyword extraction, ...) ease data mangement
  • ๐Ÿค– AI Assistance: state-of-the-art machine-learning and large language models assist with time-consuming tasks
  • ๐Ÿ‘ฅ Collaborate with your team in shared projects
  • ๐Ÿ“ฅ Export data to continue your project with other tools
  • ๐Ÿ’ป No software installation or special hardware is required
  • ๐Ÿ”“ Free open source software

Quick start

The best way to getting started is to watch our Tutorial Video Series, read the User Guide and play with DATS on our Demo Instance.

Host it yourself

0. Requirements

  • Machine with NVIDIA GPU
  • Docker with NVIDIA Container Toolkit

1. Clone the repository

git clone https://github.com/uhh-lt/dats.git

2. Run setup scripts

./bin/setup-envs.sh --project_name dats --port_prefix 101
./bin/setup-folders.sh

3. Start docker containers

docker compose -f compose.ollama.yml up -d
docker compose -f compose.yml -f compose.production.yml up --wait

4. Open DATS

Open https://localhost:10100/ in your browser

Ask for a hosted instance

Hosted instance @ HCDS

We may be able to host DATS for your research institute. Please contact the House of Computing and Data Science (HCDS) here.

Further reading

  • User Guide: If you want to use DATS, we recommend to start looking at the Features below and playing around with the tool. If you have questions, you may find help in the User Guide or in the FAQ. If you encounter problems or bugs, please leave us some feedback.
  • Admin Guide: See the quick start guide above. For more information on how to configure DATS on a server, please see the Admin Guide.
  • Developer Guide: DATS is open source software. If you want to contribute to the project, please start with the Developer Guide.

Feedback?

DATS is still under development, so please feel free to give us feedback, tell us your wishes or report bugs:

  • For feedback, please write us
  • To report bugs, please open an issue on GitHub

Features

Data collection

DATS can handle most data formats for text, image, audio, and video documents. You can easily upload your files to DATS. It also offers an integrated crawler implemented with Scrapy and Beautifulsoup to scrape websites and their images in case additional material is required.

Data pre-processing

Pre-processing

DATS automatically pre-processes documents as they are uploaded. This process extracts metadata and enriches the material with additional information, including:

  • Named entity recognition (people, organizations, locations, etc.)
  • Object detection in images and videos (cars, people, buildings, etc.)
  • Image captioning
  • Automatic speech recognition (transcription)

This feature enables you to precisely filter documents by keywords, entities, and other criteria later on.

Data management

Data Management in DATS

DATS makes it easy to organize and analyze your data. Each document can be assigned metadata โ€“ some of which DATS detects automatically โ€“ to help you categorize and find what you need. You can also add your own tags to documents.

Filtering and search options let you quickly sift through your data. Find documents containing specific keywords, entities (like people, organizations, or locations), or other criteria. This flexible system keeps you in control of your data and ensures you can quickly find the information that matters most to your research.

AI Assistance in DATS for Document Tagging

DATS offers an AI Assistant that can help you streamline your data management tasks. The AI Assistant can suggest tags and extract metadata for your documents, making it even easier to organize data.

Read more about LLM Assistance in our publlication Exploring Large Language Models for Qualitative Data Analysis.

Exploration

DATS makes exploring your data easy and intuitive. Its similarity search allows you to quickly find related documents, even across different modalities.

Found an interesting article? DATS can instantly find others like it. Discovered a key image? DATS can locate similar images, or even text documents that relate to the same concept. This cross-modal capability unlocks new ways to explore connections within your data. This feature may help you to uncover hidden connections between documents and gain a deeper understanding of your data.

Further, when viewing search results, DATS presents an overview of the most frequent keywords, tags, and entities found within those documents. This frequency analysis feature allows you to:

  • Spot key themes: Quickly grasp the main topics being discussed.
  • Discover new avenues for research: Identify potentially relevant keywords or entities you hadn't previously considered.
  • Refine your searches: Use the frequency list to add new search terms or filters, leading you to new documents and a deeper understanding of your corpus.

Annotation

DATS provides tools for text (span & sentence) and image annotation. Annotating audio and video documents directly is not (yet) supported. Instead, the automatically generated transcript can be used.

For example, the sentence annotator enables you to:

  • Highlight important passages: Easily mark key sections of the text.
  • Develop a code hierarchy: Create a structured taxonomy of codes and sub-codes to organize your analysis. DATS's interface - allows you to easily manage and update this code hierarchy as your research evolves.
  • Collaborate with others: Codes and annotations are shared with colleagues, fostering teamwork and discussion.

Sentence Annotation in DATS with AI Assistance

The AI Assistant integrated in DATS can also help you with the annotation process. It can suggest relevant text annotations, which you can then review and accept or reject. This can save you a lot of time and effort, especially if you are working with a large dataset.

Read more about the Sentence Annotation feature in our publication Semi-automatic Sequential Sentence Classification in the Discourse Analysis Tool Suite.

Analysis

DATS offers various tools for qualitative and quantitative analysis including Word- and Code-Frequency, or timeline analyses. The more-advanced Concept-over-time analysis is explained below.

Concept-over-time Analysis

Concept Over Time Analysis in DATS

DATS includes Concept Over Time Analysis, a feature that allows you to visualize how concepts evolve over time within your data. With the Concept Over Time Analysis feature, you can:

  • Define and refine your concepts of interest.
  • Visualize the occurrence of concepts over time.
  • Uncover patterns, trends, and shifts in discourse.
  • Gain a deeper understanding of how concepts change.

To use Concept Over Time Analysis, you first define the concepts you are interested in. For example, if you are interested in the concept of "democracy", you would provide a short description of what you mean by "democracy". DATS uses this description to identify relevant sentences in your data. You can then review these sentences and provide feedback to DATS, which helps to refine the concept and improve the accuracy of the analysis. Finally, the occurrence of the concept over time analysis are visualized.

DATS's Concept Over Time Analysis is a valuable tool for qualitative data analysis, providing a unique perspective on the dynamics of discourse. Read more about COTA in our publication Concept Over Time Analysis: Unveiling Temporal Patterns for Qualitative Data Analysis.

Interpretation

Whiteboard

DATS features interactive Whiteboards that provide a customizable graph-based interface to organize and manipulate your research objects and analyses. With Whiteboards, you can:

  • Visualize your data and analyses in a flexible and customizable way.
  • Organize and refine your code taxonomies.
  • Keep track of your research process and findings.
  • Create a variety of visualizations, including sampling maps and actor networks, to gain new insights into your data.

To use Whiteboards, you simply drag and drop your research objects onto the canvas. You can then connect them with edges to represent relationships between them. You can also add text, shapes, and images to your Whiteboards to further annotate your data. Whiteboards are a powerful way to interact with your data, making it easier to conduct qualitative data analysis and uncover hidden connections.

Read more about the Whiteboards in our publication Extending the Discourse Analysis Tool Suite with Whiteboards for Visual Qualitative Analysis.

Reflection

DATS provides tools for reflection and documentation that are seamlessly integrated into your workflow, helping you to capture and organize your thoughts throughout the research process:

  • Memos: Capture your thoughts and ideas as you work by attaching notes to documents, annotations, codes, and tags. This ensures that valuable insights are not lost and provides a rich record of your evolving interpretations.
  • Logbook: Summarize your findings and document your research process in a logbook. You could use it to track your progress, identify patterns in your analysis, or ensure the transparency and reproducibility of your research.