DialogueAnalysis

Data Pre-Processing In order to use this scripts ensure that the chat logs have been preprocessed and the dataset contains only the following columns: [SessionID, UserID, Speaker, Utterance, Tag] saved in a .csv file e.g. sample_corpus1.csv

About Corpus:
The dialogue corpus in sample_corpus1 shows a one-to-one turn-taking interaction where each speaker takes a turn and each turn is an utterance. The dialogue corpus in sample_corpus2 shows does not necessarily follow a one-to-one turn-taking paradigm. Each speaker can have multiple turns before their partner responses. This shows a more naturalistic exchange between two speakers.

Note: Utterances can be split in either of the cases above to capture appropriate dialogue moves. Also, splitting dialogues in a one-to-one dialogue corpus can result in loss of the one-to-one speaker flow.

Step 1: Generate the n-grams for student partner exchanges (G_stu, G_par). Note: Each participant is captured as a student and a partner

//Generating n-grams (unigrams and bigrams) and assigning "_stu" and "_par"
Assuming no error: an output file will be generated in the same directory ngram_output.csv The ngram_output will contain the [User, Ngram, Utterance 1, Utterance 2]

In the command line:

Go to directory of your scripts
Enter $ python3 ngram_gen.py sample_corpus1.csv
click enter

Output:
- a csv file (output_corpus1.csv) with all generated unigrams and bigrams, and their corresponding utterances
- Total number of unigrams: (48)
- Total number of bigrams: (43)

Step 2: Build a datatable that shows the count of n-grams for each student within their session

//Generating ngram datatable In the command line:

Go to directory of your scripts
Enter $ python3 ngram_datatable.py output_corpus1.csv
click enter

Output:
- a csv file (output_ngramTable.csv) of n-grams predictors

Step 3: Apply a threshold to intentionally include ngrams that occur in X% of converstions (test X for 10, 25, 33 and 50) to establish the most appropriate threshold for the dataset wih respect to the sample. Note: The resulting total number of ngrams will be used to run a generalized regression model. Best practices for running regression model encourages the number of predictors less than the sample size. However, since we are using an exhaustive best subset method, the number of ngramas can be greater than the sample size, within a reasonable threshold.

For instance, in the output_ngramTable.csv, sample size is 10, and the number of n-grams is (48+43 = 91). In the dimension_reducer.py, test different values of X to establish an appropriate number of ngram predictors.

In command line:

Go to directory of your scripts
Enter $ python3 ngram_dimensionReducer.py output_ngramTable.csv
click enter

Output:
- a csv file (output_ngramTable_Xpercent.csv) of X% of the n-grams predictors (currently missing the Username)
- a csv file (output_ngram_numRow.csv) of the ngrams and the number of rows (i.e. number of sessions theuy occur in (not the number of times they occur***))

Step 4: Go To JMP (preferred JMP Pro 15):

open the resulting reduced n-grams predictors dataset (output_ngramTable_Xpercent.csv)
Add a new columns with outcomes measures
Go to Analyze -> Fit Model
- select all predictors and add to "Construct Model Effects"
- select outcome variable and add to "Pick Role Variables"
Go to "Personality"
- select "Generalized Regression"
- click Run
In Model Launch, go to "Estimation Method"
- select Best Subset -click Go

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.DS_Store		.DS_Store
README.md		README.md
ngram_datatable.py		ngram_datatable.py
ngram_dimensionReducer.py		ngram_dimensionReducer.py
ngram_gen.py		ngram_gen.py
sample_corpus1.csv		sample_corpus1.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DialogueAnalysis

About

Releases

Packages

Languages

gkatuka/DialogueAnalysis

Folders and files

Latest commit

History

Repository files navigation

DialogueAnalysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages