About
Hi! My name is Phuong Pham (full name is Phuong Ngoc Viet Pham). I am a Ph.D. candidate in Computer Science @Pitt. I am working with professor Jingtao Wang. Welcome to check our lab website and particularly our specific project AttentiveLearner in improving learning experience in large classrooms, flipped classrooms, and Massive Open Online Courses (MOOCs) via unmodified smartphones. Previously, I have had an opportunity to work with professor Janyce Wiebe and professor Rebecca Hwa in a text classification project on clinical data.
Prior to coming to Pitt, I obtained my Bachelor's and Master's degrees in Information Technology and Computer Science at University of Sciences, Ho Chi Minh city, Vietnam.
Research
My research interest include Human-Computer Interaction (HCI), Mobile Interfaces, Intelligent User Interfaces, Machine Learning and Its Applications in User Interfaces and Education, Natural Language Processing (NLP), Deep Learning. Currently, I am working in emotion-aware interface for mobile MOOC learning. We name the app AttentiveLearner, which can infer learner's cognitive and affective states via physiological signals and facial expressions collected from unmodified smartphones. The approach is scalable for large learning environments as MOOC because it can run on today's smartphones without dedicated sensors.
Publications
@inproceedings{pham2018its,
title={Predicting Learners’ Emotions in Mobile MOOC Learning via a Multimodal Intelligent Tutor},
author={Pham, Phuong and Wang, Jingtao},
booktitle={Proceedings of 14th International Conference on Intelligent Tutoring Systems (ITS 2018)},
year={2018}
}
Massive Open Online Courses (MOOCs) are a promising approach for scalable knowledge dissemination. However, they also face major challenges such as low engagement, low retention rate, and lack of personalization. We propose Atten-tiveLearner2, a multimodal intelligent tutor running on unmodified smartphones, to supplement today’s clickstream-based learning analytics for MOOCs. AttentiveLearner2 uses both the front and back cameras of a smartphone as two complementary and fine-grained feedback channels in real time: the back camera monitors learners’ photoplethysmography (PPG) signals and the front camera tracks their facial expressions during MOOC learning. AttentiveLearner2 implicitly infers learners’ affective and cognitive states during learning from their PPG signals and facial expressions. Through a 26-participant user study, we found that: 1) AttentiveLearner2 can detect 6 emotions in mobile MOOC learning reliably with high accuracy (average accuracy = 84.4%); 2) the detected emotions can predict learning outcomes (best R2=50.6%); and 3) it is feasible to track both PPG signals and facial expressions in real time in a scalable manner on today's unmodified smartphones.
-
Predicting Learners’ Emotions in Mobile MOOC Learning via a Multimodal Intelligent Tutor
Phuong Pham, Jingtao Wang
Proceedings of 14th International Conference on Intelligent Tutoring Systems (ITS 2018), 2018.
abstract
bib
@inproceedings{pham2018icassp,
title={Eventness: Object Detection on Spectrograms for Temporal Localization of Audio Events},
author={Pham, Phuong and Li, Juncheng and Szurley, Joseph and Das, Samarjit},
booktitle={International Conference on Acoustics, Speech, and Signal Processing},
year={2018}
}
In this paper, we introduce the concept of Eventness for audio event detection, which can, in part, be thought of as an analogue to Objectness from computer vision. The key observation behind the eventness concept is that audio events reveal themselves as 2-dimensional time-frequency patterns with specific textures and geometric structures in spectrograms. These time-frequency patterns can then be viewed analogously to objects occurring in natural images (with the exception that scaling and rotation invariance properties do not apply). With this key observation in mind, we pose the problem of detecting monophonic or polyphonic audio events as an equivalent visual object(s) detection problem under partial occlusion and clutter in spectrograms. We adapt a state-of-the-art visual object detection model to evaluate the audio event detection task on publicly available datasets. The proposed network has comparable results with a state-of-the-art baseline and is more robust on minority events. Provided large-scale datasets, we hope that our proposed conceptual model of eventness will be beneficial to the audio signal processing community towards improving performance of audio event detection.
-
Eventness: Object Detection on Spectrograms for Temporal Localization of Audio Events
Phuong Pham, Juncheng Li, Joseph Szurley, Samarjit Das
International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2018), 2018.
abstract
bib
@article{trivedi2017nlpreviz,
title={NLPReViz: an interactive tool for natural language processing on clinical text},
author={Trivedi, Gaurav and Pham, Phuong and Chapman, Wendy W and Hwa, Rebecca and Wiebe, Janyce and Hochheiser, Harry},
journal={Journal of the American Medical Informatics Association},
year={2017}
}
The gap between domain experts and natural language processing expertise is a barrier to extracting understanding from clinical text. We describe a prototype tool for interactive review and revision of natural language processing models of binary concepts extracted from clinical notes. We evaluated our prototype in a user study involving 9 physicians, who used our tool to build and revise models for 2 colonoscopy quality variables. We report changes in performance relative to the quantity of feedback. Using initial training sets as small as 10 documents, expert review led to final F1 scores for the “appendiceal-orifice” variable between 0.78 and 0.91 (with improvements ranging from 13.26% to 29.90%). F1 for “biopsy” ranged between 0.88 and 0.94 (−1.52% to 11.74% improvements). The average System Usability Scale score was 70.56. Subjective feedback also suggests possible design improvements.
-
NLPReViz: an interactive tool for natural language processing on clinical text
Gaurav Trivedi, Phuong Pham, Wendy W Chapman, Rebecca Hwa, Janyce Wiebe, Harry Hochheiser
Journal of the American Medical Informatics Association (JAMIA), 2017.
abstract
bib
Project
@inproceedings{pham2017attentivelearner2,
title={AttentiveLearner2: A Multimodal Approach for Improving MOOC Learning on Mobile Devices},
author={Pham, Phuong and Wang, Jingtao},
booktitle={International Conference on Artificial Intelligence in Education},
pages={561--564},
year={2017},
organization={Springer}
}
We propose AttentiveLearner2, a multimodal mobile learning system for MOOCs running on unmodified smartphones. AttentiveLearner2 uses both the front and back cameras of a smartphone as two complementary and fine-grained feedback channels in real time: the back camera monitors learners’ photoplethysmography (PPG) signals and the front camera tracks their facial expressions during MOOC learning. AttentiveLearner2 implicitly infers learners’ affective and cognitive states during learning by analyzing learners’ PPG signals and facial expressions. In a 26-participant user study, we found that it is feasible to detect 6 types of emotion during learning via collected PPG signals and facial expressions and these modalities are complement with each other.
-
AttentiveLearner2: a Multimodal Approach for Improving MOOC Learning on Mobile Devices (Poster)
Phuong Pham, Jingtao Wang
Proceedings of 18th International Conference on Artificial Intelligence in Education (AIED 2017), 2017.
abstract
bib
preprint
@inproceedings{xiao2017dynamics,
title={Dynamics of Affective States During MOOC Learning},
author={Xiao, Xiang and Pham, Phuong and Wang, Jingtao},
booktitle={International Conference on Artificial Intelligence in Education},
pages={586--589},
year={2017},
organization={Springer}
}
We investigate the temporal dynamics of learners' affective states (e.g., engagement/flow, boredom, confusion, frustration, etc.) during video-based learning sessions in Massive Open Online Courses (MOOCs) through a 22-subject user study. Through both quantitative analysis of the temporal transitions of learner affect and qualitative analysis of learners’ subjective feedback, we present a new model to understand and interpret the dynamics of learners’ affective states in MOOC contexts. We also demonstrate the feasibility of predicting learners' moment-to-moment affective states via implicit photoplethysmography (PPG) sensing on unmodified smartphones.
-
Dynamics of Affective States during Mobile MOOC Learning (Poster)
Xiang Xiao, Phuong Pham, and Jingtao Wang
Proceedings of 18th International Conference on Artificial Intelligence in Education (AIED 2017), 2017.
abstract
bib
preprint
@inproceedings{author = {Pham, Phuong and Wang, Jingtao},
title = {Understanding Emotional Responses to Mobile Video Advertisements via Physiological Signal Sensing and Facial Expression Analysis},
booktitle = {Proceedings of the 22nd International Conference on Intelligent User Interfaces},
year = {2017},
publisher = {ACM},
address = {New York, NY, USA},
}
Understanding a target audience’s emotional responses to video advertisements is crucial to stakeholders. However, traditional methods for collecting such information are slow, expensive, and coarse-grained. We propose AttentiveVideo, an intelligent mobile interface with corresponding inference algorithms to monitor and quantify the effects of mobile video advertising. AttentiveVideo employs a combination of implicit photoplethysmography (PPG) sensing and facial expression analysis (FEA) to predict viewers’ attention, engagement, and sentimentality when watching video advertisements on unmodified smartphones. In a 24-participant study, we found that AttentiveVideo achieved good accuracies on a wide range of emotional measures (best average accuracy = 73.59%, kappa = 0.46 across 9 metrics). We also found that the PPG sensing channel and the FEA technique are complimentary. While FEA works better for strong emotions (e.g., joy and anger), the PPG channel is more informative for subtle responses or emotions. These findings show the potential for both low-cost collection and deep understanding of emotional responses to mobile video advertisements.
-
Understanding Emotional Responses to Mobile Video Advertisements via Physiological Signal Sensing and Facial Expression Analysis
Phuong Pham, Jingtao Wang
Proceedings of the 22th International Conference on Intelligent User Interfaces (IUI 2017), 2017.
abstract
bib
preprint
@inproceedings{author = {Pham, Phuong and Wang, Jingtao},
title = {Adaptive Review for Mobile MOOC Learning via Implicit Physiological Signal Sensing},
booktitle = {Proceedings of the 18th ACM International Conference on Multimodal Interaction},
series = {ICMI 2016},
year = {2016},
isbn = {978-1-4503-4556-9},
location = {Tokyo, Japan},
pages = {37--44},
numpages = {8},
url = {http://doi.acm.org/10.1145/2993148.2993197},
doi = {10.1145/2993148.2993197},
acmid = {2993197},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Affective Computing, Heart Rate, Intelligent Tutoring System, MOOC, Mobile Interface, Physiological Signal},
}
Massive Open Online Courses (MOOCs) have the potential to enable high quality knowledge dissemination in large scale at low cost. However, today’s MOOCs also suffer from low engagement, uni-directional information flow, and lack of personalization. In this paper, we propose AttentiveReview, an effective intervention technology for mobile MOOC learning. AttentiveReview infers a learner's perceived difficulty levels of the corresponding learning materials via implicit photoplethysmography (PPG) sensing on unmodified smartphones. AttentiveReview also recommends personalized review sessions through a user-independent model. In a 32-participant user study, we found that: 1) AttentiveReview significantly improved information recall (+14.6%) and learning gain (+17.4%) when compared with the no review condition; 2) AttentiveReview also achieved comparable performances at significantly less time when compared with the full review condition; 3) As an end-to-end mobile tutoring system, the benefits of AttentiveReview outweigh side-effects from false positives and false negatives. Overall, we show that it is feasible to improve mobile MOOC learning by recommending review materials adaptively from rich but noisy physiological signals.
-
Adaptive Review for Mobile MOOC Learning via Implicit Physiological Signal Sensing (Best Student Paper Award)
Phuong Pham, Jingtao Wang
Proceedings of the 18th ACM International Conference on Multimodal Interaction (ICMI 2016), 2016.
abstract
bib
preprint
@inproceedings{author = {Pham, Phuong and Wang, Jingtao},
title = {AttentiveVideo: Quantifying Emotional Responses to Mobile Video Advertisements},
booktitle = {Proceedings of the 18th ACM International Conference on Multimodal Interaction},
series = {ICMI 2016},
year = {2016},
isbn = {978-1-4503-4556-9},
location = {Tokyo, Japan},
pages = {423--424},
numpages = {2},
url = {http://doi.acm.org/10.1145/2993148.2998533},
doi = {10.1145/2993148.2998533},
acmid = {2998533},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Affective Computing, Computational Advertisement, Heart Rate, Mobile Interfaces, Physiological Signal},
}
This demo presents AttentiveVideo, a multi-modal video player that can collect and infer viewers’ emotional responses to video advertisements on unmodified smart phones. When a subsidized video advertisement is playing, AttentiveVideo uses on-lens finger gestures for tangible video control, and employs implicit photoplethysmography (PPG) sensing to infer viewers' attention, engagement, and sentimentality toward advertisements. Through a 24-participant pilot study, we found that AttentiveVideo is easy to learn and intuitive to use. More importantly, AttentiveVideo achieved good accuracies on a wide range of emotional measures (best average accuracy = 65.9%, kappa = 0.30 across 9 metrics). Our preliminary result shows the potential of both low-cost collection and deep understanding of emotional responses to mobile video advertisements.
-
AttentiveVideo: quantifying emotional responses to mobile video advertisements (Demo)
Phuong Pham, Jingtao Wang
ACM International Conference on Multimodal Interaction (ICMI 2016), 2016.
abstract
bib
preprint
@techreport{Author = "Dai, Wei and Li, Juncheng and Pham, Phuong and Das, Samarjit and Qu, Shuhui",
title = "Acoustic Scene Recognition with Deep Neural Networks ({DCASE} Challenge 2016)",
month = "September",
year = "2016",
institution = "DCASE2016 Challenge"
}
Sounds carry a large amount of information about our everyday environment and physical events that take place in it. Complementing visual inputs, sound can be more easily collected and stored. Increasingly machines in various environments can hear, such as smartphones, autonomous robots, or security systems. This work applies stateof-the-art deep learning models that have revolutionized speech recognition to understanding general environmental sounds.
-
Acoustic Scene Recognition with Deep Neural Networks ({DCASE} Challenge 2016) (Techinical report)
Dai, Wei and Li, Juncheng and Pham, Phuong and Das, Samarjit and Qu, Shuhui
Detection and Classification of Acoustic Scenes and Events (DCASE 2016), 2016.
abstract
bib
preprint
@techreport{Author = "Dai, Wei and Li, Juncheng and Pham, Phuong and Das, Samarjit and Qu, Shuhui",
title = "Sound Event Detection for Real Life Audio {DCASE} Challenge",
month = "September",
year = "2016",
pdf = "documents/challenge_technical_reports/Task3/Pham_2016_task3.pdf",
institution = "DCASE2016 Challenge"
}
We explore logistic regression classifier (LogReg) and deep neural network (DNN) on the DCASE 2016 Challenge for task 3, i.e., sound event detection in real life audio. Our models use the Mel Frequency Cepstral Coefficients (MFCCs) and their deltas and accelerations as detection features. The error rate metric favors the simple logistic regression model with high activation threshold on both segment- and event-based contexts. On the other hand, DNN model outperforms the baseline in frame-based context.
-
Sound Event Detection for Real Life Audio {DCASE} Challenge (Technical report)
Dai, Wei and Li, Juncheng and Pham, Phuong and Das, Samarjit and Qu, Shuhui
Detection and Classification of Acoustic Scenes and Events (DCASE 2016), 2016.
abstract
bib
preprint
@InProceedings{
author = {Xiang Xiao, Phuong Pham, Jingtao Wang},
title = {AttentiveLearner: Adaptive Mobile MOOC Learning via Implicit Cognitive States Inference},
booktitle = {ACM International Conference on Multimodal Interaction (ICMI 2015)},
month = {November},
year = {2015}
}
This demo presents AttentiveLearner, a mobile learning system optimized for consuming lecture videos in Massive Open Online Courses (MOOCs) and flipped classrooms.
-
AttentiveLearner: Adaptive Mobile MOOC Learning via Implicit Cognitive States Inference (Demo)
Xiang Xiao, Phuong Pham, Jingtao Wang
ACM International Conference on Multimodal Interaction (ICMI 2015), 2015.
abstract
bib
preprint
@InProceedings{
author = {Gaurav Trivedi, Phuong Pham, Wendy Chapman, Rebecca Hwa, Janyce Wiebe, Harry Hochheiser},
title = {Bridging the Natural Language Processing Gap: An Interactive Clinical Text Review Tool},
booktitle = {Proceedings of the 2015 AMIA Summit on Clinical Research Informatics (CRI 2015)},
month = {March},
year = {2015}
}
@inproceedings{aied2015,
title={AttentiveLearner: Improving Mobile MOOC Learning via Implicit Heart Rate Tracking},
author={Pham, Phuong and Wang, Jingtao},
booktitle={Artificial Intelligence in Education},
year={2015},
organization={Springer}
}
We present AttentiveLearner, an intelligent mobile learning system optimized for consuming lecture videos in both Massive Open Online Courses (MOOCs) and flipped classrooms. AttentiveLearner uses on-lens finger ges- tures as an intuitive control channel for video playback. More importantly, AttentiveLearner implicitly extracts learners’ heart rates and infers their atten- tion by analyzing learners’ fingertip transparency changes during learning on today's unmodified smart phones. In a 24-participant study, we found heart rates extracted from noisy image frames via mobile cameras can be used to pre- dict both learners' "mind wandering" events in MOOC sessions and their per- formance in follow-up quizzes. The prediction performance of AttentiveLearner (accuracy = 71.22%, kappa = 0.22) is comparable with existing research using dedicated sensors. AttentiveLearner has the potential to improve mobile learn- ing by reducing the sensing equipment required by many state-of-the-art intelli- gent tutoring algorithms.
-
AttentiveLearner: Improving Mobile MOOC Learning via Implicit Heart Rate Tracking
Phuong Pham, Jingtao Wang
Proceedings of 17th International Conference on Artificial Intelligence in Education (AIED 2015), 2015.
abstract
bib
preprint
We present a prototype tool to review the results of natural language processing methods to extract structured variables from clinical text. We provide novel interactive visualizations to help the users understand these results and make any necessary corrections, thus helping improve the accuracy of the extracted results.
-
Bridging the Natural Language Processing Gap: An Interactive Clinical Text Review Tool (Poster)
Gaurav Trivedi, Phuong Pham, Wendy Chapman, Rebecca Hwa, Janyce Wiebe, Harry Hochheiser
Proceedings of the 2015 AMIA Summit on Clinical Research Informatics (CRI 2015), 2015.
abstract
bib
preprint
@InProceedings{
author = {Gaurav Trivedi, Phuong Pham, Wendy Chapman, Rebecca Hwa, Janyce Wiebe and Harry Hochheiser},
title = {An Interactive Tool for Natural Language Processing on Clinical},
booktitle = {The 4th Workshop on Visual Text Analytics},
year = {2015}
}
Natural Language Processing (NLP) systems often make use of machine learning techniques that are unfamiliar to end- users who are interested in analyzing clinical records. Al- though NLP has been widely used in extracting information from clinical text, current systems generally do not support model revision based on feedback from domain experts.
We present a prototype tool that allows end users to visualize and review the outputs of an NLP system that extracts binary variables from clinical text. Our tool combines mul- tiple visualizations to help the users understand these results and make any necessary corrections, thus forming a feedback loop and helping improve the accuracy of the NLP models. We have tested our prototype in a formative think-aloud user study with clinicians and researchers involved in colonoscopy research. Results from semi-structured interviews and a Sys- tem Usability Scale (SUS) analysis show that the users are able to quickly start refining NLP models, despite having very little or no experience with machine learning. Observations from these sessions suggest revisions to the interface to better support review workflow and interpretation of results.
-
An Interactive Tool for Natural Language Processing on Clinical
Gaurav Trivedi, Phuong Pham, Wendy Chapman, Rebecca Hwa, Janyce Wiebe and Harry Hochheiser
The 4th Workshop on Visual Text Analytics, 2015.
abstract
bib
preprint
@InProceedings{
author = {Phuong Pham, Janyce Wiebe, Rebecca Hwa, Wendy Chapman},
title = {Automated Annotation on Colonoscopy Reports},
booktitle = {Proceedings of the 2013 AMIA Summit on Clinical Research Informatics (CRI 2013)},
month = {March},
year = {2013},
pages = {201},
url = {http://proceedings.amia.org/C2013-nav/}
}
This work analyzes the viability of using automatic methods to evaluate the quality of colonoscopy procedures from free-text patient charts. We find that while simple text search suffice for some quality measures, others require more sophisticated methods. Experimental results suggest that quality measures extracted by a rule-based natural language processing system and an automatic machine learning classifier rival judgments of human reviewers.
-
Automated Annotation on Colonoscopy Reports (Poster)
Phuong Pham, Janyce Wiebe, Rebecca Hwa, Wendy Chapman
Proceedings of the 2013 AMIA Summit on Clinical Research Informatics (CRI 2013), 2013.
abstract
bib
preprint
Misc
Welcome to join us at Mobile Interfaces and Pedagogical Systems Group at University of Pittsburgh.
Thanks Lingjia Deng for the HTML template.
Last Modified:
04/24/2017
|