Jae (Jaewook) Lee

HCI Researcher Studying Augmented Reality (AR), Human-AI interaction, and Accessibility.

About

About Me

Jae (Jaewook/재욱) Lee

HCI Ph.D. Student

Hey there! I'm Jae. I'm a Human-Computer Interaction (HCI) Ph.D. student at the University of Washington, advised by Prof. Jon Froehlich. I am currently supported by the NSF Graduate Research Fellowship (NSF GRFP). My research interest is within the intersection of Augmented Reality (AR), Human-AI interaction, and Accessibility. I design context-aware AR solutions, which offer tailored assistance throughout everyday tasks. By evaluating these systems, I aim to understand their potential to assist not only the general public, but also the blind or low vision (BLV) and deaf or hard of hearing (DHH) communities. I am re-imagining AR for all.

Outside research, I hang out with friends, play tennis, cook Korean food, listening to music, and play/make video games!

I'm open to chatting about research & pretty much anything else! The best way to reach me is to my email: jaewook4 [at] cs [dot] washington [dot] edu

Education

  • 2022 - Present

    University of Washington

    Doctorate Degree
  • 2018-2022

    University of Illinois Urbana-Champaign

    Bachelor Degree

Experience

  • Jun. 2024 - Oct. 2024

    Niantic

    Research Scientist Intern
  • Jun. 2023 - Nov. 2023

    Meta

    Research Scientist Intern
  • Jun. 2022 - Aug. 2022

    NASA

    AR/VR Intern
  • May 2021 - Aug. 2022

    Microsoft Research

    Open Source Researcher
Papers

Publications

  • A first-person point-of-view of a low vision participant using ARSports. Different elements of basketball and tennis, such as balls and nets, are overlayed with colored augmentations.

    ISMAR '24 Workshop

    🏆 Best Paper

    AI-Powered AR for Enhancing Sports Playability for People with Low Vision: An Exploration of ARSports

    Jaewook Lee, Yang Li, Dylan Bunarto, Eujean Lee, Olivia Wang, Adrian Rodriguez, Yuhang Zhao, Yapeng Tian, Jon E. Froehlich

    People with low vision (LV) experience challenges in visually tracking balls and players in sports like basketball and tennis, which can adversely impact their health. While existing research has studied video analysis for sports viewing, proposed camera-based assistance for casual non-ball sports, and contributed third-person perspective sports datasets, none are suitable for enhancing first-person sports playability. We present ARSports, a wearable AR research prototype that overlays instance segmentation masks in real-time for improving sports accessibility. ARSports also consists of first-person perspective sports datasets, which we manually collected and annotated, and fine-tuned instance segmentation models. Our evaluations suggest that combining real-time computer vision and augmented reality to create scene-aware visual augmentations is a promising approach to enhancing sports participation for LV individuals. We contribute open-sourced egocentric basketball and tennis datasets and models, as well as insights and design recommendations from our pilot study with an LV research team member.

  • On the left, a low vision participant is using CookAR in a tool pickup task. On the right is the first-person point of view showing green augmentations overlayed on grabbable areas of cooking tools and red augmentations on hazardous areas.

    UIST '24 11 October 2024

    🏆 Best Paper

    CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision

    Jaewook Lee, Andrew D. Tjahjadi, Jiho Kim, Junpu Yu, Minji Park, Jiawen Zhang, Jon E. Froehlich, Yapeng Tian, Yuhang Zhao

    Cooking is a central activity of daily living, supporting independence as well as mental and physical health. However, prior work has highlighted key barriers for people with low vision (LV) to cook, particularly around safely interacting with tools, such as sharp knives or hot pans. Drawing on recent advancements in computer vision (CV), we present CookAR, a head-mounted AR system with real-time object affordance augmentations to support safe and efficient interactions with kitchen tools. To design and implement CookAR, we collected and annotated the first egocentric dataset of kitchen tool affordances, fine-tuned an affordance segmentation model, and developed an AR system with a stereo camera to generate visual augmentations. To validate CookAR, we conducted a technical evaluation of our fine-tuned model as well as a qualitative lab study with 10 LV participants for suitable augmentation design. Our technical evaluation demonstrates that our model outperforms the baseline on our tool affordance dataset, while our user study indicates a preference for affordance augmentations over the traditional whole object augmentations.

  • On the left is a female user using EARLL while reading a book. On the right is a first-person point of view, which shows a textbox with 'book' in English and its French translation.

    UIST '24 Demo 11 October 2024

    Embodied AR Language Learning Through Everyday Object Interactions: A Demonstration of EARLL

    Jaewook Lee*, Sieun Kim*, Minji Park, Catherine Rasgaitis, Jon E. Froehlich

    Learning a new language is an exciting and important, yet often challenging journey. To support foreign language acquisition, we introduce EARLL, an embodied and context-aware language learning application for AR glasses. EARLL leverages real-time computer vision and depth sensing to continuously segment and localize objects in users' surroundings, check for hand-object manipulations, and then subtly trigger foreign vocabulary prompts relevant to that object. In this demo paper, we present our initial EARLL prototype and highlight current challenges and future opportunities with always-available, wearable, embodied AR language learning.

  • A first-person point-of-view image containing two people with their faces blurred to enhance security.

    USINEX '24 14 August 2024

    When the User Is Inside the User Interface: An Empirical Study of UI Security Properties in Augmented Reality

    Kaiming Cheng, Arka Bhattacharya, Michelle Lin, Jaewook Lee, Aroosh Kumar, Jeffery F. Tian, Tadayoshi Kohno, Franziska Roesner

    Augmented reality (AR) experiences place users inside the user interface (UI), where they can see and interact with three-dimensional virtual content. This paper explores UI security for AR platforms, for which we identify three UI security-related properties: Same Space (how does the platform handle virtual content placed at the same coordinates?), Invisibility (how does the platform handle invisible virtual content?), and Synthetic Input (how does the platform handle simulated user input?). We demonstrate the security implications of different instantiations of these properties through five proof-of-concept attacks between distrusting AR application components (i.e., a main app and an included library)\dash including a clickjacking attack and an object erasure attack. We then empirically investigate these UI security properties on five current AR platforms: ARCore (Google), ARKit (Apple), Hololens (Microsoft), Oculus (Meta), and WebXR (browser). We find that all platforms enable at least three of our proof-of-concept attacks to succeed. We discuss potential future defenses, including applying lessons from 2D UI security and identifying new directions for AR UI security.

  • A person looking and pointing at a bag of chips with foreign language written on it, while asking 'What is this?

    CHI '24 11 May 2024

    GazePointAR: A Context-Aware Multimodal Voice Assistant for Pronoun Disambiguation in Wearable Augmented Reality

    Jaewook Lee, Jun Wang, Elizabeth Brown, Liam Chu, Sebastian S Rodriguez, Jon E. Froehlich

    Voice assistants (VAs) like Siri and Alexa are transforming human-computer interaction; however, they lack awareness of users’ spatiotemporal context, resulting in limited performance and unnatural dialogue. We introduce GazePointAR, a fully-functional context-aware VA for wearable augmented reality that leverages eye gaze, pointing gestures, and conversation history to disambiguate speech queries. With GazePointAR, users can ask “what’s over there?” or “how do I solve this math problem?” simply by looking and/or pointing. We evaluated GazePointAR in a three-part lab study (N=12): (1) comparing GazePointAR to two commercial systems, (2) examining GazePointAR’s pronoun disambiguation across three tasks; (3) and an open-ended phase where participants could suggest and try their own context-sensitive queries. Participants appreciated the naturalness and human-like nature of pronoun-driven queries, although sometimes pronoun use was counter-intuitive. We then iterated on GazePointAR and conducted a first-person diary study examining how GazePointAR performs in-the-wild. We conclude by enumerating limitations and design considerations for future context-aware VAs.

  • A drawing demonstrating RASSAR. Upon scanning a home environment, RASSAR marks different accessibility issues using icons.

    CHI '24 11 May 2024

    RASSAR: Room Accessibility and Safety Scanning in Augmented Reality

    Xia Su, Han Zhang, Kaiming Cheng, Jaewook Lee, Qiaochu Liu, Wyatt Olson, Jon E. Froehlich

    The safety and accessibility of our homes is critical to quality of life and evolves as we age, become ill, host guests, or experience life events such as having children. Researchers and health professionals have created assessment instruments such as checklists that enable homeowners and trained experts to identify and mitigate safety and access issues. With advances in computer vision, augmented reality (AR), and mobile sensors, new approaches are now possible. We introduce RASSAR, a mobile AR application for semi-automatically identifying, localizing, and visualizing indoor accessibility and safety issues such as an inaccessible table height or unsafe loose rugs using LiDAR and real-time computer vision. We present findings from three studies: a formative study with 18 participants across five stakeholder groups to inform the design of RASSAR, a technical performance evaluation across ten homes demonstrating state-of-the-art performance, and a user study with six stakeholders. We close with a discussion of future AI-based indoor accessibility assessment tools, RASSAR’s extensibility, and key application scenarios.

  • A first-person point-of-view image of a low vision person playing tennis while using ARTennis. The tennis ball is covered by a red dot, while four green arrows point towards the ball, forming a crosshair.

    UIST '23 Demo 29 October 2023

    Towards Real-time Computer Vision and Augmented Reality to Support Low Vision Sports: A Demonstration of ARTennis

    Jaewook Lee, Devesh P. Sarda, Eujean Lee, Amy S. Lee, Jun Wang, Adrian Rodriguez, Jon E. Froehlich

    Individuals with low vision (LV) can experience vision-related challenges when participating in sports, especially those with fast-moving objects. We introduce ARTennis, a prototype for wearable augmented reality (AR) that utilizes real-time computer vision (CV) to enhance the visual saliency of tennis balls. Preliminary findings indicate that while ARTennis is helpful, combining both visual and auditory cues may be more effective. As AR and CV technologies continue to improve, we expect head-worn AR to broaden the inclusivity of sports.

  • A person looking and pointing at a bag of chips with foreign language written on it, while asking 'What is this?'

    UIST '23 Demo 29 October 2023

    Towards Designing a Context-Aware Multimodal Voice Assistant for Pronoun Disambiguation: A Demonstration of GazePointAR

    Jaewook Lee, Jun Wang, Elizabeth Brown, Liam Gene Ping Chu, Sebastian S. Rodriguez, Jon E. Froehlich

    Voice assistants (VAs) like Siri and Alexa have transformed how humans interact with technology; however, their inability to consider a user’s spatiotemporal context, such as surrounding objects, drammatically limits natural dialogue. In this demo paper, we introduce GazePointAR, a wearable augmented reality (AR) system that resolves ambiguity in speech queries using eye gaze, pointing gesture, and conversation history. With GazePointAR, a user can ask “what’s over there?” or “how do I solve this math problem?” simply by looking and/or pointing. We describe GazePointAR’s design and highlight supported use cases.

  • A drawing demonstrating RASSAR. Upon scanning a home environment, RASSAR marks different accessibility issues using icons.

    ASSETS '23 Demo 22 October 2023

    A Demonstration of RASSAR: Room Accessibility and Safety Scanning in Augmented Reality

    Xia Su, Kaiming Cheng, Han Zhang, Jaewook Lee, Wyatt Olson, Jon E. Froehlich

    In this demo paper, we introduce RASSAR, a mobile AR application for semi-automatically identifying, localizing, and visualizing indoor accessibility and safety issues using LiDAR and real-time computer vision. Our prototype supports four classes of detection problems: inaccessible object dimensions (e.g., table height), inaccessible object positions (e.g., a light switch out of reach), the presence of unsafe items (e.g., scissors), and the lack of proper assistive devices (e.g., grab bars). RASSAR’s design was informed by a formative interview study with 18 participants from five key stakeholder groups, including wheelchair users, blind and low vision participants, families with young children, and caregivers. Our envisioned use cases include vacation rental hosts, new caregivers, or people with disabilities themselves documenting issues in their homes or rental spaces and planning renovations. We present key findings from our formative interviews, the design of RASSAR, and results from an initial performance evaluation.

  • Our feedback visualization tool which displays large amounts of textual data as clickable sentiment icons.

    TOCHI '23 10 June 2023

    Visualizing Topics and Opinions Helps Students Interpret Large Collections of Peer Feedback for Creative Projects

    Patrick Crain, Jaewook Lee, Yu-Chun (Grace) Yen, Joy Kim, Alyssa Aiello, Brian Bailey

    We deployed a feedback visualization tool to learn how students used the tool for interpreting feedback from peers and teaching assistants. The tool visualizes the topic and opinion structure in a collection of feedback and provides interaction for reviewing providers’ backgrounds. A total of 18 teams engaged with the tool to interpret feedback for course projects. We surveyed students (N = 69) to learn about their sensemaking goals, use of the tool to accomplish those goals, and perceptions of specific features. We interviewed students (N = 12) and TAs (N = 2) to assess the tool’s impact on students’ review processes and course instruction. Students discovered valuable feedback, assessed project quality, and justified design decisions to teammates by exploring specific icon patterns in the visualization. The interviews revealed that students mimicked strategies implemented in the tool when reviewing new feedback without the tool. Students found the benefits of the visualization outweighed the cost of labeling feedback.

  • A false positive AI intervention, which a human user corrects by conversing with the agent.

    CSCW '23 16 April 2023

    To Err is AI: Imperfect Interventions and Repair in a Conversational Agent Facilitating Group Chat Discussions

    Hyo Jin Do, Ha-Kyung Kong, Pooja Tetali, Jaewook Lee, Brian P Bailey

    Conversational agents (CAs) can analyze online conversations using natural language techniques and effectively facilitate group discussions by sending supervisory messages. However, if a CA makes imperfect interventions, users may stop trusting the CA and discontinue using it. In this study, we demonstrate how inaccurate interventions of a CA and a conversational repair strategy can influence user acceptance of the CA, members' participation in the discussion, perceived discussion experience between the members, and group performance. We built a CA that encourages the participation of members with low contributions in an online chat discussion in which a small group (3-6 members) performs a decision-making task. Two types of errors can occur when detecting under-contributing members: 1) false-positive (FP) errors happen when the CA falsely identifies a member as under-contributing and 2) false-negative (FN) errors occur when the CA misses detecting an under-contributing member. We designed a conversational repair strategy that gives users a chance to contest the detection results and the agent sends a correctional message if an error is detected. Through an online study with 175 participants, we found that participants who received FN error messages reported higher acceptance of the CA and better discussion experience, but participated less compared to those who received FP error messages. The conversational repair strategy moderated the effect of errors such as improving the perceived discussion experience of participants who received FP error messages. Based on our findings, we offer design implications for which model should be selected by practitioners between high precision (i.e., fewer FP errors) and high recall (i.e., fewer FN errors) models depending on the desired effects. When frequent FP errors are expected, we suggest using the conversational repair strategy to improve the perceived discussion experience.

  • A drawing of a researcher and a remote participant in a study, constantly sending data to one another.

    UIST '22 28 October 2022

    RemoteLab: A VR Remote Study Toolkit

    Jaewook Lee, Raahul Natarrajan, Sebastian S. Rodriguez, Payod Panda, Eyal Ofek

    User studies play a critical role in human subject research, including human-computer interaction. Virtual reality (VR) researchers tend to conduct user studies in-person at their laboratory, where participants experiment with novel equipment to complete tasks in a simulated environment, which is often new to many. However, due to social distancing requirements in recent years, VR research has been disrupted by preventing participants from attending in-person laboratory studies. On the other hand, affordable head-mounted displays are becoming common, enabling access to VR experiences and interactions outside traditional research settings. Recent research has shown that unsupervised remote user studies can yield reliable results, however, the setup of experiment software designed for remote studies can be technically complex and convoluted. We present a novel open-source Unity toolkit, RemoteLab, designed to facilitate the preparation of remote experiments by providing a set of tools that synchronize experiment state across multiple computers, record and collect data from various multimedia sources, and replay the accumulated data for analysis. This toolkit facilitates VR researchers to conduct remote experiments when in-person experiments are not feasible or increase the sampling variety of a target population and reach participants that otherwise would not be able to attend in-person.

  • A drawing demonstrating RASSAR. Upon scanning a home environment, RASSAR marks different accessibility issues using icons.

    ASSETS '22 Workshop 5 Oct 2022

    Towards Semi-automatic Detection and Localization of Indoor Accessibility Issues using Mobile Depth Scanning and Computer Vision

    Xia Su, Kaiming Cheng, Han Zhang, Jaewook Lee, Yueqian Zhang, Jon E. Froehlich

    To help improve the safety and accessibility of indoor spaces, researchers and health professionals have created assessment instruments that enable homeowners and trained experts to audit and improve homes. With advances in computer vision, augmented reality (AR), and mobile sensors, new approaches are now possible. We introduce RASSAR (Room Accessibility and Safety Scanning in Augmented Reality), a new proof-of-concept prototype for semi-automatically identifying, categorizing, and localizing indoor accessibility and safety issues using LiDAR + camera data, machine learning, and AR. We present an overview of the current RASSAR prototype and a preliminary evaluation in a single home.

  • An AI agent intervenes privately to encourage everyone in a group to contribute equally.

    In online group discussions, balanced participation can improve the quality of discussion, members' satisfaction, and positive group dynamics. One approach to achieve balanced participation is to deploy a conversational agent (CA) that encourages participation of under-contributing members, and it is important to design communication strategies of the CA in a way that is supportive to the group. We implemented five communication strategies that a CA can use during a decision-making task in a small group synchronous chat discussion. The five strategies include messages sent to two types of recipients (@username vs. @everyone) crossed by two separate channels (public vs. private), and a peer-mediated strategy where the CA asks a peer to address the under-contributing member. Through an online study with 42 groups, we measured the balance of participation and perceptions about the CA by analyzing chat logs and survey responses. We found that the CA sending messages specifying an individual through a private channel is the most effective and preferred way to increase participation of under-contributing members. Participants also expressed that the peer-mediated strategy is a less intrusive and less embarrassing way of receiving the CA's messages compared to the conventional approach where the CA directly sends a message to the under-contributing member. Based on our findings, we discuss trade-offs of various communication strategies and explain design considerations for building an effective CA that adapts to different group dynamics and situations.

  • Four well-liked navigation instructions for mixed reality, including arrows (top left), avatar (top right), call outs (bottom left), and desaturation (bottom right).

    IEEE VR '22 20 April 2022

    User Preference for Navigation Instructions in Mixed Reality

    Jaewook Lee, Fanjie Jin, Younsoo Kim, David Lindlbauer

    Current solutions for providing navigation instructions to users who are walking are mostly limited to 2D maps on smartphones and voice-based instructions. Mixed Reality (MR) holds the promise of integrating navigation instructions directly in users’ visual field, potentially making them less obtrusive and more expressive. Current MR navigation systems, however, largely focus on using conventional designs such as arrows, and do not fully leverage the technological possibilities. While MR could present users with more sophisticated navigation visualizations, such as in-situ virtual signage, or visually modifying the physical world to highlight a target, it is unclear how such interventions would be perceived by users. We conducted two experiments to evaluate a set of navigation instructions and the impact of different contexts such as environment or task. In a remote survey (n = 50), we collected preference data with ten different designs in twelve different scenarios. Results indicate that while familiar designs such as arrows are well-rated, methods such as avatars or desaturation of non-target areas are viable alternatives. We confirmed and expanded our findings in an in-person virtual reality (VR) study (n = 16), comparing the highest-ranked designs from the initial study. Our findings serve as guidelines for MR content creators, and future MR navigation systems that can automatically choose the most appropriate navigation visualization based on users’ contexts.

  • Four of many hand interfaces, including scissors (made by extending the index and middle fingers), binoculars (making circles with both hands and putting them together), forks (made by extending the index, middle, and ring fingers), and staplers (forming an alligator mouth with one hand).

    CHI '22 29 April 2022

    🏆 Honorable Mention

    Hand Interfaces: Using Hands to Imitate Objects in AR/VR for Expressive Interactions

    Siyou Pei, Alexander Chen, Jaewook Lee, Yang Zhang

    Augmented reality (AR) and virtual reality (VR) technologies create exciting new opportunities for people to interact with computing resources and information. Less exciting is the need for holding hand controllers, which limits applications that demand expressive, readily available interactions. Prior research investigated freehand AR/VR input by transforming the user’s body into an interaction medium. In contrast to previous work that has users’ hands grasp virtual objects, we propose a new interaction technique that lets users’ hands become virtual objects by imitating the objects themselves. For example, a thumbs-up hand pose is used to mimic a joystick. We created a wide array of interaction designs around this idea to demonstrate its applicability in object retrieval and interactive control tasks. Collectively, we call these interaction designs Hand Interfaces. From a series of user studies comparing Hand Interfaces against various baseline techniques, we collected quantitative and qualitative feedback, which indicates that Hand Interfaces are effective, expressive, and fun to use.

  • An example of a user using ImageExplorer. Users can drag around the screen to learn more about an image. Users can also double tap to receive additional details about a specific object.

    Blind users rely on alternative text (alt-text) to understand an image; however, alt-text is often missing. AI-generated captions are a more scalable alternative, but they often miss crucial details or are completely incorrect, which users may still falsely trust. In this work, we sought to determine how additional information could help users better judge the correctness of AI-generated captions. We developed ImageExplorer, a touch-based multi-layered image exploration system that allows users to explore the spatial layout and information hierarchies of images, and compared it with popular text-based (Facebook) and touch-based (Seeing AI) image exploration systems in a study with 12 blind participants. We found that exploration was generally successful in encouraging skepticism towards imperfect captions. Moreover, many participants preferred ImageExplorer for its multi-layered and spatial information presentation, and Facebook for its summary and ease of use. Finally, we identify design improvements for effective and explainable image exploration systems for blind users.

  • A clickable graph, which summarizes overall contributions of each team member and expands to show details of their progress.

    CSCW '21 18 October 2021

    Challenges and Opportunities for Data-Centric Peer Evaluation Tools for Teamwork

    Wenxuan Wendy Shi, Akshaya Jagannadharao, Jaewook Lee, Brian P. Bailey

    Peer evaluations are critical for assessing teams, but are susceptible to bias and other factors that undermine their reliability. At the same time, collaborative tools that teams commonly use to perform their work are increasingly capable of logging activity that can signal useful information about individual contributions and teamwork. To investigate current and potential uses for activity traces in peer evaluation tools, we interviewed (N=11) and surveyed (N=242) students and interviewed (N=10) instructors at a single university. We found that nearly all of the students surveyed considered specific contributions to the team outcomes when evaluating their teammates, but also reported relying on memory and subjective experiences to make the assessment. Instructors desired objective sources of data to address challenges with administering and interpreting peer evaluations, and have already begun incorporating activity traces from collaborative tools into their evaluations of teams. However, both students and instructors expressed concern about using activity traces due to the diverse ecosystem of tools and platforms used by teams and the limited view into the context of the contributions. Based on our findings, we contribute recommendations and a speculative design for a data-centric peer evaluation tool.

  • A person pointing their phone at an ingredient after asking 'find me a recipe that uses these.' TouchVA generates a touch interface, which users can tap different objects and texts to include them in their query.

    ICMI '21 18 October 2021

    What’s This? A Voice and Touch Multimodal Approach for Ambiguity Resolution in Voice Assistants

    Jaewook Lee, Sebastian S. Rodriguez, Raahul Natarrajan, Jacqueline Chen, Harsh Deep, Alex Kirlik

    Human speech often contains ambiguity stemming from the use of demonstrative pronouns (DPs), such as “this” and “these.” While we can typically decipher which objects of interest DPs are referring to based on context, modern day voice assistants (VAs – such as Google Assistant and Siri) are yet unable to process queries containing such ambiguity. For instance, to humans, a question such as “how much is this?” can be clarified through visual reference (e.g., a buyer gestures to the seller the object they would like to purchase). To bridge this gap between human and machine cognition, we built and examined a touch + voice multimodal VA prototype that enables users to select key spatial information to embed as context and query the VA. The prototype converts results of mobile, real-time object recognition and optical character recognition models into augmented reality buttons that represent features. Users can interact with and modify the selected features through a word grid. We conducted a study to investigate: 1) how touch performs as an additional modality to resolve ambiguity in queries, 2) how users use DPs when interacting with VAs, and 3) how users perceive a VA that can understand DPs. From this procedure we found that as the query becomes more complex, users prefer the multimodal VA over the standard VA without experiencing elevated cognitive load. Additionally, even though it took some time getting used to, many participants eventually became comfortable with using DPs to interact with the multimodal VA and appreciated the improved human-likeness of human-VA conversations.

  • An example of a user using ImageExplorer. Users can drag around the screen to learn more about an image. Users can also double tap to receive additional details about a specific object.

    ASSETS '21 Demo 17 Oct 2021

    Image Explorer: Multi-Layered Touch Exploration to Make Images Accessible

    Jaewook Lee, Yi-Hao Peng, Jaylin Herskovitz, Anhong Guo

    Blind or visually impaired (BVI) individuals often rely on alternative text (alt-text) in order to understand an image; however, alt-text is often missing or incomplete. Automatically-generated captions are a more scalable alternative, but they are also often missing crucial details, and, sometimes, are completely incorrect, which may still be falsely trusted by BVI users. We hypothesize that additional information could help BVI users better judge the correctness of an auto-generated caption. To achieve this, we present Image Explorer, a touch-based multi-layered image exploration system that enables users to explore the spatial layout and information hierarchies in an image. Image Explorer leverages several off-the-shelf deep learning models to generate segmentation and labeling results for an image, combines and filters the generated information, and presents the resulted information in hierarchical layers. In a pilot study with three BVI users, participants used Image Explorer, Seeing AI, and Facebook to explore images with auto-generated captions of diverging quality, and judge the correctness of the captions. Preliminary results show that participants made more accurate judgements about the correctness of the captions when using Image Explorer, although they were highly confident about their judgement regardless of the tool used. Overall, Image Explorer is a novel touch exploration system that makes images more accessible for BVI users by potentially encouraging skepticism and enabling users to independently validate auto-generated captions.

  • VPM connects students with similar interests and needs using gradient lines.

    LAK21 Short Paper 12 April 2021

    Explorations of Designing Spatial Classroom Analytics with Virtual Prototyping

    JiWoong Jang, Jaewook Lee, Vanessa Echeverria, LuEttaMae Lawrence, Vincent Aleven

    Despite the potential of spatial displays for supporting teachers’ classroom orchestration through real-time classroom analytics, the process to design these displays is a challenging and under-explored topic in the learning analytics (LA) community. This paper proposes a mid-fidelity Virtual Prototyping method (VPM), which involves simulating a classroom environment and candidate designs in virtual space to address these challenges. VPM allows for rapid prototyping of spatial features, requires no specialized hardware, and enables teams to conduct remote evaluation sessions. We report observations and findings from an initial exploration with five potential users through a design process utilizing VPM to validate designs for an AR-based spatial display in the context of middle-school orchestration tools. We found that designs created using virtual prototyping sufficiently conveyed a sense of three-dimensionality to address subtle design issues like occlusion and depth perception. We discuss the opportunities and limitations of applying virtual prototyping, particularly its potential to allow for more robust co-design with stakeholders earlier in the design process.

  • Our game interface consisting of a prey (blue) and predators (red) used to measure complacency.

    SPIE '20 21 April 2020

    Measuring complacency in humans interacting with autonomous agents in a multi-agent system

    Sebastian S. Rodriguez, Jacqueline Chen, Harsh Deep, Jaewook Lee, Derrik E. Asher, Erin Zaroukian

    With advances in machine learning, autonomous agents are increasingly able to navigate uncertain operational environments, as is the case within the multi-domain operations (MDO) paradigm. When teaming with humans, autonomous agents may flexibly switch between passive bystander and active executor depending on the task requirements and the actions being taken by partners (whether human or agent). In many tasks, it is possible that a well-trained agent's performance will exceed that of a human, in part because the agent's performance is less likely to degrade over time (e.g., due to fatigue). This potential difference in performance might lead to complacency, which is a state defined by over-trust in automated systems. This paper investigates the effects of complacency in human-agent teams, where agents and humans have the same capabilities in a simulated version of the predator-prey pursuit task. We compare subjective measures of the human's predisposition to complacency and trust using various scales, and we validate their beliefs by quantifying complacency through various metrics associated with the actions taken during the task with trained agents of varying reliability levels. By evaluating the effect of complacency on performance, we can attribute a degree of variation in human performance in this task to complacency. We can then account for an individual human's complacency measure to customize their agent teammates and human-in-the-loop requirements (either to minimize or compensate for the human's complacency) to optimize team performance.

Nerdy World of Jae

Hobbies

Revamping... Coming Soon!

Contact

Get in Touch

Please Fill Required Fields