Mengyu Serene Tu
- Email: serenemy@outlook.com
Background
Experienced data scientist skilled in applying cutting-edge machine learning techniques to extract insights from complex datasets. Demonstrated effectively communicating findings to non-technical audiences.
Education
Projects
Transfer Learning for Brain Tumor Detection on Multi-Planar MRI images repo
- Cleaned brain imaging dataset, addressing missing labels, and created a PyTorch custom dataset of 1,200 images across axial, coronal, and sagittal planes
- Fine-tuned Faster R-CNN with ResNet-50-FPN backbone and achieved mAP@IoU50 of 0.669
- Fine-tuned state-of-the-art YOLOv8 and achieved mAP@IoU50 of 0.907
- Demonstrated axial plane images were most suitable for tumor detection in both models, with YOLOv8 outperforming Faster R-CNN
End-to-End Movie Recommendation System with Collaborative Filtering repo
- Conducted exploratory data analysis using pandas on a dataset of 100,000 movie ratings from 1,000 users across 1,700 movies and visualized results with matplotlib
- Implemented matrix factorization with gravity regularization using TensorFlow and reduced test loss by 35% compared to the unregularized model
- Deployed the recommendation system efficiently via both a containerized Flask API using Docker and a web-based Flask application
Large-Scale Behavioral Data Analysis and Reinforcement Learning Model for Decision-Making repo
- Preprocessed 100,000+ behavioral trials using custom cleaning scripts for missing data and outlier removal, yielding 48,063 high-quality datapoints for analysis
- Utilized advanced statistical methods (regression, ANCOVA, mixed-effect models) and ggplot in R to reveal confidence and decision outcome’s impacts on decision bias and reaction time
- Demonstrated the underlying mechanism of the decision-making process by developing a novel partially observable Markov decision process (POMDP) reinforcement learning model in Python
- Presented as the first author at Neuroscience Annual Meeting with over 30,000 participants
Machine Learning Pipeline for Brain-Imaging Time Series Data repo
- Developed an efficient feature engineering algorithm for rapid feature extraction from brain-imaging time series and achieved 50x faster computing speed than existing methods
- Implemented customized linear regression and maximum likelihood decoding models and attained over 95% cross-validated accuracy in decoding animal positions from brain time series
- Published as first-author on Neural Computation
Skills
- Programming: Python, R, MATLAB, C++, Git
- Machine Learning and Data Handling: PyTorch, TensorFlow, Scikit-Learn, Pandas, SQL
Feel free to reach out to me for collaborations, questions, or just to say hello!