Research
Research Interests: Statistical Physics, Polymer physics, Bio-physics
My research works and projects include:
- Worked on the MathSeer project, a system to make finding mathematical information easier by creating innovative search engines, interfaces, and algorithms for extracting and recognizing math
- Built a new open-source math formula extraction pipeline for PDF files
- Adopted distributed parallelization methods with multiple GPUs and implemented custom dataloader with dynamic batch size to fully utilize the GPU, which increased the speed of the math formula parser by 6 times
- Built new tools for visualization and evaluation of parsing results and errors
- Worked on a PDF symbol extractor that identifies precise bounding box locations in born-digital PDF documents
- Developed a simple and effective algorithm to perform detection of math expressions using visual features alone
- Wrote an API for recognizing handwritten and typeset formulas and output the corresponding LATEX and MathML
Current work:
- Working on improving the accuracy of the math formula parser by experimenting better visual features and attention mechanisms
- Working on adopting the parser to work with more complex graphical structures like chemical diagrams