BioTuring and Teddy Thinh

BioTuring 06/27: Gene Enrichment Analysis.
Motivation: Comparing 2 gene lists $A = {[G_{n1}, S_{n1}], [G_{n2}, S_{n2}], …}$ & $B = {[G_{m1}, S_{m1}], [G_{m2}, S_{m2}], …}$ to extrapolate on the results (e.g cancer vs non-cancer?)
Method: Get a gene pool, preferably on the whole $G_i$ set.
Arange from A $\rightarrow$ B to form a pathway.
Something about Fold Change (FC) metric here I don’t rmb well, but basically $FC = \log_2(\frac{G_{ni}}{G_{mi}})$, if FC > 0 then it’s good for hypo about gene list A, < 0 then good for gene list B.
For all gene in pathway, record its cummulative sum on FC. This will form a graph, axis x is A $\rightarrow$ B, axis y is score.
Highest one is Enrichment Score.
$\implies$ New problem, how do we know the Stats Significance of this score. Run it 10000 times and select one with p-score < 0.05? Seems reasonable, but resources are limited. This is what Bioturing Algo team been researching for now, when assigning a score towards a gene is approximately even to every other gene, you can apply some guys’ work on distribution to get 5000x faster than the conventional approach.

Took an entrance test @ BioTuring then

Met Teddy Thinh:
Here’s my two cents about the conversation:

Graduate education requires self-commitment and curiosity to learn new things, and applying it to the company at which you’re having internship.
Grad program are for connection probably, at least in HCMUS.
Prof. Ly Quoc Ngoc (Computer Vision) is goated, his mindset about research is something to be recognised by fellow researchers.