Can we graph all of science?

Рет қаралды 6,418

Mike Morrison, PhD

Ай бұрын

Part 2 of the "Better Citations Trilogy"
Even more points in this article:
www.linkedin.com/pulse/tradit...

Пікірлер: 23

@MikeMorrisonPhD Ай бұрын

I know that algorithms can detect themes in the text surrounding a citation. But often, ambiguous writing/citation placement can inhibit both robots' and readers' ability to understand what a particular citation is supposed to show. Better human linking should improve AI-readability AND human UX. I think. But, CS people please tell me what I'm missing.

@DeathSugar Ай бұрын

references always mentioned somewhere in the study itself, so you can find keywords and deduce which context it's referenced. there are bunch of algos who can measure relevance of the articles to some categories , so you manually work couple thousands and the rest will be referenced by the algorithm

@MikeMorrisonPhD Ай бұрын

Yeah the goal here is to meet the algorithms half way to increase their accuracy, and account for lots of sentences where often its not clear why an author meant to include each reference at the end of a sentence. And the reverse of this is also true, right? If we link authors, we need complex NLP to find meaning. If we link semantically, we need only simple NLP to find authors (which are always consistently formatted).

@DeathSugar Ай бұрын

@@MikeMorrisonPhD tried any simple text categorizer?

@MikeMorrisonPhD Ай бұрын

@@DeathSugar - Yeah did a (very simple) NLP algorithm for my masters, but it's been a while. Got any current favorites you can link me to?

@DeathSugar Ай бұрын

@@MikeMorrisonPhD has a dude who implemented some of it from bare bones. Some related videos. Stemming in Rust: kzfaq.info/get/bejne/sLiKjZuanb_Md4k.html Classificator for text in C: kzfaq.info/get/bejne/r8uWZMKUsqzPnoE.html Both has timecodes and some references, so you might find some of it useful . Both from their own series, so you might find previous videos useful as well to build your own classification for studies.

@austinmajeski9427 Ай бұрын

You don't. You get the AI to read the linked paper and determine the subject from there.

@MikeMorrisonPhD Ай бұрын

Yeah the goal is to help AI do its job better. And we eventually want to go far beyond the overall subject. We need really really precise metadata about causal relationships, because every percentage in extra accuracy makes it more viable and worthwhile to invest the $ in testing potential new treatments that pop out. Any inaccuracy in the data does the opposite: It makes testing new treatments more risk and costly.

@theonlyjoe_ Ай бұрын

Sure but if the ai has a bit of text beforehand, it can then filter the ones that aren’t relevant much quicker

@MikeMorrisonPhD Ай бұрын

@@theonlyjoe_ Exactly. The human author linking the meaningful text creates an optimization parameter that AI can both use and train on. Papers that have human-linked text help the AI understand papers that don’t.

@austinmajeski9427 Ай бұрын

⁠⁠@@MikeMorrisonPhD This proposal will never work, and is unnecessary. You're discussing a computation problem, not a categorization problem*. The surrounding text before the citation will already give the context you're looking for. I personally feel, and this is a stretch, that this proposal is similar to a major problem of Object Oriented Programming and why people have moved away from it. People have discovered that it's better to just let "data be data" (let citations be citations), and to not overthink how you group related functions and how they relate to one another (which part of the sentence best describes the work in the citation so I may link it?). *Your solution is categorizing citations with presumably a snippet of text in the sentence/paragraph that best summarizes the reference material. There is no reason to suggest this will save money. That is wishful thinking.

@austinmajeski9427 Ай бұрын

@@theonlyjoe_ What about the paragraph the citation is attached to already?