2 Comments
User's avatar
Fo's avatar

A late-night Corrections master-doc... Yes Please, Awesome work Isabel Keep it up! It would be so awesome to have some sort of Elasticsearch or MongoDB database for the corrections. I think a schema-free architecture would be perfect for a dataset like this. I'm working on a bot to crawl YT comments (CC maybe in the future) for Jakal-like keywords for identifying the Jackals community corrections, log them and then give unique id, associated episode, metadata..etc. As of now, each log entry has to be human matched to the timecode but scrapping the Closed Captions (cc) could help link a proposed Jakal's correction to the error in the video. Each episode is then given a "jack-rate" of 0.1-5.0 based on the total correction count and the weight of each said correction. Using Grafana then visualize the data dynamically on a webpage with the analytics and referenced source material from the jackals to studio 8g. Also the same could be applied to Corrections itself like taking minutes in a corp. meeting but speech to text from compressed audio :( From as back as I've looked no Closed captions auto-generated are not enabled. Thoughts? Got any csv's or xls's of any of this data that you could share? in need of more training data to model on helping with the context of a given correction to be accurately detected. @fdaniels

Expand full comment
isabel parigi's avatar

wow. I'm in awe...I understand very little of this plan but would be honored to pitch in. Regarding CC- if you press the three dots next to the save icon you can "view transcript" which might help connect corrections to a timecode. As we've learned, it's YouTube cc (sometimes inaccurate for proper nouns etc) but I've found it helpful when I don't have my full recap setup available (aka sitting on my couch). Lastly, for a long time I searched reddit forums for a way to search YouTube comments and didn't have any luck--so your bot is much needed/appreciated!!

Expand full comment