2 Comments

A late-night Corrections master-doc... Yes Please, Awesome work Isabel Keep it up! It would be so awesome to have some sort of Elasticsearch or MongoDB database for the corrections. I think a schema-free architecture would be perfect for a dataset like this. I'm working on a bot to crawl YT comments (CC maybe in the future) for Jakal-like keywords for identifying the Jackals community corrections, log them and then give unique id, associated episode, metadata..etc. As of now, each log entry has to be human matched to the timecode but scrapping the Closed Captions (cc) could help link a proposed Jakal's correction to the error in the video. Each episode is then given a "jack-rate" of 0.1-5.0 based on the total correction count and the weight of each said correction. Using Grafana then visualize the data dynamically on a webpage with the analytics and referenced source material from the jackals to studio 8g. Also the same could be applied to Corrections itself like taking minutes in a corp. meeting but speech to text from compressed audio :( From as back as I've looked no Closed captions auto-generated are not enabled. Thoughts? Got any csv's or xls's of any of this data that you could share? in need of more training data to model on helping with the context of a given correction to be accurately detected. @fdaniels

Expand full comment