Mar 7Liked by Mikolaj Czerkawski

Very interesting work, I have been thinking about using this dataset, but could you give an example of a machine learning project that integrates this data? I have some ideas, but would not like to misuse the data or use the data outside of the intended use.

Expand full comment

Hi! Thank you! So far we are focusing heavily on unlabelled data, since it's a necessary starting point - we might expand to labelled tasks soon (hopefully with some helping hands from the community).

For the unlabelled use cases, I really recommend playing around with self-supervised learning and generative models! We were actually thinking of showing some examples in our project, but didn't prioritise it to avoid confusion (Major TOM is mostly about data).

For self-supervised learning, there are popular approaches that could be worth a try, like SimCLR or masked vision transformers.



For generative modelling, I can't help but recommend my own course on diffusion models that includes example training notebooks and explains everything from scratch:


I realise it would be nice to have some examples that integrate Major TOM and a ready-to-use training pipeline, hopefully we can deliver something like that soon enough!

Expand full comment