I am actually new to knowledge science usually, I am at the moment attempting out semi-supervised studying utilizing UMAP on my energy consumption knowledge since I wish to categorize which home equipment are turned on a selected time interval. The dataset seems like this:
| time | worth | label | description | ------------------------------------------------------ | 1582761600 | 4628.8 | 1 | 2 ACs, four computer systems | | 1582761601 | 4624.98 | 2 | 1 AC, 2 computer systems | | 1582761602 | 4624.98 | | |
Notice that not all of the readings have labels and descriptions. I’ve already learn the documentation on semi-supervised studying utilizing UMAP at https://umap-learn.readthedocs.io/en/newest/supervised.html, the issue is that they used the fashion-mnist dataset (https://github.com/zalandoresearch/fashion-mnist) and it is dataset format is totally different from what I at the moment have. Take for instance this code snippet:
mndata = MNIST('fashion-mnist/knowledge/style') prepare, train_labels = mndata.load_training() take a look at, test_labels = mndata.load_testing() knowledge = np.array(np.vstack([train, test]), dtype=np.float64) / 255.0 goal = np.hstack([train_labels, test_labels]) courses = [ 'T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
I attempted on the lookout for an in depth rationalization on what every a part of the code means to no avail. Thus far, I used to be capable of separate my coaching and testing dataset with a 80-20 ratio however apart from that, I used to be not capable of finding any tutorial the place folks do that on a plain .csv file. My query is how do I am going about utilizing my knowledge labels in order that I can categorize the clusters I used to be already capable of plot utilizing UMAP. Thanks a lot! I will be more than pleased to regulate this submit if something’s unclear.