In this video, I'm going to break down some of the coolest stuff that's happening in AI right now, but I also want to show you exactly how I actually wrap my head around what's going on with this stuff. Some of it can be very complicated and confusing, and I'm not that smart. This is how I figured out what's actually going on with this stuff. I'm going to start by breaking down my process.
Whenever I come across something really interesting on Twitter or YouTube or a blog post or an article or something like that, I save it to this tool called my mind.com. It's a little tool that adds a plus up here in the top of my browser. And whenever I come across anything online, I just press this plus button and it saves it to this area. So whether it be a tweet, a YouTube video, a blog article, anything like that, I just press this little plus. It saves it to my mind. Now it also has an app on my iPhone where if I come across something interesting, I just press the little share button here on the right. I click share via, and then I click on my mind and you can see it saves it right here to my mind.
This is the tweet that I just saved. Now what's so special about this? My mind. It's kind of like a Pinterest board for anything that I could find, but it actually uses AI. So it automatically tags some stuff because I just added this. It's still kind of figuring out what to tag it with. But if I click on one of my older posts here, like this YouTube video, you can see that it tagged it with video gaming, text adventure, AI storytelling, GPT three. I can add my own notes here, mid journey video, and I can also add my own tags. I happen to know this video happens to be about mid journey, so I can add a tag here and it's just a way to organize all of the stuff that I want when it comes time to make content or do research or figure out what I'm going to do next. I come in here. Now, some of this stuff looks quite complicated when you come across it.
So what I'll do is I'll go to the page where it's got all of the sort of scientific data about it, usually on this arcsive.org site. And what I like to do is I like to open the PDF and the PDF is typically a long scientific document using big words that are over my head. For example, here's a 10 page document on mobile brick building Lego for 3d reconstruction on mobile devices. There's a little trick that I learned from my friend, Robert Scoble. He says pretty much everything that you need to know about this stuff is usually in the abstract and then they bury the lead and you can find a lot of the important information down in the conclusion. So what I'll do copy the abstract here, let's go ahead and just copy this whole thing, and then I'll just jump into chat GPT and I'll paste the abstract and then I'll put a little space here, jump back over, scroll all the way down to the conclusion, copy all of this, pull it in here, paste this here and say, I'm going to ask questions about the above information reply that you understand. Sure. I understand. Please go ahead with your questions. So I essentially just primed it with what I'm about to do. I plugged in my abstract and my conclusion.
And then what I could do is say, explain what all of this means. Like I'm a five-year-old and there it goes. The people who wrote the paper made a lot of different models using these blocks and took pictures of them with a special camera that can see both colors and how far away things are. The pictures they took are very accurate and can help the computer make really good 3d pictures. Right? So that's just the real basic explanation of what's going on in that paper.
And then I can ask things like, how is this new technology valuable? This is useful for a lot of different fields like robotics, video games, and even medicine. High quality 3d images created by these programs can help people better understand how things work, how they are put together and how they move. All right. So I just did that and I got a real quick explanation about mobile brick and what it can do. Now, what if I want to dive deeper? Because obviously I didn't get all of these details here.
So now what we can do is we can come up here and copy this whole URL for the PDF. And we can jump over to a tool called chat PDF, which allows you to chat with PDF files. So if I come down here and I click on from URL, paste in my from URL here, and then click go get it. It's going to go and fetch that PDF. And then I can literally ask questions of the PDF. And this time around, it's going to use the entire document to answer my questions. And it even gives us some example questions that you can use here.
How does mobile brick differ from other models of 3d object reconstruction? Can you explain how the multi-view RGBD dataset was captured using a mobile device? What are some potential applications for mobile bricks technology in the future? Let's use that one. I don't know why it gave us this service error, but it still seems to be answering our questions. According to page eight of the PDF, the mobile brick dataset can be used as a resource for researchers investigating multi-view reconstruction using RGBD images from mobile devices. And then I can do the same thing I did with chat GPT, because this is all based on chat GPT in the background anyway, I can say, explain that again, like I'm a, let's go nine year old, sometimes I mix up the age and they'll make it a little smarter for you. The mobile brick dataset is like a big collection of pictures and information about Lego models that were made using a special camera on a phone or tablet. It's like having a big library of Lego instructions that they can use to build new things and see if they did it right. This can help them make better robots or video games in the future.
Just based on asking chat GPT, some questions and asking chat PDF, some questions, I have a pretty clear understanding what this actual. Model that they're building and what this research paper is talking about and the usefulness of it. So that's my methodology for organizing a lot of this research and then also helping me understand some of this research. And what I want to do next is really dive into some of the advancements that have been happening and some of the crazy stuff that's going on in the AI space right now.
But before I do, I want to tell you about today's sponsor. Today's sponsor is Taplio.com and Taplio is a tool that helps you automate your LinkedIn. If you're like me and you don't actually use LinkedIn that often, you probably want some automation and some help keeping up with LinkedIn because right now LinkedIn is actually a very powerful platform for people in the AI space. There is a ton of action and a ton of interest in AI and machine learning and this kind of technology over there right now. And it is one of the places that you probably want to be interacting if you're trying to get a foothold in this space. And Taplio helps you do that. A couple of standout features of Taplio. They have a post inspiration section where you can see some of the posts that are currently going viral right now in your niche. For example, these are all posts on the screen of people that are talking about AI on LinkedIn and the posts are going viral. This one has 3000 likes, 1000 likes, 454 likes, 1000 likes. And what you can do is you can click on this little lightning bolt up here and what it's going to do is generate a variation of a post that's already gone viral. So you can use this as inspiration and let AI generate something that's similar to something else that's already gone viral. I can just click generate and it will craft an entire new post over here that you can either post right now or add it directly to your queue that's similar to another post that's already gone viral on LinkedIn. Another thing you can do is come over to the AI generated section in your left sidebar and just let it write some AI generated posts for you from scratch. You can see it already wrote a handful of posts that all I need to do is either add it to my queue or edit it right now and post it right now. And finally, I want to show you the engage feature down here. This section will actually find people on LinkedIn that are talking about similar stuff to what you're talking about and then suggest that you go and comment on them.
This found 50 posts for me to engage with that are all similar to the type of content that I would talk about on LinkedIn. And I can simply write my response right here inside of Taplio, or I can click this button and let Taplio generate a good response for me. Click reply and it will comment on that person's LinkedIn post for me. This can really automate the process of doing things on LinkedIn for you. And you can get more information about Taplio email@example.com/taplio. Thanks again to Taplio for sponsoring this post.
Now let me show you some of the coolest stuff to come out of the AI space in the last few days.
All right. So the first cool tech that I want to show off is the stable diffusion with brain activity. And this one is actually pretty insane. Basically what they did was they hooked up people to MRI machines and monitored their brain waves. And then while they were scanning their brain, they ran it through an image decoder and then through stable diffusion and were able to reconstruct what their brains were seeing.
So you can see they showed them a picture of a teddy bear and this is what their brain saw after decoding it through stable diffusion. They showed them a picture of some trees and a walkway and it decoded what was going on in the brain is seeing these trees in this walkway, an airplane, an airplane, a snowboarder, a snowboarder, this clock tower, and something that looks similar to a clock tower. So it explains down here how this works. They reconstructed visual images from an MRI signal using latent diffusion model named stable diffusion.
Essentially the way I understand it is the MRI picked up these really noisy images with all of these weird pixels. And then it essentially denoised it down to what they were actually seeing. This is just some wild technology because this is really close to mind reading. This is being able to see what images people are seeing in their brain, in their mind's eye. So yeah, this is just some crazy insane tech that's coming out right now.
So the next one I want to look at is called Prismar and this one's being developed by Nvidia. And what it can do is it can take an image input and then it can break it into all of this information. It can break it into a depth map and use these colors to represent the depth of the image. It can segment out the various layers. It can see the sand, the playing field, the person, the fence, the baseball bat, the chair in the background. It could do object detection. It notices the helmet, the baseball bat, the person, the belt, the footwear, people wearing hats.
It seems to notice like the surface or like the texture of what's going on in the image. It's got OCR detection, which is text and number detection. They noticed that there's a 21 on the person's Jersey. It's got edge detection where it can sort of outline the entire image and it pulls all of this data out of a single image so much so that it can answer questions about the image. So what's the person doing playing baseball? What's the number of this player? 21. And they call it Prismar because it's got the analogy of a prism where white light goes in and then all sorts of rainbow colors come out with this one image goes in and all of this information comes out. So this is a really, really cool and interesting model being developed by NVIDIA because it allows us to take one image input and then just take a whole bunch of data from that image and really understand what's going in the image. Even ask questions about what's going on in that image. So it's going to take image recognition and what information the computers can understand around these images to just a whole new level.
So this one's kind of a fun one. It's called word as image for semantic typography. And what this one essentially does is it takes a word and then uses the context of the word to change the font into what's being presented in the word like croissant is changing into a croissant. The C and cupcake is turning into a cupcake. Ashtanga yoga. The letters are changing into like yoga poses.
It's able to take the context of the word that's being said and then change around the font into images of the words that it's seeing. This is just really kind of crazy to me that this even exists. Semantic typography technique where a word illustration presents a visualization of the meaning of the word while also preserving its readability. Here's some other examples. Star. It put a star in the letter a tulip. Put a tulip in the U.
Bunny. It turned the Y into an actual bunny. Paris. It put the Eiffel tower into the a. Now, if we click over to the GitHub page, it says coming soon. So it doesn't appear to be quite ready for the public yet. I'm excited to play with this one. It looks like they're going to have a hugging face space available for this one. Once it is ready, that'll be something for us all to go and have some fun with. Once that's available.
Here's another one that I came across. It's called style. Stylize your face and only one shot.
So up until now, when we wanted to train our face into the AI, we had to use something like dream booth. I did a whole video on it. You upload like 20 images that we crop to a certain size. You have to have a certain amount from the waist up from the head up. It takes, you know, a good 45 minutes to train it into dream booth. And when it's all done, you have like a five gigabyte file that you can then pull into stable diffusion and then make images of yourself. This style model claims that you can do this kind of thing with just one source image, you can see some of the examples here, they put this source image in here. And the results are these images right here, all sorts of variations of this face, and you can tell that all of these are supposed to be this original source material, same with this one, you could see the original face and here's what came out of it, you can see that it's all variations of the same face. Personally, I'm really excited about the idea of taking one image, throwing it into a website and having it be able to generate all sorts of variations off of just that one image. It also presents the potential for more deep fakes and making that kind of stuff easier. At the same time in the ways that I use it for things like YouTube thumbnails and fun images for social media, this really excites me.
So Google kind of flopped with their Bard presentation, right? Well, they went away. We didn't hear from them for a couple of weeks and then they came back to us with Palm E.
This looks epic. If anybody had doubts about Google's place in the AI landscape, I'm sure those doubts are quickly going away after seeing what they're doing here. So basically what Palm E is, is it can understand and use information from the real world to complete tasks. The model uses a combination of words and sensory data, such as images and measurements to plan and execute tasks like picking up or arranging blocks. The model is trained on different tasks and can transfer its knowledge to new situations it hasn't even seen before. Can do things like completing a task with an object it has never seen. And similar to the Prismar one that we were looking at earlier, it can also answer questions about images that it sees, but it could answer so much more complex questions about those images. So here's an example of where it was trained to basically open the drawer and pull out the bag of chips from the drawer, and you can see in the demonstration, it pulls out the bag of chips and the dude on the side kind of chooses the robot for a couple of minutes and pulls the bag of chips out and puts them back in and the robot just continues to do what it was trained to do. This is all using Google's Palm E technology. Now, some other examples about how it uses images. So it can look at this picture here and it says, this is what the robot sees. I am a robot operating in a kitchen given image. When a human asked me to do a task, I will respond with the sequence of actions I would do to accomplish the task with only the items I see. So the human input prompt would be use all of the ingredients you see to make a cake batter. And then the output is one crack egg to put egg in a bowl, put flour in a bowl, put sugar in a bowl, mix, put in a pan, bake, eat, clean up. So based on just what it sees on this image, it gave instructions on how to bake the cake using just the items in the image, here's another example. There's two images here.
It asks what is in photo one, but not photo two photo one has sunglasses on top of folded clothes. Photo two does not have sunglasses on top of photo clothes. Therefore sunglasses are in photo one, but not photo two. So it spotted that the difference between these two photos is that it's got sunglasses in it. I didn't even notice that the first time I looked at it, given the image, I am getting just two custom pizzas for me and my friend, how much should I pay in total? You could see that the pizzas are 9.99.
It's also got chicken marsala here for 11.99, but it's ignoring that and only looking at the price of the pizza. And it's saying two pizzas times 9.99 equals $19.98. So it ignored the information it didn't need only grab the information it did need and did the math on the cost of two pizzas based on just this image prompt. I hope you enjoyed this video. Let me know what you think of this. I really want to do more videos like this, but I'm always curious. What do you think in the comments? Do you like videos like this? I can probably do a video a few times a week where I just break down all of the coolest advances that are happening in AI and all of this cool tech that's popped up on my radar and just break down what it is and why I'm excited about it. If you want to see more videos like this, let me know, because I would absolutely love to make more videos like this. These are some of the most fun videos I can make. Don't worry. I'm going to keep on making mid journey and stable diffusion and chat, GPT and Bing and tutorials around all of that kind of stuff, and I've got some really, really epic videos in the works that I'm excited to share as well, but some of these are really fun for me to just go check out all of these, like seven cool tools that I came across this week in my research and my building of future tools. So what do you think?
And speaking of future tools, check out future tools.io. This is the site where I put all of the cool tools that I come across. They're organized, they're filtered, they're sortable.
You can even see which ones people upvoted the most to see which ones other people like the most. And if this is just too overwhelming and you just want the TLDR for the week, click this little button to join the free newsletter and every single Friday, I'll just send you the five coolest tools that I came across for the week. I'll also send you a handful of news articles, a handful of YouTube videos, and one cool way to make money with AI. It's the TLDR of everything that happened in the AI world for the week. And it's completely free and you can find it firstname.lastname@example.org. Thanks so much for tuning in.
I really appreciate you. If you haven't already, maybe click that little subscribe button below. Give it a little thumbs up. By doing that, it'll make sure you see more cool AI videos in your newsfeed. So thanks again for hanging out with me. I really appreciate you taking the time and watching these videos. See you in the next one.