How to Install and Use Midscene, the Revolutionary Automation Tool 🚀

Thursday, Jan 2, 2025 | 7 minute read

GitHub Trend
How to Install and Use Midscene, the Revolutionary Automation Tool 🚀

Experience the future of automated testing with an innovative tool that simplifies interaction through natural language! Improve UI testing efficiency, enjoy intuitive reporting, and take advantage of zero-code onboarding! Embrace seamless automation today! 🚀✨

The future is here, and AI is reshaping the way we work ✨

In today’s rapidly evolving tech landscape, automated testing has become an indispensable part of software development. With the rapid advancement of artificial intelligence, traditional automation testing methods seem to be lagging behind. To address this issue, the magical tool Midscene.js has come to the rescue! 🤖 It significantly enhances UI testing efficiency and breaks down the barriers of human-computer interaction, allowing developers to easily realize their dreams of automated testing! ✨

Midscene is a powerful tool that enables users to interact with web applications through natural language. Imagine describing actions simply instead of dealing with complex code from the past. This advancement allows developers to genuinely experience the convenience and joy that technology brings. 🎉

The Game-Changer in UI Automation: What Is Midscene? 🤔

Midscene.js is an innovative, AI-driven automation SDK designed to completely transform the way we conduct UI testing and interactions! It leverages multimodal large language models (LLM), enabling users to effortlessly control web applications using natural language, thus significantly enhancing the overall automation experience. ✨

This AI-driven automation solution makes executing UI tests as simple as conversing with a person. It creates a new human-computer interaction paradigm, lowers the learning curve, and boosts productivity, making everything incredibly easy! 🎉

The Unique Appeal of Midscene 🌟

  • Natural Language Interaction: Users can simply describe actions, and the system will understand and execute operations, managing the UI with great ease! 🗣️
  • JSON Data Extraction: Midscene.js allows users to extract necessary data in JSON format, simplifying the complexity of information retrieval—truly a handy tool! 📊
  • Intuitive Assertion Mechanism: Users can validate using natural language, simplifying the verification process and enhancing overall user experience. 🔍
  • Zero-Code Onboarding: Users can utilize the Chrome extension to easily experience Midscene’s powerful features without manually writing code—get started in no time! ⚡
  • Detailed Visual Reports: After each automated task execution, the system generates comprehensive visual reports, allowing users to review every action and its outcome transparently—what a thoughtful feature! 📈
  • Robust Public LLM Support: By integrating public large language models, you can enjoy powerful features without the need for tailored training, accelerating project progress! 💻
  • Flexible Customization Options: Midscene supports various customization options to better suit users’ specific needs, offering flexibility. 🎨

Why Developers Love Midscene ❤️

  • User-Friendly Interface: Whether you’re a newbie or a seasoned pro, everyone can easily get started and quickly master core features! 👩‍💻👨‍💻
  • Open Source Project: Midscene empowers users with secure control over data, ensuring data privacy and security—definitely trustworthy! 🔐
  • Powerful API and Integration Features: Seamlessly integrates with existing tools and adapts to various workflows; its flexibility is unbeatable! 🔗
  • Active Community Support: On the official forums, you can find a wealth of learning resources, tutorials, and use cases, growing together and learning from others in the industry! 🛠️

Midscene.js not only offers a fresh perspective and methods for UI testing but also revolutionizes traditional automation processes, allowing developers to work in a simpler, safer, and more efficient environment. This powerful tool is worth a deep exploration by every developer and tester, ready to embark on a new chapter in automated testing! 🌈

Installing and Using Midscene 🚀

To start experiencing Midscene, you first need to install this command-line tool in your development environment! ✨ Simply run the provided npm command, and you’re good to go:

npm i -g @midscene/cli
  • Explanation: This command globally installs Midscene CLI using npm (the package manager for Node.js), allowing easy access from anywhere! If you only want to use Midscene in a specific project, run the command below:
npm i @midscene/cli --save-dev
  • Note: This command installs Midscene CLI as a development dependency, available only in the current project.

Once the installation is complete, you can execute automated tasks with a simple command, such as running the task named bing-search.yaml:

midscene ./bing-search.yaml
  • Note: This command directly calls Midscene to perform the specific web operations defined in your YAML script.

You can also run Midscene using npx, for example:

npx midscene ./bing-search.yaml
  • 📝 Note: npx is a package runner that helps you execute CLI tools in your project without needing to install them globally—super convenient!

Code Examples and Explanations 📜

Midscene lets users interact with web pages through simple and intuitive commands. Now, let’s take a look at some code examples showcasing key features with detailed explanations!

Searching 🔍

// 👀 type keywords, perform a search
await ai('type "Headphones" in search box, hit Enter');
  • Explanation: This code utilizes the ai function to simulate a user typing the keyword “Headphones” in the search box and pressing Enter to trigger the search. It’s a smart way to automate user actions—how cool is that!

Querying Product Information 🛒

// 👀 find the items, return in JSON
const items = await aiQuery(
  "{itemTitle: string, price: Number}[], find item in list and corresponding price"
);
  • Breakdown: This snippet uses the aiQuery function to query product information, asking for an array containing the product title itemTitle and its price price. Wow! The returned data format is JSON, making it perfect for further processing!

Outputting Query Results 📦

console.log("headphones in stock", items);
  • Comment: This line outputs the headphone information stored in the items variable to the console. If headphones are found in stock, the program will display their titles and prices—truly user-friendly!

Natural Language Assertion 🔑

// 👀 assert by natural language
await aiAssert("There is a category filter on the left");
  • Interpretation: This code uses the aiAssert function for natural language verification, ensuring that a category filter is present on the left side of the webpage, helping to confirm the layout meets expectations—smart and efficient!

Integration Methods 🔗

Midscene provides multiple integration options, so you can choose as needed:

  1. Automate using YAML scripts
  2. Integrate with Puppeteer
  3. Integrate with Playwright

YAML Example 📄

Here’s a basic example using a YAML script for the quarterly weather:

target:
  url: https://www.bing.com

tasks:
  - name: search weather
    flow:
      - ai: search for 'weather today'
      - sleep: 3000

  - name: check result
    flow:
      - aiAssert: the result shows the weather info
  • Breakdown: In this example, the target node specifies the web page to be accessed, while the tasks array contains two tasks. First, it searches for “weather,” and then it uses aiAssert to verify whether the result includes weather information, with sleep pausing the program to wait for the page to fully load—smart thinking!

Puppeteer Example 🌐

Here’s a code example for web automation using Puppeteer:

import puppeteer from "puppeteer";
import { PuppeteerAgent } from "@midscene/web/puppeteer";

const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
  (async () => {
    const browser = await puppeteer.launch({
      headless: false,
    });

    const page = await browser.newPage();
    await page.setViewport({
      width: 1280,
      height: 800,
      deviceScaleFactor: 1,
    });

    await page.goto("https://www.ebay.com");
    await sleep(5000);

    const mid = new PuppeteerAgent(page);

    await mid.aiAction('type "Headphones" in search box, hit Enter');
    await sleep(5000);

    const items = await mid.aiQuery(
      "{itemTitle: string, price: Number}[], find item in list and corresponding price"
    );
    console.log("headphones in stock", items);

    await mid.aiAssert("There is a category filter on the left");

    await browser.close();
  })()
);
  • Explanation: This code uses Puppeteer to open the eBay website, creating a new browser page. It then uses the PuppeteerAgent class to execute automated actions, such as searching for headphones and retrieving search results before closing the browser to free up resources—such a compact and powerful solution!

Running Playwright Tests ⚙️

Use the following command to execute Playwright tests:

npx playwright test ./e2e/ebay-search.spec.ts
  • Note: This command quickly launches the Playwright testing system to perform the specified search functionality, ensuring that your testing process is fast and efficient—super easy!

Data Privacy 🔒

Midscene.js, as an open-source project, follows the MIT License. Users can run Midscene in their local environment, and the data collected will only be sent to OpenAI or selected model providers, ensuring privacy and security—allowing you to use it with peace of mind!

Custom Models 🧩

Midscene defaults to using the OpenAI GPT-4o model, but users can opt for custom multimodal models to increase flexibility and adapt to various use cases—how great is that!

Through this series of steps and examples, I hope you’ll easily get started with Midscene, making the most of its powerful capabilities to automate various web tasks. Give it a try! 🎊

© 2024 - 2025 GitHub Trend

📈 Fun Projects 🔝