Phraser is supposed to help with prompt generation for DALL-E 2 and co., while OpenAI’s Whisper enables free audio transcriptions.
Image AIs let even people who can barely hold a pen generate creative art. Provided they master so-called “prompt engineering” – the art of giving the AI the right image command.
This is not as trivial as it sounds. For one thing, of course, you have to be fundamentally capable of translating an image idea into the most pictorial language possible. For another, generative image AIs such as DALL-E 2, Midjourney, or Stable Diffusion have countless parameters and styles that strongly influence image generation.
The Phraser web software is designed to facilitate prompt engineering. As usual, you have to develop the image idea yourself, but when it comes to finding the style, Phraser provides support along the various parameters of the individual systems.
Through a step-by-step menu, you can decide
- on the medium (e.g., photo, template, movie poster),
- create a text description with the most important components,
- choose color, texture, and resolution
- and decide on camera settings, the mood, and the era.
After logging in, you get the appropriate prompt for the initially selected image AI. In addition, the software inspires you with similar images that have already been generated and somewhat match your prompt.
OpenAI Whisper arrives in first tools
With Whisper, OpenAI recently released an open-source model for speech recognition and transcription in various languages. OpenAI makes the model freely accessible and available free of charge – the first developers are downloading it and integrating it into tools.
With YouTube Whisperer, the cloud platform Hugging Face already has an implementation of the model in a simple user interface that can be used to transcribe YouTube videos.
Whisper by OpenAI, also on Hugging Face, can turn words spoken into a microphone into text within a few seconds. However, the software is only available as a demo, which stops after 30 seconds. But you can record several texts in a row.
Probably the most interesting project currently is Stage Whisper: Here a team of volunteers is working together to develop a simple and free transcription app based on Whisper, which can be used by people who are less familiar with the technology. A first version is expected to be released in just a few weeks. Anyone who wants to get involved can sign up on Stage Whisper’s Discord channel.
Another project on Github, “Whispering,” wants to use Whisper for real-time transcription.