Got Social?

Experiment: Text to Voice for transcripts

In an effort to make these transcripts available to those who would normally need to use screen readers, or those who just prefer to listen to them instead of reading them, the SLER is experimenting with converting the text chat transcript to a voice enabled version.

This entire effort is the brain child of Margaret Czart (SL: Margaret Michalski).  Not only was this her idea, she is also the one who has invested the time to re-edit the written transcript to make it more listenable and taken care of the conversation and upload process, as well as putting the test web page together.  HUGE thanks to Margaret for her work on this.

As you can imaging, this work does take time.  In order to be sure that time is well invested, we are going to post the next few text-to-voice (t2v) chats here into this blog.  The purpose of this is to provide you with an opportunity to provide some feedback, which we think is really important.

If you love this idea, tell us.  If you hate this idea, tell us.  If you see or think of anything that can make it better, tell us.  No matter what your thoughts are, please tell us by responding by leaving comments below.  Even if you just write “hey, great idea” – it will let us know that there is a demand or interest in this.  We will use this feedback (and site traffic) to determine if the beta-test on this project will continue.

Here is the link, we look forward to your input.


Category: Second Life
  • Irix says:

    I couldn’t open the link, authorization required.

    October 26, 2009 at 8:39 am
    • ajblogsat says:

      Great – thanks for letting me know. Which link, the one to get to the google sites page from my blog ( or the one to download the voice chat mp3 from the google sites page?

      October 26, 2009 at 8:43 am
    • ajblogsat says:

      Hmm – I think one setting was off on the google sites page, by accident – please try again and let me know if the problem still exists.

      October 26, 2009 at 8:45 am
  • Julia says:

    Hi guys – when i click on the link above to the google site i get “insufficient permissions to view” message.

    October 26, 2009 at 8:44 am
    • Julia says:

      ah – all fixed i see :o )

      October 26, 2009 at 8:45 am
  • Frederic Emam-Zade says:

    Hello AJ:

    Frederic here (Zage Farman in SL).

    That link responds that I have “insufficient privileges”, thus i can´t evaluate it. I tried with my gmail account also, but nothing changed, I have “insufficient privileges”.

    October 26, 2009 at 8:46 am
    • ajblogsat says:

      Hello Frederic/Zage – please try again – I think I fixed the problem.

      October 26, 2009 at 8:51 am
      • Frederic Emam-Zade says:

        Now its fixed, yes, i am listening to the voiced transcript and I like it a lot.

        October 26, 2009 at 8:55 am
  • Julia Dando (aka Julala Demina) says:

    My first views:

    Great concept – improving accessibility to activities can only be a good thing. However, I found it difficult listening. If it was really clever perhaps it could identify each individual and use a different voice for each or at least alternate between several different voices.

    As with most electronic voice synthesizers the intonation is somewhat of a challenge.

    I look forward to seeing your progress with this – good luck!

    October 26, 2009 at 8:55 am
  • Ilene Frank says:

    Hey, I think this works well! It’s one more way to make this material accessible. As mentioned above, different voices for different participants would be great – but I don’t know that that much fussing would be warranted with every chat transcript. However I would like to know if that’s a possibility. Is it?

    October 26, 2009 at 9:24 am
    • ajblogsat says:

      I do not believe it is possible using the software that is currently being used. Margaret is at a conference, so once she gets back next week, hopefully she’ll have time to comb through these comments and let us know (here) if that is possible, but I think its not. I’m not even sure it is possible at all – how would the software know when to change voices? Its an interesting idea, we’ll have to wait and see how doable it is.

      October 26, 2009 at 9:34 am
      • Ann Steckel says:

        I looked into this and as far as i can determine, this is not possible with normal screen reader software. I do not know what Margaret used to create this, but it is doubtful multiple speaking voices can be cued.

        October 26, 2009 at 1:32 pm
  • arieliondotcom says:

    I like the idea for those who can hear. Also would like to see automated speech to text for those who can’t & to lessen the transcribing burden. I’d advise to find some other program besides RealPlayer. It’s a problematic program. 

    October 26, 2009 at 9:48 am
  • R. A. Lee says:

    This is a fantastic tool! How can we pilot this for you?

    October 26, 2009 at 10:11 am
    • ajblogsat says:

      Hi Ralee, thanks for checking this out. I’m glad you liked the t2v transcript, although not sure what you mean when you ask “How can we pilot this for you?” thx aj

      October 26, 2009 at 10:14 am
  • Sabine Reljic says:

    This is a great tool, certainly. However, the synthetic uniform form makes it difficult for me to multitask (I don’t multitask when reading the transcript, however a voice-version allows me to, and therefore I do). I have to focus on what is said more than i would in a podcast in order to identify the different speakers. The different tonalities and registers of human voices work as markers (I unconsciously know who talks and if there is a change in argumention, i.e., most of the time identifying an important topic to the speaker), so I’ll know how and when to redistribute my attention.
    (and I have to smile when trying to mentally catch up with the synthetizer when it spells out URLs :-)
    I don’t want to take away from this t2v option. I think that this is a great tool and an awesome idea to make the material more accessible and to more people. I’d love see how it will develop further.

    October 26, 2009 at 10:52 am
  • David Smith says:

    This is great. In addition to multiple voices (I used to play around with this type of software so I know that would be a chore.) I would like see what approach we can take to urls. The addresses embedded in the text serves little purpose but if a future software package would extract the urls to a text file (bibliography style) then they might be better used. At the point of reference the text to speech would state a number (Link3) and then you could quickly make note to view it from the bibliography. Others might have additional suggestions.

    October 26, 2009 at 11:10 am
  • Birdie Newborn says:

    Useless. If the voices were all individual it might make sense, but so many people say, and respond, in such an achronological order, and linear sound gives no access to checking back to see who responded to whom, or sort the various threads, that I found it …useless. Also takes up a lot of bandwidth. I can’t think of who might find it useful.

    October 26, 2009 at 11:44 am
    • Irix says:

      I agree with Birdie.
      I don’t think that such a translation would be much helpful.
      Instead, why don’t you do the contrary? That is, to use voice during the meeting and to give an instant revised transcript of that.

      I have participated to some meetings and I decided to read the transcripts of the others, these are my doubts:
      1. with voice there is a better use of time: in only one hour more issues can be discussed, and in a clearer way, if the discussion is well conducted.
      With text discussion a lot of contemporary threads can develop, and it is not always a positive result. (I often left meetings, after the end, with a sense of incompleteness about the subject discussed).
      2. people who cannot use voice can follow the discussion through the instant transcript. In this way both people who cannot read or listen can participate.
      3. you could then make available both the entire conversation and the written transcript, so that everyone can make the proper choice.

      Of course, the difficult part would be managing the instant transcript. And, obviously, I do not have answers, only questions ;)

      I am pretty sure you already discussed about this solution, but I thought this could be a good occasion to share my thoughts.
      Anyway, I appreciate your initiatives and efforts!!!

      October 27, 2009 at 8:20 am
  • Rosanna says:

    This voice transcript is fantastic! I am especially impressed with the screen reader’s natural-ness!
    Would it be possible to put something like tracks; so I could skip the introductory SLER information or the shout-out and go directly to the content, for example?

    October 26, 2009 at 1:49 pm
    • Oronoque Westland says:

      In the absence of tracks, I opened the file in Windows Media Player. That allowed me to skip over sections as I pleased.

      October 27, 2009 at 10:18 am
  • Serenek Timeless says:

    Voice transcripts are a good idea in principle, but these are very hard to follow for many reasons. Others have commented on the problem of the URLs. But also the lack of pauses at the ends of sentences and phrases makes everything hard to parse and understand. Having different voices for male and female speakers would help make it clear when then speaker changes. For now I’m going to stick to the written transcripts since I can never make the current meeting time.

    October 26, 2009 at 1:55 pm
  • Rosanna says:

    Furthermore, the editing of the text chat is extremely good.

    October 26, 2009 at 1:57 pm
  • Oronoque Westland says:

    I was able to download and listen to the file without any problem. Don’t have time to listen to the entire session right now, but wanted to let you know that what I heard sounded good. I think this is a big plus for persons who have visual impairments or have some other reason for opting for audio over text. Three cheers for Margaret. As someone who produces radio broadcasts, I know this was no quick and easy undertaking.

    October 27, 2009 at 10:16 am
  • H. L. Spiegel says:

    Everything was fine, and generally the pace was good, though the voice zoomed through the URLs (granted, anyone listening to these would figure out where to go!).

    October 27, 2009 at 10:28 am
  • Margaret Czart says:

    Hello Everyone,

    I have glanced through the comments and will do my best to answer a few of the questions.

    1. What program was used? I used NaturalReader Personal Version. It provides you with 1 female and 1 male voice. There is a professional version but you just get 2 additional voices.

    2. Can you have multiple voices? Well, if you create a seperate file for every individual and then somehow combine everything together then the answer is “yes”. Unfortunately, I am not a technical person to do that.

    3. Can tracks be created? Well, from what I have tested for my own purposed you may adjust the setting for the program to either automatically divide the transcript in some random manner OR you can mannually divide the transcript. Do we need this?

    4. Can something be done with URLs & SLUrls? I don’t think there is an easy way. We can say look at the transcript but that does not help those with disabilities. I am open to ideas.

    I understand that ideally it would be best to have a nice variety of voices but I have not found any software that allows you to do that. Also this is one of the best programs I found that actually sounds natrual. There may be better ones out there but I just did not find it. Finally, I do not want to take all of the credit for this idea because all of us SLER’s work as a team and every discussion brings new ideas.


    November 5, 2009 at 10:25 pm
  • Ann Steckel says:

    Margaret – Your time and skill is much appreciated. Thank you for creating an accessible alternative to text.

    November 6, 2009 at 12:42 pm
  • Casey Ashe says:

    I echo Julia’s comments. It would also be good if the text could be marked for pauses at least and some accent stresses (if possible) which are then translated into the way the synthesized voice reads the text. This in order to help the listener make transitions from one speaker to another and and to better digest what is said.

    It is a good idea and useful concept.

    November 17, 2009 at 5:20 pm

Your email address will not be published. Required fields are marked *