PDA

View Full Version : Voice recognition and activation!


dmale7
05-12-2003, 02:18 PM
I have lived through Wondersilk, Photoskins, launcher wars, Mythomania, dvd ripping to clie veiwing, WiFi, bluetooth, tappads, jackflash, and cf drivers...I want more!

Am I the only person that thinks that Voice activation and recognition would make the NX's and NZ sickeningly awesome?  I'd like to know how many of us agree with this and what do some of you think could be done to bring this about.  :cool:

ldbobby
05-12-2003, 03:22 PM
I was thinking this too, that would be some sick stuff!! :D

ashVID
05-12-2003, 03:39 PM
Very doable... at least for simple commands. I see no reason you couldnt at least have a VA launcher plug in...


ash =o)

george0
05-12-2003, 03:43 PM
not sure if the processor will be fast enough to do this, also probably would need sony to release some kind of api for the voice recording hardware?

doctor
05-12-2003, 03:46 PM
What about voice calling services (for ex. MSN) and what about webcamera. We have mic, we have repro and we have camera.

OcellNuri
05-12-2003, 03:49 PM
And we don't have API's....

javadog
05-12-2003, 03:52 PM
Originally posted by OcellNuri
And we don't have API's....

Absolutely... Sony has not given out (and probably wont) the APIs for developers to make voice recordings. Right along the lines of CF slot support.... We've had no problem adding voice recording to the Tungsten, etc., but Sony is a no go so far. (We even asked Sony about this at PalmSource last week :( )

PalmGearJenn
05-12-2003, 04:12 PM
I want this too. I have heard rumors that it is in the works but I cannot say to what extent.

OcellNuri
05-12-2003, 04:17 PM
Originally posted by javadog


Absolutely... Sony has not given out (and probably wont) the APIs for developers to make voice recordings. Right along the lines of CF slot support.... We've had no problem adding voice recording to the Tungsten, etc., but Sony is a no go so far. (We even asked Sony about this at PalmSource last week :( )

I'm curious what the tone of thier reply was.

Sicarius
05-12-2003, 09:06 PM
My ideal would be to have voice recognition on the Clie that could work through a bluetooth headset.

Couple that with the ability to hand off the headset to my phone and it would be perfect.

For those of you who may say "Doesn't your phone already have voice dial?" Well, yes, it does, but there are limits. There are only so many voice entries and the Clie, with much greater memory, would allow me to put a voice command on all entries.

Also, I can envision another form of "voice dialing" -- just say the number: five-five-five-one-two-one-two. My phone doesn't have that feature.

Just some thoughts, probably not possible.

dmale7
05-13-2003, 12:15 AM
It seems to me...like the cf driver, there must be a way around Big Brother Sony. maybe an app that can be 'attached' to the voice recorder? I hope my techie friends out there get their brain juices going. It seems like if we want anything new it always begins here at the SOURCE. Agreed?

mouthdrummer
05-13-2003, 12:18 PM
Originally posted by dmale7
It seems to me...like the cf driver, there must be a way around Big Brother Sony. maybe an app that can be 'attached' to the voice recorder? I hope my techie friends out there get their brain juices going. It seems like if we want anything new it always begins here at the SOURCE. Agreed?

That sounds like a challenge to me! What do you say folks?

;)

nx70
05-13-2003, 04:42 PM
Voice recognition would maybe work with the processor speed- given it would be coded in assembler, yeah the API's..
But the webcam and voice feature together would be maybe too much, wouldn't they?

keesercc
05-13-2003, 07:05 PM
agreed, this would be cool. a CF cellular phone card w/voice commands would be the shiznittle. man you could do everything... call a friend, record a movie, surf the net, watch a movie, play a game, record a voice memo, draw a picture, take a picture, man the NX is versatile.

dmale7
05-14-2003, 08:57 AM
Question. Why would the NX or NZ require more processing power for voice recognition? Several people have stated that more power would be needed for this to work. Why? My old audivox cell phone could do it. I can understand how something like speech-to-text could require more juice, but I believe the NX and NZ should be able to handle it. Please correct me if I'm wrong.

(I'm tryin' to start a crusade here...if you haven't noticed)

patrickl
05-14-2003, 09:10 AM
I had a program called Incube on a 386 years ago. That hardware performed voice (command) recognition too. I guess a Clie has way more power than that.

nx70
05-14-2003, 10:16 AM
Do you two mean only voice commands like in cell phones, like: open "VoiceRecorder" or did you mean "speech-to-text-recognition". I am talking about the last one, for I don't have any needs for a simple voice command.

dmale7
05-14-2003, 03:48 PM
Originally posted by nx70
Do you two mean only voice commands like in cell phones, like: open "VoiceRecorder" or did you mean "speech-to-text-recognition". I am talking about the last one, for I don't have any needs for a simple voice command.

 

 

Both actually.:D

dmale7
05-14-2003, 03:59 PM
How about something like this...

An app that allows you to 'train' the NX or NZ to write what you say like with voice COMMANDS. Maybe a 'library' can be compiled
with voice commands that do what we preset it to do, whether it's writing text or performing some other operation. Something like an audio mcphling...or a shortcuts but with recorded voice commands

Pdasrock
05-14-2003, 05:17 PM
  I think that the nx has the capability to do this, and I think there are people willing to capitalize on it and write some software, the problem is the APIs. Sony recently release some sort of sound API right(I didn't really follow what exactly it was). If they are willing to release APIs where they see a genuine need which may or may not be the case, then why not start a petition? We could start one here and post links on other nx sites. Cheers

deadlyfoez
05-14-2003, 10:36 PM
Ya know, I thought of this forever ago and voiced my ideas, but people shunned it out as "IMPOSSIBLE". It would be awesome to have a luancher that could work with voice reconnition.... and also have other user defined sound other than "clicks". Too bad I do not know how to program palm os or I would do it myself and give it away for free

CosmicBlend
05-14-2003, 11:31 PM
What ever happend to the NX webcam development for use with the wifi card?

n2ifp
05-14-2003, 11:33 PM
I want some Sony recognition for our needs!

keesercc
05-15-2003, 12:27 AM
Originally posted by CosmicBlend
What ever happend to the NX webcam development for use with the wifi card?

yeah, what ever happened to that????

ballistic
05-15-2003, 06:37 AM
I started this thread (http://www.cliesource.com/forums/showthread.php?s=&threadid=1036&highlight=thinktank) a while back with a similar wish for voice recognition.

Excerpts.

Originally posted by ballistic

Text to Speech
My same 5 year old MP2100 can read text to me in a computerized voice (not a 'built-in' feature though). I'd like my NX to read my email, webclips, ebooks to me while I'm driving to work (using some type of cradle, and my car's speakers). This would make much better use of my commute than listening to Stern or Imus. The Newton does it with a 161MHz processor and 4MB of 'heap' memory. My NX's processing power should be able to handle the task without breaking a sweat.

Voice Recognition
If a cell phone can voice dial, why can't my NX recognize a limited dictionary of voice commands so that I can control my NX hands-free during my commute? I'd like to be able to tell my NX to read my email, ebooks, webclips, etc. How about having the NX read me an email message, then I can say, "Clie, Reply to Message." The NX could open a new voice memo and I could dictate my reply. When I get home or to work, I could put my NX in the cradle, the voice memo would get imported into a program such as Dragon Naturally Speaking on the desktop. Using the desktop's superior processing power for voice recognition, my voice memo would be converted to text, and I could then preview and edit my reply. When I'm satisfied, I hit SEND and my email reply is sent.


I'm still waiting.

dmale7
05-15-2003, 07:59 AM
I think we should keep this going. I've seen a lot in the past 5 months as a proud NX owner. I wonder if Eruware would like to tackle this. Are you there Ayasin?

Pdasrock
05-15-2003, 12:01 PM
Come on sony, release the dang APIs...

fgarcia59
06-10-2003, 07:29 PM
Originally posted by dmale7
Question. Why would the NX or NZ require more processing power for voice recognition? Several people have stated that more power would be needed for this to work. Why? My old audivox cell phone could do it. I can understand how something like speech-to-text could require more juice, but I believe the NX and NZ should be able to handle it. Please correct me if I'm wrong.

(I'm tryin' to start a crusade here...if you haven't noticed)

What about text to speach. I would love my NX to speach out my reminders and appointments, so I can hear them through the headphones when I am listening to my mp3 music.

Thanks

onyxworld
06-11-2003, 02:12 PM
Originally posted by keesercc
agreed, this would be cool. a CF cellular phone card w/voice commands would be the shiznittle. man you could do everything... man the NX is versatile.

hi, i agree; this would be cool. i found this link for a cf phone card.

http://www.convergentech.com/cfgsmgprs.htm

pocket pc software, though. i haven't done any pocket pc to palm os cross-compiling. don't know how hard that would be.

the card responds to standard at commands. therefore, from a terminal program, one should be able to make a call.

i took advantage of the amazon/free shipping deal. i have not received the nx-70 yet. hence, i haven't had a chance to test cf slot addressability using eruware drivers.


kp

ayasin
06-11-2003, 02:46 PM
Originally posted by onyxworld


i haven't done any pocket pc to palm os cross-compiling. don't know how hard that would be.

Nearly impossible. Palm does not use the Standard C APIs such as those in stdio.h or stdlib.h they have their own versions of everything. Also the GUI and sound APIs are totally incompatable. You're basicly looking at rewriting an app to go from PPC to Palm or the other way.

pturp
06-11-2003, 04:23 PM
Many years ago you had an add-on (from IBM on PCs they sold)with Windows 3.11 : you could associate a 'voice print' with a command to execute. Since this was before the pentium I guess this did not take a lot of CPU cycles. And guess it worked very well.

onyxworld
06-11-2003, 04:48 PM
Originally posted by ayasin


Nearly impossible. Palm does not use the Standard C APIs such as those in stdio.h or stdlib.h they have their own versions of everything. Also the GUI and sound APIs are totally incompatable. You're basicly looking at rewriting an app to go from PPC to Palm or the other way.

appforge's mobileVB development environment for visual basic seems to handle all of those issues seamlessly. ericksson-sony licenced it for the p800. as of now, it does not support os 5, however. have you used it?

abosco
06-11-2003, 04:51 PM
I say we start a large Sony emailing campaign. It didn't do any good for getting Sony off their ***, but it got other developers to turn their heads and look into the project since so many people want it. I say we gather everybody at ClieSource and get them to send Sony Support mass emails demanding their proprietary API's for the camera, voice recorder, sound, CF, and everything else be released to developers to be worked on. I honestly don't understand how something like a released voice recorder API would hurt them. They aren't making money off of NOT having anything available for it. If somebody makes an application for your hardware that is in demand, wouldn't more people be attracted to it?

Sony, just because you're sitting on your fat *** doesn't mean motivated developers have to.

Get IBM ViaVoice or something! We are rapidly approaching the processing power and memory requirements to do so!

hecklerz
06-11-2003, 05:41 PM
Originally posted by keesercc
agreed, this would be cool. a CF cellular phone card w/voice commands would be the shiznittle. man you could do everything... call a friend, record a movie, surf the net, watch a movie, play a game, record a voice memo, draw a picture, take a picture, man the NX is versatile.

Yeah and never have to buy another Clie. Exactly what Sony doesn't want to have happen. Why give you the ability to add features to your current model and not give you a reason to buy the next greatest. :D Sony seem to want to be a bit like Apple where the Clie is concerned. Just my $.02 :)

hecklerz
06-11-2003, 05:41 PM
Originally posted by keesercc
agreed, this would be cool. a CF cellular phone card w/voice commands would be the shiznittle. man you could do everything... call a friend, record a movie, surf the net, watch a movie, play a game, record a voice memo, draw a picture, take a picture, man the NX is versatile.

Yeah and never have to buy another Clie. Exactly what Sony doesn't want to have happen. Why give you the ability to add features to your current model and not give you a reason to buy the next greatest. :D Sony seem to want to be a bit like Apple where the Clie is concerned. Just my $.02 :)

ayasin
06-11-2003, 08:27 PM
Originally posted by onyxworld


appforge's mobileVB development environment for visual basic seems to handle all of those issues seamlessly. ericksson-sony licenced it for the p800. as of now, it does not support os 5, however. have you used it?

I haven't used it, I mostly use C/C++ for Palm programming (and PPC for that matter). I don't want to start a war here but I personally don't believe that a VB clone would be up to the task for this type of app. For a GUI app like a checklist or something it's probably fine, but I question the performance and quality you'll get from it for an app of this nature.

onyxworld
06-11-2003, 09:21 PM
Originally posted by ayasin


I haven't used it, I mostly use C/C++ for Palm programming (and PPC for that matter) ... but I question the performance and quality you'll get from it for an app of this nature.

you're probably right about vb code robustness. i was surprised that sony used it for the p800.

the real question is why isn't everything (for mobile devices) written in java since it's conceived for portability. byte-code might be a little larger than compiled c++. this really isn't an issue, though, with the availability of relatively cheap memory.

incidently, the card in question uses 16550 uart serial controller. i contacted the chip manufacturer to see if palm os drivers are forth coming for the cmcs gprs card in question.

hopefully this isn't too much information.

ayasin
06-11-2003, 10:01 PM
Originally posted by onyxworld
the real question is why isn't everything (for mobile devices) written in java since it's conceived for portability.

For several reasons
1. Java isn't fast, it needs to be interperted. This can be somewhat mitigated by install time compilation, but this is complicated by the fact that some features of java rely on the fact that it's interperted.

2. Java has querks...for example it doesn't do math correctly (IEEE 754 standard).

3. Write once run anywhere is a dream of java, not a reality...just ask people who write lots of java code for a living :).

On the plus side Java natively implements a threading model that can be used to create (green) threads in single threaded environments. A better solution of course would be for Palm to license the threading portion of the AMX kernel API for redistribution and compile the kernel with 4 or 5 extra thread slots reserved for user apps.

If you're interested in checking out Java for embedded systems, there's a descent implementation of it for Palm...I think you can get it from Palmgear.

dmale7
06-12-2003, 07:06 AM
Ayasin, in your opinion, is voice recognition and/or voice command doable on the NX and NZ? Can the mic be utilized independently from the voice recorder app?

dmale7
06-12-2003, 07:06 AM
Ayasin, in your opinion, is voice recognition and/or voice command doable on the NX and NZ? Can the mic be utilized independently from the voice recorder app?

onyxworld
06-12-2003, 10:09 AM
Originally posted by ayasin


3. Write once run anywhere is a dream of java, not a reality...just ask people who write lots of java code for a living :).




i have actually been coding java for two and three-tier systems for a living since 1997. the last two years, i have been developing systems to push content to embedded devices. i for one think java--o-o languages in general--is great. so maybe you meant people who code the embedded system devices.

part of the challenge is finding the right tool for the job. it's so easy to get stuck in one vendor's tool set; but even that might change with sun's introduction at javaone of the reduced or simplified tool set and two-tier development environment. we'll see.

i learned in the early 1990's while hacking linux that most things are doable--if not by me, then by someone else. you just need enough sets of eyeballs (information) to put the pieces in the puzzle.

to that end, i've been following the cf-modem thread in this forum. although i am not really a hardware guy, it seems to me that if the drivers they are using to access the (serial) modem work, then that driver or one very similar to it should be sufficient to control a phone card because all of the functionality of the card is accessed using hayes' at commands.

now, for voice recognition, i have worked with voice-xml, client-server environments that could could control external, web-enabled tasks. i don't know about controlling internal tasks. does the palm os 5 sdk allow access to the mic. and dsp? i mean, if you can sample it, you can interpret it. then you pass the interpreted event to the rest of the environment. palm os must have event-handlers, right?

Unregistered
06-12-2003, 11:11 AM
They are doing some voice recognition work at Carnegie Mellon that resulted in code that runs nicely on a Toshiba Libretto, a wonderful little laptop variant in the Pentium 100-200 class.

The NX's should be of roughly equivalent power.

BTW, I have found that routing on my GPS software would be an excellent reason to have more processor power on my NX. Maybe we could have a switch that would allow the unit to run at higher speed for specific tasks and lower speeds for most mundane tasks to conserve power. Hmmmm....

BalBurgh
06-12-2003, 11:13 AM
Last post was by me.

So how come I can go to my profile page and the board knows it's me, but I have to log in when I want to post? Moreover, when I post, why am I listed as a guest instead of registered?

What's the deal?

lal2707
06-12-2003, 01:56 PM
There are so many cool features we could use the NX for but those Sony folks do not want to consider their customers! If they checked out this site they would get so much input that future products could be so much better. As I keep saying Sony will learn just like IBM,Msoft et al that in the end its so much better to be open -- go on release the APIs and documentation.. you know it makes sense!

ayasin
06-12-2003, 02:21 PM
Originally posted by dmale7
Ayasin, in your opinion, is voice recognition and/or voice command doable on the NX and NZ? Can the mic be utilized independently from the voice recorder app?

Yes, it's doable with a huge but...someone will have to put in a huge amount of time to figure out exactly how to drive the Sony hardware to get the mic data and then figure out how to write the code to do the analysis on the data to determine what to do (voice commands). Without Sony's help this will likely prove to be more difficult than the CF Driver. The additional problem with this is that since Sony doesn't release the API here they are free to make whatever changes suit their facny even within the same model line (look what happened with JackFlash and some of the TG50s). If they release the API there's an implied support for it by the vendor...this may be part of the reason they don't want to do this.

ayasin
06-12-2003, 02:28 PM
Originally posted by onyxworld
i for one think java--o-o languages in general--is great.

I don't disagree. I think that OO languages in general are very good (when you can spare the overhead). I love to use Java/.NET and C++ when possible. I think that Java needs to mature some and I doubt it will ever replace C++ as some have contended, but I also don't see C++ replacing Java for many things. I think the main battle to be fought now will be between Java and .NET. C# has already fired the first voley by announcing support for templates and generic programming (not as complete as C++ but much simpler to use and understand...C++ templates and partial specalization are pretty tricky for novices). Java has announced something but it's not in the VM...it's specialized at compile time while the C# one is actually specialized in the VM. I'm interested to see what happens next but this discussion should probably move to the developer forum since no one here cares :).

Ezikial Anta
06-12-2003, 03:59 PM
Haha Sumone said Shizznittle LOL! Shizzle Shizzzitzzzle I am ghetto for rizzzle!!!

lal2707
06-12-2003, 04:18 PM
Have the Sony developers heard of backward compatibility? If they released the APIs they can always add to it later but leave current versions working

ayasin
06-12-2003, 06:24 PM
Originally posted by lal2707
Have the Sony developers heard of backward compatibility? If they released the APIs they can always add to it later but leave current versions working

That's not always possible when the underlying hardware is changing substantially.

dmale7
06-12-2003, 10:58 PM
Thanks, Ayasin. Sony! What a company. How can you suck and be cool at the same time? Maybe that's in Sony's mission statement.

rquinlan19
06-12-2003, 11:42 PM
I don't know a whole lot about programming, let me just set that straight. Could someone explain to me why the APIs are so important? What makes it so difficult to "hack the API" seems to me that you could monitor how another application does things and just mimic it. What part of the puzzle is missing? Thanks

ayasin
06-13-2003, 12:40 AM
Originally posted by rquinlan19
I don't know a whole lot about programming, let me just set that straight. Could someone explain to me why the APIs are so important? What makes it so difficult to "hack the API" seems to me that you could monitor how another application does things and just mimic it. What part of the puzzle is missing? Thanks
An API stands for application programming interface. It’s how other people interface with your library or application. It’s important because without it people can’t interface with your library/application. The API is generally distributed as a header file that you use when you compile so that the compiler knows how to generate the assembly to call the functions you’re interested in calling. For example, you may be interested in calling a function called “bool SetClieScreenBrightness(int redSaturation, int blueSaturation, int greenSaturation)”. In order to do this you or your compiler have to know (1) how many parameters there are and in what order they should be put on the stack in (2) what parameters will the function put on the stack and (3) who is responsible for cleaning up the stack. There’s a lot more to it than that, but I don’t want to write a compiler book here :) this is just an example. To “Hack” an API you need to figure out (1) what function is being called (i.e. what’s the purpose of that function), (2) what the parameters are (you can do this by understanding the starting and ending state of the stack assuming you know what’s being put on there) and finally what’s coming out (and potentially what the meaning of that value is). This is a pretty daunting task on even a well supported system like Windows. You can use a debugger or a tool like SoftICE if you have an excellent understanding of assembly to do this. On palm devices it’s much more difficult. For one, there isn’t the support for on board debuggers that allow you to attach to random processes. Therefore you’re pretty much limited to (1) hacking with applications and trying to find out stuff that way or (2) finding out what the library is and using an on board forth style language/interpreter (like Quartus Forth) to call the functions in order and see what happens (plan to do lots of soft and some hard resets). If you’re not asleep :p and want more info let me know but that should give you a general answer to your questions.

dmale7
06-13-2003, 12:50 AM
so basically, without the API's, trial and error is the only way to get anything done, right?

rquinlan19
06-13-2003, 12:51 AM
Thanks for the info. That makes sense, but what I'm missing too is why there isn't some built in mechanisim for "self descovery". Seems to me you should be able to universally query an api and have it dump out what it's capable of doing. I'm a network nut so the way I see it, I can logon to a router console with an IOS version I know nothing about but I can still type "?" and list my available commands for each part of the software and stumble my way through. What prevents you from seeing a list of functions and thier associated paramaters? Thanks

ayasin
06-13-2003, 12:54 AM
Originally posted by dmale7
so basically, without the API's, trial and error is the only way to get anything done, right?

Pretty much...you can reduce the trial and error a bit by disassembling one of the programs that uses the API and studying it carefully...but it's still trial and error. Also disassembling programs is almost always a violation of the license agreement so if your a company or plan to use the software you're going to write in a commercial setting it's a big no no.

ayasin
06-13-2003, 01:03 AM
Originally posted by rquinlan19
What prevents you from seeing a list of functions and thier associated paramaters? Thanks

Java and .NET do this (it's called Introspection in Java and some similar thing in .NET) and people immediately came out with obfuscators to prevent it becuase companies don't want you using code that they don't publish for that purpose. An obfuscator basiclly makes an introspection tool useless. Take for example:

DoSomething(int whatToDo, int whatNotToDo, bool ExplodeInYourHand)

would be converted to:

a(int a, int aa, bool aaa).

C/C++ doesn't have a mechanisim for doing this...the investigative features of Java and .NET actually cost alot in terms of binary size and require some compiler overhead. In addition C/C++ are much older languages and predate many of these concepts. There are solutions that allow you to do what you're talking about in C/C++...COM and CORBA. These are very much like the router example you gave but are difficult to program and require LOTS of overhead.

rquinlan19
06-13-2003, 01:15 AM
ahhh that makes much more sense... i'm sitting here thinking how simple my idea is and that someone would have already thought of this and done it unless it wan't possible. how come apis never get leaked from companies. seems like theres gotta be a developer somewhere that's sympathetic to our cause. i'm not suggesting that anyone do this, but im seriously supprised that it hasn't happned, seems like everything else manages to get out somewhat ahead of time.

ayasin
06-13-2003, 01:30 AM
Originally posted by rquinlan19
ahhh that makes much more sense... i'm sitting here thinking how simple my idea is and that someone would have already thought of this and done it unless it wan't possible. how come apis never get leaked from companies. seems like theres gotta be a developer somewhere that's sympathetic to our cause. i'm not suggesting that anyone do this, but im seriously supprised that it hasn't happned, seems like everything else manages to get out somewhat ahead of time.

Probably because we're not worth losing your job over :). Seriously it would be pretty easy to make a leaked API worthless...on palm all they have to do is change the offsets from sysLibTrapCustom for a bunch of the functions; your leaked header is useless and the only thing they have to do to make their stuff work again is recompile. If they really want to mess with you they can change the order of parameters and such as well...but this will require some work on their part (although most of it could be done in macros).

slippy4twenty
06-13-2003, 10:03 AM
Does eruware interface with the API? I'm slightly confused, and simply because the API isn't public, are they not supposed to be used?