Page 1 of 4 123 ... LastLast
Results 1 to 10 of 38
Like Tree9Likes

Thread: Machine learning TD Bot

  1. #1
    SF Pleb
    jak8222 is offline

    Posts
    141

    Machine learning TD Bot

    So I've rewritten this to help people understand how it will work.

    The bot will follow the same architecture as the alpha go which plays Go. The idea being you have two neural nets and using monte Carlo for search. I understand this isn't the most efficient way to play to play TD (rigger has had amazing success without doing it this way.)

    I am using this as an way to learn how to create and teach Deep neural nets and a bit of fun.

    Currently I have the first net built which works. However due to the number of games you have to play to train them, playing vs manually is not an option. So to stop this I was originally getting it to learn vs a random player then play vs old versions of itself. This lead to a problem as the net learnt that simply always going left beat a random player all the time and the net gets stuck in a rut. So to combat this I am going to write a simple negamax bot which plays to 7 depth for it to play against.

    For anyone who is interested a link to the code is here :

    https://github.com/JackThomson2/TD-TensorFlow

    Pull requests and suggestions are appreciated.
    Last edited by jak8222; 07-28-2017 at 08:56 AM.
    imsugar likes this.

  2. #2
    Worlds Smallest ePeen
    HouseStepFan is offline

    Posts
    524
    i see you've googled some computer science words, good progress!
    Pikachu likes this.

  3. #3
    Sobo's #1 bae
    Pikachu is offline

    Posts
    475
    Quote Originally Posted by HouseStepFan View Post
    i see you've googled some computer science words, good progress!
    But does it photosynthesize and read quantumly?
    HouseStepFan likes this.

  4. #4
    SF Pleb
    jak8222 is offline

    Posts
    141
    Quote Originally Posted by HouseStepFan View Post
    i see you've googled some computer science words, good progress!
    I'm doing a masters in software engineering o. 0

    But whatever I guess and yeah it photosythesizes
    Last edited by jak8222; 07-28-2017 at 06:41 AM.

  5. #5
    pas
    Member
    pas is offline

    Posts
    63
    As you mentioned, this would never be able to compete with dfs approach with alpha beta pruning in the game of td.
    There are only 5 or something possibel moves per turn (which is extremely little) and hence dfs approach will trump.

    There may be a chance that it can win in a game with holes though.
    But as long as it's fun

    Maybe duty navigation would be good with this approach since the game is so damn complicated (but maybe not because stardust dnav bot already exists). Either way the training would be extremely tedious.
    jak8222 likes this.

  6. #6
    SF Pleb
    jak8222 is offline

    Posts
    141
    Quote Originally Posted by pas View Post
    As you mentioned, this would never be able to compete with dfs approach with alpha beta pruning in the game of td.
    There are only 5 or something possibel moves per turn (which is extremely little) and hence dfs approach will trump.

    There may be a chance that it can win in a game with holes though.
    But as long as it's fun

    Maybe duty navigation would be good with this approach since the game is so damn complicated (but maybe not because stardust dnav bot already exists). Either way the training would be extremely tedious.
    We will see I'd like it to get reasonable performance. There are a few avenues I can pursue with a trained value net.

  7. #7
    Senior Member
    erik is offline

    Posts
    341
    Hi!

    If you're willing to PM me your Skype or Discord, I can help you with training.

  8. #8
    SF Pleb
    jak8222 is offline

    Posts
    141
    Quote Originally Posted by erik View Post
    Hi!

    If you're willing to PM me your Skype or Discord, I can help you with training.
    Sure thing

  9. #9
    SF Pleb
    jak8222 is offline

    Posts
    141
    For anyone interested, this has gone through a complete recode and now mimics alphazero, the advantage is now training data is not needed and generates its own. I'll need to wait till I get back to uni to train it properly on my GPU to see how it performs

  10. #10
    SF Pleb
    jak8222 is offline

    Posts
    141
    Did a quick test on my laptop after one generation learnt how the scoring works and the basics of dropping coins, will need some proper training to see it's real performance

Page 1 of 4 123 ... LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •